October 10, 2024

Handling Newline Characters in Sitetran’s Page Doc Feature

David Peterman
Developer at SiteTran

In this article

Handling Newline Characters in Sitetran’s Page Doc Feature


How Browsers Handle Whitespace

When you're viewing text in a browser, it may seem like the formatting is exactly as you typed it in a text editor, but there’s something happening behind the scenes. Browsers treat whitespace—spaces, tabs, and newlines—differently than text editors do.

In most cases, browsers collapse multiple spaces and newlines into a single space when rendering text. For example, if you have the following text with multiple spaces and newlines: 

"    Hello,     world!"

The browser will display it like this:

"Hello, world!"

This means that no matter how many extra spaces or newlines are present, the browser compresses them down to just one space between words. While this works well for presenting content neatly on web pages, it creates a challenge when you need to preserve the exact formatting after translating, such as when copying and pasting the translated text into a text editor.

Unlike browsers, text editors render every space, tab, and newline character. So, if whitespace is removed during the translation process and then copied into a text editor, the original formatting may be lost. This is especially frustrating when whitespace is important for readability, such as in code snippets or document structures.

The Issue: Lost Newlines in Translated Text

When importing your phrases into a translation management system such as SiteTran, it is common to strip the newlines. This does not affect how the text appears on a webpage because browsers don’t render newlines, however, problems arise when users copy the translated content from Sitetran and paste it into a text editor, where newlines are visible and critical for formatting. Without those newlines, the content might look different, creating a frustrating experience for users who expect the original formatting to be preserved.

For example, imagine this scenario:

  • Original text:
    "\n    Hello, world\n    "
  • In Sitetran, this text would appear as:
    " Hello, world "
  • After translation, the spaces might be removed, resulting in:
    "Hola Mundo"

If you copy the translated text "Hola Mundo" and paste it into a text editor, the original newlines (\n) will not be preserved, leading to a loss of formatting.

How the Page Doc Feature Works


If you're using the Sitetran Page Doc feature and copying content from a text editor, you may notice that the text contains various whitespaces, including newlines. These newlines, while invisible in a browser, play an important role when copying content into a text editor. However, Sitetran’s process for importing and translating phrases strips these newlines. This doesn't affect how the text appears visually in a browser, as browser collapses all whitespace characters into a single space, but when you copy the translated text out of Sitetran and paste it into a text editor, the absence of these newlines can lead to formatting differences.

In this blog post, we’ll explain how Sitetran’s Page Doc feature works, the challenges surrounding whitespace and newlines, and how we developed a solution to preserve newline characters when necessary.


The Sitetran Page Doc feature is designed for users who want to translate documents or content that isn’t directly on a website. You can copy and paste your content into the Page Doc editor, which will maintain the styling and formatting of your original content—even after it's translated.

You can find the Page Doc feature in the ‘Pages & Phrases’ section by clicking the 'Add New Page' button. This allows you to create a translatable document in Sitetran. You can use the editor to manipulate the document visually or use the ‘Edit Raw HTML’ option to fine-tune the HTML directly.

One key feature of Page Doc is that it uses contentEditable, which preserves the formatting of any content pasted into it, such as font styles, alignment, and other aspects, including newlines. However, when phrases are imported for translation, Sitetran strips newline characters from the text.


Our Solution: Preserving Newlines Where Appropriate

To address this issue, we introduced a feature that ensures that newlines at the start and end of a phrase are preserved if they existed in the original text. However, this feature is designed with flexibility in mind, and it's only applied when it makes sense.

Here’s how it works:

  1. Conditionally Insert Newlines: When translating a phrase, Sitetran now checks whether the original phrase had leading or trailing newlines. If the translated phrase still has spaces at the start or end, we reinsert the corresponding newlines from the original text.
  2. Respecting the Translator’s Input: If the translator decides to remove the spaces from the beginning or end of the translated phrase, the newlines will not be preserved. This is because we prioritize the translator’s decisions over the original formatting in these cases.

This allows us to maintain formatting in cases where it matters, but also respect changes made during the translation process.

When Formatting May Still Be Lost

One important thing to note is that formatting will not be preserved in all cases. For example, if the original text contains leading and trailing spaces, but the translator removes those spaces, the newlines will also be lost. This is because we consider the removal of spaces to be intentional, and we don't add newlines if there are no spaces for them to wrap around.

Here’s an example:

  • Original text: "\n Hello world \n"
  • Imported Text: " Hello World "
  • Translated text: "Hola Mundo"

In this case, since the translator removed the leading and trailing spaces, the newlines will also be removed when you copy the translated content out of Sitetran and paste it into a text editor.

Conclusion

In summary, Sitetran’s feature to preserve newline characters solves a formatting issue that occurs when translating content with whitespaces. By adding the option to reinsert newlines only when the translated text still contains spaces, we ensure that formatting is preserved where possible while respecting the translator’s decisions. This strikes a balance between maintaining the integrity of the translation and preserving the formatting for users copying content into a text editor.

If you're using Sitetran’s Page Doc feature and care about retaining formatting when copying translated content into text editors, enabling sitetran.preserve_newline_characters can help maintain that consistency.

Let your website do the talking—anywhere.

SiteTran integrates seamlessly—no code rewrites, just smarter translation.