Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One thing that I've always struggled w.r.t i18n is having to split messages up into often times less coherent chunks in order to add things like links or tooltips or styling elements in the middle of the text, which makes it more difficult to localize messages as a holistic piece independent of the source language.

For a slightly contrived example to demonstrate this, let's say you have a string like this:

"Please click here 7 times to confirm"

Where you want to make the "click here 7 times" look like a link by wrapping it in a <a> tag, or just styled differently using a styled <span>.

Using something like react-intl, which is what I've used in the past, you'd have to do something like this:

  <FormattedMessage
    id="confirm"
    defaultMessage={`Please { confirmLink } to confirm`}
    values={{
      confirmLink: 
        <a>
          <FormattedMessage 
            id="confirm-link"
            defaultMessage={`click here {clickCount, number} {clickCount, plural,
              one {time}
              other {times}
            }`}
            values={{ clickCount: 7 }}
          />
        </a>
    }}
  />
If then some language happens to require a completely different sentence structure that changes the ordering such that the "to confirm" part needs to be interleaved somewhere in the middle of the "click here 7 times" message to sound fluent, this would not be able to accommodate that.

I'm wondering how people generally deal with this, in React and elsewhere.



This is a great point and something that we've seen come up very often in building UIs. The good practice which we recommend to developers at Mozilla is to avoid splitting or nesting messages, because it makes it harder for translators to see the entire translation at once.

We've taken a layered approach to designing Fluent: what we're announcing today is the 1.0 of the syntax and file format specification. The implementations are still maturing towards 1.0 quality, but let me quickly describe what our current thinking is.

For JavaScript, we're working on low-level library which implements a parser of Fluent files, and offers an agnostic API for formatting translations. On top of it we hope to see an ecosystem of glue-code libraries, or bindings, each satisfying the needs of a different use-case or framework.

I've been working on one such binding library called fluent-react. It's still in its 0.x days, but it's already used in a number of Mozilla projects (e.g. in Firefox DevTools). In fluent-react translations can contain limited markup. During rendering, the markup is sanitized and then matched against props defined by the developer in the source code, in a way that overlays the translation onto the source's structure. Hence, this feature is called Overlays. See https://github.com/projectfluent/fluent.js/wiki/React-Overla....

Here's how you could re-implement your example using fluent-react. Note that the <a>'s href is only defined in the prop to the Localized component.

    <Localized
        id="confirm"
        $clickCount={7}
        a={<a href="..."></a>}
    >
        {"Please <a>click here {$clickCount ->
            [one] 1 time
           *[other] {$clickCount} times
        }</a> to confirm."}
    </Localized>
I'd love to get more feedback on ideas in fluent-react. Please feel free to reach out if you have more questions!


I've seen a few libraries that use a similar approach of parsing strings for pseudo-elements and then matching them with React elements to avoid splitting up messages, but I've always felt a lot of resistance towards adopting something like that because it means incurring the runtime cost of parsing a string for elements when you can easily have hundreds or thousands of messages being rendered at once. (Call it a premature optimization if you must, but I've been bitten enough times in the past for adopting libraries/approaches that scaled poorly performance wise and had to pay the cost in untimely, painful refactors.)

I feel there's a fundamental impedance mismatch here because we're defining messages as strings but the rest of our UI as React components. I described here a potentially different component-oriented approach as an attempt to get rid of this impedance mismatch: https://news.ycombinator.com/item?id=19681129

I'd love to hear some thoughts on that approach from folks with more real-world experience working with i18n than I do (which is not a whole lot sadly, given the nature of the kinds of projects I've worked on in the past).


I’ve been trying to solve this problem in the resource4j library for Java, which can cache rendered strings, but in the end it’s always a memory vs performance problem. I didn’t publish any artificial benchmarks, but in couple real projects (resource4j+thymeleaf) performance impact was usually negligible.


> `[one] 1 time *[other] {$clickCount} times`

How does this work for languages that have more complex pluralization rules?

E.g. in Russian it's "1 раз", "2 раза", "11 раз", "12 раз", "22 раза" and "55 раз" - the case depends on the number ending, with exceptions for 11, 12, 13 and 14.


That's a great question!

Fluent relies on Unicode Plural Rules [0] which allow us to handle all (as far as Unicode knows) pluralization rules for cardinal and ordinal (and range) categories :)

[0] http://cldr.unicode.org/index/cldr-spec/plural-rules


It's up to the localizer to define variants corresponding to the language's plural categories. For Russian, that's (one, few, many). Interestingly, this particular example could simplified to (few, *), because "раз" is good for both 1, 5, 11, 55, etc. See https://projectfluent.org/play/?id=7d22f87c04b23b86d9f9149d5... for an example of this in action.

Authoring tools can help here, too. Pontoon, Mozilla's translation management system, pre-populates plural variants based on the number of plural categories defined in Unicode's CLDR.



Would this also allow for translating e.g. the <a>'s `title` attribute, or e.g. an `aria-label`?


It's something that I definitely plan to add. There's even an open issue about it! https://github.com/projectfluent/fluent.js/issues/185


Great! It sounds like a very cool project to work on :)


What's stopping you from doing that now? Just add a new translation for those labels in the source document and bind it to the attribute tag.


You should take a look at js-lingui. Child components are automatically converted to symbols by way of a babel-macro (no run time parsing of complex translations).


Why fluent-react and not (in addition to) fluent-web-components? :(


One advantage I can imagine is that you can prerender the React components, outputting e.g. plain HTML. With tools like e.g. React Static, that means you can somewhat ergonomically generate different static websites for different languages, avoiding the runtime costs of looking up the correct strings.


Using the Svelte JS library (https://svelte.technology) you can have both: server-side rendered components [1] and compile to web components (custom elements) [2]

Another advantage is that the components compile to vanilla JavaScript, so we don't rely on a runtime library to run the application.

[1] https://svelte.technology/guide#server-side-rendering

[2] https://svelte.technology/guide#custom-elements


More and more, I'm starting to think maybe plain text isn't always the best abstraction to be using for defining messages for i18n. If messages were to be defined in terms of whatever primitive you're using to build your UI (i.e. React components if you're using React, and raw html template nodes if you're working with plain html), then all of this impedance mismatch might disappear.

In the React case, a component oriented approach to i18n could maybe look something like this:

  const DefaultMessage = ({ clickCount }) => 
    <span>
      Please <a>click here {clickCount} 
      {pluralize(clickCount, {one: "time", other: "times"})}
      </a> to confirm
    </span>

  // some theoretical language that requires putting "to confirm" between 
  // "click here" and "7 times", using English for clarity
  const MessageInSomeOtherLanguage = ({ clickCount }) => 
    <span>
      Please <a>click here to confirm {clickCount}
      {pluralize(clickCount, {one: "time", other: "times"})}
      </a> 
    </span>

  // some theoretical component that renders different components 
  // based on the language and passes through props
  <FormattedComponent
    id="confirm"
    defaultMessage={DefaultMessage}
    props={{ clickCount: 7 }}
  />
This feels a lot more elegant and flexible to me. Though it would make it more difficult for non-technical folks to contribute to translations, which might not be much of a concern if your company has the resources to support localization teams in-house. Am I overlooking any other obvious downsides to this approach? Does anyone know of any libraries that offers a similar API, or have experience using a similar approach?


The problem here is tooling and workflows. Often you'll be using a SaaS product to manage translations, like Lokalise or similar products. At the end of the day, these just give you a whole bunch of strings to put in your app.

I've found this really hard. At my last place we just ate the cost (and ugliness) of included HTML in this strings and dangerously inserting them into the page.


That's always a problem, even with templates. Angular does sanitization. Which is of course not ideal from a runtime performance point, but they probably weighted the cost-benefit and found it to be an okay trade-off.

Though if the i18n string files are present at build time, then this sanitization step could be done there.


Definitely a problem. I think what will really make Fluent take off is if someone can provide good translation tools and workflow. The syntax looks great, but for most projects, I suspect it's impractical to expect translators to write Fluent manually.


Our goal was to design a simple DSL which is easy to read and make small edits to. Copying and pasting is a powerful learning method :)

We're also working on creating richer and more streamlined authoring experience in Pontoon, Mozilla's translation management system. You can read about the current state of Fluent support in Pontoon in my colleague's post at https://blog.mozilla.org/l10n/2019/04/11/implementing-fluent....


Cool! That looks like a great start :) I don't really do any l10n work myself so this was really just a bystanders perspective.


It will be way easier to maintain, if the L10n resources won’t be mixed with the markup or code. Check this for example: https://github.com/resource4j/resource4j


This is an interesting approach, but it's not tooling independent. If you can rely on your translations only seeing use in React components, then it may be just what you need.


That's a great point. Although maybe one way to make the approach more generic is to treat them as functions that happen to return React Nodes as opposed to stateless functional components.

Then you could write an alternative set of functions that returns say Vue components or raw html templates.

It still doesn't make individual translations _always_ reusable across paradigms, but I'm not so sure if the impedance mismatch associated with working with raw strings in a modern UI frameworks is worth the translation portability of that de-facto approach. And at the end of the day the only transaction functions you'd have to duplicate are the ones return more than raw strings, so it's not the end of the world.


From an accessibility point of view, it's also recommended to avoid links that only span over such half-sentences.

Screen reader users will often navigate your page by cycling through the links that are on the page and then they'll get only the link-text read out, not the surrounding text.


This is how I handle those cases in vue:

clickHere: { one: 'Please' two: { singular: ' click here {0} time ', plural: ' click here {0} times ' }, three: 'to continue', link: 'google.com/en/' }

<span> {{ this.$i18n('clickHere.one') }} <a href="{{ this.$i18n('clickHere.link') }}" >{{ this.count > 1 ? this.$i18n('clickHere.two.plural', this.count) }} : this.$i18n('clickHere.two.singular', this.count) }}<\a> {{ this.$i18n('clickHere.three') }} <\span>

Don't forget to localize your links! English might not need it but many languages eventually will point to a different url.


I'm currently fighting (again) the same kind of battles with react-intl. I'd also be keen to hear others' ideas to make this easier / better.

Another example is when you need to inject an image as a text decoration that includes text, or only makes sense in a particular part of the sentence.

One workaround I've considered is to add the decoration directly to the font you're using, so you can literally translate the decoration as text, but that doesn't usually feel like a reasonable solution, and still might not solve the problem for all languages.


GNU gettext is a well working framework for i18n.

For the sentence ordering we include all variables in the translations, but split translations on styles. We then give the translator the text in order of html appearance in source code for context. Translators can then rearrange everything but the variable across the string and also leave stuff blank when necessary. It's not perfect, but works in most cases.

And in the end developers and designers must consider the "translateability" of the UI. It's always possible to create untranslateable UI.


Hi! We've been working with and evaluating Gettext when we started Fluent.

Our opinion is similar to Unicode's - Gettext is fundamentally flawed design for internationalization purposes.

Here's you can find more detailed explanation of our position - https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...

Please, don't take it as a criticism of using it. We just don't think it scales and we don't think it's possible to produce high quality sophisticated multilingual UI's with it, but if it works for you, don't touch it :)


You make a lot of good points in the linked article, but you lose some credibility right from the start

> Secondly, it makes it impossible to introduce multiple messages with the same source string which should be translated differently.

This is false. The gettext message format uses msgctxt to deal with this. It's a fundamental part of the format. The unique identifier is the combination of msgctxt and the singular string. I wonder how you could miss that? We actually use an automatically generated msgctxt for some part of our app to avoid accidentally translating the same source text incorrectly in different context.

Also I couldn't quite follow the point about interpolation of fluent vs gettext (probably because I don't know fluent). Message interpolation in gettext works and can be absolutely readable. E.g. "You have {count} items". The big drawback is that you can't move this variable across strings. Can you do that with fluent?


> but you lose some credibility right from the start

Thank you for the feedback! I updated the article to include the mention about `msgctxt`.

Personally, in my experience, many project environments end up with partial support for this feature (for example many react/angular extractors don't support it) which leads to limited use and requires the localizer to request adding a context by the developer.

I did not include that since it's just my personal experience and I assume more mature projects tend to recognize the feature and use it, hopefully, extensively :)

> Message interpolation in gettext works and can be absolutely readable. E.g. "You have {count} items".

As far as I understand this is not part of the system (gettext), but its bindings and in result is underspecified and differs between implementations. For example [0] uses `%{ count }` while [1] uses `{{ count }}`. If I'm mistaken here, please, point me to the spec :)

Since it is a higher level replacement, this approach likely suffers from multiple limitations. First of all, I highly doubt that there is any BiDi isolation between interpolated arguments and the string leading to a common bug when RTL text (say, arabic) contains an LTR variable (say, a latin based name of a person). Fluent resolves it by wrapping all interpolated placeables in BiDi isolation marks.

Secondly, I must assume that any internationalization, such as number formatting, date formatting, etc. is also not done from within of the resolver in gettext. That, in turn, means that it may be tricky to verify that a number is formatted using eastern arabic numerals when used in arabic translation, while formatted to western arabic when used in english translation. Fluent formats all placeables using Unicode backed intl formatters (for example in JS we use ECMA402), allowing for consistency and high quality translations where placeables get formatted together with the message.

For example, in your example, will the `You have { count } items` be translated to `لديك 5 عناصر` or `لديك ٥ عناصر`? And what will happen if instead of `count`, you'd have `name: "John"`? Will it be RTL or LTR?

[0] https://hexdocs.pm/gettext/Gettext.html#content [1] https://angular-gettext.rocketeer.be/dev-guide/api/angular-g...


Yes, I agree. It only solves a subset of the problems. Formatting and RTL/LTR is difficult to solve with gettext.


> Our opinion is similar to Unicode's - Gettext is fundamentally flawed design for internationalization purposes.

Did the Unicode consortium express critics about gettext? Could you provide some reference about this?


I don't know if there's any public statement about this. I base my position on experience at Unicode Conference and work on CLDR and ICU. I understand that it diminishes the value of my claim.

I can also point out to ICU MessageFormat - which has been designed much after Gettext and, I'd dare to say on purpose, bares no resemblance to it.


I agree that CLDR plural forms and ICU MessageFormat are somehow an implicit critic of gettext design :-)


Hahaha, thank you! I still feel ashamed of making a strong claim based on informal conversations, but I feel a bit vindicated by your agreement! :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: