From MozillaWiki
Jump to: navigation, search

L20n looks hard (complexity vs. simplicity)

Most of the strings in most of the projects are simple. By simple, we mean those short, verbose commands like "Open", "Close", "Print".

But in every project there is a limited number of strings that, in order to sound right, have to be translated with use of grammatical features of the given language.

What's more important, those complex cases tend to be extremely important for the user experience. Those are the strings that make or break the UI. Things like data deletion, notifications about actions of other people, status updates etc. In L20n we walk the line between keeping easy things easy, and making complex things possible.

One of the features that we offer, is building special formulas that enable string selection among possible choices. An example of such formula is a plural form macro. Since those are standardized, we want to provide default global macros, while letting localizers add more formulas for the grammatical features of their language. Those may also go into a "common" resource for the given language.

While we don't expect localizers to have technical background or programming skills, we recognize their passion for excellence and drive for quality. We see them spending days and weeks debating single strings to make them meaningful, understandable and natural.

When we remove the limitations, many of those localizers are able to dive deeper into the features of the framework they operate in and use the limited algorithmic features that L20n offers to build truly natural looking messages. That, in turn, improves the UX of the product.

If the context data is global, how do I build multi-item UIs?

In L20n, we try to minimize the presumptions about how localizers build strings. One of such limitations in most APIs is the concept of passing variables with particular calls like:

 l10n.get('id1', {'key': 'value'});

where the second argument is an object with variables to be used within the string `id1`.

While L20n supports this syntax from the very beginning, we believe that it's more beneficial for the developer to expose the variables to the whole context and let the localizers decide which strings will depend on them. After all, the developer is not the best person to decide which string on the account screen will require user's gender.

 document.l10n.ctxdata = {
   'unreadMessages': 5

The per-call data passing may still be important for multi-item UI's (like building lists of strings with changing variables), so we decided to keep it, but we encourage developers to use context data to pass variables for localizers to use.

Why is L20n's API asynchronous?

One of the challenges of localization on the web is it's modular and asynchronous nature. Similar to CSS, JS and other resources, localization resources may load fast or slow and some of them may be unavailable.

The most natural reaction to errors in localization is to use a user provided fallback chain to switch to the next locale. If the user prefers es-MX, but it's not available, use "es".

That's simple in theory, but in practice, the synchronous APIs severely limit what can happen with localizations. L20n provides full stack of asynchronous APIs, letting developers write all code in a clean way that allows for fallback resource loading, language changes on fly and error recovery.

How is L20n different from I18n?

I18n provides features like date formatting, number formatting, currency and sorting.

L20n, through its mechanism of globals, enables localizers to use I18n even in places where developer does not use i18n properly.

That means that developers can internationalize, say, currency, and pass it to localizer, but they can also just pass a number, and localizer will choose the right internationalization format, returning ready to use string with proper currency or date within.

Are you planning for L20n to become a standard?

L20n has been modeled with ECMA/WebAPI standardization in mind.

While designing L20n APIs for HTML and JS we worked with Mozilla W3C, layout, security and DOM teams. We also introduced L20n to L10n/I18n teams at Google, Facebook and Apple for feedback.

We're confident in the current API scheme, but we still believe that the path to standardization requires more public discussion, especially with stakeholders outside of Mozilla.

The next steps for us are to prove the API in Firefox OS context as a client-side library, prototype HTML5 Gecko bindings using shadowRoot and start the conversation with other projects interested in client side HTML5 localization.

Why create a new file format?

We would prefer to use an existing file format, and we tried to fit one of the existing file formats. Here's what we learned from that experience.

L20n is conceptually different in that it puts a very limited programming or templating language in the hands of localizers.

If we're starting with the actual functionality and try to encode it in text, we're facing a few caveats. L20n is mostly data, with some limited programming concepts like referencing other values, string concatenation, value formatting, and a limited set of logical expressions. Mostly data, some programming. If we want to bolt on to existing file formats, we can choose between generic data formats, and generic programming languages.

So what happens if we use a general purpose programming language (GPL)? Well, once you serialize to a GPL, you're either stuck with supporting the full language, or you need to deal with people that try to use the full language. As L20n is language agnostic, you'd either have to re-implement parts of your picked language, or embed it. And even in case where you'd think you have one, you may not. Say, Firefox on Android has JavaScript, but on startup, you don't want to wait for gecko to be up to talk to its js engine. Also, just defining a data mapping is generally awkward, in particular in a way that can be extended seamlessly to hook up code. This isn't a good fit for a mostly-data-some-logic scenario.

Next contestant are pure data formats. JSON, Yaml, XML, you name it. Once you're serializing to a data format, you're facing the same problem as with regular programming languages. You have to bridge the gap between data and programming logic. Data might come easy, but programming logic will be tricky. You can compromise by not serializing a full AST into data objects, for example, to serialize expressions as strings. But then you need to create a parser for those expressions anyways, for tools that actually want to support l20n in full. On top of that, if you're using a generic data format, you need to validate the input from the data format parser against the constraints of your AST on parsing. Which is basically writing a parser. You don't parse text but data objects, but you're still parsing.

That's the experience we had when trying both categories of existing file formats.

L20n file format came from a conclusion that if we want to provide modern, powerful localization technology that cleans up the dirty-hack story of current localization technologies, we need to stop pretending that localization data structures are just key-value pairs with some funny minor additions. In L20n things that enable plural/gender forms, flexible string building etc. are core features. And in result they are separate data units. Different. And they need a format that can store them and make them readable/writable.

Such format has never existed before in a same way cascading style sheets file format has never existed before CSS. You could write CSS using XML format. But it ain't gonna be pretty. :)

Back to L20n home