Confirmed users
446
edits
(Lots more on language tags) |
(talk a little about BCP47 in the concepts introduction, talk more about language tags) |
||
| Line 9: | Line 9: | ||
== Key concepts == | == Key concepts == | ||
...talk about | Most of the concepts used by the Internationalization API are defined in [http://tools.ietf.org/html/bcp47 BCP 47]: a living aggregation of a set of RFCs (the set may change over time as RFCs in the set are obsoleted and replaced) specifying internationalization mechanics. Full details on concepts should generally be looked up there: ECMA-402 defines most underlying concepts only by reference. | ||
...talk about collators, date formats, and how all the stuff is implemented using what ICU primitives...copiously link to BCP47... | |||
=== Language tags === | === Language tags === | ||
Every operation is performed in terms of locales, specified as [http://tools.ietf.org/html/bcp47#section-2.1 language tags]: <code>en-US</code>, <code>nan-Hant-TW</code>, <code>und</code>, and so on. The components of a language tag are the language and optionally a script, region, and variations that might exist within these | Every operation is performed in terms of locales, specified as [http://tools.ietf.org/html/bcp47#section-2.1 language tags]: <code>en-US</code>, <code>nan-Hant-TW</code>, <code>und</code>, and so on. The main components of a language tag are the language and optionally a script, region, and variations that might exist within these. An extension component follows, permitting inclusion of extra structured data (usually to contextualize a use of the language tag). Finally, an optional private-use component may include implementation-defined data. All components are alphanumeric and case-insensitive ASCII. The components are joined into a language tag using hyphens; individual components can be distinguished by length and internal syntax (length, prefix, etc.). The precise details of language tag structure are quite complex, and they include a list of irregular forms for legacy compatibility reasons. See [http://tools.ietf.org/html/bcp47#section-2.1 BCP 47] for all the gory details. | ||
One particular subcomponent worth noting specifically is the ''Unicode extension component'', living within the extension component. The Unicode extension component has the basic form <code>"-u(-[a-z0-9]{2,8})+"</code>, with precise details in [https://tools.ietf.org/html/rfc6067 RFC 6067]. The Unicode component permits specifying additional details about sort order, numeric system, calendar system, and others. | |||
SpiderMonkey mostly ignores the language, script, region, and variant components of a language tag. It will pass these components to ICU in language tags provided by the user, but it generally doesn't examine them, or do much of interest with them. The one exception is for ''old-style language tags''. '''XXX add details about the old-style mapping code in Intl.js, and why ICU doesn't perform that mapping itself''' | SpiderMonkey mostly ignores the language, script, region, and variant components of a language tag. It will pass these components to ICU in language tags provided by the user, but it generally doesn't examine them, or do much of interest with them. The one exception is for ''old-style language tags''. '''XXX add details about the old-style mapping code in Intl.js, and why ICU doesn't perform that mapping itself''' | ||
SpiderMonkey ''does'', however, sometimes have to (very briefly) care about | SpiderMonkey ''does'', however, sometimes have to (very briefly) care about a Unicode extension component of a language tag -- but only to remove it. ECMA-402 often has better-structured means of specifying the same information, and so its algorithms require the Unicode extension component be removed before processing continues. | ||
=== ... === | === ... === | ||