User:Waldo/Internationalization API: Difference between revisions

User:Waldo/Internationalization API (view source)

Revision as of 20:29, 1 May 2013

352 bytes added , 1 May 2013

m

Slight changes/elaboration of the Language Tag section info

Waldo

Confirmed users

446

edits

@@ Line 19: / Line 19: @@
 One particular subcomponent worth noting specifically is the ''Unicode extension component'', living within the extension component.  The Unicode extension component has the basic form <code>"-u(-[a-z0-9]{2,8})+"</code>, with precise details in [https://tools.ietf.org/html/rfc6067 RFC 6067].  The Unicode component permits specifying additional details about sort order, numeric system, calendar system, and others.
-SpiderMonkey verifies structural validity of language tags and brings them into a canonical form, but generally doesn't interpret the language, script, region, variant, and private-use components of a language tag itself, passing them on to ICU for interpretation instead.  The one exception is for ''old-style language tags'': A small set of language tags for languages that according to BCP 47 don't have a default script, but that are commonly used without the script code based on older standard (RFCs 1766 and 3066) that didn't recognize script codes. The implementation maps such language tags to their modern equivalents so that they can be found in the lists of available locales provided by ICU.
+SpiderMonkey verifies structural validity of language tags and brings them into a canonical form but generally doesn't interpret the components of a language tag.  Instead SpiderMonkey passes the tag to ICU, and ICU interprets the components.
-As required by ECMA-402, SpiderMonkey separates the Unicode extension component of a language tag from the base language tag for processing. For the base language tag, a simple fallback mechanism is commonly used to find an available locale supporting it; while key-value pairs in the Unicode extension are compared separately against the feature set supported by the language found.
+One exception is for ''old-style language tags'': A small set of language tags for languages that according to BCP 47 don't have a default script, but that are commonly used without the script code based on older standard ([https://tools.ietf.org/html/rfc1766 RFC 1766] and [https://tools.ietf.org/html/rfc3066 RFC 3066]) that didn't recognize script codes. The implementation maps such language tags to their modern equivalents so that they can be found in the lists of available locales provided by ICU.
+Another exception is when determining language fallback, as required by ECMA-402: a language is requested that's not supported, but the language tag's internal structure implicitly encodes a list of fallbacks.  For example, the tag <code>en-US</code> suggests a fallback to <code>en</code>.
+The last exception is that, again as required by ECMA-402, SpiderMonkey removes the Unicode extension component of a language tag from the base language tag during processing.  The key-value pairs in the Unicode extension are compared separately against the feature set supported by the language found, and the language tag sans Unicode extension is used by ICU after the feature set is determined.
 === Currency codes ===