User:Waldo/Internationalization API: Difference between revisions

ICU updating tweaks
m (Slight changes/elaboration of the Language Tag section info)
(ICU updating tweaks)
 
(3 intermediate revisions by the same user not shown)
Line 117: Line 117:
=== Implementation ===
=== Implementation ===


ECMA-402 currently exposes <code>Intl.Collator</code>, <code>Intl.DateTimeFormat</code>, and <code>Intl.NumberFormat</code> objects.  The spec also permits initializing an existing object as one of these, for a small wrinkle.  The fundamental ICU data structures providing the relevant functionality are <code>UCollator*</code>, <code>UNumberFormat*</code>, and <code>UDateFormat*</code>, opaque pointers all.  Instances are created using <code>u{col,num,date}_open</code>, passing in appropriate arguments.  For objects ''created'' by the constructor, the pointer is stored in a reserved slot as a private value.  For objects merely ''initialized'' by the constructor, the ICU data structures must be (inefficiently!) created anew every time.  (This difference should not be observable, except through performance-timing, because the only structures consulted to create the ICU structure are internal ones , operations on which aren't observable.)
ECMA-402 currently exposes <code>Intl.Collator</code>, <code>Intl.DateTimeFormat</code>, and <code>Intl.NumberFormat</code> objects.  The spec also permits initializing an existing object as one of these, for a small wrinkle.  The fundamental ICU data structures providing the relevant functionality are <code>UCollator*</code>, <code>UNumberFormat*</code>, and <code>UDateFormat*</code>, opaque pointers all.  Instances are created using <code>u{col,num,date}_open</code>, passing in appropriate arguments.  For objects ''created'' by the constructor, the pointer is stored in a reserved slot as a private value.  For objects merely ''initialized'' by the constructor, the ICU data structures must be (inefficiently!) created anew every time.  (This difference should not be observable, except through performance-timing, because the only structures consulted to create the ICU structure are internal ones, operations on which aren't observable.)


Every object initialized as an Intl object has an associated set of internal properties.  In ECMA-402 these properties are represented using ES5's traditional double-bracket notation: <code><nowiki>[[calendar]]</nowiki></code>, <code><nowiki>[[initializedIntlObject]]</nowiki></code>, and so on.  The "ideal" means of implementing these properties would probably be ES6 private names, but they're not stable or well-understood enough to be specified yet (let alone implemented).  In the meantime we associate ECMA-402 internal properties with objects using a weak map.  Any object initialized as an <code>Intl</code> object has an internal <code><nowiki>[[initializedIntlObject]]</nowiki></code> property.  This is implemented by placing all such objects as keys in a weak map (<code>internalsMap</code> in <code>builtin/Intl.js</code>).  The corresponding value is an ''internals object''.
Every object initialized as an Intl object has an associated set of internal properties.  In ECMA-402 these properties are represented using ES5's traditional double-bracket notation: <code><nowiki>[[calendar]]</nowiki></code>, <code><nowiki>[[initializedIntlObject]]</nowiki></code>, and so on.  The "ideal" means of implementing these properties would probably be ES6 private names, but they're not stable or well-understood enough to be specified yet (let alone implemented).  In the meantime we associate ECMA-402 internal properties with objects using a weak map.  Any object initialized as an <code>Intl</code> object has an internal <code><nowiki>[[initializedIntlObject]]</nowiki></code> property.  This is implemented by placing all such objects as keys in a weak map (<code>internalsMap</code> in <code>builtin/Intl.js</code>).  The corresponding value is an ''internals object''.


Checking whether an object has been initialized as an <code>Intl</code> object is encapsulated by the <code>isInitializedIntlObject</code> method in {{source|js/src/builtin/Intl.js}}.  The <code>getInternals</code> function in the same file is used to encapsulate weak map access to an internals object.  These methods ensure the weak map mechanism is only an implementation detail encoded in a very few places.
Checking whether an object has been initialized as an <code>Intl</code> object is encapsulated by the <code>isInitializedIntlObject</code> method in {{source|js/src/builtin/Intl.js}}.  The <code>getIntlObjectInternals</code> and (less preferred) <code>getInternals</code> function in the same file are used to encapsulate weak map access to an internals object.  These methods ensure the weak map mechanism is only an implementation detail encoded in a very few places.


Internals objects are objects with null <code><nowiki>[[Prototype]]</nowiki></code>, with properties corresponding to the other internal properties on the object, named naturally — "calendar", "initializedDateTimeFormat", and so on (no brackets).  Accessing any internal property is simply a matter of doing <code>internals.calendar</code>: this is safe because, with the <code><nowiki>[[Prototype]]</nowiki></code> nulled out, property accesses can't touch any script-visible state.  Internal properties are added and set during the initialization process.  They are lazily consulted to construct an ICU structure when collation/formatting/etc. actually occurs in the <code>js::intl_CompareStrings</code>, <code>js::intl_FormatNumber</code>, and <code>js::intl_FormatDateTime</code> functions.  (Although not ''directly'' there, but rather in sub-methods called when the ICU structure isn't cached, or when the object was initialized as an <code>Intl</code> object but wasn't actually one — see again the "inefficiently" bit above.)
Internals objects are objects with null <code><nowiki>[[Prototype]]</nowiki></code> and the properties <code>type</code>, <code>lazyData</code>, and <code>internalProps</code>.  This structure permits internals objects to be ''lazily'' initialized.  Initially, <code>type</code> is <code>"partial"</code>; lazy initialization changes this to <code>"Collator"</code>, <code>"DateTimeFormat"</code>, or <code>"NumberFormat"</code> and sets <code>lazyData</code> to the information necessary to compute full initialization info; finally, first use fully initializes, converting <code>lazyData</code> into an <code>internalProps</code> object containing the actual ECMA-402-defined internal properties.  (For more details on this scheme, see <code>initializeIntlObject</code> and adjacent functions, as well as the class-specific initialization methods, in {{source|js/src/builtin/Intl.js}}.)
 
The <code>internalProps</code> object stores the internal properties (other than <code><nowiki>[[initializedIntlObject]]</nowiki></code>) of the object, named naturally — "calendar", "initializedDateTimeFormat", and so on (no brackets).  Accessing any internal property is simply a matter of doing <code>internals.calendar</code>: this is safe because, with the <code><nowiki>[[Prototype]]</nowiki></code> nulled out, property accesses can't touch any script-visible state.  These internal properties are lazily computed to construct an ICU structure when collation/formatting/etc. actually occurs in the <code>js::intl_CompareStrings</code>, <code>js::intl_FormatNumber</code>, and <code>js::intl_FormatDateTime</code> functions.  (Although not ''directly'' there, but rather in sub-methods called when the ICU structure isn't cached, or when the object was initialized as an <code>Intl</code> object but wasn't actually one — see again the "inefficiently" bit above.)


=== Care and feeding of the Internationalization API ===
=== Care and feeding of the Internationalization API ===
Line 131: Line 133:
==== ICU ====
==== ICU ====


ICU has major releases once or twice a year, and minor releases as needed. Releases are announced on the [https://lists.sourceforge.net/lists/listinfo/icu-announce icu-announce mailing list]. Each release includes the latest versions of the CLDR locale data, the IANA time zone database, and the ISO 4217 currency data, so it's generally worth it for Mozilla to update its copy each time. As of April 2013, upgrades are unfortunately blocked by [http://bugs.icu-project.org/trac/ticket/10043 ICU bug 10043]. To import the latest version, use the {{source|intl/update-icu.sh}} script.
ICU has major releases once or twice a year, and minor releases as needed. Releases are announced on the [https://lists.sourceforge.net/lists/listinfo/icu-announce icu-announce mailing list]. Each release includes the latest versions of the CLDR locale data, the IANA time zone database, and the ISO 4217 currency data, so it's generally worth it for Mozilla to update its copy each time. To import the latest version, use the {{source|intl/update-icu.sh}} script.  Doing so will likely require updating Mozilla's set of local ICU patches -- a tedious process the burden of which we attempt to minimize by upstreaming patches whenever possible (and only patching locally with good reason).


Bugs in ICU should be reported into the [http://bugs.icu-project.org/trac/ ICU bug database]. Bug fixes can be [http://site.icu-project.org/processes/contribute contributed]; as of April 2013, one contribution is in progress ({{bug|866359}}).
Bugs in ICU should be reported into the [http://bugs.icu-project.org/trac/ ICU bug database]. Bug fixes can be [http://site.icu-project.org/processes/contribute contributed]; as of February 2014, one contribution is in progress ({{bug|866359}}).


==== Language subtag registry ====
==== Language subtag registry ====
Line 156: Line 158:


The Test402 conformance test suite can be updated by Ecma members with contribution agreements at any time. Updates should be announced on the [https://mail.mozilla.org/listinfo/test262-discuss test262-discuss mailing list]. To import the updates, run the {{source|js/src/tests/update-test402.sh}} script.
The Test402 conformance test suite can be updated by Ecma members with contribution agreements at any time. Updates should be announced on the [https://mail.mozilla.org/listinfo/test262-discuss test262-discuss mailing list]. To import the updates, run the {{source|js/src/tests/update-test402.sh}} script.
SpiderMonkey won't always pass all official tests, so a mechanism for marking tests as failing is needed.  The mechanism by which JS tests are run is to generate test lists, then process extra jstests.list files already present and merge in their changes.  The resulting data determines what tests will be packaged up, to be run as J builds on tinderbox.  Thus to skip internationalization tests that fail, we list and skip them in {{source|js/src/tests/jstests.list}}.  One benefit of this is that we can fix such failures without having to rerun the import script and without having to change failing tests themselves.  (Arguably jstests.list used this way is a kludge, to be sure.  But it's not too bad as hacks go.)


=== Known issues ===
=== Known issues ===
Confirmed users
446

edits