I18n:Updating Unicode version: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Updated for version 8.0 of Unicode)
(Add CaseFolding.txt to the documentation for SpiderMonkey's Unicode support)
 
(One intermediate revision by one other user not shown)
Line 37: Line 37:


We require  Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>
We require  Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>
As well as UnicodeData.txt downloaded in the previous set, we need
As well as UnicodeData.txt downloaded in the previous step, we need
* SpecialCasing.txt
* SpecialCasing.txt


Line 65: Line 65:
* move into <code>js/src/vm/</code>
* move into <code>js/src/vm/</code>
* run <code>python ./make_unicode.py</code>
* run <code>python ./make_unicode.py</code>
* verify that <code>UnicodeData.txt</code> and the derived files were correctly updated
* verify that <code>UnicodeData.txt</code>, <code>CaseFolding.txt</code>, and the derived files were correctly updated


Note that running <code>python ./make_unicode.py FILENAME</code> instead uses <code>FILENAME</code> as a <code>UnicodeData.txt</code>, if you ever want to generate new data without overwriting the current <code>js/src/vm/UnicodeData.txt</code>.
Note that running <code>python ./make_unicode.py FILENAME1 FILENAME2</code> instead uses <code>FILENAME1</code> as a <code>UnicodeData.txt</code> and <code>FILENAME2</code> as a <code>CaseFolding.txt</code>, if you ever want to generate new data without overwriting the current <code>js/src/vm/UnicodeData.txt</code> and <code>js/src/vm/CaseFolding.txt</code>.

Latest revision as of 10:26, 28 June 2016

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require

  • UnicodeData.txt
  • Scripts.txt
  • EastAsianWidth.txt
  • BidiMirroring.txt
  • HangulSyllableType.txt
  • SpecialCasing.txt
  • ReadMe.txt (to record version/date of the UCD)
  • Unihan_Variants.txt (from Unihan.zip)

though this may change if we find a need for additional properties.

The Unicode data files listed above should be together in one directory.

We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.

We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.

From intl/unicharutil/util, run the command:

perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory

(where hb-common.h is found in the gfx/harfbuzz/src directory).

This will generate (or overwrite!) the files

  • nsUnicodePropertyData.cpp
  • nsUnicodeScriptCodes.h

in the current directory.

Casing

We require Unicode data files from http://www.unicode.org/Public/UNIDATA/
As well as UnicodeData.txt downloaded in the previous step, we need

  • SpecialCasing.txt

From intl/unichar/util, run the command:

perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp

This will generate (or overwrite!) the files

  • nsSpecialCasingData.cpp
  • all-lower-ref.html
  • all-lower.html
  • all-title-ref.html
  • all-title.html
  • all-upper-ref.html
  • all-upper.html

in the current directory

Then move the six *.html files to layout/reftests/text-transform

Normalization

Currently our normalization data is frozen at Unicode 3.2 to conform to RFC 3454 (Stringprep), see Bug 728180

JavaScript Unicode support

To update SpiderMonkey's Unicode support:

  • move into js/src/vm/
  • run python ./make_unicode.py
  • verify that UnicodeData.txt, CaseFolding.txt, and the derived files were correctly updated

Note that running python ./make_unicode.py FILENAME1 FILENAME2 instead uses FILENAME1 as a UnicodeData.txt and FILENAME2 as a CaseFolding.txt, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt and js/src/vm/CaseFolding.txt.