I18n:Updating Unicode version: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Add CaseFolding.txt to the documentation for SpiderMonkey's Unicode support)
 
(7 intermediate revisions by 2 users not shown)
Line 7: Line 7:
To regenerate the tables in nsUnicodePropertyData.cpp:
To regenerate the tables in nsUnicodePropertyData.cpp:


(1) Download the current Unicode data files from
Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>NB: not all the files are actually needed; currently, we require
* UnicodeData.txt
* Scripts.txt
* EastAsianWidth.txt
* BidiMirroring.txt
* HangulSyllableType.txt
* SpecialCasing.txt
* ReadMe.txt (to record version/date of the UCD)
* Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.


        http://www.unicode.org/Public/UNIDATA/
The Unicode data files listed above should be together in one directory.


    NB: not all the files are actually needed; currently, we require
We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt<br>This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.
      - UnicodeData.txt
      - Scripts.txt
      - EastAsianWidth.txt
      - BidiMirroring.txt
      - HangulSyllableType.txt
      - ReadMe.txt (to record version/date of the UCD)
      - Unihan_Variants.txt (from Unihan.zip)
    though this may change if we find a need for additional properties.


    The Unicode data files listed above should be together in one directory.
We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt<br>
    We also require the file  
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.
        http://www.unicode.org/Public/security/latest/xidmodifications.txt
    This file should be in a sub-directory "security" immediately below the
        directory containing the other Unicode data files.


  (2) Run this tool using a command line of the form
From intl/unicharutil/util, run the command:
  perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory
(where hb-common.h is found in the gfx/harfbuzz/src directory).


        perl genUnicodePropertyData.pl \
This will generate (or overwrite!) the files
                /path/to/hb-common.h   \
* nsUnicodePropertyData.cpp
                /path/to/UCD-directory
* nsUnicodeScriptCodes.h
in the current directory.


    (where hb-common.h is found in the gfx/harfbuzz/src directory).
== Casing ==


    This will generate (or overwrite!) the files
We require  Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>
As well as UnicodeData.txt downloaded in the previous step, we need
* SpecialCasing.txt


        nsUnicodePropertyData.cpp
From intl/unichar/util, run the command:
        nsUnicodeScriptCodes.h
perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp


    in the current directory.
This will generate (or overwrite!) the files
* nsSpecialCasingData.cpp
* all-lower-ref.html
* all-lower.html
* all-title-ref.html
* all-title.html
* all-upper-ref.html
* all-upper.html
in the current directory
 
Then move the six *.html files to layout/reftests/text-transform


== Normalization ==
== Normalization ==
== Transliteration ==
#Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
#Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
#Run <tt>perl gentransliterate.pl</tt> in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties
== Bidi ==


#Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
Currently our normalization data is frozen at Unicode 3.2 to conform to [https://www.ietf.org/rfc/rfc3454.txt RFC 3454] (Stringprep), see [https://bugzilla.mozilla.org/show_bug.cgi?id=728180 Bug 728180]
#Copy this file to intl/unicharutils/util/UnicodeData-Latest.txt in the mozilla source tree
#Run <tt>perl genbidicattable.pl</tt> in intl/unicharutils/util. This creates a new version of intl/unicharutils/util/bidicattable.h
#The previous step will probably issue warnings like the following:
WARNING, Unicode Database now contain characters which we have not considered.
change this program !!!
Problem- U+010900 - U+010907 range


In this case, you will need to edit <tt>@range</tt> in genbidicattable.pl to include the new ranges
== JavaScript Unicode support ==


== Default ignorable characters ==
To update SpiderMonkey's Unicode support:


#Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
* move into <code>js/src/vm/</code>
#Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
* run <code>python ./make_unicode.py</code>
#Download the latest version of DerivedCoreProperties.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt.
* verify that <code>UnicodeData.txt</code>, <code>CaseFolding.txt</code>, and the derived files were correctly updated
#Copy this file to intl/unicharutil/tools/DerivedCoreProperties.txt in the mozilla source tree
#Run <tt>perl genignorable.pl | perl ccmapbin.pl - gIgnorableCCMapExt ignorable</tt> in intl/unicharutil/tools. This creates a new version of gfx/thebes/src/ignorable.x-ccmap


[[Category:I18n]]
Note that running <code>python ./make_unicode.py FILENAME1 FILENAME2</code> instead uses <code>FILENAME1</code> as a <code>UnicodeData.txt</code> and <code>FILENAME2</code> as a <code>CaseFolding.txt</code>, if you ever want to generate new data without overwriting the current <code>js/src/vm/UnicodeData.txt</code> and <code>js/src/vm/CaseFolding.txt</code>.

Latest revision as of 10:26, 28 June 2016

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require

  • UnicodeData.txt
  • Scripts.txt
  • EastAsianWidth.txt
  • BidiMirroring.txt
  • HangulSyllableType.txt
  • SpecialCasing.txt
  • ReadMe.txt (to record version/date of the UCD)
  • Unihan_Variants.txt (from Unihan.zip)

though this may change if we find a need for additional properties.

The Unicode data files listed above should be together in one directory.

We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.

We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.

From intl/unicharutil/util, run the command:

perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory

(where hb-common.h is found in the gfx/harfbuzz/src directory).

This will generate (or overwrite!) the files

  • nsUnicodePropertyData.cpp
  • nsUnicodeScriptCodes.h

in the current directory.

Casing

We require Unicode data files from http://www.unicode.org/Public/UNIDATA/
As well as UnicodeData.txt downloaded in the previous step, we need

  • SpecialCasing.txt

From intl/unichar/util, run the command:

perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp

This will generate (or overwrite!) the files

  • nsSpecialCasingData.cpp
  • all-lower-ref.html
  • all-lower.html
  • all-title-ref.html
  • all-title.html
  • all-upper-ref.html
  • all-upper.html

in the current directory

Then move the six *.html files to layout/reftests/text-transform

Normalization

Currently our normalization data is frozen at Unicode 3.2 to conform to RFC 3454 (Stringprep), see Bug 728180

JavaScript Unicode support

To update SpiderMonkey's Unicode support:

  • move into js/src/vm/
  • run python ./make_unicode.py
  • verify that UnicodeData.txt, CaseFolding.txt, and the derived files were correctly updated

Note that running python ./make_unicode.py FILENAME1 FILENAME2 instead uses FILENAME1 as a UnicodeData.txt and FILENAME2 as a CaseFolding.txt, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt and js/src/vm/CaseFolding.txt.