I18n:Updating Unicode version: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 7: Line 7:
To regenerate the tables in nsUnicodePropertyData.cpp:
To regenerate the tables in nsUnicodePropertyData.cpp:


(1) Download the current Unicode data files from
Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>NB: not all the files are actually needed; currently, we require
* UnicodeData.txt
* Scripts.txt
* EastAsianWidth.txt
* BidiMirroring.txt
* HangulSyllableType.txt
* ReadMe.txt (to record version/date of the UCD)
* Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.


        http://www.unicode.org/Public/UNIDATA/
The Unicode data files listed above should be together in one directory.


    NB: not all the files are actually needed; currently, we require
We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt<br>This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.
      - UnicodeData.txt
From intl/unicharutil/util, run the command:
      - Scripts.txt
perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory
      - EastAsianWidth.txt
(where hb-common.h is found in the gfx/harfbuzz/src directory).
      - BidiMirroring.txt
      - HangulSyllableType.txt
      - ReadMe.txt (to record version/date of the UCD)
      - Unihan_Variants.txt (from Unihan.zip)
    though this may change if we find a need for additional properties.


    The Unicode data files listed above should be together in one directory.
This will generate (or overwrite!) the files
    We also require the file
* nsUnicodePropertyData.cpp
        http://www.unicode.org/Public/security/latest/xidmodifications.txt
* nsUnicodeScriptCodes.h
    This file should be in a sub-directory "security" immediately below the
in the current directory.
        directory containing the other Unicode data files.
 
(2) Run this tool using a command line of the form
 
        perl genUnicodePropertyData.pl \
                /path/to/hb-common.h  \
                /path/to/UCD-directory
 
    (where hb-common.h is found in the gfx/harfbuzz/src directory).
 
    This will generate (or overwrite!) the files
 
        nsUnicodePropertyData.cpp
        nsUnicodeScriptCodes.h
 
    in the current directory.


== Normalization ==
== Normalization ==

Revision as of 12:10, 30 September 2012

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require

  • UnicodeData.txt
  • Scripts.txt
  • EastAsianWidth.txt
  • BidiMirroring.txt
  • HangulSyllableType.txt
  • ReadMe.txt (to record version/date of the UCD)
  • Unihan_Variants.txt (from Unihan.zip)

though this may change if we find a need for additional properties.

The Unicode data files listed above should be together in one directory.

We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files. From intl/unicharutil/util, run the command:

perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory

(where hb-common.h is found in the gfx/harfbuzz/src directory).

This will generate (or overwrite!) the files

  • nsUnicodePropertyData.cpp
  • nsUnicodeScriptCodes.h

in the current directory.

Normalization

Transliteration

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
  3. Run perl gentransliterate.pl in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties

Bidi

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutils/util/UnicodeData-Latest.txt in the mozilla source tree
  3. Run perl genbidicattable.pl in intl/unicharutils/util. This creates a new version of intl/unicharutils/util/bidicattable.h
  4. The previous step will probably issue warnings like the following:
WARNING, Unicode Database now contain characters which we have not considered.
change this program !!!
Problem- U+010900 - U+010907 range

In this case, you will need to edit @range in genbidicattable.pl to include the new ranges

Default ignorable characters

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
  3. Download the latest version of DerivedCoreProperties.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt.
  4. Copy this file to intl/unicharutil/tools/DerivedCoreProperties.txt in the mozilla source tree
  5. Run perl genignorable.pl | perl ccmapbin.pl - gIgnorableCCMapExt ignorable in intl/unicharutil/tools. This creates a new version of gfx/thebes/src/ignorable.x-ccmap