I18n:Updating Unicode version

From MozillaWiki
Jump to navigation Jump to search

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

(1) Download the current Unicode data files from

        http://www.unicode.org/Public/UNIDATA/
    NB: not all the files are actually needed; currently, we require
      - UnicodeData.txt
      - Scripts.txt
      - EastAsianWidth.txt
      - BidiMirroring.txt
      - HangulSyllableType.txt
      - ReadMe.txt (to record version/date of the UCD)
      - Unihan_Variants.txt (from Unihan.zip)
    though this may change if we find a need for additional properties.
    The Unicode data files listed above should be together in one directory.
    We also require the file 
       http://www.unicode.org/Public/security/latest/xidmodifications.txt
    This file should be in a sub-directory "security" immediately below the
       directory containing the other Unicode data files.
(2) Run this tool using a command line of the form
        perl genUnicodePropertyData.pl \
                /path/to/hb-common.h   \
                /path/to/UCD-directory
    (where hb-common.h is found in the gfx/harfbuzz/src directory).
    This will generate (or overwrite!) the files
        nsUnicodePropertyData.cpp
        nsUnicodeScriptCodes.h
    in the current directory.

Normalization

Transliteration

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
  3. Run perl gentransliterate.pl in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties

Bidi

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutils/util/UnicodeData-Latest.txt in the mozilla source tree
  3. Run perl genbidicattable.pl in intl/unicharutils/util. This creates a new version of intl/unicharutils/util/bidicattable.h
  4. The previous step will probably issue warnings like the following:
WARNING, Unicode Database now contain characters which we have not considered.
change this program !!!
Problem- U+010900 - U+010907 range

In this case, you will need to edit @range in genbidicattable.pl to include the new ranges

Default ignorable characters

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
  3. Download the latest version of DerivedCoreProperties.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt.
  4. Copy this file to intl/unicharutil/tools/DerivedCoreProperties.txt in the mozilla source tree
  5. Run perl genignorable.pl | perl ccmapbin.pl - gIgnorableCCMapExt ignorable in intl/unicharutil/tools. This creates a new version of gfx/thebes/src/ignorable.x-ccmap