I18n:Updating Unicode version: Difference between revisions
Jump to navigation
Jump to search
| Line 3: | Line 3: | ||
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files. | This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files. | ||
== | == Unicode properties == | ||
To regenerate the tables in nsUnicodePropertyData.cpp: | |||
(1) Download the current Unicode data files from | |||
http://www.unicode.org/Public/UNIDATA/ | |||
NB: not all the files are actually needed; currently, we require | |||
- UnicodeData.txt | |||
- Scripts.txt | |||
- EastAsianWidth.txt | |||
- BidiMirroring.txt | |||
- HangulSyllableType.txt | |||
- ReadMe.txt (to record version/date of the UCD) | |||
- Unihan_Variants.txt (from Unihan.zip) | |||
though this may change if we find a need for additional properties. | |||
The Unicode data files listed above should be together in one directory. | |||
We also require the file | |||
http://www.unicode.org/Public/security/latest/xidmodifications.txt | |||
This file should be in a sub-directory "security" immediately below the | |||
directory containing the other Unicode data files. | |||
(2) Run this tool using a command line of the form | |||
perl genUnicodePropertyData.pl \ | |||
/path/to/hb-common.h \ | |||
/path/to/UCD-directory | |||
(where hb-common.h is found in the gfx/harfbuzz/src directory). | |||
This will generate (or overwrite!) the files | |||
nsUnicodePropertyData.cpp | |||
nsUnicodeScriptCodes.h | |||
in the current directory. | |||
== Normalization == | == Normalization == | ||
Revision as of 11:53, 30 September 2012
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.
Unicode properties
To regenerate the tables in nsUnicodePropertyData.cpp:
(1) Download the current Unicode data files from
http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require
- UnicodeData.txt
- Scripts.txt
- EastAsianWidth.txt
- BidiMirroring.txt
- HangulSyllableType.txt
- ReadMe.txt (to record version/date of the UCD)
- Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.
The Unicode data files listed above should be together in one directory.
We also require the file
http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the
directory containing the other Unicode data files.
(2) Run this tool using a command line of the form
perl genUnicodePropertyData.pl \
/path/to/hb-common.h \
/path/to/UCD-directory
(where hb-common.h is found in the gfx/harfbuzz/src directory).
This will generate (or overwrite!) the files
nsUnicodePropertyData.cpp
nsUnicodeScriptCodes.h
in the current directory.
Normalization
Transliteration
- Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
- Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
- Run perl gentransliterate.pl in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties
Bidi
- Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
- Copy this file to intl/unicharutils/util/UnicodeData-Latest.txt in the mozilla source tree
- Run perl genbidicattable.pl in intl/unicharutils/util. This creates a new version of intl/unicharutils/util/bidicattable.h
- The previous step will probably issue warnings like the following:
WARNING, Unicode Database now contain characters which we have not considered. change this program !!! Problem- U+010900 - U+010907 range
In this case, you will need to edit @range in genbidicattable.pl to include the new ranges
Default ignorable characters
- Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
- Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
- Download the latest version of DerivedCoreProperties.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt.
- Copy this file to intl/unicharutil/tools/DerivedCoreProperties.txt in the mozilla source tree
- Run perl genignorable.pl | perl ccmapbin.pl - gIgnorableCCMapExt ignorable in intl/unicharutil/tools. This creates a new version of gfx/thebes/src/ignorable.x-ccmap