User:GPHemsley/BCP 47

From MozillaWiki
Jump to: navigation, search

Summary

This is the plan for addressing bug 356038, the implementation of BCP 47.

Source

Discussion

Plan

  1. Create master list of languages
  2. Tie in spellcheckers
  3. Improve Languages UI
  4. Upgrade Languages UI to use central list
  5. Convert L10n mechanisms to use central list

Tasks

  • Update the JS code to handle all the new requirements in BCP 47.
  • Update list of language names ('Primary Language Subtags').
    • Do we exclude extinct/historical languages? If so, based on what criteria?
    • How do we choose the best name when there are multiple names in the IANA database?
      • We should listen to the language communities themselves if possible.
  • Update list of region names ('Region Subtags').
    • Who or what decides when to differ from how the IANA registry lists regions?
  • Add list of script names ('Script Subtags').
  • Intentionally ignore 'Extended Language Subtags', as they are generally for backwards compatibility with 'Primary Language Subtags' that represent macrolanguages.
    • Are we sure we want to do this?
  • Intentionally ignore 'Redundant Registrations', as they are generally for backwards compatibility and can be composed of other valid subtags.
  • Decide how to handle 'Variant Subtags', 'Extension Subtags', and 'Private Use Subtags', as well as 'Grandfathered Registrations', as it is unclear how they will come into play with regard to language selection or localization.
  • Decide how to clean up and/or reorganize language groups.
    • Can they be superseded by 'Script Subtags'?
  • Decide whether specifying the "accepted" languages is necessary.
    • Languages marked as "true" in this file are the ones that are shown in the Accept-Languages dialog; this may be moot with an improved dialog.
    • What are the reasons for a language not being "accepted"?
      • Consensus seems to be 'no'.
  • Decide how to separate the l10n-necessary language names from the l10n-unnecessary language names.
    • Do we separate 2-char vs. 3-char, or do we use another method?
      • See comment below.
  • Decide how to improve the Languages selection interface.
    • Needs input from UX about both the 'Languages' UI and the 'Fonts & Colors' UI.
  • Decide how we should handle 'q' values in the Accept-Language header.
    • Should we just allow them to be automatically generated from the given order, as is apparently the existing behavior?
      • Code is in C/C++; there doesn't seem to be any motivation to change it.
  • Decide what to do with language subtags that have a 'Scope' value.
  • Do we want to allow a user to specify languages which they explicitly do not speak or understand (q=0)?

Bugs

Note: This list is incomplete. Use the query links below for the full list.

  • bug 356038 (bcp47) – BCP 47 (RFC 5646 and 4647; IANA Language Subtag Registry) support
    • bug 666662 – Implement master list of language subtags (language, script, region, variant, etc.)
    • (language UI)
    • (l10n)
    • (spellchecking)
    • bug 556237 – Implement font and encoding negotiation based on BCP 47
      • bug 192636 – Map *-Latn languages to Western script (ISO 15924 script codes)
    • bug 656750 – Enhance hyphenation
    • (accessibility)
      • bug 481389 – Make sure only valid language attribute values are exposed
    • bug 716321 – Update existing list of language subtags to reflect more modern usage
      • bug 586085 – Add localizable language names to Firefox: Hawaiian, Hiligaynon, Kashubian
      • bug 531849 – Haitian Creole Language is listed as Haitian

Areas of focus

Master list

Bugs

  • bug 666662 – Implement master list of language subtags (language, script, region, variant, etc.)

Files

  •  ???

Current state

  • There is no true master list of language names.
  • All 2-letter and a handful of 3-letter codes have associated language names within the 'en-US' locale.
  • Language names are essentially arbitrary and subjective, with changes made to politically-charged language names and places.

Desired state

  • A master list of language names and associated information, based on the official IANA database.
  • Allow locales (including 'en-US') to localize (override) the names in the master list.
    • This would be where politically-charged names would be changed.

Language preferences UI

Bugs

  •  ???

Files

  • browser/components/preferences/languages.js
  • browser/components/preferences/languages.xul
  • browser/locales/en-US/chrome/browser/preferences/preferences.properties
  • intl/locale/src/langGroups.properties
  • intl/locale/src/language.properties
  •  ???

Current state

  • Very limited support for language codes, supporting only 'Primary Language Subtags' and 'Region Subtags' (in a limited way).
    • Not up-to-date with BCP 47.
  • No support for additional subtags (including 'Script Subtags') or hard-coded 'q' values.
    • The language UI merely takes the value of the value of the 'intl.accept_languages' preference and splits it by comma and then a single hyphen.
    • Proper parsing (which is already limited) is only done if the language tag is of the format 'xx' or 'xx-ZZ', and only if the corresponding names are available. Otherwise, the item is displayed unparsed.

Desired state

  • All possible valid combinations of subtags are supported, with corresponding names available.
  • More intuitive manipulation in the UI.

L10n

Bugs

  • (l10n)

Files

  • toolkit/locales/en-US/chrome/global/languageNames.properties
  • toolkit/locales/en-US/chrome/global/regionNames.properties
  •  ???

Current state

  • A full list of language and region names must be re-localized for each locale.
  • Subtags without a localized name face issues.
  • Localization teams for languages without a 2-letter language code must get their 3-letter code added manually to all locales.

Desired state

  • Localization teams only localize language names (etc.) that are commonly localized in their locale. Otherwise, they default to the value on the master list.
  • (Lower priority) Localization of codes like "en-IE" would be pieced together from the translations of the subtags (probably as "English (Ireland)" in this case), but it would be nice to allow l10n teams to override this behavior and give their own translation of a full code (e.g. en-IE = Hiberno-English).
  • To simplify the work of localizers, a list of "commonly localized" language names should be provided, extending the list currently in languageNames.properties (perhaps according to Kevin Scannell's suggestions in bug 356038, Comment 37)
  • Any valid combination of subtags (per BCP 47) can create a Firefox localization team and locale.
  • Locales that are not valid subtag combinations are phased out (e.g. 'jp-JP-mac').

Spellchecking

Bugs

  •  ???

Files

  •  ???

Current state

  • There are a limited number of built-in spellcheckers.
  • Additional spellcheckers can be added as extensions.
  • Spellcheckers are not approved on AMO if there is not a respective language name string in Firefox.

Desired state

  • Have built-in language names for all languages acknowledged by the IANA.
  • Allow spellcheckers (and AMO) to use master list for default language names.
  • Automatically detect language on page to use appropriate spellchecker.

Font negotiation

(this needs work)

Bugs

  • bug 556237 – Implement font and encoding negotiation based on BCP 47
    • bug 192636 – Map *-Latn languages to Western script (ISO 15924 script codes)

Files

  • browser/components/preferences/fonts.xul
  • browser/locales/en-US/chrome/browser/preferences/fonts.dtd
  • gfx/thebes/public/gfxPlatform.h
  • gfx/thebes/src/gfxAtomList.h
  • gfx/thebes/src/gfxFontconfigUtils.cpp
  • gfx/thebes/src/gfxPlatform.cpp
  • gfx/thebes/src/gfxWindowsFonts.cpp
  • gfx/thebes/src/nsUnicodeRange.cpp
  • gfx/thebes/src/nsUnicodeRange.h
  • intl/locale/src/langGroups.properties
  • intl/locale/src/language.properties
  • modules/libpref/src/init/all.js
  •  ???

Current state

  • Some sort of private-use language tags like 'x-western' are used for font categories.

Desired state

  • Better font negotiation, keeping in mind the forward-compatible drive towards UTF-8.
    • Would probably involve the 'Script Subtag' and possibly-associated 'Suppress-Script' value.

Hyphenation

(this needs work)

Bugs

Files

  •  ???

Current state

  • Automatic hyphenation support is provisionally implemented in Firefox 6 as '-moz-hyphens'.
    • Limited to 'en-US' and is based off a dictionary from OpenOffice.org. (See bug 253317.)
    • Relies on CSS3-Text spec, which is in flux.
  •  ???

Desired state

Accessibility

(this needs work)

Bugs

  • bug 481389 – Make sure only valid language attribute values are exposed

Files

  •  ???

Current state

  • @lang is presented as-is, even if invalid
  •  ???

Desired state

  • Validate @lang before passing it along
  •  ???

Meeting Notes

References

Resources