Update:Remora Localization

From MozillaWiki
Jump to: navigation, search

Localizing 101

Note: Expressions are converted to tags according to the standards mentioned at the end of the page (error_blah and such) so that multiple occurances of the same expression only have to be localized once.

Note 2: I assume you have the GNU gettext tools installed, otherwise the scripts won't work and, more importantly, the wrath of the merciless babelfish[TM] will come upon you.

L10n standards

  • If the string is a "widget" as defined in the shavictionary: (labels, common navigational elements like "Home" or "Top" or "Next")
    • element_name_additional
  • If the string is prose for explanations, error messages, instructions
    • type_name_additional
  • If the string is structural text like headers, titles, breadcrumbs, etc.
    • If the string is not in a form:
      • namespace_pagename_name_additional
    • If the string is inside of a form:
      • namespace_pagename_formname_element_name_additional

Where:

  • namespace is the location in cake of the view, so if it's under /views/developers/ the namespace is "developers"
  • pagename is the name of the file, with underscores taken out
  • formname is just what you think the form should be called with "form" appended, since cake is actually naming them... (should we make this more specific?)
  • name is a unique name for the element (preferably its id)
  • element is the closest tag or parameter, so for an images alt tag it would be "alt". For a label it would be "label"
  • additional is anything else needed to make a string unique
  • type is a global category, like "error"


Static Strings (PHP and gettext)

Use php's gettext functions to make localized strings, for example:

echo _('error_empty_glass');

To do string replacement, use sprintf like this:

echo sprintf(_('refill_something'), $glass, $beer);

Localizers can then translate it similar to:

The waiter pours some more %2$s into your %1$s.

and PHP will put in the value of $glass for %1$s and $beer for %2$s.

Note that we use ordinal parameters (%1$s) rather than simply %s which allows localizers to use a different order of the parameters (or drop some altogether).

Static pluralization

Gettext also supports some pluralization (for example, adding an 's' to the end of words when there are more than 1). To support that, we need to add a Plurarl-Forms header to the .po file, like so:

"Plural-Forms: nplurals=2; plural=n != 1;\n"

This Plural-Forms header is appropriate for english, other languages are given here.

After you have the plural forms header, you need to change the translation in the .po file. Which means, for example, this:

msgid "addons_display_a_previous_releases"
msgstr "View %d previous releases"

becomes:

msgid "addons_display_a_previous_releases"
msgid_plural "addons_display_a_previous_releases"
msgstr[0] "View %d previous release"
msgstr[1] "View %d previous releases"

Note that our single and plural forms are the same - that's because we're using placeholder strings.

The final change comes to the code, instead of calling _(), you'll need to use ngettext():

sprintf(ngettext('addons_display_x_comments_total','addons_display_x_comments_total',$total_comments), $total_comments)

The full gettext manual is available and intimidating, but it will probably answer any other questions.

Updating gettext files in the remora tree

These steps are most commonly executed in order.

extracting

After l10n strings have been added to the PHP code files etc., they have to be extracted into gettext's .po files. There's a shell script that goes through the application source tree (.php and .thtml files) and searches for gettext strings. The extracted strings are stored into ./messages.po .

Execute from the app dir:
./locale/extract-po.sh

merging

To bring the respective .po files of the individual locales up to date, execute from the app directory:

./locale/merge-po.sh messages.po ./locale
where messages.po is the file created by the extraction step and ./locale is the directory in which all the locales lie. The merge script will merge the new strings from messages.po into every *.po file underneath ./locale, then.

Note that translations already made will not be overwritten. New tags will be inserted and strings that aren't used anymore will be deprecated (i.e. commented out and put at the end of the file).

"compiling"

After translation, plain text .po files (PO = portable object) have to be translated into binary .mo files (MO = machine object). There's a third script you can run for that:

./locale/compile-mo.sh ./locale

This will make a .mo file in the same directory of every .po file.


Dynamic Strings

We need strings from the database to be localizable as well. This includes all english content in the remora code (Categories, etc.) but also the addons themselves (title, description, etc.)

Our original method was related closely to Pear::Translation2, but Lars showed us the error of our ways with the great Lars Digression of 2006. He came up with a new method that allowed referential integrity and a more stable table structure, which is detailed below:

make the translations table a simple three column table - a non-unique id, a locale and a string. Together the non-unique id and the locale make up the primary key. For each column in another table that needs a translation, replace that column with a partial key to the translations table id column. Whenever you select a row needing translations from a table, you simply add a join to the translations table using the partial key and the locale.

+------------------+------------------+------+-----+---------+-------+
| Field            | Type             | Null | Key | Default | Extra | 
+------------------+------------------+------+-----+---------+-------+
| id               | int(11) unsigned |      | PRI | 0       |       |       
| locale           | varchar(10)      |      | PRI |         |       |       
| localized_string | text             |  yes |     | NULL    |       |       
+------------------+------------------+------+-----+---------+-------+

pros

  • referential integrity
  • extensible - new or changed locale just means additional rows

cons

  • more complicated SQL - enough so to affect preformance? It depends on how many translations are needed in one query. I'll start to get worried at six in a single query.

example - fetching values

translation table

+----+--------+------------------+
| id | locale | localized_string |
+----+--------+------------------+
| 1  | en-us  | hello            |
| 1  | de     | Guten Tag        |
| 2  | en-us  | help             |
| 2  | de     | Hilfe            |
+----+--------+------------------+

A table

+----+----------+----------+
| id | greeting | danger   |
+----+----------+----------+
| 1  | 1        | 2        |
+----+----------+----------+

sql for english

select a.id, g.localized_string as greeting, d.localized_string as danger
from ((a left outer join translation as g on a.greeting = g.id and g.locale = 'en-us')
         left outer join translation as d on a.greeting = d.id and d.locale = 'en-us')
where 
  a.id = myTargetForFetching
+----+----------+----------+
| id | greeting | danger   |
+----+----------+----------+
| 1  | hello    | help     |
+----+----------+----------+

Now there are a couple of problems with this simplified technique. First, if you have a query that requires ten localized strings, then you need to add ten joins to the query. Nesting joins that deep make the SQL nearly unreadable. Second, if a localized string is missing, there is no simple (non-hacky) way to fetch a default value without resorting to another query.

example - inserting new values

Let's say we need to add a new row to the A table above. We must insert the localized strings into the translations table before inserting the new row in the A table. And, since we cannot rely on the database to automatically generate a new primary key, we'll need to get a new key from a sequence first.

To get a guaranteed unique new key:

UPDATE translations_seq SET id=LAST_INSERT_ID(id+1)
SELECT LAST_INSERT_ID()

With that new key, we can insert a new translation:

insert into translations (id, locale, localized_String) values (newId, 'en-US', 'howdy');

Now assuming we's done those steps twice (once for each new localized string required by a new row in A table).

insert into a (greeting, danger) values (newKey, secondNewKey)

To add a second language option for this new row, we need to both newKey and secondNewKey. Either these have been stored programmatically or we've refetched them by querying the A table.

insert into translations (id, locale, localized_String) values (newId, 'de', 'Gruss Gott');
 insert into translations (id, locale, localized_String) values (secondNewKey, 'de', 'der Himmel fällt'); 

example - deleting rows

Unfortunately, the cascading deletes using referential integrity within the database do not help us much in deleting translations if their parent row is deleted. This is because cascading deletes only work to delete rows that refer to a deleted primary key. If we were to delete a row from the A table, the corresponding rows in the translations table would not disappear because they do not refer to A's primary keys. This behavior could be repaired at the expense of complicating the new value insertion technique).

For now, we must manually make sure that we delete the translations after we've deleted the target row in the A table. However, we must save the keys to the translations from the target row before we delete it. Once the target row is deleted, we can iterate throught the saved list of keys and delete the translations.

Here's an example paraphrased from the python migration script:

 listOfTranslationsToDelete = newDB.executeSql(""" 
     select name from addons where id = targetAddonIDForDeletion
     union 
     select homepage from addons where id = targetAddonIDForDeletion
     union
     select description from addons where id = targetAddonIDForDeletion
     union
     select summary from addons where id = targetAddonIDForDeletion
     union
     select developercomments from addons where id = targetAddonIDForDeletion
     union
     select eula from addons where id = targetAddonIDForDeletion
     union
     select privacypolicy from addons where id = targetAddonIDForDeletion""")
  newDB.executeSql("delete from addons where id = targetAddonIDForDeletion")
  newDB.executeManySql("delete from translations where id = %s", listOfTranslationsToDelete)