Bugzilla:L10n:Maketext

From MozillaWiki
Jump to: navigation, search

The idea was introduced by mkanat (bug 407752). If implemented, will greatly reduce localizers' workload.

Design considerations

There are several decisions to be made before one start hacking templates and code:

  • Syntax for localizable texts in Perl and TT
  • Lexicon format
  • Directory structure

Perl syntax

xgettext.pl supports:

translate()
maketext()
gettext()
loc()
x()
_()
__()

In Bugzilla Perl code we will be using

loc()

Template Toolkit syntax

Formats supported by xgettext.pl are:

[% |l(args) %]string[% END %]
[% 'string' | l(args) %]
[% 'string' FILTER l(args) %]
[% l('string',args) %]
[% loc('string',args) %]

In fact, l and loc, | and FILTER are synonyms.

In Bugzilla templates we will be using

[% |l %]
Large boilerplate text, no lines changed, filter added as separate lines.
[% END %]
[% l("Short text",arg1,arg2) %]

See also #Default template look and feel below

Lexicon format

Available formats also vary:

  • simple .pm modules with explicit %Lexicon entries
  • classic .po files, more convenient to localizers community because of mature tools
  • database backend is in progress

IMHO we should stick to .po, see #Terms retirement below.

The directory structure

For .pm files there is no good place besides Bugzilla/L10N/LANG.pm

Bugzilla/L10N/LANG.po 
Most obvious, used by many projects.
template/LANG/{default,custom}/*.po 
Advantages: site and project level customizations possible. Drawbacks: need to iterate through multiple .po files to merge the lexicon; unneeded exposure to http server.
po/LANG.po or locale/LANG.po
Why not create a separate dir?

Another question is: how many .po files we need for a single instance? Is single file enough or do we split web, command line and email domains?

Features

Terms retirement

In object-oriented spirit of Maketext, application should not care about how to say something, just what to say. So default English templates may safely refer to 'bugs', and leave the rest to l10n logic. Example:

template/en/default/bug/create/create.html.tmpl:

[%# no need to PROCESS "global/variables.none.tmpl" %]
[%# no need to PROCESS "global/field-descs.none.tmpl" %]
...
[% |l %]Bugzilla – Enter Bug[% END %]

customized en.po:

msgid "Bugzilla – Enter Bug"
msgstr "Bugzilla@mysite – Submit a Service Request"

To ease such customizations (and to provide backwards compatibility) en.po would be preprocessed from en.po.tmpl:

[% PROCESS "global/variables.none.tmpl" %]
...
msgid "Bugzilla – Enter Bug"
msgstr "$terms.Bugzilla – Enter $terms.Bug"

This way global/variables.none.tmpl would be used only during checksetup.pl, to compile .po files

Backwards compatibility

Maketext support by itself does nothing to change templates. All existing templates, custom or default, English or localized, would still work. Template search path logic is not changed either: translated template/LANG/default would take precedence over maketext-converted template/en/default and LANG.po file.

Language negotiation

Best language is not so easy to glark anymore: suppose we have

  1. template/en/custom -- custom set, refit with maketext
  2. template/ru/custom -- old translated set
  3. en.po -- English messages for template/en/custom
  4. fr.po -- French messages
  5. ja.po -- Japanese messages

and then we get a request for ja, ru, fr language preference. With current logic, it would serve Russian instead of Japanese.

Default templates refit

After English lexicon shakedown localizers may start the bulk of translation work. However, nearly all required texts already exist somewhere in templates. How do we automate this?

Message catalog with its msgid/msgtxt pairs is very similar to unified diff between template/en and template/LANG trees. One can merge both, based on line numbers of English templates, and copy other (LANG) side of a diff into msgtxt.

Is it practical?

Drawbacks and concerns

Real-life examples can be seen in bug 412161 attachments.

Default template look and feel

Frequent [% l() %] calls may obfuscate templates. Example: before (without numeric inflection logic)

[% nbugs %] $terms.bugs were found in [% nproducts %] products.

and after

[% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %]

Another option to consider is Locale::Maketext::Fuzzy:

[% |l %]
[% nbugs %] bug(s) were found in [% nproducts %] product(s).
[% END %]

And then in en.po:

msgid "0 bug(s) were found in 0 product(s)."
msgstr "Zarro boogs found."
msgid "[_1] bug(s) were found in [_2] product(s)."
msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)."

Note exact match for a special case.

Template security

Real situation is even worse because most variable pieces are unsafe and require FILTER html.

[% nbugs FILTER html %] $terms.bugs were found in [% nproducts FILTER html%] products.

and after

[% bugs = BLOCK; nbugs FILTER html; END;
   products = BLOCK; nproducts FILTER html; END;
   l("[_1] bug(s) were found in [_2] products.", bugs, products) %]

Perhaps we should add function helpers to more filters.

Large keys

Long texts as %Lexicon keys are not so handy, with their multiple explicit newlines, leading spaces, etc. Other projects using Maketext and Template Toolkit also confront this:

Act! ended up with keys cut at first newline:

msgid "This is your personal page."
msgstr ""
"This is your personal page.\n"
"From here you can manage everything regarding your participation\n"

Request Tracker indeed uses long keys, up to 400 characters on a single line. Not a problem with special PO editing tools, but their template files look accordingly, i.e. do not wrap paragraphs at all.

Site maintainers' skills

Before Locale::Maketext only single knowledge was required from custom template maintainers: Template Toolkit. Now they'll need to work with message catalogs.

Work Breakdown

  1. Generic Locale::Maketext capability (bug 469732)
    1. Specify and implement (English) %quant() and %numerate() calls
    2. maketext() support for Templates
    3. maketext() support for command line scripts
    4. Bugzilla::L10N setup and lexicon loader
  2. Build tools
    1. xgettext.pl: collect localizable strings from templates and code.
  3. Default template refit
    1. Script to hunt for localizable texts in templates (inspired by mkanat's attachment to bug 407752)
    2. Convert all templates to l() filter calls (bug 412161)
    3. Convert all Perl code loc() calls (bug 469734)
    4. Extract default (English) lexicon
    5. Translate English lexicon: expand tokens to full error messages, replace bug(s) with %quant([_1],bug) calls, and bug with $terms.bug.
    6. Test and debug English lexicon
    7. Clone and translate English lexicon to other languages
  4. checksetup.pl refit
    1. Preprocess .po files from .po.tmpl files to accomodate $terms changes (bug 469734)
  5. Documenting
  6. Test suite
    1. Detect localizable strings missing in lexicon (would fallback as auto entries anyway)
    2. Decommission 009bugwords.t

What I have missed?

References

  1. Web localization in Perl
  2. Localization and Perl: gettext breaks, Maketext fixes a.k.a. TPJ13
  3. Template Toolkit, look-and-feel and internationalization
  4. i18n with Template Toolkit
  5. Cpanel localization roadmap
  6. How to support a new language in Act