Bugzilla:L10n:Maketext: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Work breakdown first version)
(Major rewrite)
Line 1: Line 1:
The idea was introduced by mkanat ({{Bug|407752}}).  If implemented, will greatly reduce localizers' workload.  However, there are some implementation obstacles.
The idea was introduced by mkanat ({{Bug|407752}}).  If implemented, will greatly reduce localizers' workload.


== Design considerations ==
== Design considerations ==
Line 8: Line 8:
* Lexicon format
* Lexicon format
* Directory structure
* Directory structure
* Usage of auto lexicon entries and exceptions
* Backwards compatibility
* '''$terms.bug''' retirement


=== Perl syntax ===
=== Perl syntax ===
Line 40: Line 37:
In fact, <tt>l</tt> and <tt>loc</tt>, <tt>|</tt> and <tt>FILTER</tt> are synonyms.
In fact, <tt>l</tt> and <tt>loc</tt>, <tt>|</tt> and <tt>FILTER</tt> are synonyms.


First syntax has obvious advantages for large boilerplate texts which we do not want to chop in perverse waysHowever, it is too verbose for short strings and unintuitive for parametric calls.
In Bugzilla templates we will be using
 
<nowiki>[% |l %]</nowiki>
Large boilerplate text, no lines changed, filter added as separate lines.
<nowiki>[% END %]</nowiki>
 
  <nowiki>[% l("Short text",arg1,arg2) %]</nowiki>
 
See also [[#Default template look and feel]] below


=== Lexicon format ===
=== Lexicon format ===


Available formats also vary:  
Available formats also vary:


* simple '''.pm''' modules with explicit <tt>%Lexicon</tt> entries  
* simple '''.pm''' modules with explicit <tt>%Lexicon</tt> entries  
Line 61: Line 66:


; <tt>po/<i>LANG</i>.po</tt> or <tt>locale/<i>LANG</i>.po</tt>: Why not create a separate dir?
; <tt>po/<i>LANG</i>.po</tt> or <tt>locale/<i>LANG</i>.po</tt>: Why not create a separate dir?
Another question is: how many '''.po''' files we need for a single instance?  Is single file enough or do we split ''web'', ''command line'' and ''email'' domains?
== Features ==


=== Terms retirement ===
=== Terms retirement ===
Line 86: Line 95:


This way <tt>global/variables.none.tmpl</tt> would be used only during '''checksetup.pl''', to compile '''.po''' files
This way <tt>global/variables.none.tmpl</tt> would be used only during '''checksetup.pl''', to compile '''.po''' files
=== Backwards compatibility ===
Maketext support by itself does nothing to change templates.  All existing templates, custom or default, English or localized, would still work.  Template search path logic is not changed either: translated <tt>template/''LANG''/default</tt> would take precedence over maketext-converted <tt>template/en/default</tt> and <tt>''LANG''.po</tt> file.
=== Language negotiation ===
Best language is not so easy to glark anymore: suppose we have
# <tt>template/en/custom</tt> -- custom set, refit with maketext
# <tt>template/ru/custom</tt> -- old translated set
# <tt>en.po</tt> -- English messages for <tt>template/en/custom</tt>
# <tt>fr.po</tt> -- French messages
# <tt>ja.po</tt> -- Japanese messages
and then we get a request for '''ja, ru, fr''' language preference.  With current logic, it would serve Russian instead of Japanese.
=== Default templates refit ===
After English lexicon shakedown localizers may start the bulk of translation work.  However, nearly all required texts already exist somewhere in templates. How do we automate this?
Message catalog with its ''msgid/msgtxt'' pairs is very similar to unified diff between <tt>template/en</tt> and <tt>template/''LANG''</tt> trees.  One can merge both, based on line numbers of English templates, and copy other (''LANG'') side of a diff into ''msgtxt''.
Is it practical?
== Drawbacks and concerns ==
=== Default template look and feel ===
Frequent <tt>[% l() %]</tt> calls may obfuscate templates.  Example: before (without numeric inflection logic)
[% nbugs %] $terms.bugs were found in [% nproducts %] products.
and after
[% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %]
Another option to consider is '''Locale::Maketext::Fuzzy''':
[% |l %]
[% nbugs %] bug(s) were found in [% nproducts %] product(s).
[% END %]
And then in '''en.po''':
msgid "0 bug(s) were found in 0 product(s).
msgstr "Zarro boogs found."
msgid "[_1] bug(s) were found in [_2] product(s)."
msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)."
Note exact match for a special case.
=== Site maintainers' skills ===
Before '''Locale::Maketext''' only single knowledge was required from custom template maintainers: Template Toolkit.  Now they'll need to work with message catalogs.


== Work Breakdown ==
== Work Breakdown ==


# Generic '''Locale::Maketext''' capability
# Generic <tt>Locale::Maketext</tt> capability
## Specify and implement(en) '''%quant()''' and '''%numerate()''' calls
## Specify and implement (English) <tt>%quant()</tt> and <tt>%numerate()</tt> calls
## '''maketext()''' support for Templates
## <tt>maketext()</tt> support for Templates
## '''maketext()''' support for command line scripts
## <tt>maketext()</tt> support for command line scripts
## '''Bugzilla::L10N''' setup and lexicon loader
## <tt>Bugzilla::L10N</tt> setup and lexicon loader
# Build tools
# Build tools
## '''xgettext.pl''': collect localizable strings from templates and code.
## <tt>xgettext.pl</tt>: collect localizable strings from templates and code.
# Default template transition
# Default template refit
## Script to hunt for localizable texts in templates (inspired by mkanat's attachment to {{bug|407752}})
## Script to hunt for localizable texts in templates (inspired by mkanat's attachment to {{bug|407752}})
## Convert all templates to '''l()''' filter calls
## Convert all templates to '''l()''' filter calls
## Extract and test English lexicon
## Extract default (English) lexicon
# '''checksetup.pl''' refit
## Translate English lexicon: expand tokens to full error messages, replace ''bug(s)'' with <tt>%quant([_1],bug)</tt> calls, and ''bug'' with <tt>$terms.bug</tt>.
## Preprocess '''.po''' files from '''
## Test and debug English lexicon
## Clone and translate English lexicon to other languages
# <tt>checksetup.pl</tt> refit
## Preprocess '''.po''' files from '''.po.tmpl''' files to accomodate '''$terms''' changes
# Documenting
# Documenting
# Test suite
# Test suite
## Missed localizable strings
## Detect localizable strings missing in lexicon (would fallback as ''auto'' entries anyway)
## Decommission <tt>009bugwords.t<tt>
 
What I have missed?


== References ==
== References ==

Revision as of 04:54, 30 November 2008

The idea was introduced by mkanat (bug 407752). If implemented, will greatly reduce localizers' workload.

Design considerations

There are several decisions to be made before one start hacking templates and code:

  • Syntax for localizable texts in Perl and TT
  • Lexicon format
  • Directory structure

Perl syntax

xgettext.pl supports:

translate()
maketext()
gettext()
loc()
x()
_()
__()

In Bugzilla Perl code we will be using

_()

Template Toolkit syntax

Formats supported by xgettext.pl are:

[% |l(args) %]string[% END %]
[% 'string' | l(args) %]
[% 'string' FILTER l(args) %]
[% l('string',args) %]
[% loc('string',args) %]

In fact, l and loc, | and FILTER are synonyms.

In Bugzilla templates we will be using

[% |l %]
Large boilerplate text, no lines changed, filter added as separate lines.
[% END %]
[% l("Short text",arg1,arg2) %]

See also #Default template look and feel below

Lexicon format

Available formats also vary:

  • simple .pm modules with explicit %Lexicon entries
  • classic .po files, more convenient to localizers community because of mature tools
  • database backend is in progress

IMHO we should stick to .po, see #Terms retirement below.

The directory structure

For .pm files there is no good place besides Bugzilla/L10N/LANG.pm

Bugzilla/L10N/LANG.po
Most obvious, used by many projects.
template/LANG/{default,custom}/*.po
Advantages: site and project level customizations possible. Drawbacks: need to iterate through multiple .po files to merge the lexicon; unneeded exposure to http server.
po/LANG.po or locale/LANG.po
Why not create a separate dir?

Another question is: how many .po files we need for a single instance? Is single file enough or do we split web, command line and email domains?

Features

Terms retirement

In object-oriented spirit of Maketext, application should not care about how to say something, just what to say. So default English templates may safely refer to 'bugs', and leave the rest to l10n logic. Example:

template/en/default/bug/create/create.html.tmpl:

[%# no need to PROCESS "global/variables.none.tmpl" %]
[%# no need to PROCESS "global/field-descs.none.tmpl" %]
...
[% |l %]Bugzilla – Enter Bug[% END %]

customized en.po:

msgid "Bugzilla – Enter Bug"
msgstr "Bugzilla@mysite – Submit a Service Request"

To ease such customizations (and to provide backwards compatibility) en.po would be preprocessed from en.po.tmpl:

[% PROCESS "global/variables.none.tmpl" %]
...
msgid "Bugzilla – Enter Bug"
msgstr "$terms.Bugzilla – Enter $terms.Bug"

This way global/variables.none.tmpl would be used only during checksetup.pl, to compile .po files

Backwards compatibility

Maketext support by itself does nothing to change templates. All existing templates, custom or default, English or localized, would still work. Template search path logic is not changed either: translated template/LANG/default would take precedence over maketext-converted template/en/default and LANG.po file.

Language negotiation

Best language is not so easy to glark anymore: suppose we have

  1. template/en/custom -- custom set, refit with maketext
  2. template/ru/custom -- old translated set
  3. en.po -- English messages for template/en/custom
  4. fr.po -- French messages
  5. ja.po -- Japanese messages

and then we get a request for ja, ru, fr language preference. With current logic, it would serve Russian instead of Japanese.

Default templates refit

After English lexicon shakedown localizers may start the bulk of translation work. However, nearly all required texts already exist somewhere in templates. How do we automate this?

Message catalog with its msgid/msgtxt pairs is very similar to unified diff between template/en and template/LANG trees. One can merge both, based on line numbers of English templates, and copy other (LANG) side of a diff into msgtxt.

Is it practical?

Drawbacks and concerns

Default template look and feel

Frequent [% l() %] calls may obfuscate templates. Example: before (without numeric inflection logic)

[% nbugs %] $terms.bugs were found in [% nproducts %] products.

and after

[% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %]

Another option to consider is Locale::Maketext::Fuzzy:

[% |l %]
[% nbugs %] bug(s) were found in [% nproducts %] product(s).
[% END %]

And then in en.po:

msgid "0 bug(s) were found in 0 product(s).
msgstr "Zarro boogs found."
msgid "[_1] bug(s) were found in [_2] product(s)."
msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)."

Note exact match for a special case.

Site maintainers' skills

Before Locale::Maketext only single knowledge was required from custom template maintainers: Template Toolkit. Now they'll need to work with message catalogs.

Work Breakdown

  1. Generic Locale::Maketext capability
    1. Specify and implement (English) %quant() and %numerate() calls
    2. maketext() support for Templates
    3. maketext() support for command line scripts
    4. Bugzilla::L10N setup and lexicon loader
  2. Build tools
    1. xgettext.pl: collect localizable strings from templates and code.
  3. Default template refit
    1. Script to hunt for localizable texts in templates (inspired by mkanat's attachment to bug 407752)
    2. Convert all templates to l() filter calls
    3. Extract default (English) lexicon
    4. Translate English lexicon: expand tokens to full error messages, replace bug(s) with %quant([_1],bug) calls, and bug with $terms.bug.
    5. Test and debug English lexicon
    6. Clone and translate English lexicon to other languages
  4. checksetup.pl refit
    1. Preprocess .po files from .po.tmpl files to accomodate $terms changes
  5. Documenting
  6. Test suite
    1. Detect localizable strings missing in lexicon (would fallback as auto entries anyway)
    2. Decommission 009bugwords.t

What I have missed?

References

  1. Web localization in Perl
  2. Localization and Perl: gettext breaks, Maketext fixes a.k.a. TPJ13
  3. Template Toolkit, look-and-feel and internationalization
  4. i18n with Template Toolkit
  5. Cpanel localization roadmap
  6. How to support a new language in Act