Bugzilla:L10n:Maketext: Difference between revisions
(Work breakdown first version) |
(Major rewrite) |
||
| Line 1: | Line 1: | ||
The idea was introduced by mkanat ({{Bug|407752}}). If implemented, will greatly reduce localizers' workload | The idea was introduced by mkanat ({{Bug|407752}}). If implemented, will greatly reduce localizers' workload. | ||
== Design considerations == | == Design considerations == | ||
| Line 8: | Line 8: | ||
* Lexicon format | * Lexicon format | ||
* Directory structure | * Directory structure | ||
=== Perl syntax === | === Perl syntax === | ||
| Line 40: | Line 37: | ||
In fact, <tt>l</tt> and <tt>loc</tt>, <tt>|</tt> and <tt>FILTER</tt> are synonyms. | In fact, <tt>l</tt> and <tt>loc</tt>, <tt>|</tt> and <tt>FILTER</tt> are synonyms. | ||
In Bugzilla templates we will be using | |||
<nowiki>[% |l %]</nowiki> | |||
Large boilerplate text, no lines changed, filter added as separate lines. | |||
<nowiki>[% END %]</nowiki> | |||
<nowiki>[% l("Short text",arg1,arg2) %]</nowiki> | |||
See also [[#Default template look and feel]] below | |||
=== Lexicon format === | === Lexicon format === | ||
Available formats also vary: | Available formats also vary: | ||
* simple '''.pm''' modules with explicit <tt>%Lexicon</tt> entries | * simple '''.pm''' modules with explicit <tt>%Lexicon</tt> entries | ||
| Line 61: | Line 66: | ||
; <tt>po/<i>LANG</i>.po</tt> or <tt>locale/<i>LANG</i>.po</tt>: Why not create a separate dir? | ; <tt>po/<i>LANG</i>.po</tt> or <tt>locale/<i>LANG</i>.po</tt>: Why not create a separate dir? | ||
Another question is: how many '''.po''' files we need for a single instance? Is single file enough or do we split ''web'', ''command line'' and ''email'' domains? | |||
== Features == | |||
=== Terms retirement === | === Terms retirement === | ||
| Line 86: | Line 95: | ||
This way <tt>global/variables.none.tmpl</tt> would be used only during '''checksetup.pl''', to compile '''.po''' files | This way <tt>global/variables.none.tmpl</tt> would be used only during '''checksetup.pl''', to compile '''.po''' files | ||
=== Backwards compatibility === | |||
Maketext support by itself does nothing to change templates. All existing templates, custom or default, English or localized, would still work. Template search path logic is not changed either: translated <tt>template/''LANG''/default</tt> would take precedence over maketext-converted <tt>template/en/default</tt> and <tt>''LANG''.po</tt> file. | |||
=== Language negotiation === | |||
Best language is not so easy to glark anymore: suppose we have | |||
# <tt>template/en/custom</tt> -- custom set, refit with maketext | |||
# <tt>template/ru/custom</tt> -- old translated set | |||
# <tt>en.po</tt> -- English messages for <tt>template/en/custom</tt> | |||
# <tt>fr.po</tt> -- French messages | |||
# <tt>ja.po</tt> -- Japanese messages | |||
and then we get a request for '''ja, ru, fr''' language preference. With current logic, it would serve Russian instead of Japanese. | |||
=== Default templates refit === | |||
After English lexicon shakedown localizers may start the bulk of translation work. However, nearly all required texts already exist somewhere in templates. How do we automate this? | |||
Message catalog with its ''msgid/msgtxt'' pairs is very similar to unified diff between <tt>template/en</tt> and <tt>template/''LANG''</tt> trees. One can merge both, based on line numbers of English templates, and copy other (''LANG'') side of a diff into ''msgtxt''. | |||
Is it practical? | |||
== Drawbacks and concerns == | |||
=== Default template look and feel === | |||
Frequent <tt>[% l() %]</tt> calls may obfuscate templates. Example: before (without numeric inflection logic) | |||
[% nbugs %] $terms.bugs were found in [% nproducts %] products. | |||
and after | |||
[% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %] | |||
Another option to consider is '''Locale::Maketext::Fuzzy''': | |||
[% |l %] | |||
[% nbugs %] bug(s) were found in [% nproducts %] product(s). | |||
[% END %] | |||
And then in '''en.po''': | |||
msgid "0 bug(s) were found in 0 product(s). | |||
msgstr "Zarro boogs found." | |||
msgid "[_1] bug(s) were found in [_2] product(s)." | |||
msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)." | |||
Note exact match for a special case. | |||
=== Site maintainers' skills === | |||
Before '''Locale::Maketext''' only single knowledge was required from custom template maintainers: Template Toolkit. Now they'll need to work with message catalogs. | |||
== Work Breakdown == | == Work Breakdown == | ||
# Generic | # Generic <tt>Locale::Maketext</tt> capability | ||
## Specify and implement( | ## Specify and implement (English) <tt>%quant()</tt> and <tt>%numerate()</tt> calls | ||
## | ## <tt>maketext()</tt> support for Templates | ||
## | ## <tt>maketext()</tt> support for command line scripts | ||
## | ## <tt>Bugzilla::L10N</tt> setup and lexicon loader | ||
# Build tools | # Build tools | ||
## | ## <tt>xgettext.pl</tt>: collect localizable strings from templates and code. | ||
# Default template | # Default template refit | ||
## Script to hunt for localizable texts in templates (inspired by mkanat's attachment to {{bug|407752}}) | ## Script to hunt for localizable texts in templates (inspired by mkanat's attachment to {{bug|407752}}) | ||
## Convert all templates to '''l()''' filter calls | ## Convert all templates to '''l()''' filter calls | ||
## Extract | ## Extract default (English) lexicon | ||
# '''checksetup.pl | ## Translate English lexicon: expand tokens to full error messages, replace ''bug(s)'' with <tt>%quant([_1],bug)</tt> calls, and ''bug'' with <tt>$terms.bug</tt>. | ||
## Preprocess '''.po''' files from ''' | ## Test and debug English lexicon | ||
## Clone and translate English lexicon to other languages | |||
# <tt>checksetup.pl</tt> refit | |||
## Preprocess '''.po''' files from '''.po.tmpl''' files to accomodate '''$terms''' changes | |||
# Documenting | # Documenting | ||
# Test suite | # Test suite | ||
## | ## Detect localizable strings missing in lexicon (would fallback as ''auto'' entries anyway) | ||
## Decommission <tt>009bugwords.t<tt> | |||
What I have missed? | |||
== References == | == References == | ||
Revision as of 04:54, 30 November 2008
The idea was introduced by mkanat (bug 407752). If implemented, will greatly reduce localizers' workload.
Design considerations
There are several decisions to be made before one start hacking templates and code:
- Syntax for localizable texts in Perl and TT
- Lexicon format
- Directory structure
Perl syntax
xgettext.pl supports:
translate() maketext() gettext() loc() x() _() __()
In Bugzilla Perl code we will be using
_()
Template Toolkit syntax
Formats supported by xgettext.pl are:
[% |l(args) %]string[% END %]
[% 'string' | l(args) %]
[% 'string' FILTER l(args) %]
[% l('string',args) %]
[% loc('string',args) %]
In fact, l and loc, | and FILTER are synonyms.
In Bugzilla templates we will be using
[% |l %] Large boilerplate text, no lines changed, filter added as separate lines. [% END %]
[% l("Short text",arg1,arg2) %]
See also #Default template look and feel below
Lexicon format
Available formats also vary:
- simple .pm modules with explicit %Lexicon entries
- classic .po files, more convenient to localizers community because of mature tools
- database backend is in progress
IMHO we should stick to .po, see #Terms retirement below.
The directory structure
For .pm files there is no good place besides Bugzilla/L10N/LANG.pm
- Bugzilla/L10N/LANG.po
- Most obvious, used by many projects.
- template/LANG/{default,custom}/*.po
- Advantages: site and project level customizations possible. Drawbacks: need to iterate through multiple .po files to merge the lexicon; unneeded exposure to http server.
- po/LANG.po or locale/LANG.po
- Why not create a separate dir?
Another question is: how many .po files we need for a single instance? Is single file enough or do we split web, command line and email domains?
Features
Terms retirement
In object-oriented spirit of Maketext, application should not care about how to say something, just what to say. So default English templates may safely refer to 'bugs', and leave the rest to l10n logic. Example:
template/en/default/bug/create/create.html.tmpl:
[%# no need to PROCESS "global/variables.none.tmpl" %] [%# no need to PROCESS "global/field-descs.none.tmpl" %] ... [% |l %]Bugzilla – Enter Bug[% END %]
customized en.po:
msgid "Bugzilla – Enter Bug" msgstr "Bugzilla@mysite – Submit a Service Request"
To ease such customizations (and to provide backwards compatibility) en.po would be preprocessed from en.po.tmpl:
[% PROCESS "global/variables.none.tmpl" %] ... msgid "Bugzilla – Enter Bug" msgstr "$terms.Bugzilla – Enter $terms.Bug"
This way global/variables.none.tmpl would be used only during checksetup.pl, to compile .po files
Backwards compatibility
Maketext support by itself does nothing to change templates. All existing templates, custom or default, English or localized, would still work. Template search path logic is not changed either: translated template/LANG/default would take precedence over maketext-converted template/en/default and LANG.po file.
Language negotiation
Best language is not so easy to glark anymore: suppose we have
- template/en/custom -- custom set, refit with maketext
- template/ru/custom -- old translated set
- en.po -- English messages for template/en/custom
- fr.po -- French messages
- ja.po -- Japanese messages
and then we get a request for ja, ru, fr language preference. With current logic, it would serve Russian instead of Japanese.
Default templates refit
After English lexicon shakedown localizers may start the bulk of translation work. However, nearly all required texts already exist somewhere in templates. How do we automate this?
Message catalog with its msgid/msgtxt pairs is very similar to unified diff between template/en and template/LANG trees. One can merge both, based on line numbers of English templates, and copy other (LANG) side of a diff into msgtxt.
Is it practical?
Drawbacks and concerns
Default template look and feel
Frequent [% l() %] calls may obfuscate templates. Example: before (without numeric inflection logic)
[% nbugs %] $terms.bugs were found in [% nproducts %] products.
and after
[% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %]
Another option to consider is Locale::Maketext::Fuzzy:
[% |l %] [% nbugs %] bug(s) were found in [% nproducts %] product(s). [% END %]
And then in en.po:
msgid "0 bug(s) were found in 0 product(s). msgstr "Zarro boogs found."
msgid "[_1] bug(s) were found in [_2] product(s)." msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)."
Note exact match for a special case.
Site maintainers' skills
Before Locale::Maketext only single knowledge was required from custom template maintainers: Template Toolkit. Now they'll need to work with message catalogs.
Work Breakdown
- Generic Locale::Maketext capability
- Specify and implement (English) %quant() and %numerate() calls
- maketext() support for Templates
- maketext() support for command line scripts
- Bugzilla::L10N setup and lexicon loader
- Build tools
- xgettext.pl: collect localizable strings from templates and code.
- Default template refit
- Script to hunt for localizable texts in templates (inspired by mkanat's attachment to bug 407752)
- Convert all templates to l() filter calls
- Extract default (English) lexicon
- Translate English lexicon: expand tokens to full error messages, replace bug(s) with %quant([_1],bug) calls, and bug with $terms.bug.
- Test and debug English lexicon
- Clone and translate English lexicon to other languages
- checksetup.pl refit
- Preprocess .po files from .po.tmpl files to accomodate $terms changes
- Documenting
- Test suite
- Detect localizable strings missing in lexicon (would fallback as auto entries anyway)
- Decommission 009bugwords.t
What I have missed?