Bugzilla:L10n:Maketext: Difference between revisions

Revision as of 04:54, 30 November 2008

The idea was introduced by mkanat (bug 407752). If implemented, will greatly reduce localizers' workload.

Design considerations

There are several decisions to be made before one start hacking templates and code:

Syntax for localizable texts in Perl and TT
Lexicon format
Directory structure

Perl syntax

translate()
maketext()
gettext()
loc()
x()
_()
__()

In Bugzilla Perl code we will be using

_()

Template Toolkit syntax

Formats supported by xgettext.pl are:

[% |l(args) %]string[% END %]
[% 'string' | l(args) %]
[% 'string' FILTER l(args) %]
[% l('string',args) %]
[% loc('string',args) %]

In fact, l and loc, | and FILTER are synonyms.

In Bugzilla templates we will be using

[% |l %]
Large boilerplate text, no lines changed, filter added as separate lines.
[% END %]

[% l("Short text",arg1,arg2) %]

See also #Default template look and feel below

Lexicon format

Available formats also vary:

simple .pm modules with explicit %Lexicon entries
classic .po files, more convenient to localizers community because of mature tools
database backend is in progress

IMHO we should stick to .po, see #Terms retirement below.

The directory structure

For .pm files there is no good place besides Bugzilla/L10N/LANG.pm

Bugzilla/L10N/LANG.po: Most obvious, used by many projects.

template/LANG/{default,custom}/*.po: Advantages: site and project level customizations possible. Drawbacks: need to iterate through multiple .po files to merge the lexicon; unneeded exposure to http server.

po/LANG.po or locale/LANG.po: Why not create a separate dir?

Another question is: how many .po files we need for a single instance? Is single file enough or do we split web, command line and email domains?

Features

Terms retirement

In object-oriented spirit of Maketext, application should not care about how to say something, just what to say. So default English templates may safely refer to 'bugs', and leave the rest to l10n logic. Example:

template/en/default/bug/create/create.html.tmpl:

[%# no need to PROCESS "global/variables.none.tmpl" %]
[%# no need to PROCESS "global/field-descs.none.tmpl" %]
...
[% |l %]Bugzilla – Enter Bug[% END %]

customized en.po:

msgid "Bugzilla – Enter Bug"
msgstr "Bugzilla@mysite – Submit a Service Request"

To ease such customizations (and to provide backwards compatibility) en.po would be preprocessed from en.po.tmpl:

[% PROCESS "global/variables.none.tmpl" %]
...
msgid "Bugzilla – Enter Bug"
msgstr "$terms.Bugzilla – Enter $terms.Bug"

This way global/variables.none.tmpl would be used only during checksetup.pl, to compile .po files

Backwards compatibility

Maketext support by itself does nothing to change templates. All existing templates, custom or default, English or localized, would still work. Template search path logic is not changed either: translated template/LANG/default would take precedence over maketext-converted template/en/default and LANG.po file.

Language negotiation

Best language is not so easy to glark anymore: suppose we have

template/en/custom -- custom set, refit with maketext
template/ru/custom -- old translated set
en.po -- English messages for template/en/custom
fr.po -- French messages
ja.po -- Japanese messages

and then we get a request for ja, ru, fr language preference. With current logic, it would serve Russian instead of Japanese.

Default templates refit

After English lexicon shakedown localizers may start the bulk of translation work. However, nearly all required texts already exist somewhere in templates. How do we automate this?

Message catalog with its msgid/msgtxt pairs is very similar to unified diff between template/en and template/LANG trees. One can merge both, based on line numbers of English templates, and copy other (LANG) side of a diff into msgtxt.

Is it practical?

Drawbacks and concerns

Default template look and feel

Frequent [% l() %] calls may obfuscate templates. Example: before (without numeric inflection logic)

[% nbugs %] $terms.bugs were found in [% nproducts %] products.

and after

[% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %]

Another option to consider is Locale::Maketext::Fuzzy:

[% |l %]
[% nbugs %] bug(s) were found in [% nproducts %] product(s).
[% END %]

And then in en.po:

msgid "0 bug(s) were found in 0 product(s).
msgstr "Zarro boogs found."

msgid "[_1] bug(s) were found in [_2] product(s)."
msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)."

Note exact match for a special case.

Site maintainers' skills

Before Locale::Maketext only single knowledge was required from custom template maintainers: Template Toolkit. Now they'll need to work with message catalogs.

Work Breakdown

Generic Locale::Maketext capability
1. Specify and implement (English) %quant() and %numerate() calls
2. maketext() support for Templates
3. maketext() support for command line scripts
4. Bugzilla::L10N setup and lexicon loader
Build tools
1. xgettext.pl: collect localizable strings from templates and code.
Default template refit
1. Script to hunt for localizable texts in templates (inspired by mkanat's attachment to bug 407752)
2. Convert all templates to l() filter calls
3. Extract default (English) lexicon
4. Translate English lexicon: expand tokens to full error messages, replace bug(s) with %quant([_1],bug) calls, and bug with $terms.bug.
5. Test and debug English lexicon
6. Clone and translate English lexicon to other languages
checksetup.pl refit
1. Preprocess .po files from .po.tmpl files to accomodate $terms changes
Documenting
Test suite
1. Detect localizable strings missing in lexicon (would fallback as auto entries anyway)
2. Decommission 009bugwords.t

What I have missed?ReferencesWeb localization in Perl Localization and Perl: gettext breaks, Maketext fixes a.k.a. TPJ13 Template Toolkit, look-and-feel and internationalization i18n with Template Toolkit Cpanel localization roadmap How to support a new language in Act

@@ Line 1: / Line 1: @@
-The idea was introduced by mkanat ({{Bug|407752}}).  If implemented, will greatly reduce localizers' workload.  However, there are some implementation obstacles.
+The idea was introduced by mkanat ({{Bug|407752}}).  If implemented, will greatly reduce localizers' workload.
 == Design considerations ==
@@ Line 8: / Line 8: @@
 * Lexicon format
 * Directory structure
-* Usage of auto lexicon entries and exceptions
-* Backwards compatibility
-* '''$terms.bug''' retirement
 === Perl syntax ===
@@ Line 40: / Line 37: @@
 In fact, <tt>l</tt> and <tt>loc</tt>, <tt>|</tt> and <tt>FILTER</tt> are synonyms.
-First syntax has obvious advantages for large boilerplate texts which we do not want to chop in perverse ways.  However, it is too verbose for short strings and unintuitive for parametric calls.
+In Bugzilla templates we will be using
+ <nowiki>[% |l %]</nowiki>
+ Large boilerplate text, no lines changed, filter added as separate lines.
+ <nowiki>[% END %]</nowiki>
+  <nowiki>[% l("Short text",arg1,arg2) %]</nowiki>
+See also [[#Default template look and feel]] below
 === Lexicon format ===
 Available formats also vary:
 * simple '''.pm''' modules with explicit <tt>%Lexicon</tt> entries
@@ Line 61: / Line 66: @@
 ; <tt>po/<i>LANG</i>.po</tt> or <tt>locale/<i>LANG</i>.po</tt>: Why not create a separate dir?
+Another question is: how many '''.po''' files we need for a single instance?  Is single file enough or do we split ''web'', ''command line'' and ''email'' domains?
+== Features ==
 === Terms retirement ===
@@ Line 86: / Line 95: @@
 This way <tt>global/variables.none.tmpl</tt> would be used only during '''checksetup.pl''', to compile '''.po''' files
+=== Backwards compatibility ===
+Maketext support by itself does nothing to change templates.  All existing templates, custom or default, English or localized, would still work.  Template search path logic is not changed either: translated <tt>template/''LANG''/default</tt> would take precedence over maketext-converted <tt>template/en/default</tt> and <tt>''LANG''.po</tt> file.
+=== Language negotiation ===
+Best language is not so easy to glark anymore: suppose we have
+# <tt>template/en/custom</tt> -- custom set, refit with maketext
+# <tt>template/ru/custom</tt> -- old translated set
+# <tt>en.po</tt> -- English messages for <tt>template/en/custom</tt>
+# <tt>fr.po</tt> -- French messages
+# <tt>ja.po</tt> -- Japanese messages
+and then we get a request for '''ja, ru, fr''' language preference.  With current logic, it would serve Russian instead of Japanese.
+=== Default templates refit ===
+After English lexicon shakedown localizers may start the bulk of translation work.  However, nearly all required texts already exist somewhere in templates. How do we automate this?
+Message catalog with its ''msgid/msgtxt'' pairs is very similar to unified diff between <tt>template/en</tt> and <tt>template/''LANG''</tt> trees.  One can merge both, based on line numbers of English templates, and copy other (''LANG'') side of a diff into ''msgtxt''.
+Is it practical?
+== Drawbacks and concerns ==
+=== Default template look and feel ===
+Frequent <tt>[% l() %]</tt> calls may obfuscate templates.  Example: before (without numeric inflection logic)
+ [% nbugs %] $terms.bugs were found in [% nproducts %] products.
+and after
+ [% l("[_1] bug(s) were found in [_2] products.", nbugs, nproducts) %]
+Another option to consider is '''Locale::Maketext::Fuzzy''':
+ [% |l %]
+ [% nbugs %] bug(s) were found in [% nproducts %] product(s).
+ [% END %]
+And then in '''en.po''':
+ msgid "0 bug(s) were found in 0 product(s).
+ msgstr "Zarro boogs found."
+ msgid "[_1] bug(s) were found in [_2] product(s)."
+ msgstr "%quant([_1],$terms.bug was,$terms.bugs were) found in %quant([_2],product)."
+Note exact match for a special case.
+=== Site maintainers' skills ===
+Before '''Locale::Maketext''' only single knowledge was required from custom template maintainers: Template Toolkit.  Now they'll need to work with message catalogs.
 == Work Breakdown ==
-# Generic '''Locale::Maketext''' capability
+# Generic <tt>Locale::Maketext</tt> capability
-## Specify and implement(en) '''%quant()''' and '''%numerate()''' calls
+## Specify and implement (English) <tt>%quant()</tt> and <tt>%numerate()</tt> calls
-## '''maketext()''' support for Templates
+## <tt>maketext()</tt> support for Templates
-## '''maketext()''' support for command line scripts
+## <tt>maketext()</tt> support for command line scripts
-## '''Bugzilla::L10N''' setup and lexicon loader
+## <tt>Bugzilla::L10N</tt> setup and lexicon loader
 # Build tools
-## '''xgettext.pl''': collect localizable strings from templates and code.
+## <tt>xgettext.pl</tt>: collect localizable strings from templates and code.
-# Default template transition
+# Default template refit
 ## Script to hunt for localizable texts in templates (inspired by mkanat's attachment to {{bug|407752}})
 ## Convert all templates to '''l()''' filter calls
-## Extract and test English lexicon
+## Extract default (English) lexicon
-# '''checksetup.pl''' refit
+## Translate English lexicon: expand tokens to full error messages, replace ''bug(s)'' with <tt>%quant([_1],bug)</tt> calls, and ''bug'' with <tt>$terms.bug</tt>.
-## Preprocess '''.po''' files from '''
+## Test and debug English lexicon
+## Clone and translate English lexicon to other languages
+# <tt>checksetup.pl</tt> refit
+## Preprocess '''.po''' files from '''.po.tmpl''' files to accomodate '''$terms''' changes
 # Documenting
 # Test suite
-## Missed localizable strings
+## Detect localizable strings missing in lexicon (would fallback as ''auto'' entries anyway)
+## Decommission <tt>009bugwords.t<tt>
+What I have missed?
 == References ==

Bugzilla:L10n:Maketext: Difference between revisions

Revision as of 04:54, 30 November 2008

Contents

Design considerations

Perl syntax

Template Toolkit syntax

Lexicon format

The directory structure

Features

Terms retirement

Backwards compatibility

Language negotiation

Default templates refit

Drawbacks and concerns

Default template look and feel

Site maintainers' skills

Work Breakdown

References

Navigation menu

Bugzilla:L10n:Maketext: Difference between revisions

Revision as of 04:54, 30 November 2008

Design considerations

Perl syntax

Template Toolkit syntax

Lexicon format

The directory structure

Features

Terms retirement

Backwards compatibility

Language negotiation

Default templates refit

Drawbacks and concerns

Default template look and feel

Site maintainers' skills

Work Breakdown

References

Navigation menu

Search