L10n:Pain points

From MozillaWiki
Jump to: navigation, search

Gandalf's approach

I'm describing the pitfalls of localizer from the very beginning of his work with first translation, until he's a fully featured maintainer of the localization.

Initial localization

  • need to know CVS
  • need to install CVS
  • need to read long instructions to get the trick that you can actually unzip the .jar file with WinZip on your windows machine
  • need to understand that source l10n structure and final l10n structure are very different
  • need to suffer possible effects of the previous ones, and reorganize the structure soon after initial idea of localization of ./chrome/en-US.jar file
  • need to be super-cautious about the formatting of the files. ANY change including often invisible BOM characters at the beginning of the file (added automatically by many editors) or any mistake that causes DTD file to be unparseable by Gecko is fatal
  • the results of such mistakes are very hard to discover early, you usually know about it after new build is created from tinderbox
  • or, you have to set up WHOLE Firefox build environment, which is horribly complicated on every OS.
  • if you even discover it, it's hardly debuggable. Usually localizer see yellow screen with some red letters that are pointless, since they don't show the whole message. If the mistake was made in .properties file, you're down. Until you have debug build and console you cannot find why firefox doesn't work for you, it hits in silence.
  • It's impossible to know the position of the entity on the screen
  • Mozilla localizers cannot use the well known simplifiers used by all other projects on earth allowing them to check, by a single click, where such word is used elsewhere in the app, to keep the consistency
  • same with ability to easily search through the messages
  • Mozilla I18n does not expect that many languages have declensions, multiply plural forms etc.
  • Localizer see "<ENTITY foo.x "Make $1%S go $2%S over $3%S">" and we wish him a pleasant nightmare
  • there's no way to manage the team work using any automated way. You have to physically select files, send them to localizers, gather the result, combine into structure and cross-fingers hoping there's no syntax bug, missing entity, etc that will cause everything to go down
  • On each and every level of localization you have absolutely no idea if you're doing the right thing
  • once you're ready, you still have to wait for someone to build Firefox with your locale to test it, or get it accepted to CVS and wait for tinderbox to provide you a build. If you fix a bug, you can confirm that only after waiting for a new build.

Working with localization

  • To get an idea of what has changed since your last localization round and find new entities you either have to learn CVS diff, or digg the website bonsai.mozilla.org. Each of those is good to get an IDEA that something has changed, but it's horrible to work with it.
  • Localizers have to develop a internal DIFF algorithm in their heads to transparently see every en-US diff and apply it on their locale by hand.
  • Did I already successfully screamed the message that literally everything is done by hand? Beside of compare-locales script.
  • There are two most common things I'm doing as a localizer.
    • Translate new string
      What can we offer? In the ideal world, the localizer should be notified in the nice way that there are new/untranslated strings, he should have an access to them with one click, localize those in the same UI and all other localizers in the team should instantly know that it's already done.
    • Check how my updated string looks in the build
      In the ideal world we should repackage the ab-CD.jar in his firefox and relaunch Firefox to let him see the change live.

Dwayne's approach

This is a description of my work in localisation using the Translate Toolkit and Pootle. I think the pain is considerably less then Gandalf's approach but still has some common pains on either end.

Starting

  • Lots of hoops to get bug reports to get access to CVS and to get access to actually do the localisation in that language. More bug reports to create components, etc. Luckily you only have to do this once. But unfortunately the knowledge of what to do is scattered across a wide field. Its seems easy to those that work in it every day but black to those coming in.
  • The task of working out what to do, language packs, jars, building, getting the actual files that you actually need to translate from CVS. It has got better with tools that can checkout only l10n work and that can build the structure in l10n/
  • Finding out which branch you should be on. What the tag is. How to work on multiple products. Black magic

Translating

OK this is where I think our process is way way easier.

  • Use l10n make to create an l10n/en-US
  • Use moz2po to create POT files from the files in l10n/en-US
  • Update your PO files using the new POT files using a Translation Memory to make sure that you get good reuse
  • Edit in your favourite PO editor allowing you a familiar interface for ALL your localsation. You have full access to LOCALIZATION NOTES, accelerator keys are mostly merged and certain config options are highlighted.
  • Do some QA with pofilter, this traps all common bugs that usually break a build
  • Use po2moz to create a set of PO files ready for committing to Mozilla CVS
  • Make a diff - check it and make sure that nothing terribly damaging is happening
  • Use a script that creates a language pack
  • Test the language pack. Iterate

Completing

  • Check that you've been working on the right stuff
  • Make a bug report with your diff
  • Watch it like a hawk as this might slip through the cracks
  • Get it approved
  • Checkin the difference
  • Check builds in Tinderbox (which I think you needed to ask to be enabled also)

Last minute

  • Discovering snippets, license approval and other pain at a very late stage in the whole process.

Heartache

  • Seeing your hard work miss a deadline and sit in limbo for years without a hope of exposure
  • Taking ages to go from beta to live (luckily now solved by exposing beta builds)
  • Starting it all over again :)

Love

  • But once its live its amazingly rewarding.
    • Firefox makes a real difference in people's lives
    • Seeing figures of downloads makes you realise the work is pulling in new people all of who pull in others and hopefully a few will join you in the localisation.

Gandalf comparison

Honestly I think we've eliminated 100% of the translation pain:

  • We don't worry at all about breaking langpacks through broken variables
  • Almost all accelerators are correctly placed
  • We have all context we need including LOCALIZATION NOTES
  • All Unicode related issues are fixed, no BOMs, etc
  • We know what was added, what changed and we can find and fix it quickly
  • Adding the Pootle dimension we make it very easy for occasional translators to submit fixes and to translate without any tools.
  • We translate almost all files in one tool and in one format

filip's approach (Frenchmozilla)

Team presentation

  • On the net since 1999
  • First localisation is M??
  • Team size: more than 10

Release

  • French Calendar
  • French Camino
  • French ChatZilla
  • French Firefox
  • French Lightning
  • French Mozilla
  • French MozMan
  • French Nvu
  • French SeaMonkey
  • French Sunbird
  • French Thunderbird
  • XULRunner
  • MDC
  • Bugzilla

Tools used

  • Site on sourceforge.net ( http://frenchmozilla.sf.net )
  • Wiki
  • Blog
  • CVS on sourceforge.net for Seamonkey, Mozilla, Nvu, ChatZilla
  • CVS on Mozilla.org for Sunbird/Lightning, Firefox, Thunderbird, Minimo
  • Bugzilla
  • Lxr - Mxr
  • Tinderbox
  • mailing lists (for cvs, path and discussion)
  • irc for direct contact
  • Text editor

Feedback

The localisation embedded in the mozilla cvs is a great improvement :

  • Nightly builds
  • Tinderboxes
  • Compare locales (.pl)
  • Auto release

But the tools are sometimes hard to use and to appropriate for the beginners.

Wishes and Improvements

  • Landing of Seamonkey’s localisation into CVS
  • Accesskeys
  • Better help on how to start a localisation (up to date)
  • CVS sandbox (a training place to learn cvs and to store localisation before going to official for new teams)
  • Help on the web is a bad idea
  • Usual rants about trademark policies overload on final stages before releases.
  • SUMO – AMO – Remora localisation
  • We need a tool to ease the localisation (important for the beginners)

Why help on the web is a bad idea

  • Tracking changes in help would be a lot harder for L10n teams if help is moved to the web. (cf discution on MDC)
  • Help in the french locale uses entities declared in the dtd (better quality). This will not work if ”help” is moved to the web

<!ENTITY % brandDTD SYSTEM "chrome://branding/locale/brand.dtd" > <dd>Par défaut, quand &brandShortName; <em>&startupHomePage.label;</em>. <em>&startupBlankPage.label;</em>.<br/><br/>

MDC (and future SUMO?) discussion

4 active members of the team work on MDC.

Prioritizing content : How do we choose on what to work?

  • Personal interest I like technology X, I find it cool (ie. canvas, css3 columns) People are not aware enough of problem Y (ie. standards, accessibility issues)
  • Popularity 200 most viewed pages in English: people need this information (could be more than 200, depending on the team size)

MDC – Quality

  • Follow some guidelines (translation guide from Sun), not too strictly
  • Proofreading is important. Overal quality grows over time, but everyone makes mistakes (tiredness, unfamiliar subject, etc.) In an ideal world: 1 translator, 1 editor, 1 proofreader. Often, 1 person has two or three of these roles (ok for short technical references, problematical for long articles/tutorials)

MDC The Big Issue

  • Keeping the existing content up to date while extending our number of localised articles
  • English ”Recent Changes” is not enough (explain why?)
  • Sudden changes of structures in some English wiki part

MDC – What we need

  • To know if some pages we translated has changed in a significant manner (minor change is an indicator, maybe not enough)
  • Possible solutions include automatic tagging (bot) ”This translation might be obsolete”
  • Prioritize according to page views, delta changed, time elapsed

MDC – Dependencies

  • Interlang bot should work fast (faster?)
  • Tools (bots?) should not alter the stats significantly (page views)

MDC – Nice to have

  • Subjects in need of attention (fast growing interest: new entries in the top 100, 500) → make an RSS feed of that?
  • From those subjects, which ones are not translated yet?