Intellego/Research: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 15: Line 15:


===How do you measure the output quality of a machine translation engine?===
===How do you measure the output quality of a machine translation engine?===
===What prominent machine translation engines are out their and what are they known for?===
===What prominent machine translation engines are out there and what are they known for?===
{| class="wikitable sortable" border="1"
{| class="wikitable sortable" border="1"
|-
|-
Line 81: Line 81:
|  
|  
|}
|}
===What prominent corpuses are currently available?===
===What prominent corpuses are currently available?===
{| class="wikitable sortable" border="1"
{| class="wikitable sortable" border="1"

Revision as of 20:12, 26 November 2013

The problems

  • One of the features where Chrome has beaten Firefox is providing users with automatic translation of web content using Google Translate. Google has spent a lot of time and incorporated some interesting strategies into building a complex, proprietary machine translation engine to handle this. The feature within Chrome not only allows users to call and retrieve machine translation output through the Google Translate engine, but Google Translate has an interface to allow users to make recommendations for improving the translation, thus allowing the engine to become more sophisticated and accurate.
  • Before Chrome, Google Translate had an open API, which allowed them to collect content for use in their engine, but also made the web a generally more multilingual place. Using this open API, any website could add a snippet of code and see their site translated on the fly. Over three years ago, Google closed this API and began charging for the service, resulting in many websites becoming monolingual once again. Closing Google Translate has left a massive gap in the web and nothing yet has been able to fill the need.
  • Many Mozilla l10n teams consist of only 1-2 people. While they would love to be able to provide coverage in their language for all of the support and websites used to market to and assist users with issues, they do not have the time to commit. User, thus, have a localized Firefox, but lack the troubleshooting support in their language.
  • More and more Mozillians are non-English speakers or do not have English writing skills. There have been efforts to provide language education for Mozillians, however, the opportunities are limited to a small percentage of Mozillians. These Mozillians are thus limited in their participation due to the significan language barrier.
  • Language support selection for machine translation projects are driven, in part, by ROI and availability of resources. This often results in minority languages, and even some majority languages (see Indic languages) being under-represented in the machine translation ecosystem. While ROI continues to be a primary motivator for incorporating support for these languages, they will remain under-represented and unsupported.
  • Data collected for machine translation corpuses is often done via web crawling and consuming data that users unknowingly offer to these engines either due to web crawling or due to agreeing to obscure terms and conditions of using that MT service. Open data collection for MT corpuses is either non-existent or an obscure practice.

Our proposed solutions

Research questions

How does machine translation work?

What are the benefits and drawbacks to each methodology?

What parts does a machine translation engine consist of?

How do you measure the output quality of a machine translation engine?

What prominent machine translation engines are out there and what are they known for?

Name Owner Method Open/Closed # of supported languages Noteworthy
Google Translate
Microsoft Translator
Babelfish
MosesMT
Other
Other
Other
Other

What prominent corpuses are currently available?

Name Owner Method Open/Closed # of languages Noteworthy
Google Translate
Microsoft Translator
Babelfish

What are the pros and cons of having a Mozilla MT engine?

What technology resources would be needed to build our own MT engine?

What human resources would be needed to build our own MT engine?

What partnership opportunities could be available for this project?

User stories

Firefox end-users

  • I want to automatically translate web sites into my native language in Firefox desktop.
  • I want to automatically translate web sites into my native language in Firefox for Android.
  • I want to automatically translate web sites into my native language in the Firefox OS browser.
  • I want to be able to give feedback and make corrections to machine translation output within these products.
  • My minority language has a very small presence online, but it's my native language and I want to see the web translated into that language.

Browser users in general

  • I want/need language tools in browser, but am currently forced to use Chrome/go without
  • I want language tools in my browser of choice

Web admins

  • I want an open API to an open MT engine that will allow my users to automatically translate the page's content into their native language with the press of a button.

Businesses

  • My product is popular in many countries, but I just don't have the resources to offer support in other languages. I want to better server and retain customers who don't speak my language.

Non-english speaking Mozillians

  • I want to be able to read emails sent to me in my native language.
  • I want to be able to send emails to other mozillians who don't speak my language, knowing that my message will be understood by anyone who reads it.
  • I want to be able to participate in Mozilla forum discussions in my native language.

Non-english speaking potential Mozillians

  • I want to support Mozilla but my English is not good enough (or I have none) to participate

Mozilla localizers

  • I want to translate support pages (or marketing campaigns, or other projects) for my localization of Firefox, but it requires a lot of time to translate. I want to be able to post-edit MT output in order to still provide language coverage without the massive time commitment.