Account confirmers, canmove, Confirmed users
2,357
edits
(→The problems: Added bullet #3) |
No edit summary |
||
| Line 8: | Line 8: | ||
*Data collected for machine translation corpuses is often done via web crawling and consuming data that users unknowingly offer to these engines either due to web crawling or due to agreeing to obscure terms and conditions of using that MT service. Open data collection for MT corpuses is either non-existent or an obscure practice. | *Data collected for machine translation corpuses is often done via web crawling and consuming data that users unknowingly offer to these engines either due to web crawling or due to agreeing to obscure terms and conditions of using that MT service. Open data collection for MT corpuses is either non-existent or an obscure practice. | ||
=Research questions= | =Research questions= | ||
===How does machine translation work?=== | ===How does machine translation work?=== | ||
There are | There are four general approaches to Machine Translation. Most of the early work, before massive corpora, was done with Rule-based machine translation ( [http://en.wikipedia.org/wiki/Rule-based_machine_translation http://en.wikipedia.org/wiki/Rule-based_machine_translation] ). However, most of the current work being done is with Statistical Machine Translation ( [http://en.wikipedia.org/wiki/Statistical_machine_translation http://en.wikipedia.org/wiki/Statistical_machine_translation] ). A brief description of each is available below. | ||
====Rule-Based Machine Translation==== | ====Rule-Based Machine Translation==== | ||
| Line 18: | Line 17: | ||
====Statistical Machine Translation==== | ====Statistical Machine Translation==== | ||
Uses statistical information to choose the "best" translation from the possible translations of a text. As far as I know, all work with statistical machine translation requires a bilingual corpus for calculating the necessary probabilities. | Uses statistical information to choose the "best" translation from the possible translations of a text. As far as I know, all work with statistical machine translation requires a bilingual corpus for calculating the necessary probabilities. | ||
====Example-based Machine Translation==== | |||
Uses cases and analogies, along with a parallel corpus, to determine the best translation. Somewhat similar to Rule-Based ([http://en.wikipedia.org/wiki/Example-based_machine_translation http://en.wikipedia.org/wiki/Example-based_machine_translation]). | |||
====Hybrid Machine Translation==== | |||
A combination of the previously mentioned approaches. | |||
===What are the benefits and drawbacks to each methodology?=== | ===What are the benefits and drawbacks to each methodology?=== | ||
===How do you measure the output quality of a machine translation engine?=== | ===How do you measure the output quality of a machine translation engine?=== | ||
;Automated evaluation | |||
* BLEU Score - http://en.wikipedia.org/wiki/BLEU | * BLEU Score - http://en.wikipedia.org/wiki/BLEU | ||
** Compares MT output against reference translations consisting of professional human translation, assigning a score (based on n-gram precision) to determine how close to the human translation the MT output arrives. | |||
* NIST - http://en.wikipedia.org/wiki/NIST_(metric) | |||
** Similar to BLEU, however, not all correct n-grams are created equal. Correct n-grams are weighted according to rarity of occurrence. | |||
* METEOR - http://en.wikipedia.org/wiki/METEOR | |||
** Evaluation based on unigram recall consistency, rather than precision (as BLEU and NIST do). | |||
* LEPOR - http://en.wikipedia.org/wiki/LEPOR | |||
** New MT evaluation model that is based on evaluating precision, recall, sentence-length and n-gram based word order. | |||
===What prominent machine translation engines are out there and what are they known for?=== | ===What prominent machine translation engines are out there and what are they known for?=== | ||
{| class="wikitable sortable" border="1" | {| class="wikitable sortable" border="1" | ||
| Line 34: | Line 44: | ||
! Open/Closed | ! Open/Closed | ||
! # of supported languages | ! # of supported languages | ||
! | ! Web hosted? | ||
|- | |- | ||
| Google Translate | | Google Translate | ||
| Line 41: | Line 51: | ||
| Closed | | Closed | ||
| +70 | | +70 | ||
| | | translate.google.com | ||
|- | |- | ||
| Microsoft Translator | | Microsoft Translator | ||
| Line 51: | Line 61: | ||
|- | |- | ||
| Babelfish | | Babelfish | ||
| Yahoo! | |||
| | | | ||
| | | Closed | ||
| | | | ||
| | | | ||
| Line 59: | Line 69: | ||
| MosesMT | | MosesMT | ||
| | | | ||
| | | Statistical | ||
| | | Open | ||
| | | | ||
| | | | ||
|- | |- | ||
| | | Apertium | ||
| | | | ||
| Rule-based | |||
| Open | |||
| | | | ||
| | | | ||
| Line 93: | Line 103: | ||
|} | |} | ||
See also [https://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications https://en.wikipedia.org/wiki/Comparison of machine_translation applications] | See also [https://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications https://en.wikipedia.org/wiki/Comparison of machine_translation applications] & [http://www.computing.dcu.ie/~mforcada/fosmt.html http://www.computing.dcu.ie/~mforcada/fosmt.html]. | ||
===What prominent corpuses are currently available?=== | ===What prominent corpuses are currently available?=== | ||
| Line 166: | Line 176: | ||
===What human resources would be needed to build our own MT engine?=== | ===What human resources would be needed to build our own MT engine?=== | ||
===What partnership opportunities could be available for this project?=== | ===What partnership opportunities could be available for this project?=== | ||
See [https://www.taus.net/taus-machine-translation-showcase https://www.taus.net/taus-machine-translation-showcase]. | |||
=User stories= | =User stories= | ||
==Firefox end-users== | ==Firefox end-users== | ||