Intellego/GSoC/2014: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Added sections about the project.)
Line 36: Line 36:
;Final deliverable:
;Final deliverable:
* Automatic terminology translation tool consisting of a web interface and a server-side tool. A user will insert the URL of a source language web site and the tool will return the rendered target language website containing partially translated content.
* Automatic terminology translation tool consisting of a web interface and a server-side tool. A user will insert the URL of a source language web site and the tool will return the rendered target language website containing partially translated content.


== Google Summer of Code Interested Parties ==
== Google Summer of Code Interested Parties ==
Line 45: Line 46:
! Profile/site  
! Profile/site  
! Description of interest  
! Description of interest  
|-
| [http://mozillians.org/en/akshayaurora Akshay Aurora] (:system64)
| akshayaurora[at]yahoo.com
| [http://iakshay.net Website] // [http://github.com/iakshay Github] // [http://linkedin.com/in/akshayaurora LinkedIn]
| Full stack developer passionate about open technologies
|-
|-
|}
|}

Revision as of 11:53, 5 March 2014

GSoC Project Outline

Intellego is an initiative to develop a machine translation platform from open corpus data, open corpus gathering techniques, and open web services APIs to lower the linguistic accesibility barrier for users and websites and further promote the exploration of freedom of linguistic expression on the web.

This piece of the project will lay the foundational code for Intellego by aiming at the completion of the first key milestone: creating an automatic translation tool for web sites aimed to translate all key source terminology in a site into the target language equivalents. This will be accomplished by scanning the DOM of a site, extracting the translatable text nodes, searching for source terminology matches from within a bilingual termbase, and returning target language terminology within the rendered page. This project will aim to perform these tasks within the Mozilla support sites.

If the student can accomplish the basic scope of the project before the necessary eight weeks, the stretch aim would be to enable the addition of context sensitive retrieval of target terminology.

Skills Needed

  • DOM manipulation (JavaScript)
  • Information retrieval
  • XML
  • Understanding of open webservices APIs
  • Python
  • Ability to quickly create an intuitive front-end web UI using an existing framework (e.g., Django)

Timeline

8-week project timeline (three months for whole thing beginning to end but only 8 weeks allowed to be allocated to actual code work--See timeline link below):

Week 1
  • Create a bilingual termbase of terminology consisting of Mozilla-specific terminology from Mozilla l10n resources.
Week 2
  • Create a front-end web portal UI in which the user will simply enter a URL and click a button to execute the MT results.
  • Create a back-end, Python-based program that will, given a URL, extract the DOM text nodes from the associated webpage.
Week 3
  • Filter out DOM text nodes with untranslatable (or non-translatable) text.
Week 4
  • Search the translatable DOM text nodes (the source) for source terminology matches in the bilingual termbase.
Week 5
  • Map the source terminology to the matching target terminology from the termbase.
Week 6
  • All-At-Once Replacement Method: Regenerate the DOM with the replaced terminology, output to a new webpage, and render it.
Week 7
  • On-the-Fly Replacement Method: Perform the terminology replacement operation on the DOM segment by segment, instead of extracting all text nodes from the DOM at once.
Week 8
  • Evaluate each method (all-at-once or on-the-fly) for efficiency and analyze whether it would be beneficial to use one method over the other, or whether it would be better to offer a choice of either.
Final deliverable
  • Automatic terminology translation tool consisting of a web interface and a server-side tool. A user will insert the URL of a source language web site and the tool will return the rendered target language website containing partially translated content.


Google Summer of Code Interested Parties

Please add a row below the header row containing the appropriate information if you are interested in this project.

Name Contact info Profile/site Description of interest
Akshay Aurora (:system64) akshayaurora[at]yahoo.com Website // Github // LinkedIn Full stack developer passionate about open technologies

Contact

Mentor & Reporter