Intellego/Meetings/Status/2014-02-06

From MozillaWiki
Jump to: navigation, search

https://intellego.etherpad.mozilla.org/ep/pad/view/ro.8xafZ2WPK5AlVPYWfo0Do/rev.5872

Meeting Details

Talking Points

  • Action item follow-up
  • GSoC project?
  • Possible Phase 1 milestones

Possible Phase 1

Previous Action Items

  • [Kensie] fleshing out and putting it onto wiki
  • Determine Phase 1 milestones
    • Research scope of proposed milestones
      • [Gordon]:
        • Read a few of the more notable papers on MT output evaluation.
          • Status: Forwarded some more recent results from NIST's comparison of MT engines.
        • Contact mitcho about leveraging hyperlink constituencies for use in Intellego.
          • Status: E-mailed mitcho on Feb. 3; no response yet.
        • Look into what it would take to translate snippets, including investigating existing open source machine translation engines and tying them into Pontoon.
          • Status: Deferred.
      • [Jeff]:
        • Research what it would entail to translate only specific terminology.
  • [All] Rework research questions within the travel metaphor paradigm

Action Items

  • [Gordon] Look into what it would take to translate snippets, including investigating existing open source machine translation engines and tying them into Pontoon.
  • [Gordon] Try to identify contacts at UD and UMD to get on board with Intellego.
  • [Kensie, Mekki] Sprint: Turn backgrounder bullet points into prose.

Google Summer of Code

Google Summer of Code application due a week from tomorrow (Feb. 14).

  • [Jeff] Reach out to Chad et.al. about GSoC.
  • [Jeff] Reach out to TAUS & BYU
  • [Gordon] Reach out to UD and UMD for possible student participants in GSoC project.
  • [Mekki] Pull GSOC application and look it over for requirements. Compare to CREDIL previously successful application
  • [Jeff, Mekki] Reach out to CREDIL for a possible collaboration effort.
  • [Jeff, Mekki, Gordon] Sprint: Fill out the Google Summer of Code application.

Research

Places to get initial code base

Google Translate Team
(Mekki has some contacts at Google and they may be able to release some snippets, algorithms, methods, etc., though preliminarily, barring an officialy high-level agreement between Google and Mozilla, actual code base being released to the Intellego project won't be happening any time soon for various reasons).
Language Immersion for Chrome Project
(Details here: https://chrome.google.com/webstore/detail/language-immersion-for-ch/bedbecnakfcpmkpddjfnfihogkaggkhl )
This is Chromified "Terminology output" potential milestone 1 idea above.
DARPA Broad Operational Language Translation (BOLT) project
(Already open source. Details here: http://www.darpa.mil/Our_Work/I2O/Programs/Broad_Operational_Language_Translation_%28BOLT%29.aspx )
Maybe one of the classic open source IRC bots like MegaHAL?
We don't want so much the Siri-like behaviour of the projects, but the code that takes input and analyzes it might be useful. (Maybe start at the MegaHAL Sourceforge page to dig through code? http://sourceforge.net/projects/megahal/ ) (The creator also has explanations of how the code works here: http://megahal.alioth.debian.org/How.html -- He seems to be active in Debian circles.)
There's a huge list of Natural Language Processing (NLP) software projects on this page.
Many of them are likely to be open source: http://en.wikipedia.org/wiki/Outline_of_natural_language_processing
Stanford CoreNLP
( http://nlp.stanford.edu/software/corenlp.shtml )
Science Toolbox ( http://www.sciencetoolbox.org/ )
https://github.com/louismullie/stanford-core-nlp
https://github.com/arrigonialberto86/ruby-band
http://www.sciencetoolbox.org/tag/natural%20language%20processing
http://www.sciencetoolbox.org/tag/machine%20learning

Potential collaborators/sponsors

Canadian Internet Registry Authority (CIRA)
They have a call for proposals right now for their Community Investment Program to be funded from their current fiscal year budget (that has to go out by end of March). We can get up to $100,000 for a project like this. Their interest will be in making the Internet more accessible to all Canadians, especially English/French, but also other Canadian diversity (of language). (Details of call for proposals here: http://cira.ca/about-cira/community-investment-program?utm_source=memberEnglish&utm_medium=email&utm_campaign=cip )
The Sloan Foundation
They fund research related to this sort of project. We probably fall under their "basic research" initiative, but if we pitch it right, they are likely to be interested. Mekki has contacts who have applied to the Sloan Foundation successfully before so we can ask for advice. People in Mozilla Foundation will be familiar with it as well as they have successfully received funding from the Sloan Foundation for the Software Carpentry Project (http://software-carpentry.org/ ) (http://www.sloan.org/ )
Center for Research and Experimental Development in Informatics Libre (CREDIL)
They're a FLOSS not-for-profit group based in Ottawa of which Mekki is a member. They have experience working with language technologies (amongst other tech projects). They have some old school talent (from the pre-Mozilla Netscape era, but without all the baggage) and have lots of experience working with government and FLOSS. They could potentially also separately apply for a CIRA grant to work on Intellego from a different angle (so that we can both get funded). I've planted the idea seed with them to look at the CIRA grant and there is interest. (Details about CREDIL here: http://www.credil.org/ )
Linguistech
A collaborator with CREDIL, funded by the Canadian federal government, that focuses on translation technologies. They have already developed numerous tools (mostly for English<->French) and could benefit greatly from what we have to offer. (Details about Linguistech here: http://linguistech.ca/Home )
Google
For several reasons, Google needs a competitor for Google Translate. 1) For antitrust reasons, it's in their best interest to not be the monopolistic leader in translation services. Sponsoring a competitor (as they do with Firefox) actually allows them to pass antitrust scrutiny on the range of their activities more readily. 2) There are innovation-related reasons for them to sponsor and collaborate on Mozilla Intellego. In short, it prevents them from becoming complacent and prompts them to continually innovate. Google Translate has been stagnating in recent years, so this boost will be useful. 3) Many of the arguments for Chrome->Firefox and Android->FirefoxOS also apply to Google Translate -> Mozilla Intellego.
Google's probably out, given previous (confidential) discussion.
Microsoft
For many of the same reasons as Google. Kensie has knowledge about related initiatives at Microsoft
DARPA (See BOLT above)
They have lots and lots and lots of money for this sort of project. Language tech is a high priority for them right now given the number of theaters in which the US military is operating. They have sponsored numerous initatives to help on-the-ground personal communicate with locals. Big sponsorship. A key issue with this partner is ensuring that we can maintain our focus on Translation for the Open Web and not get drawn into war politics. DARPA has historically been good about this and is quite happy to contribute to open source projects, both money and employee time, so it should be fine. (Details on some funding opportunities here: http://www.darpa.mil/Opportunities/Solicitations/DARPA_Solicitations.aspx)
NATO Allied Command Transformation (ACT)
Similar to DARPA. They have an interest in anything that promotes training/learning/communication between NATO allies. Being able to do on-the-fly MLT with Intellego would seem to nicely fit that bill. (Some details here: http://www.act.nato.int/ffci)
Alelo Inc.
Originally known for their Tactical Iraqi language training product (using the Unreal 2003 engine no less!), this company has worked extensively with military, academia, and governments to do immersive language training (primarily for soliders). The pitch for us is that currently all of their language training is done based on learners internalizing the vocabulary using the exercises, etc. We can pitch tot hem that Intellego can be an additional option that they can incorporate into their training under the category of "unexpected", where the person can use it as a tool when they can't otherwise understand. The improvement of Intellego over time can be progressively incorporated into their training (they've been at it for more than 10 years already), so it makes them seem forward-looking (and since they get a lot of R&D money, that's a huge incentive for them). (Their homepage is here: http://www.alelo.com/index.html)
Government of Canada
The Canadian Federal Government is always interested in anything related to English <-> French translation given the two official languages in Canada. They throw money at this sort of work (See Linguistech above for example). There are several different departments that may be interested. Some are listed below:
The Translation Bureau, for example, has developed several tools for their translation staff to use.
We could offer them a new option. Mekki has some contacts there that he could poke to see how we might be able to submit a proposal. (Some details about the Translation Bureau here: http://www.bt-tb.tpsgc-pwgsc.gc.ca/btb.php?lang=eng&cont=001)
Literacy and Essential Skills at the department of Employment and Social Development Canada funds programs that have the potential to advance literacy.
We can pitch Mozilla Intellego as also helping with language learning for people who are new to English/French (helping translate from their native language during the learning process). (Some details about the funding opportunities are here: http://www.esdc.gc.ca/eng/jobs/les/funding/index.shtml )
Citizenship and Immigration Canada.
Similar to above. They have money to help promote the integration of new Canadians who might not have English/French as their first language. (Some details of the funding opportunities are here: http://www.cic.gc.ca/english/department/grants-contributions-funding/index.asp )
Many others.
Mekki can do a more thorough search as things move forward.
Provincial and Territorial Governments in Canada
Similar to Federal Government above, but 13 different sources with different rules, metrics, and amounts of available funds. However, there's a good chance of getting money from several of them. Requires some extensive digging/footwork. Mekki can do that as things move forward.
Language Industry Association (AILIA)
They have funding opportunities for organizations developing linguistic tools (They funded parts of Linguistech above). (Some details here. May need to dig a bit more: http://www.ailia.ca/Canadian+Language+Sector+Enhancement+Program+%28CLSEP%29++Funded+Projects )
Center for Canadian Language Benchmarks
They have a similar interest (from the measurement side) of promoting language proficiency as the other government-esque places listed above (Some details here: http://www.language.ca/)
University partners
For example, the University of Ottawa as a translation department (http://www.translation.uottawa.ca/), York University has programs in translation and interpretation (http://www.glendon.yorku.ca/translation/) etc. Most major universities around the world are likely to have a program. We may be able to convince them to do a joint degree/diploma with a computer science department to specialize on the things related to this project. We already have many contacts at computer science deparment with many Mozillians. We can approach the translation departments to partner once we have a pitch.
College partners
There are two approaches possible. The first is similar to the approach for universities. The second is to focus on colleges that have technical programs where students do hands-on work with open source projects, just like Seneca College already does with its CDOT area with Mozilla, OpenOffice.org, Fedora, NextJ, and other projects. We could contact some professors and ask them if they would be willing to partner with us to have all of their students work on Intellego for a semester as part of their course work. The Profs could teach them the theoretical details behind machine language translation, and then they can apply what they have learned in the project as contributors. An advantage of this approch is that several Canadian colleges (Including Seneca, see: http://www.canadainternational.gc.ca/brazil-bresil/study-etudie/swb-ssf.aspx?lang=en ) are party to an agreement to get a large number of students from foreign countries to come do special diplomas, usually for 1 year. Currently, Brazil is sending many students who speak Brazilian Portuguese. It's a great opportunity to get students who speak another language (where there is huge FirefoxOS deployment no less!) to participate in Intellego. Mekki has contacts deep within this program and has already seeded interest. They have dozens (or more) students who are looking for "placements" to get their "work credit hours" with open source projects. It's a great fit to have them work on improving Brazilian Portuguese translation as a starting point (and contributing to developing ways of understanding regionalizations of languages as is the case with Portuguese in Portugal and Brazil, or French in France and Quebec).
Google Summer of Code (GSOC)
Always a great source of talented students, which can be recruited from around the world with skills in different languages. All we need to apply is someone who can direct the student's efforts and manage them. Organizational applications are due in 1 WEEK (February 14, 2014) so we'll have to get right on this! (Details: http://www.google-melange.com/gsoc/homepage/google/gsoc2014 )
Engineers Without Borders
Intellego could be useful in nearly all of EwB's ventures. I'm unsure how they collaborate with projects such as Intellego. Mekki has some contacts at EwB, so will inquire if this is the sort of thing they would want to partner with.
Médecins Sans Frontières / Doctors Without Borders (MSF)
Similar to EwB. They have an interest in the outcomes of Intellego. It could simplify a lot of their work. They may not have funding for us, but they may be able to broker funding from others (Their main Canadian site is : www.msf.ca/)
Ushahidi
They're best known for the open source croud sourcing of information platform. They also have the CrowdMap hosted version. One of the challenges they face is that information is typically recorded in the language in which the report is submitted. When validating reports, if the validator has to speak the same language (or dialect) as the reporter, it limits the scope of who can validate (e.g., election observers from foreign countries & locals reporting violence at a poling station). Intellego to promote communications on the crowd sourcing platform. Unsure how much funding they have but they have their own open source community, code base, and various other things that we could tap into/collaborate on. Mekki also has a contact there to get more information and guage interest. (Site: http://www.ushahidi.com)
Humanitarian Open Street Map
(Similar to Ushahidi. Details here: http://hot.openstreetmap.org/ )
CrisisCommons
(Similar to Humanitarian Open Street Map: http://crisiscommons.org/ )
Open Knowledge Foundation
Their mandate overlaps considerably with Mozilla's open web mandate. They're champions of open source and have a worldwide presence. I suspect they will have a lot of interest in Intellego. Mekki has a contact there who can help us uncover more details and opportunities (Site: http://okfn.org/)

Things to keep in mind

  • It's never too early in a project to put together a Contributor License Agreement. Are we just going to go with the standard Mozilla CLA? That's probably easiest. Having something to point to that is already established will make discussions with potential collaborators (above) that much more straightforward.