GSoC Update 3 - HTML5 Speech API

From MozillaWiki
Jump to: navigation, search

Six weeks up since coding started. Progress on the project is looking good, though I would've liked to have finished a lot more by now. Lost about 10 days because of examinations, but last week has been quite productive.

Things accomplished since last time:

  • Got audio recording to work on mac. (It turns out that the issue I was running into was fixed in a newer version of portaudio.
  • Figured out how to send audio and receive results - This turned out to be easier than expected. A simple HTTP POST with the audio data gives me the recognition results.
curl -H "Content-Type: audio/x-flac; rate=16000" -F"myfile=@untitle.flac" ""

gives me the result in JSON:

           "utterance":"this is a audio recording",

I'm working on getting the same to work using xmlhttprequest.
Lots more to be done this week:

  • UI to get user permission for speech.
  • Integrating endpointing, speechrecognizer and everything else.