GSoC Update 2 - HTML5 Speech API

From MozillaWiki
Jump to: navigation, search

Three more weeks since my last post, and since coding started. I should probably blog a lot more about my project, and I will try to do so henceforth.

Things accomplished -

  • Got microphone audio capture to work(presently only on linux). Initially tried the same on the Mac, but failed because of some internal portaudio issue. It works perfectly using ALSA on linux. So, on my mentors suggestion, I decided to get everything to work on one platform at first. Will eventually implement audio capture on Mac using CoreAudio.
  • Exposed nsSpeechRequest as a JavaScript object. In the process learnt a lot about Cycle Collection, Query Interface implementation and more XPCOM-stuff.

Things I'm working on -

  • A simple program to send audio data to google's speech server and get its response back. Will have to decide on a mechanism to do this with SpeechRequest. (possibly using XMLHttpRequest/nsHTTPChannel/EventSource). I'm currently looking into their implementations.
  • Audio Encoding - The captured data needs to be encoded in a compatible format. Since the speech incubator group hasn't decided on a format yet, I'm going to use Speex for now since google's speech service supports it.
  • Endpointing - End of speech needs to be detected by measuring audio energy levels. I might just reuse some code from the Chromium project for this.

I have been talking to my mentor, smaug quite a bit about the project and he's been really helpful whenever I'm stuck with something. I will be busy all of next week with examinations, as pointed out in my schedule. Will make up for this loss of time once exams are done. Exciting times ahead, I'm looking forward to writing lots more code.