Changes

SpeechRTC - Speech enabling the open web

35 bytes removed, 18:19, 21 April 2014

no edit summary

== SpeechRTC - Speech enabling the open web ==

Speech recognition on any modern handsets is almost a standard feature, and the target of SpeechRTC is bring it to Firefox OS and other Mozilla products by creating a scalable and flexible services platform with focus on delivering great experience to users and empowerement of developers finally offering full support to Web Speech API and other tools.

== ~~How-to~~ Starting point == ~~1. Starting point~~

SpeechRTC is already used on two published Firefox OS apps[1], and proved to run both online as offline at any device with 1.3, even unagi. So the fast track is first integrate it with FxOS as some OS level integrated apps, to build the foundations and then release the Web Speech API to developers on sequence.

1. == The Client==

On online mode, Audio is captured encoded on Opus through MediaRecorder and streamed through websockets to a nodejs application at the server that handle the connection with the decoder. It also have methods to change language model when necessary.

On offline mode, audio is captured as pcm from gUm and streamed to a web worker "thread", who treats it, and handle its processing with the decoder api ported to js by emscripten. As online, aslo has language model switch support. Despite this proved to work, the ideal approach is run the decode on a separate cpp process and communicate with it through IPC, to make it run even on phones with constrained cpu.

2. == The Server==

On online mode, the nodejs application who receive audio and grammars from peers is responsible to handle the connection with the voice server, who then decode opus to pcm and pass it to the decoder when dealing with recognition, or swicth the language model when requested to. Some argued with me on the past about also running the decode on node, but I decided that would be better to the project decouple it, since we may need to use different decoders and voice servers that may not run on javascript.

3. == The speech decoder==

Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers.

For LVSCR we can also consider Julius and benchmark it since he has great proved results.

~~3.1~~ * Automatic retrain

We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall.

~~3.2~~ * Privacy

Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue.

~~3.3~~ * Offline and online

The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front.

4. == Web Speech API==

After we build boths online as offline backends on scalable way, we connect it with the already ready Web Speech API on Gecko, and release the api to developers and automatically starts to support every web app already developer with Web Speech API support that currently only runs on Chrome.

8. == Demos, Links and references ~~8.1 The crab~~ ~~Video: https://www.youtube.com/watch?v~~=~~pnCRH-Iznrc~~ ~~App: https://marketplace.firefox.com/app/the-crab~~=

~~8.2 Voicity~~* The crab ** Video: https://www.youtube.com/watch?v=~~cjjFvyH3kdc~~pnCRH-Iznrc ** App: https://marketplace.firefox.com/app/~~voicity~~the-crab

* Voicity

** https://www.youtube.com/watch?v=cjjFvyH3kdc

** https://marketplace.firefox.com/app/voicity

~~8.3~~ * Emscripten Offline recognition on Peak ** Video: https://www.youtube.com/watch?v=FXKXhrRDEb8

~~8.4~~ * SpeechRTC Github ** https://github.com/andrenatal/speechrtc

~~8.5~~ * ChatterThing - Telefonica Hackaton Campus Party BR Winner ** https://www.youtube.com/watch?v=mTlcjPG7ogM (portuguese)

Andrenatal

Confirm

58

edits

Changes

SpeechRTC - Speech enabling the open web

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools