Changes

Jump to: navigation, search

SpeechRTC - Speech enabling the open web

3 bytes removed, 18:26, 21 April 2014
The speech decoder
== The speech decoder ==
* ===Decoder** Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers. The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models. For LVSCR we can also consider Julius and benchmark it since he has great proved results.===
* Automatic retrainThird-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers.
** We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for him but also for the service as overallboth FSG and LVSCR language models.
* For LVSCR we can also consider Julius and benchmark it since he has great proved results. ===Automatic retrain=== We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall.  ===Privacy===
** Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue.
* ===Offline and online
** The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front.
== Web Speech API ==
Confirm
58
edits

Navigation menu