SpeechRTC - Speech enabling the open web: Difference between revisions

SpeechRTC - Speech enabling the open web (view source)

Revision as of 18:26, 21 April 2014

3 bytes removed , 21 April 2014

→‎The speech decoder

Andrenatal

Confirmed users

58

edits

@@ Line 22: / Line 22: @@
 == The speech decoder ==
-* Decoder
+===Decoder===
-** Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers.
-   The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models.
-   For LVSCR we can also consider Julius and benchmark it since he has great proved results.
-* Automatic retrain
+Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers.
-** We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall.
+The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models.
-* Privacy
+For LVSCR we can also consider Julius and benchmark it since he has great proved results.
+===Automatic retrain===
+We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall.
+===Privacy===
-** Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue.
+Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue.
-* Offline and online
+===Offline and online
-** The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front.
+The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front.
 == Web Speech API ==

SpeechRTC - Speech enabling the open web: Difference between revisions

SpeechRTC - Speech enabling the open web (view source)

Revision as of 18:26, 21 April 2014

Navigation menu

Search