Changes

Jump to: navigation, search

Web Speech API - Speech Recognition

3,787 bytes added, 07:47, 14 October 2019
Adding offline and deep speech bits
= WebSpeech API - Speech Recognition =
== Frequently Asked Questions ==
===== What is it? =====
The speech recognition part of the WebSpeech API allows websites to enable speech input within their experiences. Some examples of this include Duolingo, Google Translate, Google.com (for voice search).
===== What is it not? =====
For Firefox, we can choose whether we hold on to users’ data to train our own speech services. Currently we have audio collection defaulted to off, but eventually would like to allow users to opt-in if they choose.
 
===== Where are our servers and who manages it? =====
The entire backend is managed by Mozilla's cloudops and services team. Here is the current architecture:
 
[[File:Wsa architecture.png|1000px|]]
 
===== There are three parts to this process - the website, the browser and the server. Which part does the current WebSpeech work cover? =====
The current work being added to Firefox Nightly is the browser portion of the process. It provides the path for the website to access the speech recognition engine on the server.
 
===== Can we not send audio to Google? =====
We can send the audio to any speech recognition service we choose. Mozilla is currently developing our own service called Deep Speech which we hope to validate in 2020 as a replacement for Google, at least in English. We may eventually use a variety of recognition engines for different languages.
 
===== Who pays for Google Cloud? =====
Currently the license for Google Cloud STT is being handled by our Cloud Ops team under our general contract with Google Cloud
 
===== How can I test with Deep Speech? =====
Considering you are using a Nightly version with the API enabled, you just need to change a preference and point to our endpoint enabled with Deep Speech (currently only English is available):
# Go to about:config
# Set the preference '''''media.webspeech.service.endpoint''''' to '' https://dev.speaktome.nonprod.cloudops.mozgcp.net/''
 
===== Why not do it offline? =====
To do speech recognition offline, the speech recognition engine must be embedded within the browser. This is possible and we may do it eventually, but it is not currently planned for Firefox. We will, however, be testing offline speech recognition in Firefox Reality for Chinese users early in 2020. Depending on how those tests go, we may plan to extend the functionality elsewhere.
 
At one point back in 2015, there was an offline speech recognition engine (Pocketsphinx) embedded into Gecko for FirefoxOS. It was removed from the codebase in 2017 because it could not match the quality of recognition offered by server-based engines using deep neural nets.
 
===== But I still want to run offline =====
You can easily setup your own Deep Speech service locally in your computer using our docker images. Just follow the steps below:
# First install and start the Deep Speech Docker image from [https://github.com/mozilla-services/deepspeech-server/ here].
# Then install and start the speech-proxy Docker image from [https://github.com/mozilla/speech-proxy/blob/master/README.md here] with the environment variable ''ASR_URL'' set to the address of yours Deep Speech instance.
# Set the preference '''''media.webspeech.service.endpoint''''' of your Nightly to the address of your speech-proxy instance
# Navigate to: [https://translate.google.com Google Translate], click the microphone and test it.
# If it works, then you have the recognition happening 100% offline in your system.
 
===== Why are we holding WebSpeech support in Nightly? =====
We are in the process of working with Google and other partners to update the WebSpeech API spec. While that process is underway, we will need to make updates to our implementation to match. We will not ship WebSpeech support more broadly until there is consensus that the Standard is appropriate for Mozilla.
 
===== Are you adding voice commands to Firefox? =====
There are experiments testing the user value of voice operations within the browser itself, but that effort is separate from general Webspeech API support. For more details on that work, dig in here.
 
===== What’s next? =====
You can see the full picture of Voice work going on at Mozilla here.
 
===== Have a question not addressed here? =====
[add emails?]
Confirm
58
edits

Navigation menu