Changes

Jump to: navigation, search

Web Speech API - Speech Recognition

3,074 bytes added, 06:39, 14 October 2019
Created page with "= WebSpeech API = == Frequently Asked Questions == ===== What is it? ===== The WebSpeech API allows websites to enable speech input within their experiences. Some examples o..."
= WebSpeech API =

== Frequently Asked Questions ==

===== What is it? =====
The WebSpeech API allows websites to enable speech input within their experiences. Some examples of this include Duolingo, Google Translate, Google.com (for voice search).

===== What is it not? =====
*Speech recognition by the browser
*A translation service
*Text-to-speech/narration
*Always-on listening
*A voice assistant (see below)
*Voice search

===== Why are we doing it? =====
Chrome, Edge, Safari and Opera support a form of this API currently for Speech-to-text, which means sites that rely on it work in those browsers, but not in Firefox. As speech input becomes more prevalent, it helps developers to have a consistent way to implement it on the web. It helps users because they will be able to take advantage of speech-enabled web experiences on any browser they choose. We can also offer a more private speech experience, as we do not keep identifiable information along with users’ audio recordings.

If nothing else, our lack of support for voice experiences is a webcompat issue that will only become more of a handicap as voice becomes more prevalent on the web. We’ve therefore included the work needed to start closing this gap among our 2019 OKRs for Firefox, beginning with providing WebSpeech API support in Firefox Nightly.

===== What does it do? =====
When a user visits a speech-enabled website, they will use that site’s UI to start the process. It’s up to individual sites to determine how voice is integrated in their experience, how it is triggered and how to display recognition results.

As an example, a user might see a microphone button in a text field. When they click it, they will be prompted to grant temporary permission for the browser to access the microphone. Then they can input what they want to say (an utterance). Once they’ve finished their utterance, the browser passes the audio to a server, where it is run through a speech recognition engine. The speech recognizer decodes the audio and sends a transcript back down to the browser to display on a page as text.

===== Where does the audio go? =====
Firefox can specify which server receives the audio data inputted by the users. Currently we are sending audio to Google’s Cloud Speech-to-Text. Google leads the industry in this space and has speech recognition in 120 languages.

Prior to sending the data to Google, however, Mozilla routes it through our own servers first, to strip it of user identity information. This means that Google can’t connect the user data to their account because it all looks like it comes from Mozilla. We opt-out of allowing Google to store our voice requests. This means, unlike when a user inputs speech using Chrome, their recordings are not saved and can not be attached to their profile and saved indefinitely.

For Firefox, we can choose whether we hold on to users’ data to train our own speech services. Currently we have audio collection defaulted to off, but eventually would like to allow users to opt-in if they choose.
Confirm
58
edits

Navigation menu