Web Speech API - Speech Recognition: Difference between revisions

Web Speech API - Speech Recognition (view source)

Revision as of 20:24, 14 January 2020

91 bytes removed , 14 January 2020

m

small edits to wording/grammar and a few clarifications based on feedback

Jvo

Confirmed users

1

edit

@@ Line 17: / Line 17: @@
 Chrome, Edge, Safari and Opera support a form of this API currently for Speech-to-text, which means sites that rely on it work in those browsers, but not in Firefox. As speech input becomes more prevalent, it helps developers to have a consistent way to implement it on the web. It helps users because they will be able to take advantage of speech-enabled web experiences on any browser they choose. We can also offer a more private speech experience, as we do not keep identifiable information along with users’ audio recordings.
-If nothing else, our lack of support for voice experiences is a webcompat issue that will only become more of a handicap as voice becomes more prevalent on the web.  We’ve therefore included the work needed to start closing this gap among our 2019 OKRs for Firefox, beginning with providing WebSpeech API support in Firefox Nightly.
+If nothing else, our lack of support for voice experiences is a web compatability issue that will only become more of a handicap as voice becomes more prevalent on the web.  We’ve therefore included the work needed to start closing this gap among our 2019 OKRs for Firefox, beginning with providing WebSpeech API support in Firefox Nightly.
 =====  What does it do? =====
@@ Line 42: / Line 42: @@
 <ol>
-<li>The Web Speech API code in the browser is responsible for prompting the user for permission to record from the microphone, determine when stopped speaking, and submit the data to our speech proxy server. There are four headers that can be used by the client to alter the proxy's behavior [https://github.com/mozilla/speech-proxy/blob/master/Makefile#L13]:</li>
+<li>The Web Speech API code in the browser is responsible for prompting the user for permission to record from the microphone, determine when speaking has ended, and submit the data to our speech proxy server. There are four headers that can be used by the client to alter the proxy's behavior [https://github.com/mozilla/speech-proxy/blob/master/Makefile#L13]:</li>
      * Accept-Language-STT: determines the language aiming to be decoded by the STT service
      * Store-Sample: determines if the user allows '''''Mozilla''''' to store the '''audio''' sample in our own servers to further use (training our own models, for example)
      * Store-Transcription: determines if the user allows '''''Mozilla''''' to store the '''transcription''' in our own servers to further use (training our own models, for example)
      * Product-Tag:  determines which product is making use of the API. It can be: vf for voicefill, fxr for Firefox Reality, wsa for Web Speech API, and so on.
-<li>Once the proxy receives the request with the audio sample, it looks for the headers that were set, and nothing besides what was requested by the user plus a timestamp and the user-agent is saved. You can check it here: [https://github.com/mozilla/speech-proxy/blob/master/server.js#L324] </li>
+<li>Once the proxy receives the request with the audio sample, it looks for the headers that were set. Nothing other than what was requested by the user plus a timestamp and the user-agent is saved. You can check it here: [https://github.com/mozilla/speech-proxy/blob/master/server.js#L324] </li>
 <li>The proxy then looks for the format of the file and decodes it to raw pcm. </li>
 <li>A request is made to the STT provider set in the proxy's configuration file containing '''just the audio file'''. </li>
-<li>Once the STT provider returns the request containing a transcription and a confidence score, that is then forwarded to the client who then is responsible to take an action accordingly with the user's request.</li>
+<li>Once the STT provider returns the request containing a transcription and a confidence score, it is forwarded to the client, who then is responsible to take an action according to the user's request.</li>
 </ol>
 =====  How does your proxy server work? Why do we have it? =====
-There are both technical and practical reasons to have that. We wanted to have both the flexibility to abstract the redirection of the user's requests to different STT services without changing the client code, but also to have a single protocol to be used across all projects at Mozilla. But the most beneficial reason was to make sure that we would keep our user's anonymous in the case we need to use a 3rd party provider, once is this case, the requests to the provider are made from our own server and just the audio sample is submitted to them, who then returns the transcription. See below a list of some benefits for routing the data through our speech proxy:
+There are both technical and practical reasons to have a proxy server. We wanted to have both the flexibility to abstract the redirection of the user's requests to different STT services without changing the client code, and also to have a single protocol to be used across all projects at Mozilla. But the most beneficial reason was to keep our users anonymous when we need to use a 3rd party provider. In this case, the requests to the provider are made from our own server and only the audio sample is submitted to get a transcription. Some benefits of routing the data through our speech proxy:
-# Before sending the data any 3rd party STT provider, we have the chance to strip all user's information and make an anonymous request to the provider, preserving then the user's identity.
+# Before sending the data any 3rd party STT provider, we have the chance to strip user's identifying information and make an anonymous request to the provider.
-# In the case we need to user 3rd party and paid STT services, we don't need to ship the service's key along the client's code.
+# When we need to use 3rd party and paid STT services, we don't need to ship the service's key along with the client's code.
-# Centralizing and funneling the requests through our servers prevents abuse from the clients, and allow us to implement mechanisms like throttling, blacklists and such.
+# Centralizing and funneling the requests through our servers decreases the chance of abuse from clients and allows us to implement mechanisms like throttling, blacklists, etc.
-# We can switch between the STT services in real time as we need/want, and redirect the request to any service we decide without changing any code in the client. For example: send english requests to provider A and pt-br to provider B without sending any update to the client.
+# We can switch between the STT services in real time as we need/want and redirect the request to any service we choose, without changing any code in the client. For example: send English requests to provider A and pt-br to provider B without sending any update to the client.
 # We can both support STT services on premises as off premises without having any extra logic in the client.
 # We can centralize all requests coming from different products into a single speech endpoint making it easier to measure the engines both quantitative as qualitative.
 # We can support different audio formats without adding extra logic to the clients regardless of the format supported by the STT provider, like adding compression or streaming between the client and the proxy.
-# If the user desires to contribute to Mozilla's mission and let us save the samples, we can, without sending it to 3rd party providers.
+# If users desire to contribute to Mozilla's mission and let us save their audio samples, we can, without sending it to 3rd party providers.

Web Speech API - Speech Recognition: Difference between revisions

Web Speech API - Speech Recognition (view source)

Revision as of 20:24, 14 January 2020

Navigation menu

Search