Changes

Jump to: navigation, search

Web Speech API - Speech Recognition

1 byte added, 01:34, 20 November 2019
changing orders
For Firefox, we can choose whether we hold on to users’ data to train our own speech services. Currently we have audio collection defaulted to off, but eventually would like to allow users to opt-in if they choose.
 
===== How does your proxy server work? Why do we have it? =====
There are both technical and practical reasons to have that. We wanted to have both the flexibility to abstract the redirection of the user's requests to different STT services without changing the client code, but also to have a single protocol to be used across all projects at Mozilla. But the most beneficial reason was to make sure that we would keep our user's anonymous in the case we need to use a 3rd party provider, once is this case, the requests to the provider are made from our own server and just the audio sample is submitted to them, who then returns the transcription. See below a list of some benefits for routing the data through our speech proxy:
# Before sending the data any 3rd party STT provider, we have the chance to strip all user's information and make an anonymous request to the provider, preserving then the user's identity.
# In the case we need to user 3rd party and paid STT services, we don't need to ship the service's key along the client's code.
# Centralizing and funneling the requests through our servers prevents abuse from the clients, and allow us to implement mechanisms like throttling, blacklists and such.
# We can switch between the STT services in real time as we need/want, and redirect the request to any service we decide without changing any code in the client. For example: send english requests to provider A and pt-br to provider B without sending any update to the client.
# We can both support STT services on premises as off premises without having any extra logic in the client.
# We can centralize all requests coming from different products into a single speech endpoint making it easier to measure the engines both quantitative as qualitative.
# We can support different audio formats without adding extra logic to the clients regardless of the format supported by the STT provider, like adding compression or streaming between the client and the proxy.
# If the user desires to contribute to Mozilla's mission and let us save the samples, we can, without sending it to 3rd party providers.
===== Where are our servers and who manages it? =====
<li>Once the STT provider returns the request containing a transcription and a confidence score, that is then forwarded to the client who then is responsible to take an action accordingly with the user's request.</li>
</ol>
 
===== How does your proxy server work? Why do we have it? =====
There are both technical and practical reasons to have that. We wanted to have both the flexibility to abstract the redirection of the user's requests to different STT services without changing the client code, but also to have a single protocol to be used across all projects at Mozilla. But the most beneficial reason was to make sure that we would keep our user's anonymous in the case we need to use a 3rd party provider, once is this case, the requests to the provider are made from our own server and just the audio sample is submitted to them, who then returns the transcription. See below a list of some benefits for routing the data through our speech proxy:
# Before sending the data any 3rd party STT provider, we have the chance to strip all user's information and make an anonymous request to the provider, preserving then the user's identity.
# In the case we need to user 3rd party and paid STT services, we don't need to ship the service's key along the client's code.
# Centralizing and funneling the requests through our servers prevents abuse from the clients, and allow us to implement mechanisms like throttling, blacklists and such.
# We can switch between the STT services in real time as we need/want, and redirect the request to any service we decide without changing any code in the client. For example: send english requests to provider A and pt-br to provider B without sending any update to the client.
# We can both support STT services on premises as off premises without having any extra logic in the client.
# We can centralize all requests coming from different products into a single speech endpoint making it easier to measure the engines both quantitative as qualitative.
# We can support different audio formats without adding extra logic to the clients regardless of the format supported by the STT provider, like adding compression or streaming between the client and the proxy.
# If the user desires to contribute to Mozilla's mission and let us save the samples, we can, without sending it to 3rd party providers.
 
===== There are three parts to this process - the website, the browser and the server. Which part does the current WebSpeech work cover? =====
Confirm
58
edits

Navigation menu