Confirmed users
24
edits
| Line 10: | Line 10: | ||
== 1.Speech Input API == | == 1.Speech Input API == | ||
The speech input API aims to provide an alternative input method for web applications, without using a keyboard or other physical device. This API can be used to input commands, fill input elements, give directions etc. | The speech input API aims to provide an alternative input method for web applications, without using a keyboard or other physical device. This API can be used to input commands, fill input elements, give directions etc. It is based on SpeechRequest[[http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0023/speechrequest.xml.html]] | ||
The API | The API consists of 2 main components - | ||
A Media Capture API to capture a stream of raw audio. Its implementation is platform dependent. | *A Media Capture API to capture a stream of raw audio. Its implementation is platform dependent. Mac, Windows and Linux are to be supported first, eventually adding support for android. | ||
*A streaming API to asynchronously stream microphone data to a speech recognition server and to get the results back. This will be similar to how XMLHttpRequest is implemented. The api should be able to support both local and remote engines or a combination of both depending on the network connection available. | |||
==== <br> Security/Privacy issues ==== | ==== <br> Security/Privacy issues ==== | ||
A speech input session should be allowed only with the user's consent. This could be provided using a doorhanger notification. The user should be notified when audio is being recorded possibly using a record symbol somewhere in the web browser UI itself like the URL bar or status bar. | *A speech input session should be allowed only with the user's consent. This could be provided using a doorhanger notification. | ||
*The user should be notified when audio is being recorded possibly using a record symbol somewhere in the web browser UI itself like the URL bar or status bar. | |||
==== <br> API Design - ==== | ==== <br> API Design - ==== | ||
The API will look like the interface described in the proposal. | The API will look like the interface described in the SpeechRequest proposal. | ||
The developer should be able to specify a grammar(using SRGS) for the speech which is useful when the set of possible commands is limited. The recognition response would be in the form of an EMMA document. The developer should be allowed to set a threshold for accuracy and sensitivity to improve performance. The developer should be able to choose what speech engine to use. The developer should be able to start, stop, handle errors and multiple requests as required. | *The developer should be able to specify a grammar(using SRGS) for the speech which is useful when the set of possible commands is limited. The recognition response would be in the form of an EMMA document. | ||
*The developer should be allowed to set a threshold for accuracy and sensitivity to improve performance. | |||
*The developer should be able to choose what speech engine to use. | |||
*The developer should be able to start, stop, handle errors and multiple requests as required. | |||
== <br> 2.Text To Speech API == | == <br> 2.Text To Speech API == | ||