SpeechAPI: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:


<h1> Technical Stuff </h1><br />
<h1> Technical Stuff </h1><br />
  <h2> 1.Speech Input API  </h2>
The speech input API aims to provide an alternative input method for web applications, without using a keyboard or other physical device. This API can be used to input commands, fill input elements, give directions etc. It is based on SpeechRequest[[http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0023/speechrequest.xml.html]]
The API consists of 2 main components -
*A Media Capture API to capture a stream of raw audio. Its implementation is platform dependent. Mac, Windows and Linux are to be supported first, eventually adding support for android.
*A streaming API to asynchronously stream microphone data to a speech recognition server and to get the results back. This will be similar to how XMLHttpRequest is implemented. The api should be able to support both local and remote engines or a combination of both depending on the network connection available.
==== Security/Privacy issues  ====
*A speech input session should be allowed only with the user's consent. This could be provided using a doorhanger notification.
*The user should be notified when audio is being recorded possibly using a record symbol somewhere in the web browser UI itself like the URL bar or status bar.<br>
==== API Design -  ====
The API will look like the interface described in the SpeechRequest proposal.
*The developer should be able to specify a grammar(using SRGS) for the speech which is useful when the set of possible commands is limited. The recognition response would be in the form of an EMMA document.
*The developer should be allowed to set a threshold for accuracy and sensitivity to improve performance.
*The developer should be able to choose what speech engine to use.
*The developer should be able to start, stop, handle errors and multiple requests as required.
<br>
   <h2>nsSpeechRequest</h2>
   <h2>nsSpeechRequest</h2>
     <p>Related text here</p>
     <p>Related text here</p>

Revision as of 12:38, 5 April 2012

Introduction


This project is an extension of the GSoC Speech project. This project offers support to voice commands inside the firefox browser and has lead to an extension in SpeechRecognition API as well as text-to-speech API.


Initial contributors:

Roshan Vidyashankar and Anant Narayanan are the initial contributors to the Speech Project

Contributors for this extended SpeechAPI project are:

  1. Rohan Dalvi
  2. Harshank Vengurlekar
  3. Jagannath Ramesh


Technical Stuff


1.Speech Input API

The speech input API aims to provide an alternative input method for web applications, without using a keyboard or other physical device. This API can be used to input commands, fill input elements, give directions etc. It is based on SpeechRequest[[1]]

The API consists of 2 main components -

  • A Media Capture API to capture a stream of raw audio. Its implementation is platform dependent. Mac, Windows and Linux are to be supported first, eventually adding support for android.
  • A streaming API to asynchronously stream microphone data to a speech recognition server and to get the results back. This will be similar to how XMLHttpRequest is implemented. The api should be able to support both local and remote engines or a combination of both depending on the network connection available.

Security/Privacy issues

  • A speech input session should be allowed only with the user's consent. This could be provided using a doorhanger notification.
  • The user should be notified when audio is being recorded possibly using a record symbol somewhere in the web browser UI itself like the URL bar or status bar.

API Design -

The API will look like the interface described in the SpeechRequest proposal.

  • The developer should be able to specify a grammar(using SRGS) for the speech which is useful when the set of possible commands is limited. The recognition response would be in the form of an EMMA document.
  • The developer should be allowed to set a threshold for accuracy and sensitivity to improve performance.
  • The developer should be able to choose what speech engine to use.
  • The developer should be able to start, stop, handle errors and multiple requests as required.


nsSpeechRequest

Related text here

endpointer

Related text here

browser.js

Related text here

browser.xul

Related text here

Bugs solved:


Mention all bugs solved in the form of a list

Tentative Schedule


  • 28th November - 4th December - SpeechRequest + endpointer code compiled.
  • 5th December - 11th December - Fixes to microphone handling on linux, some other small fixes, getting familiar with the code.
  • 12th December - 18th December - continue some small fixes ( for example simplify thread handling ).
  • 19th December - 25th December - Christmas etc, not much progress, but continue with fixing SpeechRequest API, adding possibly some new features.
  • 26th December - 1st January - Holiday Season, not much progress, but continue with SpeechRequest Implementation.
  • 2nd January - 8th January - Get TTS working.
  • 9th January - 15th January - Enhancements to the TTS Implementation.
  • 16th January - 22nd January - First speech commands: for example browser go back & go forward etc.
  • 23rd January - 29th January - More speech commands, maybe read entire text
  • Demos & Examples

    Patches/Updates