Revision as of 12:38, 5 April 2012

Introduction

This project is an extension of the GSoC Speech project. This project offers support to voice commands inside the firefox browser and has lead to an extension in SpeechRecognition API as well as text-to-speech API.

Initial contributors:

Roshan Vidyashankar and Anant Narayanan are the initial contributors to the Speech Project

Contributors for this extended SpeechAPI project are:

Rohan Dalvi
Harshank Vengurlekar
Jagannath Ramesh

Technical Stuff

1.Speech Input API

The speech input API aims to provide an alternative input method for web applications, without using a keyboard or other physical device. This API can be used to input commands, fill input elements, give directions etc. It is based on SpeechRequest[[1]]

The API consists of 2 main components -

A Media Capture API to capture a stream of raw audio. Its implementation is platform dependent. Mac, Windows and Linux are to be supported first, eventually adding support for android.
A streaming API to asynchronously stream microphone data to a speech recognition server and to get the results back. This will be similar to how XMLHttpRequest is implemented. The api should be able to support both local and remote engines or a combination of both depending on the network connection available.

Security/Privacy issues

A speech input session should be allowed only with the user's consent. This could be provided using a doorhanger notification.
The user should be notified when audio is being recorded possibly using a record symbol somewhere in the web browser UI itself like the URL bar or status bar.

API Design -

The API will look like the interface described in the SpeechRequest proposal.

The developer should be able to specify a grammar(using SRGS) for the speech which is useful when the set of possible commands is limited. The recognition response would be in the form of an EMMA document.
The developer should be allowed to set a threshold for accuracy and sensitivity to improve performance.
The developer should be able to choose what speech engine to use.
The developer should be able to start, stop, handle errors and multiple requests as required.

nsSpeechRequest

Related text here

endpointer

Related text here

browser.js

Related text here

browser.xul

Related text here

Bugs solved:

Mention all bugs solved in the form of a list

Tentative Schedule

28^th November - 4^th December - SpeechRequest + endpointer code compiled.
5^th December - 11^th December - Fixes to microphone handling on linux, some other small fixes, getting familiar with the code.
12^th December - 18^th December - continue some small fixes ( for example simplify thread handling ).
19^th December - 25^th December - Christmas etc, not much progress, but continue with fixing SpeechRequest API, adding possibly some new features.
26^th December - 1^st January - Holiday Season, not much progress, but continue with SpeechRequest Implementation.
2^nd January - 8^th January - Get TTS working.
9^th January - 15^th January - Enhancements to the TTS Implementation.
16^th January - 22^nd January - First speech commands: for example browser go back & go forward etc.
23^rd January - 29^th January - More speech commands, maybe read entire text

@@ Line 9: / Line 9: @@
 <h1> Technical Stuff </h1><br />
+  <h2> 1.Speech Input API  </h2>
+The speech input API aims to provide an alternative input method for web applications, without using a keyboard or other physical device. This API can be used to input commands, fill input elements, give directions etc. It is based on SpeechRequest[[http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0023/speechrequest.xml.html]]
+The API consists of 2 main components -
+*A Media Capture API to capture a stream of raw audio. Its implementation is platform dependent. Mac, Windows and Linux are to be supported first, eventually adding support for android.
+*A streaming API to asynchronously stream microphone data to a speech recognition server and to get the results back. This will be similar to how XMLHttpRequest is implemented. The api should be able to support both local and remote engines or a combination of both depending on the network connection available.
+==== Security/Privacy issues  ====
+*A speech input session should be allowed only with the user's consent. This could be provided using a doorhanger notification.
+*The user should be notified when audio is being recorded possibly using a record symbol somewhere in the web browser UI itself like the URL bar or status bar.<br>
+==== API Design -  ====
+The API will look like the interface described in the SpeechRequest proposal.
+*The developer should be able to specify a grammar(using SRGS) for the speech which is useful when the set of possible commands is limited. The recognition response would be in the form of an EMMA document.
+*The developer should be allowed to set a threshold for accuracy and sensitivity to improve performance.
+*The developer should be able to choose what speech engine to use.
+*The developer should be able to start, stop, handle errors and multiple requests as required.
+<br>
    <h2>nsSpeechRequest</h2>
      <p>Related text here</p>

SpeechAPI: Difference between revisions

Revision as of 12:38, 5 April 2012

Contents

Introduction

Initial contributors:

Technical Stuff

1.Speech Input API

Security/Privacy issues

API Design -

nsSpeechRequest

endpointer

browser.js

browser.xul

Bugs solved:

Tentative Schedule

Demos & Examples

Patches/Updates

Navigation menu

SpeechAPI: Difference between revisions

Revision as of 12:38, 5 April 2012

Introduction

Initial contributors:

Technical Stuff

1.Speech Input API

Security/Privacy issues

API Design -

nsSpeechRequest

endpointer

browser.js

browser.xul

Bugs solved:

Tentative Schedule

Demos & Examples

Patches/Updates

Navigation menu

Search