User:GuanWen/SpeechSynthesis

SpeachSynthesis

Introduction

Speech synthesis is part of html5 speech API, it can do the text-to-speech work.
You can check the detail at https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

WebAPI

The following a sample codes that we can use in web page.
Example

   var txt = new SpeechSynthesisUtterance('Hello Mozilla');
   speechSynthesis.speak(txt);

To make speech synthesis API speeching the sentence or words you want it to, you have to wrap them as a SpeechSynthesisUtterance object, and pass it to speechSynthesis.speak() as parameter. There are more details in following sections.

SpeechSynthesisUtterance

SpeechSynthesisUtterance is an interface that contain the information about how and what content the system should speech.

attribute DOMString text;  //The content system should speech
attribute DOMString lang; //The language system should choose
attribute SpeechSynthesisVoice? voice; //The voice system should use
attribute float volume; //The speech volume
attribute float rate;  //The speech rate
attribute float pitch;  //The speech pitch

SpeechSynthesis

SpeechSynthesis is an interface of the speech system. developer can control it through following methods.

void speak(SpeechSynthesisUtterance utterance);  //Make the system speak the SpeechSynthesisUtterance you pass.
void cancel();  //Cancel the speak request and all the request in speak queue.
void pause();  //Pause the current speak task.
void resume();  //Resume the task from pause state.

And developer can use SpeechSynthesis.getVoices() to obtain available voices in system.

Components

There are three main components of speech synthesis in the gecko system: SpeechSynthesis, nsSynthVoiceRegistry and nsISpeechService.

SpeechSynthesis

SpeechSynthesis is in charge of managing all the speech request. It puts receiving requests in a queue and pass the request to nsSynthVoiceRegistry when speech system available. Also it controls the requests execution with pause(), resume(), cancel() API from web API developers.

nsSynthVoiceRegistry

Speech service will register their voice to nsSynthVoiceRegistry when they are started. And it will wrap request from SpeechSynthesis into a peech task and send it from child process to chrome process. It bind the task with mStream. It will find the best mVoice to task and send it to the corresponding speech service

nsISpeechService

The actual component to do the text-to-speech, it will take initiate to register voice to nsSynthVoiceregistry. Two kinds of VoiceService
1. Indirect audio - the service is responsible for outputting audio. The service calls the nsISpeechTask.dispatch methods directly. Starting with dispatchStart() and ending with dispatchEnd or dispatchError().
2. Direct audio - the service provides us with PCM-16 data, and we output it. The service does not call the dispatch task methods directly. Instead, audio information is provided at setup(), and audio data is sent with sendAudio(). The utterance is terminated with an empty sendAudio().

nsISpeechService implement

In firefox OS, we use Pico as our TTS system. We are trying to enable SpeechSynthesis on other platform. You can check it in [Bug 1003439](https://bugzilla.mozilla.org/show_bug.cgi?id=1003439)