User:Tedd/Speech Recognition

From MozillaWiki
Jump to: navigation, search

Architecture

API limitation

Acess to the speech recognition is limited to certified apps (at the time of writing), using the 'Func' parameter inside the WebIDL constructor:

 Func="SpeechRecognition::IsAuthorized"

IsAuthorized is implemented inside the SpeechRecognition class:

bool
SpeechRecognition::IsAuthorized(JSContext* aCx, JSObject* aGlobal)
{
  bool inCertifiedApp = IsInCertifiedApp(aCx, aGlobal);
  bool enableTests = Preferences::GetBool(TEST_PREFERENCE_ENABLE);
  bool enableRecognitionEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_ENABLE);
  bool enableRecognitionForceEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_FORCE_ENABLE);
  return (inCertifiedApp || enableRecognitionForceEnable || enableTests) && enableRecognitionEnable;
}

Relevant files in the Gecko source tree

pocketsphinx library code:

./media/pocketsphinx

Speech recognition WebIDL:

./dom/webidl/SpeechRecognitionResultList.webidl
./dom/webidl/SpeechRecognitionResult.webidl
./dom/webidl/SpeechRecognitionAlternative.webidl
./dom/webidl/SpeechRecognition.webidl
./dom/webidl/SpeechRecognitionEvent.webidl
./dom/webidl/SpeechRecognitionError.webidl

WebIDL implementation (C++):

gecko/dom/media/webspeech/recognition/SpeechRecognitionResult.h
gecko/dom/media/webspeech/recognition/SpeechRecognition.h
gecko/dom/media/webspeech/recognition/SpeechRecognitionResultList.h
gecko/dom/media/webspeech/recognition/SpeechRecognitionResult.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognitionAlternative.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognitionResultList.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognition.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognitionAlternative.h

Recognition service IDL:

./dom/media/webspeech/recognition/nsISpeechRecognitionService.idl

Implementation of the IDL interface (C++):

./dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp
./dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.h

Events Implementation (C++):

./dom/events/SpeechRecognitionError.h
./dom/events/SpeechRecognitionError.cpp

Association between components for speech recognition

Speech recognition functionality is available in JavaScript through WebIDL which is bound to a C++ class, which in return uses the nsISpeechRecognitionSerivce interface to communicate with the actual recognition service. This section should illustrate how each component is associated with one another.

In JavaScript (given the right permissions) a 'SpeechRecognition' object can be created:

var speech = new SpeechRecognition();
speech.start(stream);

The invoked function is defined inside a WebIDL file (SpeechRecognition.webidl):

interface SpeechRecognition : EventTarget {
    ...
    void start(optional MediaStream stream);
    ...
}

The SpeechRecognition interface and the start method, are itself implemented in a C++ class (SpeechRecognition::Start):

void
SpeechRecognition::Start(const Optional<NonNull<DOMMediaStream>>& aStream, ErrorResult& aRv)
{
  ...
  nsresult rv;
  rv = mRecognitionService->Initialize(this);
  ...
}

mRecognitionService is an instance of the class that implements the nsISpeechRecognitionService interface.

interface nsISpeechRecognitionService : nsISupports {
    void initialize(in SpeechRecognitionWeakPtr aSpeechRecognition);
    ...
}

In case of pocketpshinx, this class is defined in PocketSphinxSpeechRecognitionService.h which implements the Initialize function as well:

NS_IMETHODIMP
PocketSphinxSpeechRecognitionService::Initialize(
    WeakPtr<SpeechRecognition> aSpeechRecognition)
{
...
}

This class uses the pocketsphinx library for the speech recognition, an example of the library use is shown here:

rv = ps_process_raw(mPs, &mAudiovector[0], mAudiovector.Length(), FALSE,
                    FALSE);

rv = ps_end_utt(mPs);
confidence = 0;

Library functions used in pocketsphinx service class

The following exported functions from the library are used inside the SpeechRecognition (pocketsphinx) service

int ps_start_utt(ps_decoder_t *ps);
int ps_process_raw(ps_decoder_t *ps, int16 const *data, size_t n_samples, int no_search, int full_utt);
int ps_end_utt(ps_decoder_t *ps);
char const *ps_get_hyp_final(ps_decoder_t *ps, int32 *out_is_final);
int32 ps_get_prob(ps_decoder_t *ps);
logmath_t *ps_get_logmath(ps_decoder_t *ps);
arg_t const *ps_args(void);
ps_decoder_t *ps_init(cmd_ln_t *config);
int ps_set_jsgf_string(ps_decoder_t *ps, const char *name, const char *jsgf_string);
int ps_set_search(ps_decoder_t *ps, const char *name);