Firefox OS/Vaani/Custom Command

From MozillaWiki
Jump to navigation Jump to search

What is Custom Command?

Custom Command is a feature to enable 3rd party apps to register their voice commands. Voice commands will be predefined actions in the short term. We may use Action of schema.org as base and extend it as action list. 3rd party apps should use JSON-LD to state which actions they support. Once the actions detected, Vaani calls the apps to handle that.

Prerequisite

Install Vaani to your Firefox OS device

  1. Have web speech API valid on your device. (Currently, the API is only valid on B2G nightly build)
  2. Install Vaani on your device.
    1. Download Vaani from https://github.com/kdavis-mozilla/fxos-voice-commands
    2. Connect your device
    3. Open your firefox browser select Tools -> Web Developer -> Web IDE
    4. Select Project -> Open Packaged App, choose the folder you place the downloaded app.
    5. Click on Play icon

Implementation

(TBD)(brief introduction)

Architecture

(TBD)(graph and explain)

The following picture shows the system architecture :

CustomerCommandArchitecture.png

We will explain this architecture by components.

ActionCollector

ActionCollector will parse the actions from ActionDescriptions and store it. Once it receives the action trigger event, it will forward the action to corresponding web app.

InteractionManager

InteractionManager is the main controller of this system. It listens to the recognition results from SpeechRecognition and ask ActionCollector to trigger related action.

ActionTranslater

When InteractionManager receives the recognition results from SpeechRecognition, what it gets is just the raw sentences. The results have to be translated by ActionTranslater to become real action.

Vaani

Vaani is the UI of this system, it will tell users the state of this system and also user interact with users to make them get access to this system.

Our ultra target is use JSON-LD format to implement actions, this will be mentioned in the following section. And in current stage, we don't implement ActionTranslater neither, just register predefined grammar to SpeechRecognition.

Data structure

(TBD)(definition and explain)

JSON-LD and manifest.webapp

Since JSON-LD is not implemented, we use manifest.webapp to host the supported actions information. In manifest.webapp, an app should add the following section to have custom command supported:

 "custom_commands": [{
   "@context": "http://schema.org",
   "@type": "WebApplication",
   "url": "/calendar/index.html",
   "potentialAction": {
     "@type": "DialAction",
     "target": {
       "url": "/dailer/index.html?dail={telephone}"
     }
   },
   "potentialObject": {
     "@type": ["Text", "Person"]
   }
 }]

In this case, we create a dial command whose object is a telephone text or a person. Once a user says this command, the /dailer/index.html?dail={telephone} is called.

Predefined actions

(TBD)

Bug list

Dependencies

  1. JSON-LD support in Browser API: bug 1178491

Tasks

Fast prototype

Before clarifying real tasks, we will create a fast prototype based at gaia. The design of actions will be implemented at gaia. We will use pre-defined action to grammar mappings, like:

   # DialAction
   #JSGF v1.0; grammar fxosVoiceCommands;
   public <dial> = call (one | two | three | four | five | six | seven | eight | nine | zero)+
   
   # OpenAction
   #JSGF v1.0; grammar fxosVoiceCommands;
   public <open> = open `app names...`

Some of grammars will be generated dynamically, like OpenAction.

Manifest change at dialer/Vaani app

We add the following settings to dialer app:

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "DialAction",
       "target": {
         "@type": "WebActivity",
         "name": "dial",
         "data": {
           "type": "webtelephony/number",
           "number": "@number"
         }
       }
     }]


We add the following settings to Vaani app:

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "OpenAction",
       "object": "SoftwareApplication",
       "target": {
         "@type": "WebActivity",
         "name": "open-app",
         "data": {
           "type": "webapp",
           "manifest": "@manifest",
           "entry_point": "@entry_point"
         }
       }
     }]


Since we don't have NLP, we use predefined action for this case. At this version, we support DialAction and OpenAction. All variables of these two actions are also predefined. The variable @number will be replaced with the recognized number in this case. The @manifest and @entry_point will be replaced with app's manifest and entrypoint.

Since there is no one handles open-app activity, we also add open-app activity to Vaani app.

Development Plan

  • Stage 1 : Single customized command.
    • Define data format and supported actions.
    • Predefined grammar structure.
    • Single command.
  • Stage 2 : Compound command
    • Compound command.
  • Stage 3 : NLP service
    • NLP processor to recognition result.

FAQ

(TBD)

  • JSON-LD is written in javascript scope which means we can only get the information after the App is launched.
  • To achieve NLP recognition, SpeechRecognition should support grammar-free recognition.