Firefox OS/Vaani/Custom Command

From MozillaWiki
Jump to: navigation, search

What is Custom Command?

Custom Command is a feature to enable 3rd party apps to register their voice commands. Voice commands will be predefined actions in the short term. We may use Action of schema.org as base and extend it as action list. 3rd party apps should use JSON-LD to state which actions they support. Once the actions detected, Vaani calls the apps to handle that.

Prerequisite

Install Vaani to your Firefox OS device

  1. Have web speech API valid on your device. (Currently, the API is only valid on B2G nightly build)
  2. Install Vaani on your device.
    1. Download Vaani from https://github.com/mozilla/vaani
      1. Follow instructions to build in the README.md file
    2. Connect your device
    3. Open your firefox browser select Tools -> Web Developer -> Web IDE
    4. Select Project -> Open Packaged App, choose the "build/" directory inside the folder you downloaded the app.
    5. Click on Play icon

Implementation

(TBD)(brief introduction) To achieve 3rd party command registry, we apply linked data defined in schema.org as command format then make Vaani parse and manage the registered commands.

Data structure

(TBD)(definition and explain)

JSON-LD and manifest.webapp

Since JSON-LD is not implemented, we use manifest.webapp to host the supported actions information. In manifest.webapp, an app should add the following section to have custom command supported:

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "OpenAction",
       "object": "SoftwareApplication",
       "target": {
         "@type": "WebActivity",
         "name": "open-app",
         "data": {
           "type": "webapp",
           "manifest": "@manifest",
           "entry_point": "@entry_point"
         }
       }
     }]

In this case, we create an open command whose object is a software application. Once a user says this command, an activity is created.

Structure of Code


Introduced-modules.png

predefined actions (predefined-actions/*.js)

A predefined action is responsible for:

  • parse target field of manifest.webapp
  • generate grammar
  • parse result from transcript

We can find DialAction at predefined-actions/dial-action.js, OpenAppAction at predefined-actions/open-app-action.js.


launchers (launchers/*.js)

A launcher executes the command parsed by Vaani. Currently, we only support WebActivity, launchers/activity-launcher.js. In the future, we will support IAC.


actions parser (store/app-actions.js)

The actions parser parses action definitions from manifest.webapp. It looks up custom_command field from manifest.webapp file. Once found it, it asks launcher to parse it.


core (action/standing-by.js)

We extends the original design of Vaani but changes setupSpeech and _interpreter function at standing-by.js:

  • setupSpeech: we use predefined actions to generate grammars.
  • _interpreter: we use predefined actions to parse transcript and use launcher to execute command.


Predefined actions

(TBD)

Dial Action

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "DialAction",
       "target": {
         "@type": "WebActivity",
         "name": "dial",
         "data": {
           "type": "webtelephony/number",
           "number": "@number"
         }
       }
     }]

Open Action

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "OpenAction",
       "object": "SoftwareApplication",
       "target": {
         "@type": "WebActivity",
         "name": "open-app",
         "data": {
           "type": "webapp",
           "manifest": "@manifest",
           "entry_point": "@entry_point"
         }
       }
     }]

Bug list

Dependencies

  1. JSON-LD support in Browser API: bug 1178491

Development Plan

  • Stage 1 : Single customized command.
    • Define data format and supported actions.
    • Predefined grammar structure.
    • Compound command through IAC.
  • Stage 2 : Compound command API
    • Compound command API for 3rd party apps.
  • Stage 3 : NLP service
    • NLP processor to recognition result.

Stage 1

In stage 1, we will implement experimental code at gaia with the following prerequisites:

  • We use WebActivity to support UI switching interaction, and use IAC to support background interaction.
  • For multiple steps interaction, we use IAC as the base for asking apps to help Vaani to fulfill the missing elements and perform the command.
  • We only support predefined actions since we don't have NLP.

Repos

We may find all sources at the following places:

Current status

Only WebActivity is supported.

Grammars/Actions

The predefined actions are DialAction and OpenAppAction which will be mapped to grammar, like:

   # DialAction
   #JSGF v1.0; grammar fxosVoiceCommands;
   public <dial> = call (one | two | three | four | five | six | seven | eight | nine | zero)+
   
   # OpenAppAction
   #JSGF v1.0; grammar fxosVoiceCommands;
   public <open> = open `app names...`

DialAction can be mapped into a concrete grammar. Since the app name may have pronounceable name bug 1180113, grammar of OpenAppAction will be defined as built-in apps.

The OpenAppAction can be translated into a generic OpenAction with SoftwareApplication object. So, we change the definition of OpenAppAction to OpenAction with a object field whose value is SoftwareApplication object, we may find it below.

Manifest change at dialer/Vaani app

We add the following settings to dialer app:

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "DialAction",
       "target": {
         "@type": "WebActivity",
         "name": "dial",
         "data": {
           "type": "webtelephony/number",
           "number": "@number"
         }
       }
     }]


We add the following settings to Vaani app:

     "custom_commands": [{
       "@context": "http://schema.org",
       "@type": "OpenAction",
       "object": "SoftwareApplication",
       "target": {
         "@type": "WebActivity",
         "name": "open-app",
         "data": {
           "type": "webapp",
           "manifest": "@manifest",
           "entry_point": "@entry_point"
         }
       }
     }]


All variables of DialAction and OpenAction are also predefined. The variable @number will be replaced with the recognized number in this case. The @manifest and @entry_point will be replaced with app's manifest and entrypoint.

Since there is no one handles open-app activity, we also add open-app activity to Vaani app.

Stage 2

In stage 1, we use IAC to achieve interaction between Vaani and app. However, IAC is only for certified apps, we want extend it to any 3rd party apps. So in this stage, we will design a communication API for general packaged apps.

FAQ

(TBD)

  • JSON-LD is written in javascript scope which means we can only get the information after the App is launched.
  • To achieve NLP recognition, SpeechRecognition should support grammar-free recognition.