Labs/Ubiquity/Parser Documentation
Back to Labs/Ubiquity.
The Epic Journey
We will now follow the journey of a string through Ubiquity's parser.
Suppose that the user enters the following into the parser:
trans hello world to ja
Finding the Verb
First, the parser finds the verb in the sentence.
To make internationalization easier, all operations which are (human)-language-dependent are handled by language-specific plugins. Finding the verb is language-dependent: In some languages it comes before the object, in other languages after the object, etc.
So if Ubiquity is in English mode, then the sentence is dispatched to the English-specific parser plugin for verb detection. Currently this is done by simply taking the part of the string that occurs before the first space character: in other words, trans is our verb, and hello world to ja is the rest of our sentence.
(Note: this simplistic way of getting the verb currently limits us to having only single-word command names. It needs to be changed to support multi-word commands.)
Looking for a Verb Match
The parser maintains a list of every command that's installed in Ubiquity, and the next step in the parsing process is to take the input verb and compare it to the name of each command to figure out which ones are potential matches, rank them by match quality, and eliminate the rest.
When attempting to match an input verb to a command name, the parser tries the following types of matches:
- Direct match of input verb to the beginning of the command name, e.g. matching "trans" to "translate". This is considered the best type of match.
- Match in the middle or end of the command name, e.g. matching "ans" to "translate". This type of match is considered not as good: that is, if you type "ans" then the parser assumes you are more likely to mean "answer" than "translate". But a mid-word match is still better than nothing.
- Match to one of the command's synonyms. A command can define any number of synonyms. For example, "tweet" is a synonym for "twitter", so if the input is "twee" then "twitter" will go in this category. A match to a synonym is not as good as a match to the primary command name.
- Match in the middle or end of a synonym. For example, matching "eet" to "twitter" because it matches the end of "tweet". This type of match is unlikely to be what the user is looking for, and so we try only to suggest it if there are no better matches.
If there is no verb match: Noun-First Suggestions
If the input string is empty
(Noun-first suggestion based on the selection)
Scoring the Quality of the Verb Match
Scoring the Frequency of the Verb Match
Meanwhile: Parsing the Arguments
(Also language-dependent)
Assigning input substrings to command arguments
To create PartiallyParsedSentences
Interpolating magic pronouns like "this" (or not)
Getting suggestions from NounTypes
This can be asynchronous