1,007
edits
No edit summary |
|||
Line 3: | Line 3: | ||
[[Image:Ubiquity_NLParser_Whiteboard_Diagram.jpg|200px|thumb|right|The parser architecture, as depicted by Jono DiCarlo.]] | [[Image:Ubiquity_NLParser_Whiteboard_Diagram.jpg|200px|thumb|right|The parser architecture, as depicted by Jono DiCarlo.]] | ||
= The Epic Journey = | |||
We will now follow the journey of a string through Ubiquity's parser. | We will now follow the journey of a string through Ubiquity's parser. | ||
Line 10: | Line 10: | ||
trans hello world to ja | trans hello world to ja | ||
== Part 1: Parser plugin to PartiallyParsedSentences == | |||
=== Language-specific parser plugins=== | === Language-specific parser plugins=== | ||
Line 37: | Line 39: | ||
The parser maintains a list of every command that's installed in Ubiquity, and the next step in the parsing process is to take the input verb and compare it to the name of each command to figure out which ones are potential matches, rank them by match quality, and eliminate the rest. | The parser maintains a list of every command that's installed in Ubiquity, and the next step in the parsing process is to take the input verb and compare it to the name of each command to figure out which ones are potential matches, rank them by match quality, and eliminate the rest. | ||
=== Meanwhile: Parsing the Arguments === | |||
(Also language-dependent) | |||
=== Assigning input substrings to command arguments === | |||
To create PartiallyParsedSentences | |||
=== If there is no verb match: Noun-First Suggestions === | === If there is no verb match: Noun-First Suggestions === | ||
Line 68: | Line 72: | ||
In the unlikely event that the input and the selection are both empty, then the parser returns an empty suggestion list. | In the unlikely event that the input and the selection are both empty, then the parser returns an empty suggestion list. | ||
== Scoring and Sorting Suggestions == | |||
=== Scoring the Quality of the Verb Match === | === Scoring the Quality of the Verb Match === | ||
Line 75: | Line 81: | ||
The current heuristic is extremely ad-hoc, but produces the ordering we want... so far. | The current heuristic is extremely ad-hoc, but produces the ordering we want... so far. | ||
When attempting to match an input verb to a command name, the parser tries the following types of matches: | |||
If the input word matches the beginning of the verb name, it gets a good score, between 0.75 and 1.0. The more percentage of the verb name that is matched, the higher the score is within this range. (All else being equal, this will rank short verb names before longer ones; I'm not sure this is really needed or correct.) | #A perfect match between the input word and the verb name earns the maximum score, 1.0. | ||
#If the input word matches the beginning of the verb name, e.g. input is "trans", command is "translate": it gets a good score, between 0.75 and 1.0. The more percentage of the verb name that is matched, the higher the score is within this range. (All else being equal, this will rank short verb names before longer ones; I'm not sure this is really needed or correct.) | |||
If the input word matches the verb name, but not at the beginning, (e.g.: input is "cal", verb is "add-to-calendar") then it gets a score between 0.5 and 0.75, again scaled by the percentage of the verb name that is matched. | #If the input word matches the verb name, but not at the beginning, (e.g.: input is "cal", verb is "add-to-calendar") then it gets a score between 0.5 and 0.75, again scaled by the percentage of the verb name that is matched. Match in the middle is not as good as match at the beginning, because it's assumed that if the user types "ans" they are more likely to want "answers" than "translate". | ||
#If the input word matches the beginning of a ''synonym'' of the verb name, it gets a score between 0.25 and 0.5. For example, "tweet" is a synonym for "twitter", so if the input is "twee" then "twitter" will go in this category. (Note this means that any match to a synonym is ranked below any match to a primary name. This is to prevent synonyms from colonizing the namespace.) | |||
If the input word matches the beginning of a ''synonym'' of the verb name, it gets a score between 0.25 and 0.5. (Note this means that any match to a synonym is ranked below any match to a primary name.) | #If the input word matches a synonym of the verb name, but not at the beginning, it gets a score between 0 and 0.5. For example, matching "eet" to "twitter" because it matches the end of "tweet". This type of match is considered a real long-shot, but better than nothing. | ||
If the input word matches a synonym of the verb name, but not at the beginning, it gets a score between 0 and 0.5. | |||
No match at all gets a 0. | No match at all gets a 0. | ||
Line 93: | Line 97: | ||
( [https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l127 NLParser.Parser._sortSuggestionList() in parser.js] ) | ( [https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l127 NLParser.Parser._sortSuggestionList() in parser.js] ) | ||
=== Interpolating magic pronouns like "this" (or not) === | === Interpolating magic pronouns like "this" (or not) === |
edits