Labs/Ubiquity/Parser Documentation: Difference between revisions

Labs/Ubiquity/Parser Documentation (view source)

Revision as of 01:15, 31 January 2009

328 bytes removed , 31 January 2009

no edit summary

Jdicarlo

1,007

edits

@@ Line 3: / Line 3: @@
 [[Image:Ubiquity_NLParser_Whiteboard_Diagram.jpg|200px|thumb|right|The parser architecture, as depicted by Jono DiCarlo.]]
-== The Epic Journey ==
+= The Epic Journey =
 We will now follow the journey of a string through Ubiquity's parser.
@@ Line 10: / Line 10: @@
    trans hello world to ja
+== Part 1: Parser plugin to PartiallyParsedSentences ==
 === Language-specific parser plugins===
@@ Line 37: / Line 39: @@
 The parser maintains a list of every command that's installed in Ubiquity, and the next step in the parsing process is to take the input verb and compare it to the name of each command to figure out which ones are potential matches, rank them by match quality, and eliminate the rest.
-When attempting to match an input verb to a command name, the parser tries the following types of matches:
-# Direct match of input verb to the beginning of the command name, e.g. matching "trans" to "translate".  This is considered the best type of match.
+=== Meanwhile: Parsing the Arguments ===
-# Match in the middle or end of the command name, e.g. matching "ans" to "translate".  This type of match is considered not as good: that is, if you type "ans" then the parser assumes you are more likely to mean "answer" than "translate".  But a mid-word match is still better than nothing.
-# Match to one of the command's synonyms.  A command can define any number of synonyms.  For example, "tweet" is a synonym for "twitter", so if the input is "twee" then "twitter" will go in this category. A match to a synonym is not as good as a match to the primary command name.
+(Also language-dependent)
-# Match in the middle or end of a synonym.  For example, matching "eet" to "twitter" because it matches the end of "tweet".  This type of match is unlikely to be what the user is looking for, and so we try only to suggest it if there are no better matches.
+=== Assigning input substrings to command arguments ===
+To create PartiallyParsedSentences
 === If there is no verb match: Noun-First Suggestions ===
@@ Line 68: / Line 72: @@
 In the unlikely event that the input and the selection are both empty, then the parser returns an empty suggestion list.
+== Scoring and Sorting Suggestions ==
 === Scoring the Quality of the Verb Match ===
@@ Line 75: / Line 81: @@
 The current heuristic is extremely ad-hoc, but produces the ordering we want... so far.
-A perfect match between the input word and the verb name earns the maximum score, 1.0.
+When attempting to match an input verb to a command name, the parser tries the following types of matches:
-If the input word matches the beginning of the verb name, it gets a good score, between 0.75 and 1.0.  The more percentage of the verb name that is matched, the higher the score is within this range.  (All else being equal, this will rank short verb names before longer ones; I'm not sure this is really needed or correct.)
+#A perfect match between the input word and the verb name earns the maximum score, 1.0.
+#If the input word matches the beginning of the verb name, e.g. input is "trans", command is "translate": it gets a good score, between 0.75 and 1.0.  The more percentage of the verb name that is matched, the higher the score is within this range.  (All else being equal, this will rank short verb names before longer ones; I'm not sure this is really needed or correct.)
-If the input word matches the verb name, but not at the beginning, (e.g.:  input is "cal", verb is "add-to-calendar") then it gets a score between 0.5 and 0.75, again scaled by the percentage of the verb name that is matched.
+#If the input word matches the verb name, but not at the beginning, (e.g.:  input is "cal", verb is "add-to-calendar") then it gets a score between 0.5 and 0.75, again scaled by the percentage of the verb name that is matched.  Match in the middle is not as good as match at the beginning, because it's assumed that if the user types "ans" they are more likely to want "answers" than "translate".
+#If the input word matches the beginning of a ''synonym'' of the verb name, it gets a score between 0.25 and 0.5.  For example, "tweet" is a synonym for "twitter", so if the input is "twee" then "twitter" will go in this category.  (Note this means that any match to a synonym is ranked below any match to a primary name.  This is to prevent synonyms from colonizing the namespace.)
-If the input word matches the beginning of a ''synonym'' of the verb name, it gets a score between 0.25 and 0.5.  (Note this means that any match to a synonym is ranked below any match to a primary name.)
+#If the input word matches a synonym of the verb name, but not at the beginning, it gets a score between 0 and 0.5.  For example, matching "eet" to "twitter" because it matches the end of "tweet".  This type of match is considered a real long-shot, but better than nothing.
-If the input word matches a synonym of the verb name, but not at the beginning, it gets a score between 0 and 0.5.
 No match at all gets a 0.
@@ Line 93: / Line 97: @@
 ( [https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l127 NLParser.Parser._sortSuggestionList() in parser.js] )
-=== Meanwhile: Parsing the Arguments ===
-(Also language-dependent)
-=== Assigning input substrings to command arguments ===
-To create PartiallyParsedSentences
 === Interpolating magic pronouns like "this" (or not) ===