Labs/Ubiquity/Parser Documentation: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 11: Line 11:
   trans hello world to ja
   trans hello world to ja


== Part 1: Parser plugin to PartiallyParsedSentences ==
== Early Parsing: Parser plugin to PartiallyParsedSentences ==


=== Language-specific parser plugins===
=== Language-specific parser plugins===
Line 47: Line 47:


To create PartiallyParsedSentences
To create PartiallyParsedSentences
== Noun-First Parsing ==


=== If there is no verb match: Noun-First Suggestions ===
=== If there is no verb match: Noun-First Suggestions ===
Line 73: Line 76:
In the unlikely event that the input and the selection are both empty, then the parser returns an empty suggestion list.
In the unlikely event that the input and the selection are both empty, then the parser returns an empty suggestion list.


== Scoring and Sorting Suggestions ==
== Late Parsing: Noun Suggestions, FullyParsedSentences ==
 
=== Scoring the Quality of the Verb Match ===
 
([https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l829 NLParser.Verb.match()] in parser.js)
 
The current heuristic is extremely ad-hoc, but produces the ordering we want... so far.
 
When attempting to match an input verb to a command name, the parser tries the following types of matches:
 
#A perfect match between the input word and the verb name earns the maximum score, 1.0.
#If the input word matches the beginning of the verb name, e.g. input is "trans", command is "translate": it gets a good score, between 0.75 and 1.0.  The more percentage of the verb name that is matched, the higher the score is within this range.  (All else being equal, this will rank short verb names before longer ones; I'm not sure this is really needed or correct.)
#If the input word matches the verb name, but not at the beginning, (e.g.:  input is "cal", verb is "add-to-calendar") then it gets a score between 0.5 and 0.75, again scaled by the percentage of the verb name that is matched.  Match in the middle is not as good as match at the beginning, because it's assumed that if the user types "ans" they are more likely to want "answers" than "translate".
#If the input word matches the beginning of a ''synonym'' of the verb name, it gets a score between 0.25 and 0.5.  For example, "tweet" is a synonym for "twitter", so if the input is "twee" then "twitter" will go in this category.  (Note this means that any match to a synonym is ranked below any match to a primary name.  This is to prevent synonyms from colonizing the namespace.)
#If the input word matches a synonym of the verb name, but not at the beginning, it gets a score between 0 and 0.5.  For example, matching "eet" to "twitter" because it matches the end of "tweet".  This type of match is considered a real long-shot, but better than nothing.
 
No match at all gets a 0.
 
The algorithm doesn't currently recognize disjoint or out-of-order matches.  E.g. if the user typed "add cal", they might mean "add-to-calendar", and we might detect that fact if we did disjoint matches, but we don't so this will get a score of 0.  It might be worth experimenting with matches like this, but how to rank them?
 
=== Scoring the Frequency of the Verb Match ===
 
( [https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l127 NLParser.Parser._sortSuggestionList() in parser.js] )
 
 




Line 128: Line 107:


=== Filling missing arguments with the NounType's defaults ===
=== Filling missing arguments with the NounType's defaults ===
== Scoring and Sorting Suggestions ==
=== Scoring the Quality of the Verb Match ===
([https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l829 NLParser.Verb.match()] in parser.js)
The current heuristic is extremely ad-hoc, but produces the ordering we want... so far.
When attempting to match an input verb to a command name, the parser tries the following types of matches:
#A perfect match between the input word and the verb name earns the maximum score, 1.0.
#If the input word matches the beginning of the verb name, e.g. input is "trans", command is "translate": it gets a good score, between 0.75 and 1.0.  The more percentage of the verb name that is matched, the higher the score is within this range.  (All else being equal, this will rank short verb names before longer ones; I'm not sure this is really needed or correct.)
#If the input word matches the verb name, but not at the beginning, (e.g.:  input is "cal", verb is "add-to-calendar") then it gets a score between 0.5 and 0.75, again scaled by the percentage of the verb name that is matched.  Match in the middle is not as good as match at the beginning, because it's assumed that if the user types "ans" they are more likely to want "answers" than "translate".
#If the input word matches the beginning of a ''synonym'' of the verb name, it gets a score between 0.25 and 0.5.  For example, "tweet" is a synonym for "twitter", so if the input is "twee" then "twitter" will go in this category.  (Note this means that any match to a synonym is ranked below any match to a primary name.  This is to prevent synonyms from colonizing the namespace.)
#If the input word matches a synonym of the verb name, but not at the beginning, it gets a score between 0 and 0.5.  For example, matching "eet" to "twitter" because it matches the end of "tweet".  This type of match is considered a real long-shot, but better than nothing.
No match at all gets a 0.
The algorithm doesn't currently recognize disjoint or out-of-order matches.  E.g. if the user typed "add cal", they might mean "add-to-calendar", and we might detect that fact if we did disjoint matches, but we don't so this will get a score of 0.  It might be worth experimenting with matches like this, but how to rank them?
=== Scoring the Frequency of the Verb Match ===
( [https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/71b710040206/ubiquity/modules/parser/parser.js#l127 NLParser.Parser._sortSuggestionList() in parser.js] )


=== Scoring the Quality of the Noun Matches ===
=== Scoring the Quality of the Noun Matches ===
1,007

edits