Labs/Ubiquity/Parser 2: Difference between revisions

add normalizeArgument
No edit summary
(add normalizeArgument)
Line 12: Line 12:
# split words/arguments + case markers
# split words/arguments + case markers
# pick possible verbs
# pick possible verbs
# (pick possible clitics - for the (near) future)
# pick possible clitics
# group into arguments (argument structure parsing)
# group into arguments (argument structure parsing)
# anaphora (magic word) substitution  
# anaphora (magic word) substitution  
# verb suggestion
# suggest normalized arguments
# suggest verbs for parses without one
# noun type detection
# noun type detection
# argument noun suggestion
# replace arguments with their nountype suggestions
# score + rank
# rank


==parser files==
==parser files==
Line 88: Line 89:
Each language has a set of "anaphora" or "magic words", like the English <code>["this", "that", "it", "selection", "him", "her", "them"]</code>. This step will search for any occurrences of these in the parses' arguments and make substituted alternatives, if there is a selection text.
Each language has a set of "anaphora" or "magic words", like the English <code>["this", "that", "it", "selection", "him", "her", "them"]</code>. This step will search for any occurrences of these in the parses' arguments and make substituted alternatives, if there is a selection text.


=step 6: noun type detection=
=step 6: suggest normalized arguments=
 
[http://mitcho.com/blog/projects/solving-another-romantic-problem/ > see blog post on argument normalization] and its use cases
 
For languages with a <code>normalizeArgument()</code> method, this method is applied to each argument. If any normalized alternatives are returned, a copy of the parse is made with that suggestion. Prefixes and suffixes stripped off through argument normalization is put in the <code>inactivePrefix</code> and <code>inactiveSuffix</code> properties of the argument.
 
=step 7: noun type detection=
For each parse, send each argument string to the noun type detector. The noun type detector will cache detection results, so it only checks each string once. This returns a list of possible noun types with their "scores".
For each parse, send each argument string to the noun type detector. The noun type detector will cache detection results, so it only checks each string once. This returns a list of possible noun types with their "scores".


Line 95: Line 102:
  'my calendar' -> [{type: service, score: 1},{type: arb, score: .7}]
  'my calendar' -> [{type: service, score: 1},{type: arb, score: .7}]


=step 7: ranking=
=step 9: replace arguments with nountype suggestions=
 
 
 
=step 10: ranking=


  foreach parse (w/o V)
  foreach parse (w/o V)
308

edits