308
edits
(New page: <h1>Parser: The Next Generation</h1> ==Intro== ===High level overview:=== 0. receive input 1. (split words/arguments) 2. pick possible V's 2'. (pick possible clitics - for the (near) fut...) |
No edit summary |
||
| Line 4: | Line 4: | ||
===High level overview:=== | ===High level overview:=== | ||
# (split words/arguments) | |||
# pick possible V's | |||
# (pick possible clitics - for the (near) future) | |||
# group into arguments | |||
# noun type detection | |||
# rank | |||
===each language will have:=== | ===each language will have:=== | ||
| Line 18: | Line 17: | ||
<b>EX:</b> <code>add lunch with Dan tomorrow to my calendar</code> | <b>EX:</b> <code>add lunch with Dan tomorrow to my calendar</code> | ||
==step 1== | ==step 1: split words/arguments== | ||
Japanese: split on common particles... in the future get feedback from user for this | Japanese: split on common particles... in the future get feedback from user for this | ||
Chinese: split on common functional verbs and prepositions | Chinese: split on common functional verbs and prepositions | ||
| Line 24: | Line 23: | ||
(Maybe split case marking prefixes/suffixes into individual words here?) | (Maybe split case marking prefixes/suffixes into individual words here?) | ||
==step 2== | ==step 2: pick possible V's== | ||
Ubiq will cache a regexp for detection of substrings of verb names. For example: <code>(a|ad|add|add-|...|add-to-calendar|g|go|...google...)</code> | Ubiq will cache a regexp for detection of substrings of verb names. For example: <code>(a|ad|add|add-|...|add-to-calendar|g|go|...google...)</code> | ||
| Line 33: | Line 32: | ||
<b>EX</b>: <code>('add','lunch with Dan tomorrow to my calendar'), ('','add lunch with Dan tomorrow to my calendar')</code> | <b>EX</b>: <code>('add','lunch with Dan tomorrow to my calendar'), ('','add lunch with Dan tomorrow to my calendar')</code> | ||
==step 3== | ==step 3: pick possible clitics== | ||
TODO | |||
==step 4: group into arguments== | |||
Find delimiters (see above). | Find delimiters (see above). | ||
| Line 64: | Line 67: | ||
(Note: for words which are not incorporated into an oblique argument (aka "modifier argument"), they are pushed onto the DO list.) | (Note: for words which are not incorporated into an oblique argument (aka "modifier argument"), they are pushed onto the DO list.) | ||
step | ==step 5: noun type detection== | ||
For each parse, send each argument string to the noun type detector. The noun type detector will cache detection results, so it only checks each string once. This returns a list of possible noun types with their "scores". | For each parse, send each argument string to the noun type detector. The noun type detector will cache detection results, so it only checks each string once. This returns a list of possible noun types with their "scores". | ||
| Line 71: | Line 74: | ||
'my calendar' -> [{type: service, score: 1},{type: arb, score: .7}] | 'my calendar' -> [{type: service, score: 1},{type: arb, score: .7}] | ||
step | ==step 6: ranking== | ||
<code> | |||
foreach parse (w/o V) | foreach parse (w/o V) | ||
by semantic roles in the parse, find appropriate verbs | by semantic roles in the parse, find appropriate verbs | ||
foreach possible verb | foreach possible verb | ||
score = \prod_{each semantic role in the verb} score(the content of that argument being the appropriate nountype) | score = \prod_{each semantic role in the verb} score(the content of that argument being the appropriate nountype) | ||
</code> | |||
EX: | <b>EX:</b> | ||
{V: null, | {V: null, | ||
| Line 103: | Line 109: | ||
score = score * (1-0.5**(#DO-1)) (example algorithm) | score = score * (1-0.5**(#DO-1)) (example algorithm) | ||
EX: score = 1, with 2 direct objects, so | <b>EX:</b> score = 1, with 2 direct objects, so | ||
score = 1 * (1-0.5**1) = 1 * 0.5 = 0.5 | score = 1 * (1-0.5**1) = 1 * 0.5 = 0.5 | ||
edits