User:Mitcho/ParserTNG: Difference between revisions

Jump to navigation Jump to search
no edit summary
(New page: <h1>Parser: The Next Generation</h1> ==Intro== ===High level overview:=== 0. receive input 1. (split words/arguments) 2. pick possible V's 2'. (pick possible clitics - for the (near) fut...)
 
No edit summary
Line 4: Line 4:


===High level overview:===
===High level overview:===
0. receive input
# (split words/arguments)
1. (split words/arguments)
# pick possible V's
2. pick possible V's
# (pick possible clitics - for the (near) future)
2'. (pick possible clitics - for the (near) future)
# group into arguments
3. group into arguments
# noun type detection
4. noun type detection
# rank
5. rank


===each language will have:===
===each language will have:===
Line 18: Line 17:
<b>EX:</b> <code>add lunch with Dan tomorrow to my calendar</code>
<b>EX:</b> <code>add lunch with Dan tomorrow to my calendar</code>


==step 1==
==step 1: split words/arguments==
Japanese: split on common particles... in the future get feedback from user for this
Japanese: split on common particles... in the future get feedback from user for this
Chinese: split on common functional verbs and prepositions
Chinese: split on common functional verbs and prepositions
Line 24: Line 23:
(Maybe split case marking prefixes/suffixes into individual words here?)
(Maybe split case marking prefixes/suffixes into individual words here?)


==step 2==
==step 2: pick possible V's==
Ubiq will cache a regexp for detection of substrings of verb names. For example: <code>(a|ad|add|add-|...|add-to-calendar|g|go|...google...)</code>
Ubiq will cache a regexp for detection of substrings of verb names. For example: <code>(a|ad|add|add-|...|add-to-calendar|g|go|...google...)</code>


Line 33: Line 32:
<b>EX</b>: <code>('add','lunch with Dan tomorrow to my calendar'), ('','add lunch with Dan tomorrow to my calendar')</code>
<b>EX</b>: <code>('add','lunch with Dan tomorrow to my calendar'), ('','add lunch with Dan tomorrow to my calendar')</code>


==step 3==
==step 3: pick possible clitics==
 
TODO
 
==step 4: group into arguments==
Find delimiters (see above).
Find delimiters (see above).


Line 64: Line 67:
(Note: for words which are not incorporated into an oblique argument (aka "modifier argument"), they are pushed onto the DO list.)
(Note: for words which are not incorporated into an oblique argument (aka "modifier argument"), they are pushed onto the DO list.)


step 4:
==step 5: noun type detection==
For each parse, send each argument string to the noun type detector. The noun type detector will cache detection results, so it only checks each string once. This returns a list of possible noun types with their "scores".
For each parse, send each argument string to the noun type detector. The noun type detector will cache detection results, so it only checks each string once. This returns a list of possible noun types with their "scores".


Line 71: Line 74:
'my calendar' -> [{type: service, score: 1},{type: arb, score: .7}]
'my calendar' -> [{type: service, score: 1},{type: arb, score: .7}]


step 5:
==step 6: ranking==
 
<code>
foreach parse (w/o V)
foreach parse (w/o V)
   by semantic roles in the parse, find appropriate verbs
   by semantic roles in the parse, find appropriate verbs
   foreach possible verb
   foreach possible verb
     score = \prod_{each semantic role in the verb} score(the content of that argument being the appropriate nountype)
     score = \prod_{each semantic role in the verb} score(the content of that argument being the appropriate nountype)
</code>
    
    
EX:
<b>EX:</b>


{V:    null,
{V:    null,
Line 103: Line 109:
score = score * (1-0.5**(#DO-1)) (example algorithm)
score = score * (1-0.5**(#DO-1)) (example algorithm)


EX: score = 1, with 2 direct objects, so
<b>EX:</b> score = 1, with 2 direct objects, so
score = 1 * (1-0.5**1) = 1 * 0.5 = 0.5
score = 1 * (1-0.5**1) = 1 * 0.5 = 0.5
308

edits

Navigation menu