Finding Head Of A Noun Phrase In Nltk And Stanford Parse According To The Rules Of Finding Head Of A Np
Solution 1:
There are built-in string to Tree
object in NLTK (http://www.nltk.org/_modules/nltk/tree.html), see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541.
>>> from nltk.tree import Tree
>>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'>>> for i in Tree.fromstring(parsestr).subtrees():
... if i.label() == 'NP':
... print i
...
(NP
(NP (DT The) (JJ old) (NN oak) (NN tree))
(PP (IN from) (NP (NNP India))))
(NP (DT The) (JJ old) (NN oak) (NN tree))
(NP (NNP India))
>>> for i in Tree.fromstring(parsestr).subtrees():
... if i.label() == 'NP':
... print i.leaves()
...
['The', 'old', 'oak', 'tree', 'from', 'India']
['The', 'old', 'oak', 'tree']
['India']
Note that it's not always the case that right most noun is the head noun of an NP, e.g.
>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
... if i.label() == 'NP':
... print i.leaves()[-1]
...
Magnificent
talk
Arguably, Magnificent
can still be the head noun. Another example is when the NP includes a relative clause:
(NP (NP the person) that gave (NP the talk)) went home
The head noun of the subject is person
but the last leave node of the NP the person that gave the talk
is talk
.
Solution 2:
I was looking for a python script using NLTK that does this task and stumbled across this post. Here's the solution I came up with. It's a little bit noisy and arbitrary, and definitely doesn't always pick the right answer (e.g. for compound nouns). But I wanted to post it in case it was helpful for others to have a solution that mostly works.
#!/usr/bin/env pythonfrom nltk.tree import Tree
examples = [
'(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
"(ROOT\n (S\n (NP\n (NP (DT the) (NN person))\n (SBAR\n (WHNP (WDT that))\n (S\n (VP (VBD gave)\n (NP (DT the) (NN talk))))))\n (VP (VBD went)\n (NP (NN home)))))",
'(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
]
deffind_noun_phrases(tree):
return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]
deffind_head_of_np(np):
noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
top_level_trees = [np[i] for i inrange(len(np)) iftype(np[i]) is Tree]
## search for a top-level noun
top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
iflen(top_level_nouns) > 0:
## if you find some, pick the rightmost one, just 'causereturn top_level_nouns[-1][0]
else:
## search for a top-level np
top_level_nps = [t for t in top_level_trees if t.label()=='NP']
iflen(top_level_nps) > 0:
## if you find some, pick the head of the rightmost one, just 'causereturn find_head_of_np(top_level_nps[-1])
else:
## search for any noun
nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
iflen(nouns) > 0:
## if you find some, pick the rightmost one, just 'causereturn nouns[-1]
else:
## return the rightmost word, just 'causereturn np.leaves()[-1]
for example in examples:
tree = Tree.fromstring(example)
for np in find_noun_phrases(tree):
print"noun phrase:",
print" ".join(np.leaves())
head = find_head_of_np(np)
print"head:",
print head
For the examples discussed in the question and in the other answers, this is the output:
noun phrase:TheoldoaktreefromIndiahead:treenoun phrase:Theoldoaktreehead:treenoun phrase:Indiahead:Indianoun phrase:thepersonthatgavethetalkhead:personnoun phrase:thepersonhead:personnoun phrase:thetalkhead:talknoun phrase:homehead:homenoun phrase:CarnactheMagnificenthead:Magnificentnoun phrase:atalkhead:talk
Post a Comment for "Finding Head Of A Noun Phrase In Nltk And Stanford Parse According To The Rules Of Finding Head Of A Np"