Is Wordnet Path Similarity Commutative?
Solution 1:
Technically without the dummy root, both car
and automobile
synsets would have no link to each other:
>>>from nltk.corpus import wordnet as wn>>>x = wn.synset('car.n.01')>>>y = wn.synset('automobile.v.01')>>>print x.shortest_path_distance(y)
None
>>>print y.shortest_path_distance(x)
None
Now, let's look at the dummy root issue closely. Firstly, there is a neat function in NLTK that says whether a synset needs a dummy root:
>>> x._needs_root()
False>>> y._needs_root()
True
Next, when you look at the path_similarity
code (http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity), you can see:
defpath_similarity(self, other, verbose=False, simulate_root=True):
distance = self.shortest_path_distance(other, \
simulate_root=simulate_root and self._needs_root())
if distance isNoneor distance < 0:
returnNonereturn1.0 / (distance + 1)
So for automobile
synset, this parameter simulate_root=simulate_root and self._needs_root()
will always be True
when you try y.path_similarity(x)
and when you try x.path_similarity(y)
it will always be False
since x._needs_root()
is False
:
>>> Trueand y._needs_root()
True>>> Trueand x._needs_root()
False
Now when path_similarity()
pass down to shortest_path_distance()
(https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance) and then to hypernym_distances()
, it will try to call for a list of hypernyms to check their distances, without simulate_root = True
, the automobile
synset will not connect to the car
and vice versa:
>>> y.hypernym_distances(simulate_root=True)
set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)])
>>> y.hypernym_distances()
set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)])
>>> x.hypernym_distances()
set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8), (Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01'), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7), (Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)])
So theoretically, the right path_similarity
is 0 / None , but because of the simulate_root=simulate_root and self._needs_root()
parameter,
nltk.corpus.wordnet.path_similarity()
in NLTK's API is not commutative.
BUT the code is also not wrong/bugged, since comparison of any synset distance by going through the root will be constantly far since the position of the dummy *ROOT*
will never change, so the best of practice is to do this to calculate path_similarity:
>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
# When you NEVER want a non-zero value, since going to # the *ROOT* will always get you some sort of distance # from synset x to synset y>>> max(wn.path_similarity(x,y), wn.path_similarity(y,x))
# when you can allow None in synset similarity comparison>>> min(wn.path_similarity(x,y), wn.path_similarity(y,x))
Solution 2:
I don't think it is a bug in wordnet per se. In your case, automobile is specified as a verb and car as noun, so you will need to look through the synset to see what the graph looks like and decide if the nets are labeled correctly.
A = 'car.n.01'
B = 'automobile.v.01'
C = 'automobile.n.01'
wn.synset(A).path_similarity(wn.synset(B))
wn.synset(B).path_similarity(wn.synset(A))
wn.synset(A).path_similarity(wn.synset(C)) # is 1
wn.synset(C).path_similarity(wn.synset(A)) # is also 1
Post a Comment for "Is Wordnet Path Similarity Commutative?"