Extract The Name Of Candidate From Text File Using Python And Nltk
import re import spacy import nltk from nltk.corpus import stopwords stop = stopwords.words('english') from nltk.corpus import wordnet inputfile = open('inputfile.txt', 'r') Strin
Solution 1:
import re
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')
from nltk.corpus import wordnet
String = 'Ravana was killed in a war'
Sentences = nltk.sent_tokenize(String)
Tokens = []
for Sent in Sentences:
Tokens.append(nltk.word_tokenize(Sent))
Words_List = [nltk.pos_tag(Token) for Token in Tokens]
Nouns_List = []
for List in Words_List:
for Word in List:
if re.match('[NN.*]', Word[1]):
Nouns_List.append(Word[0])
Names = []
for Nouns in Nouns_List:
if not wordnet.synsets(Nouns):
Names.append(Nouns)
print (Names)
Check this code. I am getting Ravana
as output.
EDIT:
I used a few sentences from my resume to create a text file, and gave it as input to my program. Only the changed portion of the code is shown below:
import io
File = io.open("Documents\\Temp.txt", 'r', encoding = 'utf-8')
String = File.read()
String = re.sub('[/|.|@|%|\d+]', '', String)
And it is returning all the names that are not in the wordnet
corpus, like my name, my house name, place, college name and place.
Solution 2:
From the word list obtained after parts-of-speech tagging, extract all the words having noun tag using regular expression:
Nouns_List = []
for Word in nltk.pos_tag(Words_List):
if re.match('[NN.*]', Word[1]):
Nouns_List.append(Word[0])
For each word in the Nouns_List
, check whether it is an English word. This can be done by checking whether synsets
are available for that word in wordnet
:
from nltk.corpus import wordnet
Names = []
for Nouns in Nouns_List:
ifnot wordnet.synsets(Nouns):
#Not an English word
Names.append(Nouns)
Since Indian names cannot be entries in English dictionary, this can be a possible method to extract them from a text.
Post a Comment for "Extract The Name Of Candidate From Text File Using Python And Nltk"