Skip to content Skip to sidebar Skip to footer

Extract The Name Of Candidate From Text File Using Python And Nltk

import re import spacy import nltk from nltk.corpus import stopwords stop = stopwords.words('english') from nltk.corpus import wordnet inputfile = open('inputfile.txt', 'r') Strin

Solution 1:

import re
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')
from nltk.corpus import wordnet

String = 'Ravana was killed in a war'

Sentences = nltk.sent_tokenize(String)
Tokens = []
for Sent in Sentences:
    Tokens.append(nltk.word_tokenize(Sent)) 
Words_List = [nltk.pos_tag(Token) for Token in Tokens]

Nouns_List = []

for List in Words_List:
    for Word in List:
        if re.match('[NN.*]', Word[1]):
             Nouns_List.append(Word[0])

Names = []
for Nouns in Nouns_List:
    if not wordnet.synsets(Nouns):
        Names.append(Nouns)

print (Names)

Check this code. I am getting Ravana as output.

EDIT:

I used a few sentences from my resume to create a text file, and gave it as input to my program. Only the changed portion of the code is shown below:

import io

File = io.open("Documents\\Temp.txt", 'r', encoding = 'utf-8')
String = File.read()
String = re.sub('[/|.|@|%|\d+]', '', String)

And it is returning all the names that are not in the wordnet corpus, like my name, my house name, place, college name and place.

Solution 2:

From the word list obtained after parts-of-speech tagging, extract all the words having noun tag using regular expression:

Nouns_List = []

for Word in nltk.pos_tag(Words_List):
    if re.match('[NN.*]', Word[1]):
         Nouns_List.append(Word[0])

For each word in the Nouns_List, check whether it is an English word. This can be done by checking whether synsets are available for that word in wordnet:

from nltk.corpus import wordnet

Names = []
for Nouns in Nouns_List:
    ifnot wordnet.synsets(Nouns):
        #Not an English word
        Names.append(Nouns)

Since Indian names cannot be entries in English dictionary, this can be a possible method to extract them from a text.

Post a Comment for "Extract The Name Of Candidate From Text File Using Python And Nltk"