Skip to content Skip to sidebar Skip to footer

How To Make A Unique Data From Strings

I have a data like this . the strings are separated by comma. 'India1,India2,myIndia ' 'Where,Here,Here ' 'Here,Where,India,uyete' 'AFD,TTT' What I am trying to do is to put

Solution 1:

stack.txt below contains this:

"India1,India2,myIndia"
"Where,Here,Here"
"Here,Where,India,uyete"
"AFD,TTT"

Here you go:

from collections import OrderedDict

with open("stack.txt", "r") as f:
    # read your data in from the gist site and strip off any new-line characters
    data = [eval(line.strip()) for line in f.readlines()]
    # get individual words into a list
    individual_elements = [word for row in data for word in row.split(",")]
    # remove duplicates and preserve order
    uniques = OrderedDict.fromkeys(individual_elements)   
    # convert from OrderedDict object to plain list
    final = [word for word in uniques]

print(final)

Which yields this:

['India1', 'India2', 'myIndia', 'Where', 'Here', 'India', 'uyete', 'AFD', 'TTT']

Edit: To get your desired output, just print the list in the format you want:

print("\n".join(final))

Which is equivalent, from an output standpoint, to this:

for x in final:
    print(x)

Which yields this:

India1
India2
myIndia
Where
Here
India
uyete
AFD
TTT

Solution 2:

Why using numpy??? and I'm not sure if you want to use the same file as input and output

#!/usr/bin/env python


# give a name to my data 
inputData = """India1,India2,myIndia
Where,Here,Here   
Here,Where,India,uyete
AFD,TTT"""

# if you want to read the data from a file
#inputData = open(fileName, 'r').readlines()

outputData = ""
tempData = list()
for line in inputData.split("\n"):
    lineStripped = line.strip()
    lineSplit = lineStripped.split(',')
    lineElementsStripped = [element.strip() for element in lineSplit]
    tempData.extend( lineElementsStripped )
tempData = set(tempData)
outputData = "\n".join(tempData)
print("\nInputdata: \n%s" % inputData)
print("\nOutputdata: \n%s" % outputData)

Solution 3:

It sounds like you probably have a csv file. You don't need numpy for that, the included batteries are all you need.

 import csv

 data = []
 with open('test.txt') as f:
     reader = csv.reader(f)
     for row in reader:
         data.extend(row)

You can .extend lists rather than .append to them. It's basically like saying

for thing in row:
    data.append(thing)

That will still leave the duplicates, though. If you don't care about order you can just make it a set and call .update() instead of extend:

 data = set()
 with open('test.txt') as f:
     reader = csv.reader(f)
     for row in reader:
         data.extend(row)

And now everything is unique. But if you care about order you'll have to filter things down a bit:

unique_data = []
for thing in data:
    if thing not in unique_data:
        unique_data.append(thing)

If your test.txt file contains this text:

"India1,India2,myIndia     "
"Where,Here,Here   "
"Here,Where,India,uyete"
"AFD,TTT"

And not

India1,India2,myIndia     
Where,Here,Here   
Here,Where,India,uyete
AFD,TTT

Then you don't quite have a csv. You can either fix what's generating your csv or manually remove the quotes or just fix it on the fly.

def remove_quotes(file):
    for line in file:
        yield line.strip('"\n')

reader = csv.reader(remove_quotes(f))

Post a Comment for "How To Make A Unique Data From Strings"