Skip to content Skip to sidebar Skip to footer

Add Bi-grams To A Pandas Dataframe

I have a list of bi-grams like this: [['a','b'],['e', ''f']] Now I want to add these bigrams to a DataFrame with their frequencies like this: b f a|1 0 e|0 1 I tried doing

Solution 1:

from collections import Counter

bigrams = [[['a','b'],['e', 'f']], [['a','b'],['e', 'g']]]
pairs = []
for bg in bigrams:
    pairs.append((bg[0][0], bg[0][1]))
    pairs.append((bg[1][0], bg[1][1]))
c = Counter(pairs)

>>> pd.Series(c).unstack()  # optional:  .fillna(0)
    b   f   g
a   2 NaN NaN
e NaN   1   1

The above is for the intuition. This can be wrapped up in a one line generator expression as follows:

pd.Series(Counter((bg[i][0], bg[i][1]) for bg in bigrams for i in range(2))).unstack()

Solution 2:

You can use Counter from the collections package. Note that I changed the contents of the list to be tuples rather than lists. This is because Counter keys (like dict keys) must be hashable.

from collections import Counter

l = [('a','b'),('e', 'f')]
index, cols = zip(*l)
df = pd.DataFrame(0, index=index, columns=cols)
c = Counter(l)

for (i, c), count in c.items():
    df.loc[i, c] = count

Post a Comment for "Add Bi-grams To A Pandas Dataframe"