Python: Replace One Word In A Sentence With A List Of Words And Put Thenew Sentences In Another Column In Pandas
Solution 1:
Use:
# STEP 1
df1 = data['sentences'].str.extract(
r"(?i)(?P<before>.*)\s(?P<clock>\w+(?=\so'clock))\s(?P<after>.*)")
# STEP 2
df1['clock'] = df1['clock'].str.replace(
r'\w+', ','.join(my_list)).str.split(',')
# STEP 3
data['new_sentences'] = df1.dropna().explode('clock').agg(
' '.join, 1).groupby(level=0).agg(', '.join)
# STEP 4
data['new_sentences'] = data['new_sentences'].fillna(data['sentences'])
Explanation/Steps:
STEP 1: Use Series.str.extract
along with the given regex pattern to create a three column dataframe where the first col corresponds to the sentence before the clock e.g. 10
, the middle column corresponds to clock itself and right column corresponds to the sentence after the clock.
# df1
before clock after
0 I have a class at ten o'clock
1 NaN NaN NaN
2 she goes to school at eight o'clock
STEP 2: Use Series.str.replace
to replace the tokens in the clock column with all the items in my_list
. Then use Series.str.split
to split the replaced tokens around the delimiter ,
.
# df1
before clock after
0 I have a class at [two, three, five, ten] o'clock
1 NaN NaN NaN
2 she goes to school at [two, three, five, ten] o'clock
STEP 3: Dataframe.explode
to explode the dataframe df1 around column clock
the use the .agg
to join the columns along axis 1. Then use groupby on level 0 to agg this datframe further.
# data
sentences new_sentences
0 I have a class at ten o'clock I have a class at two o'clock, I have a class ...
1 she is my friend NaN
2 she goes to school at eight o'clock she goes to school at two o'clock, she goes to...
STEP 4: Finally use Series.fillna
to fill the missing values in the new_sentences
column from the corresponding sentences
column.
# data
sentences new_sentences
0 I have a class at ten o'clock I have a class at two o'clock, I have a class ...
1 she is my friend she is my friend
2 she goes to school at eight o'clock she goes to school at two o'clock, she goes to...
Solution 2:
Is this in line with what you were expecting?
import re
data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']
regex = re.compile(r"(\w+) (?=o'clock)", re.IGNORECASE)
new = []
for i in data["sentences"]:
for j in my_list:
new.append(re.sub(regex, j + ' ', i))
new = list(set(new))
print(new)
Output:
I have a class at two o'clock
I have a class at ten o'clock
she goes to school at two o'clock
she goes to school at five o'clock
I have a class at five o'clock
I have a class at three o'clock
she goes to school at ten o'clock
she goes to school at three o'clock
she is my friend
OR equivalent:
import re
data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']
regex = re.compile(r"(\w+) (?=o'clock)", re.IGNORECASE)
x = list(set([re.sub(regex, j + ' ', i) for j in my_list for i in data["sentences"]]))
Post a Comment for "Python: Replace One Word In A Sentence With A List Of Words And Put Thenew Sentences In Another Column In Pandas"