Pandas Melt 2 Groups Of Columns
I have a table as id name sec1 sec2 sec3 sec4 sec5 sec6 1 abc mp(6) up(7) sp(8) cp(7) 2 4 and i want output as id name tests values slots 1 abc
Solution 1:
First we start with set_index
and stack()
.
We then extract the test value and place it with its own column and reset the index:
s = df.set_index(['id','name'],append=True).stack().to_frame('tests')
s['values'] = s['tests'].str.extract('(\d+)')[0]
df1 = s.dropna(subset=['values']).reset_index([1,2,3])
df1['tests'] = df1['tests'].str.replace('\(.*\)','')
print(df1)
id name level_3 tests values
0 1 abc sec1 mp 6
0 1 abc sec2 up 7
0 1 abc sec3 sp 8
0 1 abc sec4 cp 7
Okay, now we need to something very similar to a cross join in SQL, that is repeat the dataframe according the len
of the slots and add each value accordingly.
Note, this is making a product of your dataframe, a better way would be to use merge but you've not made your input clear, as in how multiple rows will appear.
Let's try this with pd.concat
making use of the keys
argument:
vals = s.loc[s['values'].isna(),'tests'].values
final = pd.concat([df1 ]* len(vals),0,keys=vals)\
.reset_index(0).rename(columns={'level_0' : 'slots'})
print(final[['id','name','tests','values','slots']])
id name tests values slots
0 1 abc mp 6 2
0 1 abc up 7 2
0 1 abc sp 8 2
0 1 abc cp 7 2
0 1 abc mp 6 4
0 1 abc up 7 4
0 1 abc sp 8 4
0 1 abc cp 7 4
Avoiding a total product using merge
:
vals = s.loc[s['values'].isna(), 'tests'].reset_index([2,3],drop=True)\
.to_frame('slots').reset_index(1)
print(pd.merge(df1,vals,on=['id']))
id name level_3 tests values slots
0 1 abc sec1 mp 6 2
1 1 abc sec1 mp 6 4
2 1 abc sec2 up 7 2
3 1 abc sec2 up 7 4
4 1 abc sec3 sp 8 2
5 1 abc sec3 sp 8 4
6 1 abc sec4 cp 7 2
7 1 abc sec4 cp 7 4
Solution 2:
One way about it is to melt
twice, then extract the text and numbers into separate columns:
(
df.melt(["id", "name", "sec5", "sec6"], var_name="var1", value_name="tests")
.melt(["id", "name", "var1", "tests"], value_name="slots")
.assign(values=lambda df: df.tests.str.extract(r"(\d)"),
tests=lambda df: df.tests.str.extract(r"([a-z]+)"))
.loc[:, ["id", "name", "tests", "values", "slots"]]
)
id name tests values slots
01 abc mp 6211 abc up 7221 abc sp 8231 abc cp 7241 abc mp 6451 abc up 7461 abc sp 8471 abc cp 74
Post a Comment for "Pandas Melt 2 Groups Of Columns"