Pandas Split Column Name
Solution 1:
Here is another way. It assumes that low/high group ends with the words Low
and High
respectively, so that we can use .str.endswith()
to identify which rows are Low/High.
Here is the sample data
df = pd.DataFrame('group0Low group0High group1Low group1High routeLow routeHigh landmarkLow landmarkHigh'.split(), columns=['group_level'])
df
group_level
0 group0Low
1 group0High
2 group1Low
3 group1High
4 routeLow
5 routeHigh
6 landmarkLow
7 landmarkHigh
Use np.where
, we can do the following
df['level'] = np.where(df['group_level'].str.endswith('Low'), 'Low', 'High')
df['group'] = np.where(df['group_level'].str.endswith('Low'), df['group_level'].str[:-3], df['group_level'].str[:-4])
df
group_level level group
0 group0Low Low group0
1 group0High High group0
2 group1Low Low group1
3 group1High High group1
4 routeLow Low route
5 routeHigh High route
6 landmarkLow Low landmark
7 landmarkHigh High landmark
Solution 2:
I suppose it depends how general the strings you're working are. Assuming the only levels are always delimited by a capital letter you can do
In [30]:
s = pd.Series(['routeHigh', 'routeLow', 'landmarkHigh',
'landmarkLow', 'routeMid', 'group0Level'])
s.str.extract('([\d\w]*)([A-Z][\w\d]*)')
Out[30]:
0 1
0 route High
1 route Low
2 landmark High
3 landmark Low
4 route Mid
5 group0 Level
You can even name the columns of the result in the same line by doing
s.str.extract('(?P<group>[\d\w]*)(?P<Level>[A-Z][\w\d]*)')
So in your use case you can do
group_level_df = stacked.group_level.extract('(?P<group>[\d\w]*)(?P<Level>[A-Z][\w\d]*)')
stacked = pd.concat([stacked, group_level_df])
Here's another approach which assumes only knowledge of the level names in advance. Suppose you have three levels:
lower = stacked.group_level.str.lower()
for level in ['low', 'mid', 'high']:
rows_in = lower.str.contains(level)
stacked.loc[rows_in, 'level'] = level.capitalize()
stacked.loc[rows_in, 'group'] = stacked.group_level[rows_in].str.replace(level, '')
Which should work as long as the level doesn't appear in the group name as well, e.g. 'highballHigh'. In cases where group_level
didn't contain any of these levels you would end up with null values in the corresponding rows
Post a Comment for "Pandas Split Column Name"