Python/pandas Dataframe Replace 0 With Median Value
I have a python pandas dataframe with several columns and one column has 0 values. I want to replace the 0 values with the median or mean of this column. data is my dataframe arti
Solution 1:
use pandas
replace
method:
df = pd.DataFrame({'a': [1,2,3,4,0,0,0,0], 'b': [2,3,4,6,0,5,3,8]})
df
a b
0 1 2
1 2 3
2 3 4
3 4 6
4 0 0
5 0 5
6 0 3
7 0 8
df['a']=df['a'].replace(0,df['a'].mean())
df
a b
0 1 2
1 2 3
2 3 4
3 4 6
4 1 0
5 1 5
6 1 3
7 1 8
Solution 2:
I think you can use mask
and add parameter skipna=True
to mean
instead dropna
. Also need change condition to data.artist_hotness == 0
if need replace 0
values or data.artist_hotness.isnull()
if need replace NaN
values:
import pandas as pd
import numpy as np
data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})
print (data)
artist_hotness
00.011.025.03 NaN
mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0data['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)
print (data)
artist_hotness
02.011.025.03 NaN
Alternatively use loc
, but omit column name:
data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
artist_hotness
02.011.025.03 NaN
data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
IndexingError: (0 True 1 False 2 False 3 False Name: artist_hotness, dtype: bool, 'artist_hotness')
Another solution is DataFrame.replace
with specifying columns:
data=data.replace({'artist_hotness': {0: mean_artist_hotness}})
print (data)
aa artist_hotness
00.02.011.01.025.05.03 NaN NaN
Or if need replace all 0
values in all columns:
import pandas as pd
import numpy as np
data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})
print (data)
aa artist_hotness
00.00.011.01.025.05.03 NaN NaN
mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0data=data.replace(0,mean_artist_hotness)
print (data)
aa artist_hotness
02.02.011.01.025.05.03 NaN NaN
If need replace NaN
in all columns use DataFrame.fillna
:
data=data.fillna(mean_artist_hotness)
print (data)
aa artist_hotness
00.00.011.01.025.05.032.02.0
But if only in some columns use Series.fillna
:
data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness)
print (data)
aa artist_hotness
00.00.011.01.025.05.03 NaN 2.0
Solution 3:
data['artist_hotness'] = data['artist_hotness'].map( lambda x : data.artist_hotness.mean() if x == 0else x)
Solution 4:
Found these very useful, although mask
is really slow (not sure why).
I did this:
df.loc[ df['artist_hotness'] == 0 | np.isnan(df['artist_hotness']), 'artist_hotness' ] = df['artist_hotness'].median()
Post a Comment for "Python/pandas Dataframe Replace 0 With Median Value"