Replacing Nan With Mean
I would like to replace missing data points with mean from each column in text with python. So, my idea was: Read each column from text file Calculate a mean of each column Repla
Solution 1:
Remember that replace
does not replace the string in-place, you have to do something like this:
a1 = a1.replace("nan", str(y1))
Solution 2:
your problem is that y1 is not a string? you can just: a1.replace("nan", str(y1))
Solution 3:
You could use the masked array filled method:
import numpy as np
filename = '/tmp/data'withopen(filename, 'w') as f:
f.write('''
1 2 nan
2 nan 3
nan 3 4
nan nan nan
''')
arr = np.genfromtxt(filename)
print(arr)
# [[ 1. 2. nan]# [ 2. nan 3.]# [ nan 3. 4.]# [ nan nan nan]]
mask = np.isnan(arr)
masked_arr = np.ma.masked_array(arr, mask)
means = np.mean(masked_arr, axis=0)
print(means)
# [1.5 2.5 3.5]
With the above setup,
print(masked_arr.filled(means))
yields
[[ 1. 2. 3.5]
[ 2. 2.5 3. ]
[ 1.5 3. 4. ]
[ 1.5 2.5 3.5]]
Then, to write the array to a file, you could use np.savetxt:
np.savetxt(filename, masked_arr.filled(means), fmt='%0.2f')
Post a Comment for "Replacing Nan With Mean"