Skip to content Skip to sidebar Skip to footer

Replacing Nan With Mean

I would like to replace missing data points with mean from each column in text with python. So, my idea was: Read each column from text file Calculate a mean of each column Repla

Solution 1:

Remember that replace does not replace the string in-place, you have to do something like this:

a1 = a1.replace("nan", str(y1))

Solution 2:

your problem is that y1 is not a string? you can just: a1.replace("nan", str(y1))

Solution 3:

You could use the masked array filled method:

import numpy as np

filename = '/tmp/data'withopen(filename, 'w') as f:
    f.write('''
1 2 nan
2 nan 3
nan 3 4
nan nan nan
''')

arr = np.genfromtxt(filename)
print(arr)
# [[  1.   2.  nan]#  [  2.  nan   3.]#  [ nan   3.   4.]#  [ nan  nan  nan]]

mask = np.isnan(arr)
masked_arr = np.ma.masked_array(arr, mask)
means = np.mean(masked_arr, axis=0)

print(means)
# [1.5 2.5 3.5]

With the above setup,

print(masked_arr.filled(means))

yields

[[ 1.   2.   3.5]
 [ 2.   2.5  3. ]
 [ 1.5  3.   4. ]
 [ 1.5  2.5  3.5]]

Then, to write the array to a file, you could use np.savetxt:

np.savetxt(filename, masked_arr.filled(means), fmt='%0.2f')

Post a Comment for "Replacing Nan With Mean"