Skip to content Skip to sidebar Skip to footer

Python: Pandas Np.where Vs. Df.loc With Multiple Conditions

Np.where has been giving me a lot of errors, so I am looking for a solution with df.loc instead. This is the np.where error I have been getting: C:\Users\xxx\AppData\Local\Continuu

Solution 1:

I think your boolean are not strings, so need remove ':

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
                  'checked': ['0','0','1','0'],
                  'duplicate': [True, True, False, False]})

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0True01      AAA       0True02      ABC       1False03      CDE       0False0

Or if compare with boolean column, == True can be omited:

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Also if need check checked need ' because strings:

df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0True    Y
1      AAA       0True    Y
2      ABC       1False03      CDE       0False0

EDIT:

Solution with loc:

df['flag'] = '0'
mask = (df['checked'] == '0') &(df['duplicate'])
df.loc[mask, 'flag'] = 'Y'print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

Post a Comment for "Python: Pandas Np.where Vs. Df.loc With Multiple Conditions"