Skip to content Skip to sidebar Skip to footer

Avoiding Keyerror In Dataframe

I am validating my dataframe with below code, df = df[(df[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1)) & ((df['plan_year'].notnull()) &

Solution 1:

You can use intersection:

L = ['name', 'issuer_id', 'service_area_id']
cols = df.columns.intersection(L)

(df[cols].notnull().all(axis=1))

EDIT:

df = pd.DataFrame({
        'name':list('abcdef'),
         'plan_year':[2015,2015,2015,5,5,4],
})
print (df)
  name  plan_year
0    a       2015
1    b       2015
2    c       2015
3    d          5
4    e          5
5    f          4

Idea is create dictionary of valid values for each colum first:

valid = {'name':'a', 
        'issuer_id':'a',
        'service_area_id':'a',
        'plan_year':2015,
         ...}

Then filter new dictionary by missing columns and assign to original DataFrame and create new DataFrame:

d1 = {k: v for k, v in valid.items() if k inset(valid.keys()) - set(df.columns)}
print (d1)
{'issuer_id': 'a', 'service_area_id': 'a'}


df1 = df.assign(**d1)
print (df1)
  name  plan_year issuer_id service_area_id
0    a       2015         a               a
1    b       2015         a               a
2    c       2015         a               a
3    d          5         a               a
4    e          5         a               a
5    f          4         a               a

Last filter:

m1 = (df1[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1)) 
m2 = ((df1['plan_year'].notnull()) & 
      (df1['plan_year'].astype(str).str.isdigit()) & 
      (df1['plan_year'].astype(str).str.len() == 4))

df1 = df1[m1 & m2]
print (df1)
  name  plan_year issuer_id service_area_id
0    a       2015         a               a
1    b       2015         a               a
2    c       2015         a               a

Last you can remove helper columns:

df1=df1[m1&m2].drop(d1.keys(),axis=1)print(df1)nameplan_year0a20151b20152c2015

Solution 2:

Add another variable called columns and filter it with the ones that exist in df:

columns = ['name', 'issuer_id', 'service_area_id']
existing = [i for i in columns if i in df.columns]
df = df[(df[existing]...

EDIT You could also assign each condition to a variable and use it later like this:

cond1 = df['is_age_29_plan'].astype(str).isin(['True', 'False', 'nan']) if 'is_age_29_plan' in df.columns else True

Then, use the cond1 in your filtering statement.

Post a Comment for "Avoiding Keyerror In Dataframe"