Avoiding Keyerror In Dataframe
I am validating my dataframe with below code, df = df[(df[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1)) & ((df['plan_year'].notnull()) &
Solution 1:
You can use intersection
:
L = ['name', 'issuer_id', 'service_area_id']
cols = df.columns.intersection(L)
(df[cols].notnull().all(axis=1))
EDIT:
df = pd.DataFrame({
'name':list('abcdef'),
'plan_year':[2015,2015,2015,5,5,4],
})
print (df)
name plan_year
0 a 2015
1 b 2015
2 c 2015
3 d 5
4 e 5
5 f 4
Idea is create dictionary of valid values for each colum first:
valid = {'name':'a',
'issuer_id':'a',
'service_area_id':'a',
'plan_year':2015,
...}
Then filter new dictionary by missing columns and assign
to original DataFrame
and create new DataFrame:
d1 = {k: v for k, v in valid.items() if k inset(valid.keys()) - set(df.columns)}
print (d1)
{'issuer_id': 'a', 'service_area_id': 'a'}
df1 = df.assign(**d1)
print (df1)
name plan_year issuer_id service_area_id
0 a 2015 a a
1 b 2015 a a
2 c 2015 a a
3 d 5 a a
4 e 5 a a
5 f 4 a a
Last filter:
m1 = (df1[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1))
m2 = ((df1['plan_year'].notnull()) &
(df1['plan_year'].astype(str).str.isdigit()) &
(df1['plan_year'].astype(str).str.len() == 4))
df1 = df1[m1 & m2]
print (df1)
name plan_year issuer_id service_area_id
0 a 2015 a a
1 b 2015 a a
2 c 2015 a a
Last you can remove helper columns:
df1=df1[m1&m2].drop(d1.keys(),axis=1)print(df1)nameplan_year0a20151b20152c2015
Solution 2:
Add another variable called columns
and filter it with the ones that exist in df:
columns = ['name', 'issuer_id', 'service_area_id']
existing = [i for i in columns if i in df.columns]
df = df[(df[existing]...
EDIT You could also assign each condition to a variable and use it later like this:
cond1 = df['is_age_29_plan'].astype(str).isin(['True', 'False', 'nan']) if 'is_age_29_plan' in df.columns else True
Then, use the cond1
in your filtering statement.
Post a Comment for "Avoiding Keyerror In Dataframe"