Automatic (whisker-sensitive) Ylim In Boxplots
Solution 1:
I think a combination of the seaborn style and the way matplotlib draws boxplots is hiding your outliers here.
If I generate some skewed data
import seaborn as sns
import pandas as pd
import numpy as np
x = pd.DataFrame(np.random.lognormal(size=(100, 6)),
columns=list("abcdef"))
And then use the boxplot
method on the dataframe, I see something similar
x.boxplot()
But if you change the symbol used to plot outliers, you get
x.boxplot(sym="k.")
Alternatively, you can use the seaborn boxplot
function, which does the same thing but with some nice aesthetics:
sns.boxplot(x)
Solution 2:
Building on eumiro's answer in this SO post (I just extend it to pandas data frames you could do the following
import numpy as np
import pandas as pd
defreject_outliers(df, col_name, m=2):
""" Returns data frame without outliers in the col_name column """return df[np.abs(df[col_name] - df[col_name].mean()) < m * df[col_name].std()]
# Create fake data
N = 10
df = pd.DataFrame(dict(a=np.random.rand(N), b=np.random.rand(N)))
df = df.append(dict(a=0.1, b=10), ignore_index=True)
# Strip outliers from the "b" column
df = reject_outliers(df, "b")
bp = df.boxplot()
The argument m
is the number of standard deviations to ignore.
EDIT:
Why do the whiskers not include the maximum outliers in the first place?
There are several types of Boxplots as described on Wikipedia. The pandas
boxplot calls to matplotlib
's boxplot. If you take a look at the documentation for this the argument whis
"Defines the length of the whiskers as a function of the inner quartile range. So it won't cover the entire range by design.
Post a Comment for "Automatic (whisker-sensitive) Ylim In Boxplots"