Skip to content Skip to sidebar Skip to footer

Automatic (whisker-sensitive) Ylim In Boxplots

When plotting columns of a dataframe with pandas, e.g. df.boxplot() the automatic adjustment of the yaxis can lead to a large amount of unused space in the plot. I wonder if thi

Solution 1:

I think a combination of the seaborn style and the way matplotlib draws boxplots is hiding your outliers here.

If I generate some skewed data

import seaborn as sns
import pandas as pd
import numpy as np

x = pd.DataFrame(np.random.lognormal(size=(100, 6)),
             columns=list("abcdef"))

And then use the boxplot method on the dataframe, I see something similar

x.boxplot()

enter image description here

But if you change the symbol used to plot outliers, you get

x.boxplot(sym="k.")

enter image description here

Alternatively, you can use the seaborn boxplot function, which does the same thing but with some nice aesthetics:

sns.boxplot(x)

enter image description here

Solution 2:

Building on eumiro's answer in this SO post (I just extend it to pandas data frames you could do the following

import numpy as np
import pandas as pd

defreject_outliers(df, col_name, m=2):
    """ Returns data frame without outliers in the col_name column """return df[np.abs(df[col_name] - df[col_name].mean()) < m * df[col_name].std()]

# Create fake data
N = 10
df = pd.DataFrame(dict(a=np.random.rand(N), b=np.random.rand(N)))
df = df.append(dict(a=0.1, b=10), ignore_index=True)

# Strip outliers from the "b" column
df = reject_outliers(df, "b")
bp = df.boxplot()

The argument m is the number of standard deviations to ignore.

EDIT:

Why do the whiskers not include the maximum outliers in the first place?

There are several types of Boxplots as described on Wikipedia. The pandas boxplot calls to matplotlib's boxplot. If you take a look at the documentation for this the argument whis"Defines the length of the whiskers as a function of the inner quartile range. So it won't cover the entire range by design.

Post a Comment for "Automatic (whisker-sensitive) Ylim In Boxplots"