Multiple histograms with plotnine

Libraries

For creating this chart, we will need to load the following libraries:

import pandas as pd
from plotnine import *

Dataset

Since histograms are a type of chart that displays the distribution of a numerical variable, we need a dataset that contains this type of values.

For this, we will create a simple dataset with 2 variables: value and category. The value variable will contain the numerical values that we want to plot, and the category variable will contain the categories that we want to use to group the values.

# Generate data
import numpy as np
group1_num = np.random.normal(loc=0, scale=1, size=300) 
group2_num = np.random.normal(loc=8, scale=2, size=300)
group1_cat = np.repeat('Group1', 300)
group2_cat = np.repeat('Group2', 300)

df = pd.DataFrame({
    'numerical': np.concatenate([group1_num, group2_num]),
    'categorical': np.concatenate([group1_cat, group2_cat])
})
df.head()
numerical categorical
0 0.316677 Group1
1 -0.577093 Group1
2 0.716630 Group1
3 -1.756055 Group1
4 0.895396 Group1

Double histogram

We use the geom_histogram() function to create a histogram, and in order to display two histograms on the same chart, we need to use the fill argument to differentiate them.

This argument must be the name of the variable that we want to use to group the data. In this case, we will use the category variable to group the data.

(
ggplot(df, aes(x='numerical', fill='categorical')) +
    geom_histogram(bins=20) +
    theme_minimal()
)

Mirror histogram

To create a mirror histogram, we need to have to have 2 variables that we want to display.

For this we add 2 new columns to our dataset: numerical_top and numerical_bottom.

Then we call the geom_histogram() function twice, one for each variable, and we use the y argument to specify the position of the histogram.

df['num_top'] = np.random.normal(loc=5, scale=2, size=600)
df['num_bottom'] = np.random.normal(loc=0, scale=2, size=600)

(
ggplot(df) +
    geom_histogram(aes(x='num_top', y='..density..'), bins=20, fill='lightblue') +
    geom_histogram(aes(x='num_bottom', y='-..density..'), bins=20, fill='darkred') +
    xlab('Value') +
    theme_minimal()
)