Python Pandas - Histograms
A histogram is a graphical representation of the distribution of a dataset. It helps you to visualize the frequency of data within defined intervals, called bins. A histogram looks similar to a bar plot but the difference is, histograms represents the distribution of numerical data grouped into ranges (bins), whereas bar plots represent categorical data, with each bar corresponding to a specific category.
In this tutorial, we will learn how to create and customize histograms using the Pandas library with different examples.
Creating Histograms in Pandas
In Pandas, histograms can be created using the plot.hist() method for both the Series and DataFrames objects. This method results a matplotlib.AxesSubplot object containing the histogram plot.
DataFrame.plot.hist(): Creates histogram for one or more columns in a DataFrame.
Series.plot.hist(): Creates a histogram for a specific column or Series.
Syntax
Following is the syntax of the hist() method −
DataFrame.plot.hist(by=None, bins=10, **kwargs)
Where,
by: Groups the DataFrame by a column.
bins: The number of bins to use for the histogram. The default value is 10.
**kwargs: Additional arguments to customize the plot.
Example
Here is a basic example of creating a histogram for a DataFrame using the plot.hist() method.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [7, 4]
# Create a DataFrame with random data
df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"])
# Plot histogram
ax = df.plot.hist()
plt.title("Simple Histogram")
plt.show()
Following is the output of the above code −

Plotting a Stacked Histogram
A stacked histogram displays multiple numerical columns stacked on top of each other. This can be done by using the stacked=True parameter.
Example
This example creates a stacked histogram for a DataFrame using the stacked=True parameter.
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot the stacked histogram df.plot.hist(stacked=True, bins=20, alpha=0.7, title="Stacked Histogram") plt.show()
On executing the above code we will get the following output −

Creating the Horizontal Histograms
To create a horizontal histogram, you can use orientation='horizontal' parameter of the plot.hist() method.
Example
This example creates a stacked histogram for a DataFrame using the stacked=True parameter.
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot the stacked histogram df.plot.hist(orientation='horizontal', bins=20, alpha=0.7, title="Horizontal Histogram") plt.show()
Following is the output of the above code −

Plotting the Cumulative Histogram
Cumulative histograms show the cumulative frequency distribution. Plotting the cumulative histogram can be done by setting the cumulative parameter to True.
Example
This example demonstrates plotting a cumulative histogram for a DataFrame using the cumulative=True parameter of the plot.hist() method.
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame with random data df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"]) # Plot the Cumulative histogram df.plot.hist(cumulative='horizontal', bins=20, alpha=0.7, title="Cumulative Histogram") plt.show()
On executing the above code we will get the following output −

Subplots for Histograms
You can create individual subplots for histograms of each column of a DataFrame using the direct DataFrame.hist() method.
Example
This example creates subplots for histogram of DataFrame columns using the DataFrame.hist() method.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [7, 4]
# Create a DataFrame with random data
df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"])
# Subplots for each column
df.hist(color='lightgreen', bins=20)
plt.suptitle("Histograms into Subplots")
plt.show()
Following is the output of the above code −

Grouped Histograms
Grouped histograms allow you to visualize data distribution by specific categories. We can use the by parameter to create histograms grouped by a column.
Example
This example creates a grouped histogram for DataFrame columns using the by parameter.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [7, 4]
# Create a DataFrame with random data
x = ['A']*30 + ['B']*70
y = np.random.randn(100)
df = pd.DataFrame({'Letter': x, 'Numbers': y})
# Plot the Grouped histogram
df.plot.hist(by='Letter', bins=20, alpha=0.7, title="Grouped Histogram")
plt.show()
Following is the output of the above code −