Mastering Subplots in Matplotlib for Enhanced Data Visualization
Written on
Chapter 1: Introduction to Subplots
Utilizing subplots allows for the efficient consolidation of multiple visualizations within a single figure, which is invaluable for summarizing extensive data concisely. This guide will delve into the efficient use of subplots, offering granular control over grid layouts.
To commence, we will utilize the basic subplot function to generate uniformly sized plots. First, we need to import the necessary libraries:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Next, we define a basic subplot structure in Matplotlib, creating two rows and three columns of equal dimensions:
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row', figsize=(9, 6))
fig.tight_layout(pad=3.0)
The sharex parameter ensures that plots in the same column share the x-axis, while sharey does so for the y-axis across rows. This design choice simplifies data comparison, although it can have its drawbacks, which we will explore later.
Section 1.1: Accessing Subplots
The plots generated are stored in a two-dimensional array. Let’s print the ax to visualize its structure:
ax
Output:
array([[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>]], dtype=object)
This representation clarifies that we are working with a 2D array, necessitating nested loops for element access. Let’s populate each rectangle with text:
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row', figsize=(9, 6))
fig.tight_layout(pad=2)
for a in ax:
for b in a:
text = 'I am a plot'
b.annotate(text, (0.3, 0.45), fontsize=12)
The goal is to insert meaningful graphs into these rectangles, which requires a dataset. We will be utilizing the following dataset:
Feel free to download it and follow along.
Here’s how to load the dataset using pandas:
df = pd.read_csv('nhanes_2015_2016.csv')
df.columns
Output:
Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR',
'RIDAGEYR', 'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL',
'DMDHHSIZ', 'WTINT2YR', 'SDMVPSU', 'SDMVSTRA', 'INDFMPIR',
'BPXSY1', 'BPXDI1', 'BPXSY2', 'BPXDI2', 'BMXWT', 'BMXHT',
'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC', 'BMXWAIST', 'HIQ210'],
dtype='object')
Now, let’s create another 2x3 array of plots and assign them to the ax elements. Remember how to index a 2D array? Here’s how to access the ax elements and create plots:
fig, ax = plt.subplots(2, 3, figsize=(15, 10))
fig.tight_layout(pad=2)
ax[0, 0].scatter(df['BMXWT'], df['BMXHT'])
ax[1, 1].plot(df['BMXBMI'])
df['DMDHHSIZ'].hist(ax=ax[0, 2])
df.groupby('DMDEDUC2')['BPXSY1'].mean().plot(ax=ax[0, 1], kind='pie', colors=['lightgreen', 'coral', 'pink', 'violet', 'skyblue'])
plt.show()
Note that I omitted the sharex and sharey parameters here, as they are not applicable to a pie chart, which lacks traditional axes. The histogram and scatter plots also feature different value ranges, so consider experimenting with this on your own.
Subsection 1.1.1: Customizing Plot Sizes
The plots above maintain uniform dimensions. However, it's possible to create subplots of varying sizes by utilizing the add_gridspec function:
fig = plt.figure(constrained_layout=True, figsize=(8, 8))
s = fig.add_gridspec(3, 3, width_ratios=[2, 3, 4], height_ratios=[3, 3, 2])
for row in range(3):
for col in range(3):
ax = fig.add_subplot(s[row, col])
Chapter 2: Advanced Control with GridSpec
To gain even finer control over the layout, we can combine subplots with GridSpec. Here’s an example to illustrate:
plt.figure(figsize=(15, 12))
grid = plt.GridSpec(3, 4, wspace=0.3, hspace=0.3)
plt.subplot(grid[0, :3])
plt.subplot(grid[0, 3])
plt.subplot(grid[1:, :2])
plt.subplot(grid[1, 2:])
plt.subplot(grid[2, 2:])
Let’s break down what’s happening here. We initiate a 3x4 grid. By indexing through it, we can craft custom plot sizes:
plt.subplot(grid[0, :3])
This code specifies a plot that spans the first three columns of the first row. The subsequent codes similarly define the sizes and positions of various plots.
Now that you understand how to manipulate the grid and create custom-shaped plots, let’s proceed to generate another set with real data.
plt.figure(figsize=(12, 12))
grid = plt.GridSpec(4, 4, wspace=0.3, hspace=0.8)
g1 = plt.subplot(grid[0:2, :3])
g2 = plt.subplot(grid[2:, 0:2])
g3 = plt.subplot(grid[:2, 3])
g4 = plt.subplot(grid[2:, 2:])
df.groupby('DMDMARTL')['BPXSY1'].mean().plot(kind='bar', ax=g1)
g1.set_title("Bar Plot of Systolic Blood Pressure by Marital Status")
df['BPXSY1'].hist(ax=g3, orientation='horizontal', color='gray')
g3.set_title("Distribution of Systolic Blood Pressure")
df.plot('BPXSY1', 'BPXDI1', kind='scatter', ax=g2, alpha=0.3)
g2.set_title("Systolic vs. Diastolic Blood Pressure")
df['BMXHT'].plot(ax=g4, color='gray')
g4.set_title('Line Plot of Population Weight')
Conclusion
If you have successfully executed the code and comprehended the material, mastering subplots in Matplotlib should now be straightforward, granting you comprehensive control over your visualizations. For a video explanation of this content:
The first video titled "Matplotlib Tutorial (Part 10): Subplots" provides an in-depth overview of how to create and manage subplots effectively.
Additionally, you might find this video beneficial:
The second video, "Matplotlib Subplots - A Helpful Illustrated Guide," serves as a practical complement to the concepts discussed here.
Feel free to connect with me on Twitter or like my Facebook page for more updates.