Matplotlib (2) - micheledpierri.com

Following our general introduction to the matplotlib environment, let’s explore the types of graphs this Python library can produce.

Among the most commonly used graphs for data presentation are:

Histograms
Box plots
Scatter plots
Bar charts
Line graphs

Histograms

Histograms primarily illustrate the distribution of a continuous variable. The data is divided into uniform intervals (bins), and the frequency of each bin is represented.

You can create histograms using the following function:

plt.hist()

This function accepts several arguments:

bins: number of intervals for dividing the data
color and edgecolor: fill color of the bars and color of their edges
alpha: controls the transparency of the bars

The following program generates a series of data with normal distribution and displays them using histograms:

import matplotlib.pyplot as plt
import numpy as np

# Create a random dataset with normal distribution
data = np.random.randn(1000)

# Create a histogram
plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, color='skyblue', edgecolor='black', alpha=0.7)

# Add title and labels
plt.title('Data Distribution', fontsize=16, fontweight='bold')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.6)

plt.show()

The resulting graph looks as follows:

To increase the detail of the distribution, we can increase the number of bins. In the following example, we’ve increased the number of bins from 30 to 300:

Box Plots

Box plots (also known as box-and-whisker plots) are ideal for highlighting the distribution of continuous variables in quartiles. The box shows the data ranging from the first to the third quartile, with the median highlighted. Outliers are also displayed.

Box plots are created with the function:

plt.boxplot()

The key parameters are:

data: the variable containing the values
patch_artist: boolean, indicates whether the box plot should be filled with colors
notch: boolean, indicates the confidence interval of the median
vert: boolean, specifies whether the graph should be oriented vertically

Let’s create a box plot using normally distributed data. We’ll generate a list (data) containing three groups of 100 random numbers. Each group will have a mean of 0 and standard deviations of 1, 2, and 3 respectively.

np.random.seed(10)  # Set a seed for reproducibility
data = [np.random.normal(0, std, 100) for std in range(1, 4)]

# Create the boxplot
plt.figure(figsize=(10, 6))
plt.boxplot(data, patch_artist=True, notch=True, vert=True)

# Add title and labels
plt.title('Data Distribution with Boxplot', fontsize=16, fontweight='bold')
plt.xlabel('Dataset')
plt.ylabel('Values')
plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.grid(axis='y', linestyle='--', alpha=0.6)

plt.show()

The resulting graph will look like this:

Scatter Plots

Scatter plots are used to highlight relationships between two variables, revealing trends and correlations. They’re particularly useful for visualizing how one variable changes to another.

To generate a scatter plot, use the following function:

plt.scatter()

The key parameters for this function are:

x and y: the variables containing the values to be compared
color and edgecolor: colors of the points and their borders
alpha: transparency level, which can help highlight overlapping points

Here’s an example of how to create a scatter plot comparing two variables:

np.random.seed(0)
x = np.random.rand(100)
y = 2 * x + np.random.normal(0, 0.1, 100)

plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='teal', alpha=0.7, edgecolor='k')

# Add title and labels
plt.title('Scatter Plot', fontsize=16, fontweight='bold')
plt.xlabel('X Variable')
plt.ylabel('Y Variable')
plt.grid(True, linestyle='--', alpha=0.6)

plt.show()

Bar Charts

Bar charts effectively display the count or frequency of categorical data. They provide a clear visual representation of data categories and their corresponding values.

To generate a bar chart, use the following function:

plt.bar()

This function accepts several key parameters:

categories, values: pairs of categories and their corresponding counts or frequencies
color: fill color of the bars
edgecolor: color of the bar borders

Here’s an example of code generating a bar chart:

categories = ['A', 'B', 'C', 'D']
values = [15, 30, 45, 10]

plt.figure(figsize=(10, 6))
plt.bar(categories, values, color='cadetblue', edgecolor='black')

# Add title and labels
plt.title('Category Count', fontsize=16, fontweight='bold')
plt.xlabel('Categories')
plt.ylabel('Count')
plt.grid(axis='y', linestyle='--', alpha=0.6)

plt.show()

The resulting graph looks like this:

Line Graph

Line graphs are ideal for showing time series.

By indicating time intervals on the x-axis, we can see how values change over time.

The command that allows us to create line graphs is:

plt.plot()

which accepts as parameters:

date, values: pair of date and values on that date
colors and other parameters to adjust the graphical appearance

Here’s an example of a line graph:

dates = np.arange('2024-01', '2024-06', dtype='datetime64[D]')
values = np.random.randn(len(dates)).cumsum()

plt.figure(figsize=(12, 6))
plt.plot(dates, values, color='dodgerblue', linewidth=2)

# Add title and labels
plt.title('Time Series', fontsize=16, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Cumulative Value')
plt.xticks(rotation=45)
plt.grid(True, linestyle='--', alpha=0.6)

plt.show()

The already impressive capabilities of matplotlib can be significantly enhanced by incorporating the Seaborn library. Built upon matplotlib’s foundation, Seaborn offers a user-friendly approach to creating intricate and visually appealing graphs. This powerful combination allows data scientists and analysts to effortlessly generate complex visualizations, expanding the range of possibilities for data representation and analysis.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.