The t-test is a widely used statistical method for comparing the means of one or two groups. It helps determine if there’s a significant difference between these means.
William Sealy Gosset, a statistician at a Guinness brewery in Dublin, developed the t-test in the early 20th century. Gosset’s task was to analyze the quality of raw materials—barley and hops—to improve beer production processes. Unlike the statistical methods of his time, which required large samples, Gosset had access to only small samples. This constraint led him to develop a new statistical technique for analyzing data from small samples.
The work describing the method was published in 1908. Since the company prohibited employees from publishing scientific articles under their names, the author signed with the pseudonym “Student.” This led to the naming of the “Student’s distribution” and its associated t-test.
The t-test is used to verify the null hypothesis (H₀) that there are no differences between the means of two populations or groups. More precisely, it verifies that any observed differences are not due to chance.
There are three main types of t-tests:
- Single-sample t-test
- Independent samples t-test
- Paired samples t-test
As a parametric test, the t-test assumes a normal (or approximately normal) distribution of data and that the data are obtained randomly and independently (except for the paired t-test). Additionally, for the independent samples t-test, similar variance between groups is required.
Single-sample t-test
The single-sample t-test is used when we have one sample with a mean value we want to compare against a reference point. This reference point can be:
• A known value from previous research or industry standards • A theoretical value from mathematical models or hypotheses • A value calculated from other statistical measures
This t-test determines whether our sample’s mean differs significantly from the reference point. It reveals whether our sample data aligns with or deviates from expected or established values. The single-sample t-test is particularly useful for assessing how a specific group or population compares to a standard or benchmark.
For example, if we know that the average systolic blood pressure in a given region is 120 mmHg, we can use this value as a benchmark to determine whether the blood pressure data from our study sample differs significantly from this regional average.
A p-value below the significance level leads us to reject the null hypothesis (that there’s no difference between values) and accept the alternative hypothesis (that a significant difference exists between the sample mean and the reference value).
Independent samples t-test
The independent samples t-test compares the means of two randomly and independently collected data samples to determine if the variations are due to chance (null hypothesis) or are statistically significant (alternative hypothesis).
As a parametric test, it assumes that the values follow a normal distribution and have similar variance between groups.
A practical example is comparing mean systolic blood pressure values between two independent populations—one undergoing pharmacological treatment and the other not. Another example is comparing a parameter between male and female subjects.
A p-value below the significance level (typically 0.05) allows us to reject the null hypothesis, indicating that the observed differences are unlikely to be due to chance.
The independent samples t-test assumes that the two groups have equal variance (homoscedasticity). This assumption can be evaluated using Levene’s test. Levene’s test’s null hypothesis states that there is no difference between the variances of the two groups. If it produces a p-value lower than the significance level, we assume the variances are different. This difference in variances makes the samples unsuitable for the t-test, potentially leading to unreliable results.
In such cases, Welch’s test can be used to compare means. This variant of the t-test doesn’t require the assumption of equal variances and is particularly useful when the two groups have significantly different sample sizes.
Paired samples t-test
In the paired samples t-test, data are compared from the same subject at different times—for example, before and after treatment or measurements taken on the same subjects at different intervals.
A classic example is measuring blood pressure in the same subject before and after starting antihypertensive therapy.
In this case, the null hypothesis states that measurement differences are due to chance, while the alternative hypothesis posits that they result from the intervention performed between measurements.
Python and t-test
Among Python libraries, the scipy.stats library is commonly used for t-test calculations.
scipy.stats offers four modules for t-test calculations:
t_test_1samp: for single-sample t-test
t_test_ind: for independent samples t-test
t_test_rel: for paired samples t-test
t_test_ind_from_stats: for calculating the independent samples t-test from aggregated data
The scipy.stats.t_test_ind function accepts an equal_var parameter, which defaults to True. Setting it to False assumes unequal variances and automatically applies Welch’s test.
Additionally, Levene’s test can be performed using scipy.stats.levene(means1, means2).
Below are examples demonstrating how to use Python and scipy.stats to calculate the three types of t-tests.
Import libraries
import numpy as np
import scipy.stats as stats
np.random.seed(42) # For reproducibility
One-sample t-test
# One-sample t-test (comparing a sample mean to a known population mean)
# Simulate blood pressure readings from a sample of patients
blood_pressure_sample = np.random.normal(loc=125, scale=10, size=30) # mean 125, std dev 10
population_mean_bp = 120 # Known population mean
# Perform one-sample t-test
t_stat_1samp, p_value_1samp = stats.ttest_1samp(blood_pressure_sample, population_mean_bp)
# Display results
print("One-Sample t-test (Blood Pressure):")
print(f"t-statistic: {t_stat_1samp:.4f}, p-value: {p_value_1samp:.4f}")
Two independent sample t-test
# Independent two-sample t-test (comparing two independent groups)
# Simulate cholesterol levels for two groups: treated and control
cholesterol_treated = np.random.normal(loc=190, scale=15, size=30) # Treated group, mean 190
cholesterol_control = np.random.normal(loc=200, scale=15, size=30) # Control group, mean 200
# Perform independent two-sample t-test (assuming equal variances)
t_stat_2samp, p_value_2samp = stats.ttest_ind(cholesterol_treated, cholesterol_control)
# Display results
print("Independent Two-Sample t-test (Cholesterol Levels):")
print(f"t-statistic: {t_stat_2samp:.4f}, p-value: {p_value_2samp:.4f}")
Paired t-test
# Paired t-test (comparing two related groups: before and after treatment)
# Simulate blood pressure readings before and after treatment on the same group of patients
blood_pressure_before = np.random.normal(loc=140, scale=12, size=30) # Before treatment
blood_pressure_after = np.random.normal(loc=130, scale=12, size=30) # After treatment
# Perform paired t-test
t_stat_paired, p_value_paired = stats.ttest_rel(blood_pressure_before, blood_pressure_after)
# Display results
print("Paired t-test (Blood Pressure Before vs After Treatment):")
print(f"t-statistic: {t_stat_paired:.4f}, p-value: {p_value_paired:.4f}")