Beyond Statistical Significance: Understanding Effect Size
Effect size measures the magnitude of a difference or relationship, independent of sample size. While p-values tell you whether an effect is statistically significant, effect sizes tell you how practically meaningful that effect is. This distinction is crucial for evidence-based decision making in research, medicine, education, and business.
Consider a pharmaceutical trial where a new drug shows a statistically significant improvement (p < 0.001) over a placebo. Without effect size, you don't know if the improvement is 0.1% or 50%. Effect size provides this crucial context, helping stakeholders determine whether the effect is worth the cost, side effects, or implementation effort.
The most common effect size measure for comparing two groups is Cohen's d, which expresses the difference between means in standard deviation units. This standardization allows comparison across different studies and measurement scales.
Why Effect Size Matters
Statistical significance is heavily influenced by sample size. With a large enough sample, even trivial differences become "significant." Conversely, important effects may not reach significance in small samples. Effect size solves this problem by providing a sample-size-independent measure.
The Significance Trap
Key reasons to use effect size:
- Meta-analysis: Effect sizes can be combined across studies to estimate overall effects
- Power analysis: Required to calculate necessary sample sizes for future studies
- Practical decisions: Helps determine if interventions are worth implementing
- Replication: Provides a target for replication studies to match
Cohen's d: The Standard Effect Size Measure
Cohen's d expresses the difference between two group means in units of pooled standard deviation:
Cohen's d
Where M₁ and M₂ are the group means, and sp is the pooled standard deviation calculated as:
Pooled Standard Deviation
The sign of d indicates direction: positive when M₁ > M₂, negative when M₁ < M₂. Often the absolute value |d| is reported when direction is obvious from context.
Why Pool the Standard Deviation?
Alternative Effect Size Measures
While Cohen's d is most common, alternatives exist for specific situations:
Hedges' g: Bias-Corrected Effect Size
Cohen's d slightly overestimates the population effect size in small samples. Hedges' g applies a correction factor:
Hedges' g Correction
For samples above 20 per group, the difference is negligible. For small samples (n < 20), Hedges' g is preferred.
Glass's Δ: When Variances Differ
When one group is a control with known variability, use only the control group's standard deviation as the denominator:
Glass's Delta
This is useful when the treatment might affect variance (e.g., an intervention that helps low performers more than high performers).
Interpreting Effect Sizes: Cohen's Guidelines
Jacob Cohen proposed these conventions for interpreting d values:
| Effect Size (d) | Interpretation | Overlap |
|---|---|---|
| 0.2 | Small | 85% overlap between groups |
| 0.5 | Medium | 67% overlap between groups |
| 0.8 | Large | 53% overlap between groups |
| 1.2 | Very Large | 40% overlap between groups |
| 2.0 | Huge | 19% overlap between groups |
Context Matters
Worked Example: Educational Intervention
A school tests a new reading program. Control group (n=25): mean=72, SD=12. Treatment group (n=30): mean=79, SD=14. Calculate Cohen's d:
Calculate Pooled Variance
Calculate Pooled SD
Calculate Cohen's d
Interpret
This means if you took a random student from the treatment group and a random student from the control group, the treatment student would score higher about 64% of the time (calculated from the overlap).
Python Implementation
Calculate effect sizes programmatically with confidence intervals:
import numpy as np
from scipy import stats
def cohens_d(group1, group2):
"""Calculate Cohen's d for two independent groups."""
n1, n2 = len(group1), len(group2)
var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
# Pooled standard deviation
pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
# Cohen's d
d = (np.mean(group1) - np.mean(group2)) / pooled_std
return d
def hedges_g(group1, group2):
"""Calculate Hedges' g (bias-corrected effect size)."""
n1, n2 = len(group1), len(group2)
d = cohens_d(group1, group2)
# Correction factor for small sample bias
correction = 1 - 3 / (4*(n1+n2) - 9)
return d * correction
# Example usage
control = [68, 72, 75, 70, 69, 74, 71, 73, 76, 72]
treatment = [75, 79, 82, 78, 80, 77, 81, 76, 83, 79]
d = cohens_d(treatment, control)
g = hedges_g(treatment, control)
print(f"Cohen's d: {d:.3f}")
print(f"Hedges' g: {g:.3f}")