Cohen's d and Effect Size Calculations

Beyond Statistical Significance: Understanding Effect Size

Effect size measures the magnitude of a difference or relationship, independent of sample size. While p-values tell you whether an effect is statistically significant, effect sizes tell you how practically meaningful that effect is. This distinction is crucial for evidence-based decision making in research, medicine, education, and business.

Consider a pharmaceutical trial where a new drug shows a statistically significant improvement (p < 0.001) over a placebo. Without effect size, you don't know if the improvement is 0.1% or 50%. Effect size provides this crucial context, helping stakeholders determine whether the effect is worth the cost, side effects, or implementation effort.

The most common effect size measure for comparing two groups is Cohen's d, which expresses the difference between means in standard deviation units. This standardization allows comparison across different studies and measurement scales.

Why Effect Size Matters

Statistical significance is heavily influenced by sample size. With a large enough sample, even trivial differences become "significant." Conversely, important effects may not reach significance in small samples. Effect size solves this problem by providing a sample-size-independent measure.

The Significance Trap

A study with n=10,000 might show p < 0.001 for a difference of 0.5 points on a 100-point scale. This is statistically significant but practically meaningless (d ≈ 0.05). Always report effect sizes alongside p-values.

Key reasons to use effect size:

Meta-analysis: Effect sizes can be combined across studies to estimate overall effects
Power analysis: Required to calculate necessary sample sizes for future studies
Practical decisions: Helps determine if interventions are worth implementing
Replication: Provides a target for replication studies to match

Cohen's d: The Standard Effect Size Measure

Cohen's d expresses the difference between two group means in units of pooled standard deviation:

Cohen's d

d = (M₁ - M₂) / sp

Where M₁ and M₂ are the group means, and sp is the pooled standard deviation calculated as:

Pooled Standard Deviation

sp = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)]

The sign of d indicates direction: positive when M₁ > M₂, negative when M₁ < M₂. Often the absolute value |d| is reported when direction is obvious from context.

Why Pool the Standard Deviation?

Pooling assumes both groups have equal population variances. This gives a more stable estimate than using either group's SD alone, and matches the assumptions of the independent samples t-test.

Alternative Effect Size Measures

While Cohen's d is most common, alternatives exist for specific situations:

Hedges' g: Bias-Corrected Effect Size

Cohen's d slightly overestimates the population effect size in small samples. Hedges' g applies a correction factor:

Hedges' g Correction

g = d × (1 - 3/(4(n₁+n₂) - 9))

For samples above 20 per group, the difference is negligible. For small samples (n < 20), Hedges' g is preferred.

Glass's Δ: When Variances Differ

When one group is a control with known variability, use only the control group's standard deviation as the denominator:

Glass's Delta

Δ = (M₁ - M₂) / s_control

This is useful when the treatment might affect variance (e.g., an intervention that helps low performers more than high performers).

Interpreting Effect Sizes: Cohen's Guidelines

Jacob Cohen proposed these conventions for interpreting d values:

Effect Size (d)	Interpretation	Overlap
0.2	Small	85% overlap between groups
0.5	Medium	67% overlap between groups
0.8	Large	53% overlap between groups
1.2	Very Large	40% overlap between groups
2.0	Huge	19% overlap between groups

Context Matters

These are rough guidelines, not absolute rules. In some fields, d = 0.2 might be highly meaningful (e.g., reducing heart attack risk), while in others d = 0.8 might be expected (e.g., tutoring vs. no instruction).

Worked Example: Educational Intervention

A school tests a new reading program. Control group (n=25): mean=72, SD=12. Treatment group (n=30): mean=79, SD=14. Calculate Cohen's d:

Calculate Pooled Variance

sp² = [(25-1)(12)² + (30-1)(14)²] / (25+30-2) = [24×144 + 29×196] / 53 = [3456 + 5684] / 53 = 172.45

Calculate Pooled SD

sp = √172.45 = 13.13

Calculate Cohen's d

d = (79 - 72) / 13.13 = 7 / 13.13 = 0.53

Interpret

A medium effect size (d = 0.53). The treatment group scores about half a standard deviation higher than control.

This means if you took a random student from the treatment group and a random student from the control group, the treatment student would score higher about 64% of the time (calculated from the overlap).

Python Implementation

Calculate effect sizes programmatically with confidence intervals:

python

import numpy as np
from scipy import stats

def cohens_d(group1, group2):
    """Calculate Cohen's d for two independent groups."""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    # Cohen's d
    d = (np.mean(group1) - np.mean(group2)) / pooled_std
    return d

def hedges_g(group1, group2):
    """Calculate Hedges' g (bias-corrected effect size)."""
    n1, n2 = len(group1), len(group2)
    d = cohens_d(group1, group2)

    # Correction factor for small sample bias
    correction = 1 - 3 / (4*(n1+n2) - 9)
    return d * correction

# Example usage
control = [68, 72, 75, 70, 69, 74, 71, 73, 76, 72]
treatment = [75, 79, 82, 78, 80, 77, 81, 76, 83, 79]

d = cohens_d(treatment, control)
g = hedges_g(treatment, control)
print(f"Cohen's d: {d:.3f}")
print(f"Hedges' g: {g:.3f}")

Sources

References and further authoritative reading used in preparing this article.

← Learning Center