Pooled Standard Deviation for Multiple Groups

What is Pooled Standard Deviation?

Pooled standard deviation combines variance estimates from two or more groups to get a single, weighted estimate. It's essential for two-sample t-tests when assuming equal variances.

The concept is straightforward: if we believe two groups come from populations with the same underlying variability, we can combine their data to get a better estimate of that shared variability. More data means a more precise estimate.

Think of it this way: if you have 20 observations from Group A and 30 from Group B, and both groups have the same true variance, you now have 50 observations to estimate that variance instead of estimating it separately from smaller samples.

When to Pool

Only pool standard deviations when you have reason to believe the underlying population variances are equal. Use Levene's test or the F-test to check this assumption before pooling.

The Pooled SD Formula

For two groups, the pooled standard deviation is:

Two-Group Pooled SD

sp = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)]

Where n₁ and n₂ are sample sizes, and s₁ and s₂ are sample standard deviations.

For k groups (as in ANOVA), the formula generalizes:

Multi-Group Pooled SD

sp = √[Σ(nᵢ-1)sᵢ² / Σ(nᵢ-1)]

Notice the formula uses (n-1) terms in both numerator and denominator. This weighting ensures larger samples contribute more to the pooled estimate, which is appropriate because larger samples provide more reliable variance estimates.

Underlying Assumptions

Pooled standard deviation assumes homogeneity of variance—that all groups share the same population variance. This assumption matters most when:

Sample sizes are unequal (especially problematic if larger group has smaller variance)
The ratio of largest to smallest variance exceeds 2-3
Sample sizes are small (large samples are more robust to violations)

When Variances Differ

If variances are unequal, use Welch's t-test instead of the pooled t-test, or use separate variance estimates. Welch's test doesn't assume equal variances and is often recommended as the default approach.

Worked Example

Scenario: Comparing test scores between two classes:

Class A: n₁ = 25, mean = 78, s₁ = 12
Class B: n₂ = 30, mean = 82, s₂ = 14

Pooled SD calculation:

sp = √[((25-1)(12)² + (30-1)(14)²) / (25+30-2)] sp = √[(24×144 + 29×196) / 53] sp = √[(3456 + 5684) / 53] sp = √[9140 / 53] = √172.45 = 13.13

The pooled SD of 13.13 falls between the individual SDs (12 and 14), weighted toward the larger sample. This pooled value would then be used in the t-test formula or Cohen's d calculation.

Statistical Applications

Independent samples t-test: The pooled SD is used to calculate the standard error of the difference between means.
Cohen's d effect size: Effect sizes are standardized using the pooled SD: d = (M₁ - M₂) / sp
ANOVA: The Mean Square Error (MSE) in ANOVA is essentially a pooled variance estimate across all groups.
Meta-analysis: When combining studies, pooled estimates help standardize effects across different contexts.

Sources

References and further authoritative reading used in preparing this article.

← Learning Center