How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Combined Mean and Standard Deviation from Groups

Quick Answer

TL;DR

To combine group standard deviations, compute the combined mean first, add within-group and between-group sums of squares, then divide by total degrees of freedom.

Combined mean is the weighted average of group means, using each group size as the weight.
Combined standard deviation is the spread of all observations after the groups are treated as one dataset.
Within-group variation comes from each group's own standard deviation.
Between-group variation comes from group means sitting above or below the combined mean.
Use the sample formula when group SDs are sample SDs; use the population formula only for complete populations.

A student or analyst usually needs this formula after receiving only summaries: group size, group mean, and group standard deviation. The raw rows may be in separate lab sheets, classrooms, production lots, or survey waves. As a data educator, the practical objective is to recover the standard deviation of the combined dataset without making the common error of averaging the SDs.

This guide focuses on combining group summaries. If your problem is about adding independent random variables, use Combining Standard Deviations. If your problem is a two-sample t-test that assumes equal variances, use Pooled Standard Deviation or the pooled standard deviation calculator.

When This Formula Applies

The combined standard deviation formula applies when you know, for each group, n_i, mean_i, and s_i, and you want the sample standard deviation of the union of all observations.

Situation	Use this article?	Reason
Three classes report n, mean, and sample SD; you need the SD of all students together	Yes	You are combining group summaries into one dataset.
Two labs report instrument error SDs; you need total error SD	No	That is propagation of independent variation; use Combining Standard Deviations.
Two treatment groups need one equal-variance estimate for a t-test	Usually no	That is pooled SD; it estimates shared within-group variability and ignores between-group mean differences.
You have all raw observations in one column	No	Use the sample standard deviation calculator directly.

Do not average standard deviations

Averaging group SDs loses group-size weights and misses between-group spread. The combined SD can be larger than every group SD when group means differ.

Combined Mean Formula

The combined mean is a weighted mean. Each group mean contributes in proportion to the number of observations in that group.

Combined mean

xbar = sum(n_i * xbar_i) / sum(n_i)

This step is required before calculating the combined standard deviation because the between-group term measures how far each group mean is from the combined mean.

Combined Standard Deviation Formula

For sample standard deviations, combine the sums of squares, not the SDs. The formula has two parts: within-group variation and between-group variation.

Combined sample variance from group summaries

s^2 = [sum((n_i - 1) * s_i^2) + sum(n_i * (xbar_i - xbar)^2)] / (N - 1)

Combined sample standard deviation

s = sqrt(s^2)

Here N = sum(n_i). The first sum rebuilds the within-group sum of squares from each sample SD. The second sum adds the extra spread caused by different group means.

Term	What it measures	Why it matters
sum((n_i - 1) * s_i^2)	Within-group sum of squares	Reconstructs the spread inside each group.
sum(n_i * (xbar_i - xbar)^2)	Between-group sum of squares	Adds spread created by group centers being different.
N - 1	Total sample degrees of freedom	Matches the ordinary sample variance denominator for all observations together.

For a full population, replace sample SDs with population SDs and divide by N instead of N - 1:

Combined population variance

sigma^2 = [sum(n_i * sigma_i^2) + sum(n_i * (mu_i - mu)^2)] / N

Worked Example

For this article, we verified the grouped formula against a raw-row check using three small inspection batches. The summary version below is what an analyst would have if the original batch sheets were already archived.

Batch	Raw readings used for verification	n	Mean	Sample SD
A	9.8, 10.1, 10.0, 10.2, 9.9	5	10.00	0.1581
B	10.4, 10.5, 10.6, 10.7	4	10.55	0.1291
C	9.6, 9.7, 9.8	3	9.70	0.1000

Find the combined mean

N = 5 + 4 + 3 = 12. Combined mean = (5*10.00 + 4*10.55 + 3*9.70) / 12 = 10.1083.

Compute within-group sum of squares

Within SS = (5-1)*0.1581^2 + (4-1)*0.1291^2 + (3-1)*0.1000^2 = 0.17 after rounding.

Compute between-group sum of squares

Between SS = 5*(10.00-10.1083)^2 + 4*(10.55-10.1083)^2 + 3*(9.70-10.1083)^2 = 1.3390.

Divide by total degrees of freedom

Sample variance = (0.17 + 1.3390) / 11 = 0.1372.

Take the square root

Combined sample SD = sqrt(0.1372) = 0.3704.

Why the answer is larger than the group SDs

The largest group SD is only 0.1581, but the combined SD is 0.3704 because Batch B is centered high and Batch C is centered low. The overall spread includes both the scatter inside each batch and the distance between batch centers.

To audit the arithmetic, paste the 12 raw readings into the sample standard deviation calculator, or verify the squared-spread pieces with the variance calculator and the mean calculator.

Decision Checklist

Use this formula when each group summarizes the same measurement unit.
Use group sizes as weights; do not give a group of 3 the same influence as a group of 300.
Confirm whether each reported SD is sample SD or population SD before choosing the denominator.
Include the between-group term when your goal is the SD of all observations combined.
Exclude the between-group term only when estimating a shared within-group SD for a pooled t-test.

Common Mistakes

Mistake: averaging SDs

Average SD = (0.1581 + 0.1291 + 0.1000) / 3 = 0.1291, which misses the group-center spread.

Mistake: using pooled SD

Pooled SD uses only within-group variation. It is useful for equal-variance tests, not for the SD of the merged dataset.

Mistake: mixing units

Do not combine centimeters, inches, percentages, and dollars in one SD unless every value has been converted to a common scale.

Mistake: hiding imbalance

A small high-variance group can look dramatic, but a much larger stable group may dominate the combined result.

FAQ

Can I combine standard deviations without group means?:Not if you need the SD of the full combined dataset. You need group means to calculate between-group variation.
Is combined SD the same as pooled SD?:No. Combined SD includes differences between group means. Pooled SD estimates a shared within-group SD and is usually smaller when group means differ.
What if the groups overlap?:Do not use this formula for overlapping groups. The observations would be double-counted, so N and the sums of squares would be wrong.
What if I only have variances?:Use variances directly in the formula by replacing s_i^2 with the reported variance. Do not square a value that is already a variance.

Sources

References and further authoritative reading used in preparing this article.

← Centro de Aprendizado

Reading goal	What to focus on	Common mistake
Definition	What the metric is and what quantity it summarizes	Treating the formula as self-explanatory
Formula choice	Sample versus population assumptions and notation	Using n when n-1 is required or vice versa
Interpretation	Whether the result indicates concentration, spread, or risk	Calling a large value good or bad without context