Σ
SDCalc
IntermediárioFormulas·11 min

Combined Mean and Standard Deviation from Groups

Learn the combined standard deviation formula for grouped summary statistics, including the combined mean, within-group variation, between-group variation, and a worked example.

By Standard Deviation Calculator Team · Data Science Team·Published

Quick Answer

TL;DR

To combine group standard deviations, compute the combined mean first, add within-group and between-group sums of squares, then divide by total degrees of freedom.
  • Combined mean is the weighted average of group means, using each group size as the weight.
  • Combined standard deviation is the spread of all observations after the groups are treated as one dataset.
  • Within-group variation comes from each group's own standard deviation.
  • Between-group variation comes from group means sitting above or below the combined mean.
  • Use the sample formula when group SDs are sample SDs; use the population formula only for complete populations.

A student or analyst usually needs this formula after receiving only summaries: group size, group mean, and group standard deviation. The raw rows may be in separate lab sheets, classrooms, production lots, or survey waves. As a data educator, the practical objective is to recover the standard deviation of the combined dataset without making the common error of averaging the SDs.

This guide focuses on combining group summaries. If your problem is about adding independent random variables, use Combining Standard Deviations. If your problem is a two-sample t-test that assumes equal variances, use Pooled Standard Deviation or the pooled standard deviation calculator.

When This Formula Applies

The combined standard deviation formula applies when you know, for each group, n_i, mean_i, and s_i, and you want the sample standard deviation of the union of all observations.

SituationUse this article?Reason
Three classes report n, mean, and sample SD; you need the SD of all students togetherYesYou are combining group summaries into one dataset.
Two labs report instrument error SDs; you need total error SDNoThat is propagation of independent variation; use Combining Standard Deviations.
Two treatment groups need one equal-variance estimate for a t-testUsually noThat is pooled SD; it estimates shared within-group variability and ignores between-group mean differences.
You have all raw observations in one columnNoUse the sample standard deviation calculator directly.

Do not average standard deviations

Averaging group SDs loses group-size weights and misses between-group spread. The combined SD can be larger than every group SD when group means differ.

Combined Mean Formula

The combined mean is a weighted mean. Each group mean contributes in proportion to the number of observations in that group.

Combined mean

xbar = sum(n_i * xbar_i) / sum(n_i)

This step is required before calculating the combined standard deviation because the between-group term measures how far each group mean is from the combined mean.

Combined Standard Deviation Formula

For sample standard deviations, combine the sums of squares, not the SDs. The formula has two parts: within-group variation and between-group variation.

Combined sample variance from group summaries

s^2 = [sum((n_i - 1) * s_i^2) + sum(n_i * (xbar_i - xbar)^2)] / (N - 1)

Combined sample standard deviation

s = sqrt(s^2)

Here N = sum(n_i). The first sum rebuilds the within-group sum of squares from each sample SD. The second sum adds the extra spread caused by different group means.

TermWhat it measuresWhy it matters
sum((n_i - 1) * s_i^2)Within-group sum of squaresReconstructs the spread inside each group.
sum(n_i * (xbar_i - xbar)^2)Between-group sum of squaresAdds spread created by group centers being different.
N - 1Total sample degrees of freedomMatches the ordinary sample variance denominator for all observations together.

For a full population, replace sample SDs with population SDs and divide by N instead of N - 1:

Combined population variance

sigma^2 = [sum(n_i * sigma_i^2) + sum(n_i * (mu_i - mu)^2)] / N

Worked Example

For this article, we verified the grouped formula against a raw-row check using three small inspection batches. The summary version below is what an analyst would have if the original batch sheets were already archived.

BatchRaw readings used for verificationnMeanSample SD
A9.8, 10.1, 10.0, 10.2, 9.9510.000.1581
B10.4, 10.5, 10.6, 10.7410.550.1291
C9.6, 9.7, 9.839.700.1000
1

Find the combined mean

N = 5 + 4 + 3 = 12. Combined mean = (5*10.00 + 4*10.55 + 3*9.70) / 12 = 10.1083.
2

Compute within-group sum of squares

Within SS = (5-1)*0.1581^2 + (4-1)*0.1291^2 + (3-1)*0.1000^2 = 0.17 after rounding.
3

Compute between-group sum of squares

Between SS = 5*(10.00-10.1083)^2 + 4*(10.55-10.1083)^2 + 3*(9.70-10.1083)^2 = 1.3390.
4

Divide by total degrees of freedom

Sample variance = (0.17 + 1.3390) / 11 = 0.1372.
5

Take the square root

Combined sample SD = sqrt(0.1372) = 0.3704.

Why the answer is larger than the group SDs

The largest group SD is only 0.1581, but the combined SD is 0.3704 because Batch B is centered high and Batch C is centered low. The overall spread includes both the scatter inside each batch and the distance between batch centers.

To audit the arithmetic, paste the 12 raw readings into the sample standard deviation calculator, or verify the squared-spread pieces with the variance calculator and the mean calculator.

Decision Checklist

  • Use this formula when each group summarizes the same measurement unit.
  • Use group sizes as weights; do not give a group of 3 the same influence as a group of 300.
  • Confirm whether each reported SD is sample SD or population SD before choosing the denominator.
  • Include the between-group term when your goal is the SD of all observations combined.
  • Exclude the between-group term only when estimating a shared within-group SD for a pooled t-test.

Common Mistakes

Mistake: averaging SDs

Average SD = (0.1581 + 0.1291 + 0.1000) / 3 = 0.1291, which misses the group-center spread.

Mistake: using pooled SD

Pooled SD uses only within-group variation. It is useful for equal-variance tests, not for the SD of the merged dataset.

Mistake: mixing units

Do not combine centimeters, inches, percentages, and dollars in one SD unless every value has been converted to a common scale.

Mistake: hiding imbalance

A small high-variance group can look dramatic, but a much larger stable group may dominate the combined result.

FAQ

  • Can I combine standard deviations without group means?:Not if you need the SD of the full combined dataset. You need group means to calculate between-group variation.
  • Is combined SD the same as pooled SD?:No. Combined SD includes differences between group means. Pooled SD estimates a shared within-group SD and is usually smaller when group means differ.
  • What if the groups overlap?:Do not use this formula for overlapping groups. The observations would be double-counted, so N and the sums of squares would be wrong.
  • What if I only have variances?:Use variances directly in the formula by replacing s_i^2 with the reported variance. Do not square a value that is already a variance.

Further Reading

Sources

References and further authoritative reading used in preparing this article.

  1. NIST/SEMATECH e-Handbook of Statistical Methods: Measures of ScaleNIST
  2. NIST/SEMATECH e-Handbook of Statistical Methods: Chi-Square Test for the VarianceNIST

How to Read This Article

A statistics tutorial is a practical interpretation guide, not just a formula dump. It refers to the assumptions, notation, and reporting language that analysts need when they explain a result to a teacher, manager, client, or reviewer. The article body covers the specific topic, while the sections below create a common interpretation frame that readers can reuse across related metrics.

Reading goalWhat to focus onCommon mistake
DefinitionWhat the metric is and what quantity it summarizesTreating the formula as self-explanatory
Formula choiceSample versus population assumptions and notationUsing n when n-1 is required or vice versa
InterpretationWhether the result indicates concentration, spread, or riskCalling a large value good or bad without context

Frequently Asked Questions

How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Authoritative References

These sources define the concepts referenced most often across our articles. Bessel's correction is a sample adjustment, variance is a squared measure of spread, and standard deviation is the square root of variance expressed in the same units as the data.