What is the Standard Deviation Formula?
The standard deviation formula is the mathematical equation used to quantify the amount of variation or dispersion in a set of data values. A low standard deviation indicates that the data points tend to be close to the mean (μ or x̄), while a high standard deviation indicates that the data points are spread out over a wider range of values.
In statistics, the formula you use depends on whether you are working with an entire population or a sample drawn from that population. The core concept involves calculating the average of the squared deviations from the mean, known as the variance (σ²), and then taking the square root to return the measurement to the original units.
Population Standard Deviation
- σ (sigma): Population standard deviation
- Σ (sigma): Sum of...
- xi: Each individual value in the dataset
- μ (mu): Population mean
- N: Total number of data points in the population
Population vs. Sample Standard Deviation
In real-world data analysis, it is rare to have data for an entire population. Most of the time, we collect a sample to make inferences about the larger population. Because a sample only estimates the population mean, calculating standard deviation using the population formula on a sample consistently underestimates the true variability. To correct this bias, we use the sample standard deviation formula.
Sample Standard Deviation
Don't mix up your formulas!
Step-by-Step Calculation of the Formula
Calculating standard deviation by hand requires a systematic approach. By following these steps, you can accurately compute either the population or sample standard deviation for any dataset.
Calculate the Mean
Find the Deviations
Square the Deviations
Sum the Squared Deviations
Divide by N or n-1
Take the Square Root
Why Does the Sample Formula Divide by n-1?
Dividing by n-1 instead of n is a concept known as Bessel's correction. Because the sample mean (x̄) is calculated from the sample data itself, the deviations (xi - x̄) are mathematically constrained to sum to zero. This means the data points are slightly closer to the sample mean than they are to the true population mean (μ).
By dividing by n-1 (the degrees of freedom), we inflate the variance just enough to compensate for this underestimation, providing an unbiased estimator of the population variance.
Further Reading
Sources
References and further authoritative reading used in preparing this article.