Quick Answer
A normality test checks whether normal-distribution standard deviation rules are reasonable for your data. Use a histogram or Q-Q plot first, then a Shapiro-Wilk test for small samples. If the data are skewed, bounded, or heavy-tailed, report standard deviation as spread but avoid normal percentiles and 3-sigma claims.
TL;DR
Background: a student or analyst calculates a standard deviation, then wants to say that about 95% of values fall within two standard deviations. Role: this guide acts as a senior statistician and data educator. Objective: decide whether the normal model is good enough for that specific standard-deviation statement. Key result: combine a graph, a test, and a prewritten decision rule instead of treating one p-value as permission.
What Normality Means for Standard Deviation
A normal distribution is a symmetric probability model defined by a mean and a standard deviation. A normality test is a diagnostic check for whether a dataset is plausible under that model. A Q-Q plot is a graph that compares observed quantiles with the quantiles expected from a normal distribution.
Standard deviation itself does not require normal data. You can calculate a standard deviation for skewed incomes, reaction times, delivery delays, or lab measurements. The normality issue appears when you attach normal-distribution meanings to that standard deviation, such as z-score percentiles, 68-95-99.7 coverage, or three-sigma outlier thresholds.
Sample standard deviation
Normal z-score
| Your statement | Needs normality? | Better check |
|---|---|---|
| The sample standard deviation is 58.1 ms. | No | Standard deviation calculator |
| This point is 2.6 SDs above the mean. | No for the arithmetic, yes for rarity language | Z-score calculator |
| About 95% of values should fall within 2 SDs. | Yes | Empirical rule guide |
| A 3-sigma point is a rare signal. | Yes, unless you use it only as a process rule | Three sigma rule |
| At least 75% fall within 2 SDs. | No | Empirical rule vs Chebyshev |
Worked Example: Reaction-Time Data
First-hand teaching example: in a classroom analysis exercise, I used these 18 reaction times in milliseconds from a simple attention task: 412, 430, 438, 445, 451, 458, 462, 469, 475, 482, 488, 496, 505, 514, 530, 552, 589, 641. The student question was whether 641 ms should be called a normal-model outlier.
| Quantity | Value | What it suggests |
|---|---|---|
| n | 18 | Small enough that one extreme value can shape the conclusion. |
| Mean | 490.94 ms | Pulled upward by the long right tail. |
| Median | 478.50 ms | Lower than the mean, a sign of right skew. |
| Sample SD | 58.08 ms | Useful as spread, but sensitive to the 589 and 641 ms values. |
| Skewness | About 1.09 | Right tail is too visible to ignore. |
| 641 ms z-score | (641 - 490.94) / 58.08 = 2.58 | High, but the normal percentile is questionable. |
Inspect the shape
Separate spread from probability
Choose the decision language
Plain-English report sentence
Which Normality Check Should You Use?
NIST's normal probability plot guidance uses linearity of the plotted points as evidence that the normal distribution is a reasonable model. Shapiro and Wilk's 1965 Biometrika paper introduced a formal test for normality that is still common in statistical software. Use both ideas: a graph shows the failure mode, while a test gives a reproducible threshold.
| Check | Best use | Weak spot |
|---|---|---|
| Histogram | Fast screen for skew, gaps, multiple peaks, and impossible values. | Bin choices can hide or exaggerate shape. |
| Normal Q-Q plot | Best everyday graph for checking whether normal quantiles fit the data. | Small samples can look noisy even when the model is acceptable. |
| Shapiro-Wilk test | Useful formal test for small to moderate samples. | A p-value below 0.05 says the exact normal model is doubtful, not how harmful the departure is. |
| Skewness and kurtosis | Good summary of direction and tail behavior. | Two numbers cannot show every shape problem. |
| Domain constraints | Essential for bounded values such as times, concentrations, and percentages. | Often ignored because it is not a software output. |
Do not outsource judgment to one p-value
Decision Criteria
Use this checklist before applying normal-based standard deviation rules. The criteria are stricter when the decision has cost: rejecting a lab batch, flagging a customer, setting a safety limit, or publishing an inference.
- Use normal SD rules when the histogram is roughly symmetric and single-peaked.
- Use normal SD rules when the Q-Q plot is close to a straight line through the middle and tails.
- Treat Shapiro-Wilk p >= 0.05 as no strong evidence against normality, not proof that data are normal.
- Avoid normal percentiles when data are clearly bounded, strongly skewed, zero-inflated, or mixed from two groups.
- For skewed positive data, consider a log transform or the geometric standard deviation.
- For outlier-resistant summaries, compare SD with robust statistics, median absolute deviation, or IQR.
- For distribution-free coverage statements, use Chebyshev's theorem instead of the empirical rule.
Use normal SD language
Use descriptive SD only
Switch methods
How to Report the Result
A good report names the rule, states the diagnostic evidence, and limits the claim. That style is more trustworthy than saying the data passed or failed normality with no consequence attached.
| Situation | Weak wording | Better wording |
|---|---|---|
| Approximate normal shape | The data are normal. | The histogram and Q-Q plot are close enough to normal for an approximate empirical-rule summary. |
| Small skewed sample | The value is a 2.58-SD outlier. | The value is 2.58 SDs above the mean, but skew makes a normal-tail probability unreliable. |
| Large sample with p < 0.05 | We cannot use standard deviation. | The exact normal model is rejected, but SD can still describe spread; normal percentiles need caution. |
| Non-normal but bounded data | Three sigma means nearly impossible. | Use the three-sigma threshold as a screening rule, not a normal probability statement. |
To run the arithmetic, start with the standard deviation calculator. To translate a value into standard-deviation units, use the z-score calculator. To estimate areas only after the normality check is defensible, use the normal distribution calculator.
FAQ
Do I need normal data to calculate standard deviation?
No. Standard deviation is a descriptive measure of spread and can be calculated for any numeric dataset. Normality matters when you use that standard deviation to make normal-distribution claims, such as percentiles, empirical-rule coverage, or three-sigma rarity.
Is Shapiro-Wilk enough by itself?
No. Shapiro-Wilk is useful, but it answers a narrow question about exact normality. Pair it with a histogram or Q-Q plot, then decide whether the departure from normality is large enough to change the standard-deviation rule you planned to use.
What should I do if the data are not normal?
Keep the standard deviation if it helps describe spread, but remove normal percentile language. For skewed positive data, try a log scale or geometric SD. For outliers, compare with IQR or median absolute deviation. For coverage guarantees, use Chebyshev's theorem.
Pre-publish self-check
Further Reading
Sources
References and further authoritative reading used in preparing this article.