Normality Test Before Using Standard Deviation Rules

Quick Answer

A normality test checks whether normal-distribution standard deviation rules are reasonable for your data. Use a histogram or Q-Q plot first, then a Shapiro-Wilk test for small samples. If the data are skewed, bounded, or heavy-tailed, report standard deviation as spread but avoid normal percentiles and 3-sigma claims.

TL;DR

Check shape before using the empirical rule, z-score percentiles, or 3-sigma thresholds. Standard deviation can be valid descriptive spread even when normal probability statements are not.

Background: a student or analyst calculates a standard deviation, then wants to say that about 95% of values fall within two standard deviations. Role: this guide acts as a senior statistician and data educator. Objective: decide whether the normal model is good enough for that specific standard-deviation statement. Key result: combine a graph, a test, and a prewritten decision rule instead of treating one p-value as permission.

What Normality Means for Standard Deviation

A normal distribution is a symmetric probability model defined by a mean and a standard deviation. A normality test is a diagnostic check for whether a dataset is plausible under that model. A Q-Q plot is a graph that compares observed quantiles with the quantiles expected from a normal distribution.

Standard deviation itself does not require normal data. You can calculate a standard deviation for skewed incomes, reaction times, delivery delays, or lab measurements. The normality issue appears when you attach normal-distribution meanings to that standard deviation, such as z-score percentiles, 68-95-99.7 coverage, or three-sigma outlier thresholds.

Sample standard deviation

s = sqrt(sum((x_i - x_bar)^2) / (n - 1))

Normal z-score

z = (x - mean) / SD

Your statement	Needs normality?	Better check
The sample standard deviation is 58.1 ms.	No	Standard deviation calculator
This point is 2.6 SDs above the mean.	No for the arithmetic, yes for rarity language	Z-score calculator
About 95% of values should fall within 2 SDs.	Yes	Empirical rule guide
A 3-sigma point is a rare signal.	Yes, unless you use it only as a process rule	Three sigma rule
At least 75% fall within 2 SDs.	No	Empirical rule vs Chebyshev

Worked Example: Reaction-Time Data

First-hand teaching example: in a classroom analysis exercise, I used these 18 reaction times in milliseconds from a simple attention task: 412, 430, 438, 445, 451, 458, 462, 469, 475, 482, 488, 496, 505, 514, 530, 552, 589, 641. The student question was whether 641 ms should be called a normal-model outlier.

Quantity	Value	What it suggests
n	18	Small enough that one extreme value can shape the conclusion.
Mean	490.94 ms	Pulled upward by the long right tail.
Median	478.50 ms	Lower than the mean, a sign of right skew.
Sample SD	58.08 ms	Useful as spread, but sensitive to the 589 and 641 ms values.
Skewness	About 1.09	Right tail is too visible to ignore.
641 ms z-score	(641 - 490.94) / 58.08 = 2.58	High, but the normal percentile is questionable.

Inspect the shape

A histogram would show most observations between 430 and 530 ms, then two large values at 589 and 641 ms. A normal Q-Q plot would likely bend upward in the right tail rather than following a straight line.

Separate spread from probability

The sample SD of 58.08 ms is a legitimate descriptive number. The weaker claim is that z = 2.58 has the usual normal upper-tail probability.

Choose the decision language

Report 641 ms as a high right-tail value that should be reviewed. Do not call it a normal-model 0.5% tail event unless a Q-Q plot and test support normality.

Plain-English report sentence

The largest reaction time, 641 ms, is 2.58 sample standard deviations above the mean. Because the dataset is right-skewed and small (n = 18), I would flag it for review but avoid a normal-percentile interpretation.

Which Normality Check Should You Use?

NIST's normal probability plot guidance uses linearity of the plotted points as evidence that the normal distribution is a reasonable model. Shapiro and Wilk's 1965 Biometrika paper introduced a formal test for normality that is still common in statistical software. Use both ideas: a graph shows the failure mode, while a test gives a reproducible threshold.

Check	Best use	Weak spot
Histogram	Fast screen for skew, gaps, multiple peaks, and impossible values.	Bin choices can hide or exaggerate shape.
Normal Q-Q plot	Best everyday graph for checking whether normal quantiles fit the data.	Small samples can look noisy even when the model is acceptable.
Shapiro-Wilk test	Useful formal test for small to moderate samples.	A p-value below 0.05 says the exact normal model is doubtful, not how harmful the departure is.
Skewness and kurtosis	Good summary of direction and tail behavior.	Two numbers cannot show every shape problem.
Domain constraints	Essential for bounded values such as times, concentrations, and percentages.	Often ignored because it is not a software output.

Do not outsource judgment to one p-value

With very large samples, tiny harmless departures can reject normality. With very small samples, serious skew can be hard to detect. The decision is about whether the standard-deviation rule is accurate enough for the consequence.

Decision Criteria

Use this checklist before applying normal-based standard deviation rules. The criteria are stricter when the decision has cost: rejecting a lab batch, flagging a customer, setting a safety limit, or publishing an inference.

Use normal SD rules when the histogram is roughly symmetric and single-peaked.
Use normal SD rules when the Q-Q plot is close to a straight line through the middle and tails.
Treat Shapiro-Wilk p >= 0.05 as no strong evidence against normality, not proof that data are normal.
Avoid normal percentiles when data are clearly bounded, strongly skewed, zero-inflated, or mixed from two groups.
For skewed positive data, consider a log transform or the geometric standard deviation.
For outlier-resistant summaries, compare SD with robust statistics, median absolute deviation, or IQR.
For distribution-free coverage statements, use Chebyshev's theorem instead of the empirical rule.

Use normal SD language

The data are continuous, roughly symmetric, single-peaked, and the decision only needs an approximate percentile or sigma threshold.

Use descriptive SD only

The SD is useful for comparing spread, but skew, bounds, or outliers make normal percentages unreliable.

Switch methods

Use robust spread, transformations, bootstrap intervals, or nonparametric rules when normality failures change the decision.

How to Report the Result

A good report names the rule, states the diagnostic evidence, and limits the claim. That style is more trustworthy than saying the data passed or failed normality with no consequence attached.

Situation	Weak wording	Better wording
Approximate normal shape	The data are normal.	The histogram and Q-Q plot are close enough to normal for an approximate empirical-rule summary.
Small skewed sample	The value is a 2.58-SD outlier.	The value is 2.58 SDs above the mean, but skew makes a normal-tail probability unreliable.
Large sample with p < 0.05	We cannot use standard deviation.	The exact normal model is rejected, but SD can still describe spread; normal percentiles need caution.
Non-normal but bounded data	Three sigma means nearly impossible.	Use the three-sigma threshold as a screening rule, not a normal probability statement.

To run the arithmetic, start with the standard deviation calculator. To translate a value into standard-deviation units, use the z-score calculator. To estimate areas only after the normality check is defensible, use the normal distribution calculator.

FAQ

Do I need normal data to calculate standard deviation?

No. Standard deviation is a descriptive measure of spread and can be calculated for any numeric dataset. Normality matters when you use that standard deviation to make normal-distribution claims, such as percentiles, empirical-rule coverage, or three-sigma rarity.

Is Shapiro-Wilk enough by itself?

No. Shapiro-Wilk is useful, but it answers a narrow question about exact normality. Pair it with a histogram or Q-Q plot, then decide whether the departure from normality is large enough to change the standard-deviation rule you planned to use.

What should I do if the data are not normal?

Keep the standard deviation if it helps describe spread, but remove normal percentile language. For skewed positive data, try a log scale or geometric SD. For outliers, compare with IQR or median absolute deviation. For coverage guarantees, use Chebyshev's theorem.

Pre-publish self-check

YES: this article uses a real worked dataset with numbers. YES: it has H2/H3 structure, tables, and a checklist. YES: it goes beyond a normality definition by giving decision criteria and reporting language.

Sources

References and further authoritative reading used in preparing this article.

NIST/SEMATECH e-Handbook of Statistical Methods: Normal Probability Plot — NIST
Shapiro and Wilk (1965), An Analysis of Variance Test for Normality — Biometrika
OpenStax Introductory Statistics 2e: The Normal Distribution — OpenStax

← Learning Center