The Problem
An IQ score by itself can look more precise than it really is. A reported 115, 130, or 92 may drive screening, placement, or research summaries, but the raw number does not tell you how unusual that score is, how tightly a local group clusters, or whether a small point difference is practically meaningful.
Standard deviation is the missing context. Many major IQ composites are normed to mean 100 with SD 15, while some scaled subtest scores use mean 10 and SD 3. If you do not keep the score scale and its standard deviation in view, it is easy to overstate differences between students, cohorts, or testing periods. This page focuses on statistical interpretation only, not clinical diagnosis.
Why Standard Deviation Helps
Standard deviation tells you how far scores typically sit from the mean. In IQ reporting, that matters because most downstream decisions are based on relative standing, not raw points alone. Once you know the SD for the score scale, you can translate a score into a z-score, an approximate percentile, and a normal-curve interpretation using the normal distribution guide and the Empirical Rule.
Sample Standard Deviation for an IQ Score Set
Two SD Systems Commonly Appear
Standard deviation is also what turns IQ scores into operational decisions. It helps a school team judge whether a score near a cutoff is materially different from the mean, whether a screening cohort is unusually homogeneous, and whether subgroup comparisons should be reported as raw-point gaps or standardized distance from the norm. The guide to interpreting standard deviation and the z-score explanation are the best follow-on references when you need to explain the result to non-statistical stakeholders.
Worked Example
Suppose a district screens two enrichment cohorts. Both cohorts average 118 on the same composite score scale, but their spreads differ. That changes how many students fall near a gifted-review threshold of 130.
| Cohort | Mean IQ | Standard Deviation | What the Same Mean Hides |
|---|---|---|---|
| Cohort A | 118 | 14 | Wide spread; many students sit both near and well above the cutoff |
| Cohort B | 118 | 6 | Tighter cluster; far fewer students are likely to reach 130 |
| Published norm reference | 100 | 15 | General-population comparison scale |
How the Decision Changes
Decision Criteria
| Observed Pattern | What It Often Means | Recommended Next Step |
|---|---|---|
| Local SD is close to the published norm SD | Your group spread looks broadly similar to the reference population | Use z-scores and percentile interpretation with more confidence |
| Local SD is much smaller than the norm SD | The sample may be range-restricted, highly selected, or unusually homogeneous | Avoid over-interpreting small point gaps and summarize with descriptive statistics |
| Local SD is much larger than expected | The cohort may contain mixed subgroups, inconsistent testing conditions, or data-quality problems | Check subgroup composition, scoring, and outliers before drawing program conclusions |
| A score sits near a cutoff such as 70, 85, 115, or 130 | The practical interpretation depends on how many SD from the mean the cutoff lies | Convert the cutoff with the z-score calculator and pair it with percentile context |
| You are comparing composite and scaled subtest scores | The same raw-point gap can mean very different things across scales | Normalize both results to SD units before comparing them directly |
Do Not Treat SD as a Diagnostic Verdict
Workflow
Confirm the score scale before doing any math
Decide whether you are using published norms or a local sample
Translate important scores into SD units
Add percentile and distribution context
Inspect spread before comparing groups
- Keep composite scores and scaled subtest scores in separate analyses unless you explicitly standardize them first.
- Document whether the SD you report comes from the published norm group or your local dataset.
- Treat small raw-point differences near a cutoff cautiously, especially when confidence intervals overlap.
- If the local sample is selected, screened, or very small, explain that the observed SD may not represent the broader population.