How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Modified Z-Score Outlier Detection: Using MAD Instead of Standard Deviation

What the Modified Z-Score Measures

A modified z-score is an outlier score built from the median and median absolute deviation (MAD) instead of the mean and standard deviation. That single change makes it much more resistant to a few extreme values.

A regular z-score asks how many standard deviations a value sits from the mean. A modified z-score asks a similar question, but uses robust statistics so the center and spread are not pulled around by the very outliers you are trying to detect.

Why analysts use it

If a dataset contains one bad sensor reading, one typo, or one unusually large transaction, standard z-scores can understate how extreme that point is because the mean and standard deviation both move. Modified z-scores stay anchored to the median and MAD.

If you want the broader background on MAD first, read Robust Statistics: MAD and IQR. If you want a calculator workflow, the Outlier Calculator, Z-Score Calculator, and Descriptive Statistics Calculator are the most relevant tools on this site.

Formula and Threshold

Modified z-score

M_i = 0.6745(x_i - median) / MAD

Here, `MAD = median(|x_i - median|)`. The constant `0.6745` rescales MAD so the modified z-score lines up with the usual z-score scale when the data are approximately normal.

Component	Meaning	Why it is robust
Median	The middle value after sorting the data	A few extreme values usually do not change it much
MAD	The median of absolute distances from the median	Extreme distances do not dominate because the median is used again
0.6745	Normal-distribution scaling constant	Makes thresholds easier to compare with classic z-score intuition

Common cutoff

A widely used rule is to flag observations where `|M_i| > 3.5`. Some teams use stricter or looser cutoffs, but `3.5` is the standard starting point in many robust outlier workflows.

Step 1

Sort the data and find the median.

Step 2

Compute each absolute distance from the median.

Step 3

Take the median of those distances to get MAD.

Step 4

Compute `M_i = 0.6745(x_i - median) / MAD` for each value.

Step 5

Investigate values with `|M_i| > 3.5` instead of deleting them automatically.

Worked Example

Consider response times in seconds: `10, 11, 12, 12, 13, 14, 35`. The median is `12`. Absolute deviations from the median are `2, 1, 0, 0, 1, 2, 23`, so the MAD is `1`.

For the value `35`, the modified z-score is `0.6745 × (35 - 12) / 1 = 15.51`. That is far above `3.5`, so it is a strong outlier candidate. By contrast, the value `14` has score `0.6745 × 2 / 1 = 1.35`, which is not unusual.

Value	Distance from median	Modified z-score	Flag?
10	2	-1.35	No
11	1	-0.67	No
12	0	0.00	No
13	1	0.67	No
14	2	1.35	No
35	23	15.51	Yes

Why this differs from the classic z-score

In this dataset, the outlier inflates the mean and standard deviation. That can make the usual z-score look less extreme than it should. Modified z-scores avoid that circular problem, which is why they are often preferable for first-pass screening before you move to standard-deviation-based rules.

When It Works Best

Best use cases

Small and medium datasets, skewed operational data, lab results with occasional contamination, quality-control streams with rare failures, and exploratory analysis where outliers may already be present.

Less suitable cases

Very tiny datasets where any rule is unstable, multimodal data with multiple legitimate clusters, and datasets where MAD is zero because many values are tied at the median.

A useful decision rule is: if your downstream method depends on the mean and standard deviation, compare both views. Use modified z-scores to spot suspicious points first, then decide whether the business or scientific context justifies keeping, correcting, or separately analyzing them.

MAD can be zero

If many observations equal the median, MAD may be `0`, so the formula breaks. In that case, inspect the raw distances directly, use an IQR-based method, or switch to subject-matter rules rather than forcing a divide-by-zero workaround.

Decision Checklist

Use modified z-scores when outliers may already distort the mean and standard deviation.
Pair the method with the median and MAD, not with mean-only reporting.
Start with the standard cutoff `|M_i| > 3.5`, then tighten or loosen only with a stated reason.
Investigate flagged points against logs, units, instrument status, or source records before removing anything.
If the data are clean and approximately normal, compare results with classic z-scores and standard deviation.

Common Pitfalls

Pitfall 1:Treating a flagged point as proof of error. An outlier rule identifies observations worth review, not values that must be discarded.
Pitfall 2:Using mean absolute deviation instead of median absolute deviation. They are different measures and lead to different thresholds.
Pitfall 3:Applying the method to grouped or clearly multi-cluster data without checking whether the 'outlier' is actually a separate population.
Pitfall 4:Skipping context. A modified z-score is a screening tool, not a replacement for process knowledge, experimental design, or domain judgment.

Sources

References and further authoritative reading used in preparing this article.

NIST/SEMATECH e-Handbook of Statistical Methods — NIST
Robust measures of scale — Wikipedia
Iglewicz, B. and Hoaglin, D.C. (1993). How to Detect and Handle Outliers. — ASQ Quality Press

← Κέντρο Μάθησης

Reading goal	What to focus on	Common mistake
Definition	What the metric is and what quantity it summarizes	Treating the formula as self-explanatory
Formula choice	Sample versus population assumptions and notation	Using n when n-1 is required or vice versa
Interpretation	Whether the result indicates concentration, spread, or risk	Calling a large value good or bad without context