Modified Z-Score Outlier Detection: Using MAD Instead of Standard Deviation

What the Modified Z-Score Measures

A modified z-score is an outlier score built from the median and median absolute deviation (MAD) instead of the mean and standard deviation. That single change makes it much more resistant to a few extreme values.

A regular z-score asks how many standard deviations a value sits from the mean. A modified z-score asks a similar question, but uses robust statistics so the center and spread are not pulled around by the very outliers you are trying to detect.

Why analysts use it

If a dataset contains one bad sensor reading, one typo, or one unusually large transaction, standard z-scores can understate how extreme that point is because the mean and standard deviation both move. Modified z-scores stay anchored to the median and MAD.

If you want the broader background on MAD first, read Robust Statistics: MAD and IQR. If you want a calculator workflow, the Outlier Calculator, Z-Score Calculator, and Descriptive Statistics Calculator are the most relevant tools on this site.

Formula and Threshold

Modified z-score

M_i = 0.6745(x_i - median) / MAD

Here, `MAD = median(|x_i - median|)`. The constant `0.6745` rescales MAD so the modified z-score lines up with the usual z-score scale when the data are approximately normal.

Component	Meaning	Why it is robust
Median	The middle value after sorting the data	A few extreme values usually do not change it much
MAD	The median of absolute distances from the median	Extreme distances do not dominate because the median is used again
0.6745	Normal-distribution scaling constant	Makes thresholds easier to compare with classic z-score intuition

Common cutoff

A widely used rule is to flag observations where `|M_i| > 3.5`. Some teams use stricter or looser cutoffs, but `3.5` is the standard starting point in many robust outlier workflows.

Step 1

Sort the data and find the median.

Step 2

Compute each absolute distance from the median.

Step 3

Take the median of those distances to get MAD.

Step 4

Compute `M_i = 0.6745(x_i - median) / MAD` for each value.

Step 5

Investigate values with `|M_i| > 3.5` instead of deleting them automatically.

Worked Example

Consider response times in seconds: `10, 11, 12, 12, 13, 14, 35`. The median is `12`. Absolute deviations from the median are `2, 1, 0, 0, 1, 2, 23`, so the MAD is `1`.

For the value `35`, the modified z-score is `0.6745 × (35 - 12) / 1 = 15.51`. That is far above `3.5`, so it is a strong outlier candidate. By contrast, the value `14` has score `0.6745 × 2 / 1 = 1.35`, which is not unusual.

Value	Distance from median	Modified z-score	Flag?
10	2	-1.35	No
11	1	-0.67	No
12	0	0.00	No
13	1	0.67	No
14	2	1.35	No
35	23	15.51	Yes

Why this differs from the classic z-score

In this dataset, the outlier inflates the mean and standard deviation. That can make the usual z-score look less extreme than it should. Modified z-scores avoid that circular problem, which is why they are often preferable for first-pass screening before you move to standard-deviation-based rules.

When It Works Best

Best use cases

Small and medium datasets, skewed operational data, lab results with occasional contamination, quality-control streams with rare failures, and exploratory analysis where outliers may already be present.

Less suitable cases

Very tiny datasets where any rule is unstable, multimodal data with multiple legitimate clusters, and datasets where MAD is zero because many values are tied at the median.

A useful decision rule is: if your downstream method depends on the mean and standard deviation, compare both views. Use modified z-scores to spot suspicious points first, then decide whether the business or scientific context justifies keeping, correcting, or separately analyzing them.

MAD can be zero

If many observations equal the median, MAD may be `0`, so the formula breaks. In that case, inspect the raw distances directly, use an IQR-based method, or switch to subject-matter rules rather than forcing a divide-by-zero workaround.

Decision Checklist

Use modified z-scores when outliers may already distort the mean and standard deviation.
Pair the method with the median and MAD, not with mean-only reporting.
Start with the standard cutoff `|M_i| > 3.5`, then tighten or loosen only with a stated reason.
Investigate flagged points against logs, units, instrument status, or source records before removing anything.
If the data are clean and approximately normal, compare results with classic z-scores and standard deviation.

Common Pitfalls

Pitfall 1:Treating a flagged point as proof of error. An outlier rule identifies observations worth review, not values that must be discarded.
Pitfall 2:Using mean absolute deviation instead of median absolute deviation. They are different measures and lead to different thresholds.
Pitfall 3:Applying the method to grouped or clearly multi-cluster data without checking whether the 'outlier' is actually a separate population.
Pitfall 4:Skipping context. A modified z-score is a screening tool, not a replacement for process knowledge, experimental design, or domain judgment.

Sources

References and further authoritative reading used in preparing this article.

NIST/SEMATECH e-Handbook of Statistical Methods — NIST
Robust measures of scale — Wikipedia
Iglewicz, B. and Hoaglin, D.C. (1993). How to Detect and Handle Outliers. — ASQ Quality Press

← Tanulóközpont