Σ
SDCalc
ΜέτριοιApplications·11 min

Modified Z-Score Outlier Detection: Using MAD Instead of Standard Deviation

Learn how modified z-scores use median absolute deviation for more robust outlier detection, with formulas, thresholds, worked examples, and a practical checklist.

By Standard Deviation Calculator Team · Data Science Team·Published

What the Modified Z-Score Measures

A modified z-score is an outlier score built from the median and median absolute deviation (MAD) instead of the mean and standard deviation. That single change makes it much more resistant to a few extreme values.

A regular z-score asks how many standard deviations a value sits from the mean. A modified z-score asks a similar question, but uses robust statistics so the center and spread are not pulled around by the very outliers you are trying to detect.

Why analysts use it

If a dataset contains one bad sensor reading, one typo, or one unusually large transaction, standard z-scores can understate how extreme that point is because the mean and standard deviation both move. Modified z-scores stay anchored to the median and MAD.

If you want the broader background on MAD first, read Robust Statistics: MAD and IQR. If you want a calculator workflow, the Outlier Calculator, Z-Score Calculator, and Descriptive Statistics Calculator are the most relevant tools on this site.

Formula and Threshold

Modified z-score

M_i = 0.6745(x_i - median) / MAD

Here, `MAD = median(|x_i - median|)`. The constant `0.6745` rescales MAD so the modified z-score lines up with the usual z-score scale when the data are approximately normal.

ComponentMeaningWhy it is robust
MedianThe middle value after sorting the dataA few extreme values usually do not change it much
MADThe median of absolute distances from the medianExtreme distances do not dominate because the median is used again
0.6745Normal-distribution scaling constantMakes thresholds easier to compare with classic z-score intuition

Common cutoff

A widely used rule is to flag observations where `|M_i| > 3.5`. Some teams use stricter or looser cutoffs, but `3.5` is the standard starting point in many robust outlier workflows.
1

Step 1

Sort the data and find the median.
2

Step 2

Compute each absolute distance from the median.
3

Step 3

Take the median of those distances to get MAD.
4

Step 4

Compute `M_i = 0.6745(x_i - median) / MAD` for each value.
5

Step 5

Investigate values with `|M_i| > 3.5` instead of deleting them automatically.

Worked Example

Consider response times in seconds: `10, 11, 12, 12, 13, 14, 35`. The median is `12`. Absolute deviations from the median are `2, 1, 0, 0, 1, 2, 23`, so the MAD is `1`.

For the value `35`, the modified z-score is `0.6745 × (35 - 12) / 1 = 15.51`. That is far above `3.5`, so it is a strong outlier candidate. By contrast, the value `14` has score `0.6745 × 2 / 1 = 1.35`, which is not unusual.

ValueDistance from medianModified z-scoreFlag?
102-1.35No
111-0.67No
1200.00No
1310.67No
1421.35No
352315.51Yes

Why this differs from the classic z-score

In this dataset, the outlier inflates the mean and standard deviation. That can make the usual z-score look less extreme than it should. Modified z-scores avoid that circular problem, which is why they are often preferable for first-pass screening before you move to standard-deviation-based rules.

When It Works Best

Best use cases

Small and medium datasets, skewed operational data, lab results with occasional contamination, quality-control streams with rare failures, and exploratory analysis where outliers may already be present.

Less suitable cases

Very tiny datasets where any rule is unstable, multimodal data with multiple legitimate clusters, and datasets where MAD is zero because many values are tied at the median.

A useful decision rule is: if your downstream method depends on the mean and standard deviation, compare both views. Use modified z-scores to spot suspicious points first, then decide whether the business or scientific context justifies keeping, correcting, or separately analyzing them.

MAD can be zero

If many observations equal the median, MAD may be `0`, so the formula breaks. In that case, inspect the raw distances directly, use an IQR-based method, or switch to subject-matter rules rather than forcing a divide-by-zero workaround.

Decision Checklist

  • Use modified z-scores when outliers may already distort the mean and standard deviation.
  • Pair the method with the median and MAD, not with mean-only reporting.
  • Start with the standard cutoff `|M_i| > 3.5`, then tighten or loosen only with a stated reason.
  • Investigate flagged points against logs, units, instrument status, or source records before removing anything.
  • If the data are clean and approximately normal, compare results with classic z-scores and standard deviation.

Common Pitfalls

  • Pitfall 1:Treating a flagged point as proof of error. An outlier rule identifies observations worth review, not values that must be discarded.
  • Pitfall 2:Using mean absolute deviation instead of median absolute deviation. They are different measures and lead to different thresholds.
  • Pitfall 3:Applying the method to grouped or clearly multi-cluster data without checking whether the 'outlier' is actually a separate population.
  • Pitfall 4:Skipping context. A modified z-score is a screening tool, not a replacement for process knowledge, experimental design, or domain judgment.

Further Reading

Sources

References and further authoritative reading used in preparing this article.

  1. NIST/SEMATECH e-Handbook of Statistical MethodsNIST
  2. Robust measures of scaleWikipedia
  3. Iglewicz, B. and Hoaglin, D.C. (1993). How to Detect and Handle Outliers.ASQ Quality Press

How to Read This Article

A statistics tutorial is a practical interpretation guide, not just a formula dump. It refers to the assumptions, notation, and reporting language that analysts need when they explain a result to a teacher, manager, client, or reviewer. The article body covers the specific topic, while the sections below create a common interpretation frame that readers can reuse across related metrics.

Reading goalWhat to focus onCommon mistake
DefinitionWhat the metric is and what quantity it summarizesTreating the formula as self-explanatory
Formula choiceSample versus population assumptions and notationUsing n when n-1 is required or vice versa
InterpretationWhether the result indicates concentration, spread, or riskCalling a large value good or bad without context

Frequently Asked Questions

How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Authoritative References

These sources define the concepts referenced most often across our articles. Bessel's correction is a sample adjustment, variance is a squared measure of spread, and standard deviation is the square root of variance expressed in the same units as the data.