Robust Statistics: MAD, IQR, and Outlier-Resistant Methods

Why Robust Statistics?

Standard deviation is a powerful measure of spread, but it has a critical weakness: extreme sensitivity to outliers. A single extreme value can dramatically inflate the SD, giving a misleading picture of typical variation.

Robust statistics provide measures of spread that resist the influence of outliers, making them essential for real-world data where measurement errors, data entry mistakes, or genuine extreme cases are common.

Example: The Outlier Effect

Data: 10, 12, 11, 13, 12, 11, 100 (one outlier) Standard Deviation: 32.4 (dominated by outlier) MAD: 1.0 (ignores the outlier) IQR: 1.5 (ignores the outlier)

Breakdown Point

A statistic's "breakdown point" is the proportion of data that can be extreme before the statistic becomes meaningless. SD has a breakdown point of 0% (one outlier can destroy it). MAD and IQR have breakdown points of 50%—half your data can be outliers and they still work.

Median Absolute Deviation (MAD)

MAD is the most robust measure of spread. It calculates the median of absolute deviations from the median:

MAD Formula

MAD = median(|xᵢ - median(x)|)

Find Median

Calculate the median of your dataset.

Calculate Deviations

Subtract the median from each value and take absolute values.

Find MAD

Calculate the median of these absolute deviations.

Scaling MAD to estimate σ: For normally distributed data, MAD ≈ 0.6745 × σ. To estimate SD from MAD, multiply by 1.4826:

SD Estimate from MAD

σ̂ = 1.4826 × MAD

Why 1.4826?

This scaling factor comes from the relationship between MAD and SD for normal distributions. It ensures the scaled MAD is an unbiased estimator of the true standard deviation when data is normal.

Interquartile Range (IQR)

IQR measures the spread of the middle 50% of data—the range between the 25th and 75th percentiles:

IQR Formula

IQR = Q3 - Q1 = 75th percentile - 25th percentile

IQR is widely used because it's simple to understand, easy to visualize in box plots, and forms the basis of the common "1.5×IQR rule" for outlier detection.

Scaling IQR to estimate σ: For normal data, IQR ≈ 1.35 × σ. To estimate SD from IQR:

SD Estimate from IQR

σ̂ = IQR / 1.35 ≈ 0.7413 × IQR

Comparing Robust Measures

Standard Deviation

Uses all data points · Most efficient for normal data · Very sensitive to outliers · Breakdown point: 0%

MAD

Most robust measure · Uses median (not mean) · Immune to any outliers · Breakdown point: 50%

IQR

Easy to understand · Used in box plots · Ignores extreme 50% · Breakdown point: 25%

When to Use Robust Statistics

Exploratory analysis: When you don't know if outliers exist, start with robust measures
Data quality issues: When data may contain errors or measurement problems
Heavy-tailed distributions: When extreme values are expected (financial returns, insurance claims)
Small samples: When outliers have outsized impact due to few observations
Outlier detection: Using SD to detect outliers is circular; use IQR or MAD instead

Implementation Examples

Python

import numpy as np
from scipy import stats

def mad(data):
    """Median Absolute Deviation"""
    median = np.median(data)
    return np.median(np.abs(data - median))

def scaled_mad(data):
    """MAD scaled to estimate SD (for normal data)"""
    return 1.4826 * mad(data)

def iqr(data):
    """Interquartile Range"""
    return np.percentile(data, 75) - np.percentile(data, 25)

# Compare on data with outlier
data = [10, 12, 11, 13, 12, 11, 100]
print(f"SD: {np.std(data, ddof=1):.2f}")
print(f"MAD: {mad(data):.2f}")
print(f"Scaled MAD: {scaled_mad(data):.2f}")
print(f"IQR: {iqr(data):.2f}")

Sources

References and further authoritative reading used in preparing this article.

← Learning Center