What are Outliers?
Outliers are data points that differ significantly from other observations. They can be caused by measurement errors, data entry mistakes, or they might represent genuinely unusual cases worth investigating.
The orange point at (10, 50) is an outlier
The 3-Sigma Rule
For normally distributed data, points beyond 3 standard deviations from the mean are considered outliers. They occur less than 0.3% of the time by chance.
Outlier if
Example
Z-Score Method
Calculate the z-score for each data point. If |z| > 3 (or sometimes 2.5), it's an outlier.
Z-Score
Threshold Options
IQR Method (Alternative)
The Interquartile Range (IQR) method is more robust to outliers because it doesn't use the mean or standard deviation.
Step 1
Step 2
Step 3
Step 4
Step 5
Handling Outliers
Don't Automatically Delete!