TL;DR
- Sample ratio mismatch is a traffic split that differs too much from the planned A/B allocation.
- For a 50/50 test with 12,000 users, one arm should be 6,000 +/- expected random noise.
- A 6,268 vs. 5,732 split is 4.89 standard deviations from expectation, so hold the analysis.
- Resolve SRM before reading conversion lift, revenue lift, or guardrail metric deltas.
The Problem
A product analyst launches a checkout A/B test with a planned 50/50 split. The dashboard shows the variant has better revenue per visitor, but the exposure counts are not balanced. Before the analyst can use the A/B test conversion-rate workflow, they need to know whether the assignment process itself is trustworthy.
Sample ratio mismatch is a data quality failure where the observed allocation differs materially from the expected allocation. Microsoft Research describes SRM as a check that must pass before experiment effects are analyzed, and Microsoft Learn warns not to draw conclusions from an experiment with SRM until the issue is addressed.
This page is written from the role of a senior experimentation analyst reviewing production experiment telemetry. The objective is not to declare a feature winner; it is to decide whether the experiment data is clean enough for lift, confidence interval, and hypothesis testing work.
Why Standard Deviation Helps
An A/B assignment is a random process. If 12,000 independent users are assigned 50/50, the control count will not always be exactly 6,000. Standard deviation gives the expected size of normal random imbalance, so you can tell a harmless wobble from a broken trigger, logging filter, bot rule, or allocation bug.
- Standard deviation:Standard deviation is a measure of the typical spread of observations around an expected value.
- Sample ratio mismatch:Sample ratio mismatch is an observed treatment allocation that is too far from the planned experiment ratio.
- Z-score:A z-score is a standardized distance from expectation, measured in standard deviation units.
Expected Standard Deviation of One A/B Arm Count
SRM Distance in Standard Deviations
Why This Is a Pre-Analysis Check
Worked Example
A SaaS checkout team tests a new payment-step layout for one week. The planned allocation is 50% control and 50% variant. The analyst exports exposed user counts by day before looking at conversion, because assignment imbalance can bias every downstream metric.
| Day | Control Exposures | Variant Exposures | Note |
|---|---|---|---|
| Mon | 842 | 781 | Slightly high control |
| Tue | 895 | 803 | Control-heavy |
| Wed | 911 | 816 | Control-heavy |
| Thu | 872 | 827 | Moderate imbalance |
| Fri | 946 | 839 | Control-heavy |
| Sat | 883 | 833 | Moderate imbalance |
| Sun | 919 | 833 | Control-heavy |
Calculating the SRM Signal
Decision Criteria
| SRM Distance | Practical Interpretation | Decision |
|---|---|---|
| Less than 2 SD | Common random imbalance for most routine checks | Continue, but keep the count check in the report |
| 2 to 3 SD | Borderline imbalance that may matter for high-stakes launches | Segment by platform, country, browser, and trigger path before analysis |
| More than 3 SD | Unlikely under the planned allocation if assignment and logging are healthy | Pause winner claims and investigate SRM before metric analysis |
| Repeated daily bias in one direction | Likely systematic routing, logging, eligibility, or bot-filter issue | Fix instrumentation or rerun the experiment |
Do Not Fix SRM by Reweighting First
Analyst Workflow
Lock the planned allocation
Export exposure counts, not conversions
Compute expected count and standard deviation
Convert the count gap to a z-score
Only analyze effects after the data quality check passes
Diagnosis Checklist
- Compare SRM by browser, app version, logged-in state, geography, and traffic source.
- Check whether one arm changes page load speed enough to affect exposure logging.
- Look for eligibility rules that exclude users after assignment but before exposure.
- Audit bot filters, privacy blockers, redirect handling, and duplicate user stitching.
- Review daily counts for a one-direction pattern instead of a single noisy day.
Tools & Next Steps
Z-Score Calculator
P-Value Calculator
Critical Value Calculator
A/B Test Conversion Rates
Further Reading
Sources
References and further authoritative reading used in preparing this article.
- Diagnosing Sample Ratio Mismatch in A/B Testing — Microsoft Research
- Experiments Best Practices and Recommendations — Microsoft Learn
- NIST/SEMATECH e-Handbook: Binomial Distribution — NIST/SEMATECH