Standard Deviation Calculator for A/B Test Sample Ratio Mismatch

TL;DR

Sample ratio mismatch is a traffic split that differs too much from the planned A/B allocation.
For a 50/50 test with 12,000 users, one arm should be 6,000 +/- expected random noise.
A 6,268 vs. 5,732 split is 4.89 standard deviations from expectation, so hold the analysis.
Resolve SRM before reading conversion lift, revenue lift, or guardrail metric deltas.

The Problem

A product analyst launches a checkout A/B test with a planned 50/50 split. The dashboard shows the variant has better revenue per visitor, but the exposure counts are not balanced. Before the analyst can use the A/B test conversion-rate workflow, they need to know whether the assignment process itself is trustworthy.

Sample ratio mismatch is a data quality failure where the observed allocation differs materially from the expected allocation. Microsoft Research describes SRM as a check that must pass before experiment effects are analyzed, and Microsoft Learn warns not to draw conclusions from an experiment with SRM until the issue is addressed.

This page is written from the role of a senior experimentation analyst reviewing production experiment telemetry. The objective is not to declare a feature winner; it is to decide whether the experiment data is clean enough for lift, confidence interval, and hypothesis testing work.

Why Standard Deviation Helps

An A/B assignment is a random process. If 12,000 independent users are assigned 50/50, the control count will not always be exactly 6,000. Standard deviation gives the expected size of normal random imbalance, so you can tell a harmless wobble from a broken trigger, logging filter, bot rule, or allocation bug.

Standard deviation:Standard deviation is a measure of the typical spread of observations around an expected value.
Sample ratio mismatch:Sample ratio mismatch is an observed treatment allocation that is too far from the planned experiment ratio.
Z-score:A z-score is a standardized distance from expectation, measured in standard deviation units.

Expected Standard Deviation of One A/B Arm Count

sigma = sqrt(N * p * (1 - p))

SRM Distance in Standard Deviations

z = (observed count - expected count) / sigma

Why This Is a Pre-Analysis Check

A clean split does not prove the variant won. It only says the allocation counts look plausible enough to continue with the p-value calculator, z-score calculator, or confidence intervals guide.

Worked Example

A SaaS checkout team tests a new payment-step layout for one week. The planned allocation is 50% control and 50% variant. The analyst exports exposed user counts by day before looking at conversion, because assignment imbalance can bias every downstream metric.

Day	Control Exposures	Variant Exposures	Note
Mon	842	781	Slightly high control
Tue	895	803	Control-heavy
Wed	911	816	Control-heavy
Thu	872	827	Moderate imbalance
Fri	946	839	Control-heavy
Sat	883	833	Moderate imbalance
Sun	919	833	Control-heavy

Calculating the SRM Signal

Total exposed users = 12,000. Expected control count = 12,000 0.50 = 6,000. Observed control count = 6,268. Expected standard deviation = sqrt(12,000 0.50 * 0.50) = 54.77 users. The imbalance is (6,268 - 6,000) / 54.77 = 4.89 standard deviations. That is far beyond normal random assignment noise, so the analyst should hold the experiment analysis and diagnose the assignment path.

Decision Criteria

SRM Distance	Practical Interpretation	Decision
Less than 2 SD	Common random imbalance for most routine checks	Continue, but keep the count check in the report
2 to 3 SD	Borderline imbalance that may matter for high-stakes launches	Segment by platform, country, browser, and trigger path before analysis
More than 3 SD	Unlikely under the planned allocation if assignment and logging are healthy	Pause winner claims and investigate SRM before metric analysis
Repeated daily bias in one direction	Likely systematic routing, logging, eligibility, or bot-filter issue	Fix instrumentation or rerun the experiment

Do Not Fix SRM by Reweighting First

Reweighting can hide an instrumentation defect. First identify why users entered one arm more often: feature eligibility, triggered exposure logic, redirects, blocked scripts, device filters, or delayed event ingestion.

Analyst Workflow

Lock the planned allocation

Record whether the experiment was designed as 50/50, 90/10, or another split before reading results. For traffic planning, use the sample size calculator.

Export exposure counts, not conversions

Use the event that defines experiment exposure. Do not use purchase, signup, or click counts as the SRM denominator.

Compute expected count and standard deviation

For each arm, calculate expected count = N p and sigma = sqrt(N p * (1 - p)). This is the binomial spread expected from random assignment.

Convert the count gap to a z-score

Use the z-score calculator with observed minus expected count divided by sigma, then compare the result with the critical value calculator.

Only analyze effects after the data quality check passes

If the split is plausible, continue to conversion lift, standard error, confidence intervals, and guardrail monitoring such as feature flag rollout variability.

Diagnosis Checklist

Compare SRM by browser, app version, logged-in state, geography, and traffic source.
Check whether one arm changes page load speed enough to affect exposure logging.
Look for eligibility rules that exclude users after assignment but before exposure.
Audit bot filters, privacy blockers, redirect handling, and duplicate user stitching.
Review daily counts for a one-direction pattern instead of a single noisy day.

Tools & Next Steps

Z-Score Calculator

Convert the observed allocation gap into standard deviation units for the SRM check.

P-Value Calculator

Quantify how surprising the allocation gap is under the planned random split.

Critical Value Calculator

Choose a review threshold before looking at experiment outcomes.

A/B Test Conversion Rates

After SRM passes, use this workflow to evaluate lift stability and decision confidence.

Sources

References and further authoritative reading used in preparing this article.

Diagnosing Sample Ratio Mismatch in A/B Testing — Microsoft Research
Experiments Best Practices and Recommendations — Microsoft Learn
NIST/SEMATECH e-Handbook: Binomial Distribution — NIST/SEMATECH

← Ứng dụng Thực tế