TL;DR
For Stata users, standard deviation is the fastest audit of whether a mean can be trusted. Run `summarize`, confirm valid N and units, compare subgroup SD with `tabstat`, then review influential observations before reporting a policy, clinical, or research result.
- Stata is statistical software that applied researchers use to manage data, run models, and produce reproducible output.
- The `summarize` command is a Stata command that reports count, mean, standard deviation, minimum, and maximum for numeric variables.
- Standard deviation is a scale statistic that estimates the typical distance of observed values from the mean.
- A z-score is a standardized value that shows how many standard deviations one observation sits from the mean.
- Use the sample standard deviation calculator to verify high-stakes Stata output before publication.
Research Problem
A health services analyst receives a Stata `.dta` file from a clinic pilot. The outcome is patient wait time in minutes after triage. The program manager wants to report the average wait time, but one long delay may be changing the story. The specific question is whether the mean wait time is stable enough to use in a decision memo.
This page treats the analyst as a senior public-health data reviewer. The objective is to turn a Stata descriptive table into a decision: report the mean and SD, segment by clinic or shift, source-check an outlier, or replace a fragile mean-only summary with a richer distribution review.
Stata Analyst Role
StataCorp documents `summarize` as the base descriptive command for variables and `tabstat` as a tabulation command for selected statistics. In practice, a Stata analyst should not copy the SD into a report until the variable type, missing-value rules, subgroup logic, and potential outliers have been checked.
Why this matters in Stata
Stata Workflow
Confirm the variable is numeric and correctly coded
Run the baseline descriptive table
Request distribution detail when the range looks suspicious
Compare groups before making an operational recommendation
Convert spread into an action
summarize wait_minutes
local wait_mean = r(mean)
local wait_sd = r(sd)
generate z_wait = (wait_minutes - `wait_mean') / `wait_sd'
summarize wait_minutes, detail
tabstat wait_minutes, by(clinic_shift) statistics(n mean sd min max)Do not reuse r(mean) after another command
Sample Standard Deviation Behind Stata summarize
Manual z-score check
Worked Example
A clinic pilot records wait time in minutes for 12 consecutive patients after triage: 18, 22, 20, 24, 19, 21, 26, 23, 20, 22, 25, and 41. The analyst needs to brief operations leadership on whether the pilot is producing consistent waits.
| Patient | Wait Minutes | Manual z-score | Review Note |
|---|---|---|---|
| 1 | 18 | -0.90 | Within expected range |
| 2 | 22 | -0.23 | Within expected range |
| 3 | 20 | -0.57 | Within expected range |
| 4 | 24 | 0.10 | Within expected range |
| 5 | 19 | -0.73 | Within expected range |
| 6 | 21 | -0.40 | Within expected range |
| 7 | 26 | 0.43 | Within expected range |
| 8 | 23 | -0.07 | Within expected range |
| 9 | 20 | -0.57 | Within expected range |
| 10 | 22 | -0.23 | Within expected range |
| 11 | 25 | 0.26 | Within expected range |
| 12 | 41 | 2.91 | Source-check before memo |
Interpreting the Stata output
Cross-check Stata by hand
Decision Criteria
| Stata Pattern | What It Means | Decision |
|---|---|---|
| Mean and SD are stable after source-checking | The descriptive table supports a concise summary | Report mean, SD, N, minimum, maximum, and the Stata command used |
| Maximum or minimum is more than about 2.5 SD from the mean in a small file | One observation may be a real exception, data-entry error, or operational incident | Source-check the row and compare with the outlier detection guide |
| Group SD differs sharply across clinics or shifts | The pooled mean hides operational heterogeneity | Report group-level `tabstat` output before making a single-system recommendation |
| Valid N changes after filters or merges | The analysis sample is not stable | Document inclusion rules and rerun the descriptive table with explicit `if` conditions |
| SD is large relative to the operating target | The average may satisfy the target while many patients still wait too long | Pair SD with percentiles, median, IQR, or confidence intervals |
NIST frames standard deviation as a measure of scale, not a decision by itself. For Stata users, that means the SD must be tied to the dataset's unit, the command syntax, the sample definition, and the operational threshold. A reproducible `.do` file should make those choices visible.
Reporting Checklist
- Variable definition:Name the outcome, unit, coding direction, and plausible range before interpreting the SD.
- Sample logic:Use sample SD unless the Stata file contains the complete population you intend to describe.
- Syntax trail:Keep the exact `summarize` or `tabstat` command in the analysis log so the table can be reproduced.
- Outlier review:Use z-scores, percentiles, and the [z-score calculator](/tools/z-score) to decide which rows need source review.
- Mean precision:When the question is uncertainty around the mean, move from SD to the [standard error of the mean calculator](/tools/standard-error-of-the-mean).
Evolve the Analysis
The weakest Stata write-up says, "average wait time was 23.42 minutes with SD 6.04." Replace it with a concrete methods sentence: "Using Stata `summarize` on 12 valid triage waits, mean wait was 23.42 minutes, SD was 6.04, range was 18-41, and the 41-minute delay was source-checked because its z-score was 2.91."
Pre-Publish Check
- Yes: the page includes a worked example with 12 concrete wait times, mean 23.42, SD 6.04, and z-score 2.91.
- Yes: the structure uses H2 sections, Stata syntax, formulas, a decision table, and a reporting checklist.
- Yes: the guidance goes beyond the formula by covering Stata command choice, returned results, filters, outlier review, and decision criteria.
Tools & Next Steps
Sample Standard Deviation
Z-Score Calculator
Standard Error
Sample vs Population
Further Reading
Sources
References and further authoritative reading used in preparing this article.