Standard Deviation of a Probability Distribution

When to Use This Formula

When a problem gives you possible outcomes and their probabilities instead of a raw data list, you do not use the usual sample standard deviation workflow. You first compute the expected value of the random variable, then measure how far each outcome sits from that mean after weighting by probability.

This setup appears in reliability models, quality-control defect counts, insurance claims, game outcomes, and classroom probability questions. If you want to check the arithmetic numerically, the site's probability calculator, mean and variance calculator, and mean, variance, and standard deviation calculator are the most relevant companion tools.

Input type	What you know	Best standard deviation approach
Raw dataset	Observed values such as 4, 7, 9, 10	Use sample or population formulas on the data list
Frequency table	Observed values plus counts	Use weighted frequencies as shown in Standard Deviation from a Frequency Table
Probability distribution	Possible values plus probabilities summing to 1	Use expected value and probability-weighted variance

Key distinction

A probability distribution describes the behavior of the random variable itself, so its standard deviation is usually a population-style parameter. You are not estimating from a sample at this stage.

Core Formulas

For a discrete random variable X with outcomes xᵢ and probabilities pᵢ, the probabilities must satisfy Σpᵢ = 1. The mean of the distribution is:

Expected value

μ = E(X) = Σ(xᵢpᵢ)

The variance and standard deviation are then:

Variance of a discrete probability distribution

σ² = Σ[(xᵢ - μ)²pᵢ]

Standard deviation of a discrete probability distribution

σ = √[Σ((xᵢ - μ)²pᵢ)]

This is the same spread concept used in Standard Deviation Formula Explained and Understanding Variance, but the weights now come from probabilities rather than repeated observations.

Worked Example

Suppose X is the number of defective items found in a short production run. Its probability distribution is:

x	P(X = x)	xP(X = x)	x²P(X = x)
0	0.15	0.00	0.00
1	0.35	0.35	0.35
2	0.30	0.60	1.20
3	0.20	0.60	1.80
Total	1.00	1.55	3.35

The expected value is μ = 1.55. That means the long-run average number of defects per run is 1.55, even though 1.55 defects never occurs in a single run.

Using the variance formula directly gives σ² = (0 - 1.55)²(0.15) + (1 - 1.55)²(0.35) + (2 - 1.55)²(0.30) + (3 - 1.55)²(0.20) = 0.9475.

The standard deviation is σ = √0.9475 ≈ 0.973. In practical terms, the distribution typically varies by about one defect around its mean.

Verify the distribution

Check that all probabilities are between 0 and 1 and that they sum to 1.00.

Find the mean first

Compute Σxᵢpᵢ before touching variance. The mean anchors every deviation.

Square each distance from the mean

Use (xᵢ - μ)², not the unsquared distances, so positive and negative deviations do not cancel.

Weight by probability

Multiply each squared distance by its probability pᵢ.

Take the square root last

Variance comes first; standard deviation is its square root.

Interpretation tip

A standard deviation near 0 means most of the probability mass is tightly concentrated near the mean. A larger standard deviation means more probability sits farther away, either through wider spread or heavier tails.

Shortcut Method

Many textbook problems are faster with the computational identity:

Variance shortcut

σ² = E(X²) - [E(X)]² = Σ(xᵢ²pᵢ) - μ²

In the example above, E(X²) = 3.35 and μ² = 1.55² = 2.4025. So σ² = 3.35 - 2.4025 = 0.9475, which matches the long method exactly.

When the shortcut helps most

Use E(X²) - μ² when the table already includes an x²p(x) column or when you are doing the calculation by hand under time pressure.

Sample vs Distribution Standard Deviation

Students often mix up a sample standard deviation with the standard deviation of a probability distribution. They answer different questions. A sample standard deviation summarizes observed data and usually uses n - 1. A distribution standard deviation summarizes the theoretical model itself and uses the full probability weights.

Use sample SD when

You already observed a dataset and want to estimate variability from those measurements. See Sample vs Population for that workflow.

Use distribution SD when

You are given outcomes with probabilities and want the exact spread implied by the model, such as a binomial, geometric, or custom discrete distribution.

That difference also explains why a probability-distribution question usually does not use Bessel's correction. The probabilities already define the whole distribution, so there is no sample-estimation adjustment.

Practical Patterns

Scenario	What the standard deviation tells you
Defects per batch	How volatile the defect count is around the expected number of defects
Game payoff distribution	How risky or unpredictable the payoff is relative to its average
Demand outcomes in inventory planning	How much actual demand may swing around expected demand
Number of claims or failures	How stable or unstable the event count is over repeated periods

If the distribution is approximately bell-shaped after aggregation, Understanding Normal Distribution and Z-Score Explained help you translate standard deviation into probability statements and unusually high or low outcomes.

Quick Bernoulli example

If X equals 1 for a machine failure and 0 for no failure, with P(failure) = 0.08, then μ = 0.08, σ² = 0.08(0.92) = 0.0736, and σ ≈ 0.271. Even a binary event has a meaningful standard deviation because the result varies from run to run.

Problem-Solving Checklist

Check the inputs:Make sure the table gives **probabilities**, not frequencies. If you have counts instead, use the frequency-table workflow instead of the probability-distribution workflow.
Sum to one:Confirm that **Σp(x) = 1**. If not, the table is incomplete or the values need normalization.
Compute the mean first:Do not jump straight to squared deviations. Every variance term depends on **μ**.
Use the shortcut when convenient:If **Σx²p(x)** is easy to build, use **E(X²) - μ²** to reduce arithmetic mistakes.
Interpret the answer in context:A standard deviation is large or small only relative to the outcome scale, the mean, and the real decision you are making.

Once you see probabilities as weights, the formula becomes much easier to remember: find the mean, measure squared distance from that mean, weight by probability, and take the square root. The mathematics is the same idea as ordinary standard deviation, but the input is a model instead of a sample.

Sources

References and further authoritative reading used in preparing this article.

← Learning Center