Σ
SDCalc

Statistics Glossary

Definitions of key terms in statistics, probability, and data analysis. Browse alphabetically or jump to a category.

Measures of Central Tendency

Mean (μ / x̄)

The arithmetic average of a set of values, calculated by summing all values and dividing by the count. It represents the central tendency of the data and is the most commonly used measure of center.

Median

The middle value in a sorted data set. If there is an even number of values, the median is the average of the two middle values. It is resistant to outliers and is preferred over the mean for skewed distributions.

Mode

The value that appears most frequently in a data set. A data set can have one mode (unimodal), multiple modes (multimodal), or no mode at all. The mode is the only measure of center applicable to nominal categorical data.

Weighted Mean

An average in which each data point is multiplied by a weight reflecting its relative importance, then summed and divided by the total weight. Used when observations contribute unequally to the result.

Geometric Mean

The nth root of the product of n positive values. Used for averaging rates, ratios, and percentage changes; always less than or equal to the arithmetic mean for positive data.

Expected Value

The long-run average value of a random variable, computed as the probability-weighted sum of all possible outcomes. Symbolized E(X) and equal to the population mean μ.

Measures of Variability

Variance (σ² / s²)

The average of the squared differences from the mean. Variance quantifies the degree of spread in a data set and is the square of the standard deviation. Useful in mathematical operations but harder to interpret than σ.

Interquartile Range (IQR)

The difference between the 75th percentile (Q3) and the 25th percentile (Q1). The IQR measures the spread of the middle 50% of data and is resistant to outliers, making it ideal for skewed distributions.

Related terms:Quartile, Percentile, Outlier, Range

Mean Absolute Deviation (MAD)

The average of the absolute differences between each data point and the mean. Less affected by extreme values than variance because deviations are not squared.

Pooled Standard Deviation

A weighted average of the standard deviations from two or more groups, assuming equal population variances. Used in two-sample t-tests and effect size calculations.

Probability Distributions

Standard Normal Distribution

A normal distribution with mean 0 and standard deviation 1, denoted N(0, 1). Any normal variable can be standardized to this distribution using a z-score transformation.

t-Distribution

A symmetric, bell-shaped distribution similar to the normal but with heavier tails. Used for inference about the mean when the population standard deviation is unknown and the sample size is small.

Uniform Distribution

A distribution in which every outcome in a given range is equally likely. May be discrete (e.g., a fair die) or continuous (e.g., random number between 0 and 1).

Binomial Distribution

The discrete probability distribution of the number of successes in n independent yes/no trials, each with the same success probability p. Parameters: n and p.

Poisson Distribution

A discrete distribution expressing the probability of a given number of events occurring in a fixed interval of time or space, given a constant mean rate λ.

Probability Distribution

A mathematical function that gives the probabilities of occurrence of different possible outcomes. Can be discrete (PMF) or continuous (PDF).

Central Limit Theorem

States that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population's distribution. Foundational to statistical inference.

Bernoulli Trial

A single experiment with exactly two possible outcomes, success or failure, with a fixed probability p of success. The building block of the binomial distribution.

Hypothesis Testing

Alternative Hypothesis (H₁ / Hₐ)

The statement that contradicts the null hypothesis, typically representing the effect or difference the researcher hopes to demonstrate. Can be one-sided or two-sided.

Significance Level (α)

The threshold probability at which a researcher rejects the null hypothesis, commonly set at 0.05. Equal to the probability of a Type I error.

Statistical Power

The probability that a hypothesis test correctly rejects a false null hypothesis (1 − β). Higher with larger samples, larger effect sizes, and higher α.

Confidence Interval

A range of values, computed from sample data, likely to contain the true population parameter with a specified level of confidence (e.g., 95%). Wider intervals indicate less precision.

Confidence Level

The long-run proportion of confidence intervals (built from repeated samples) that contain the true parameter. Common values: 90%, 95%, 99%.

Margin of Error

The half-width of a confidence interval, equal to the critical value times the standard error. Indicates the maximum expected sampling error.

Effect Size

A quantitative measure of the magnitude of a phenomenon, independent of sample size. Common measures include Cohen's d, Pearson's r, and odds ratios.

Correlation & Regression

Correlation Coefficient (r)

A value between −1 and 1 that measures the strength and direction of the linear relationship between two variables. Values near ±1 indicate a strong linear relationship; 0 indicates no linear association.

Spearman Correlation

A non-parametric correlation based on the ranks of the data rather than raw values. Measures the strength of any monotonic relationship and is robust to outliers.

Covariance

A measure of the joint variability of two random variables. Positive when variables tend to move together, negative when they move in opposite directions; standardized form is the Pearson correlation.

Regression

A statistical method for modeling the relationship between a dependent variable and one or more independent variables. Linear regression fits the best straight line through the data.

Sampling & Estimation

Population

The complete set of all individuals or observations of interest in a study. Population parameters are typically denoted with Greek letters (μ, σ).

Related terms:Sample, Parameter, Sampling

Sample

A subset of a population selected for analysis. Sample statistics are typically denoted with Latin letters (x̄, s) and are used to estimate population parameters.

Related terms:Population, Sampling, Statistic

Sampling

The process of selecting a subset of individuals from a population for study. Methods include simple random, stratified, cluster, and systematic sampling.

Related terms:Sample, Population, Bias

Sampling Distribution

The probability distribution of a statistic obtained by drawing all possible samples of the same size from a population. The sampling distribution of the mean is approximately normal by the CLT.

Bias

A systematic error that causes the expected value of an estimator to differ from the true parameter. Examples include selection bias, response bias, and measurement bias.

Sample Size (n)

The number of observations in a sample. Larger samples generally yield more precise estimates and greater statistical power but cost more to collect.

Parameter

A numerical value that summarizes a characteristic of an entire population, such as μ or σ. Usually unknown and estimated from sample statistics.

Statistic

A numerical value computed from a sample, such as x̄ or s. Used to estimate population parameters.

Related terms:Parameter, Sample, Estimator

Estimator

A rule or formula for calculating an estimate of a parameter from sample data. Good estimators are unbiased, consistent, and efficient.

Related terms:Parameter, Statistic, Bias

Data Properties & Diagnostics

Outlier

A data point that is significantly different from other observations. Common detection methods include values beyond ±2 or ±3 standard deviations from the mean, or outside Q1 − 1.5×IQR and Q3 + 1.5×IQR.

Skewness

A measure of the asymmetry of a probability distribution. Positive skew means the tail extends to the right; negative skew means it extends to the left; zero skew indicates symmetry.

Kurtosis

A measure of the tailedness of a probability distribution. High kurtosis (leptokurtic) indicates heavy tails and a sharp peak; low kurtosis (platykurtic) indicates light tails and a flat peak.

Percentile

A value below which a given percentage of observations fall. For example, the 90th percentile is the value below which 90% of the data points are found.

Quartile

Values that divide a sorted data set into four equal parts. Q1 (25th percentile), Q2 (median, 50th), and Q3 (75th percentile). Used to compute the IQR and construct box plots.

Frequency Distribution

A summary showing how often each value (or range of values) occurs in a data set. Often visualized with histograms or bar charts.

Robust Statistics

Statistical methods that perform well even when assumptions are violated or when outliers are present. Examples include the median, MAD, and trimmed mean.