How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Cohen's d 与效应量计算 | Standard Deviation Calculator

超越统计显著性：理解效应量

效应量衡量的是差异或关系的大小，与样本量无关。虽然 p 值告诉你某个效应是否具有统计显著性，但效应量告诉你这个效应在实际中有多重要。这种区分对于研究、医学、教育和商业中的循证决策至关重要。

假设一项临床试验显示新药相比安慰剂有统计学上显著的改善（p < 0.001）。如果没有效应量，你不知道这种改善是 0.1% 还是 50%。效应量提供了这一关键背景，帮助利益相关者判断该效应是否值得付出成本、副作用或实施努力。

比较两组最常用的效应量指标是 Cohen's d，它以标准差为单位表达均值差异。这种标准化使得不同研究和测量尺度之间的比较成为可能。

为什么效应量很重要

统计显著性深受样本量的影响。样本足够大时，即使微不足道的差异也能“显著”。相反，重要的效应在小样本中可能达不到显著性。效应量通过提供一个不受样本量影响的指标来解决这个问题。

显著性陷阱

一项 n=10,000 的研究可能显示百分制量表上 0.5 分差异的 p < 0.001。这在统计上显著，但在实际中毫无意义（d ≈ 0.05）。务必在报告 p 值的同时报告效应量。

使用效应量的关键原因：

荟萃分析：可以将不同研究的效应量合并以估计总体效应
统计功效分析：计算未来研究所需的样本量时必不可少
实际决策：帮助判断干预措施是否值得实施
可重复性：为重复研究提供可匹配的目标

Cohen's d：标准效应量指标

Cohen's d 以合并标准差为单位表达两组均值的差异：

Cohen's d

d = (M₁ - M₂) / sp

其中 M₁ 和 M₂ 是组均值，sp 是合并标准差，计算方式为：

合并标准差

sp = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)]

d 的正负号表示方向：M₁ > M₂ 时为正，M₁ < M₂ 时为负。当方向从上下文已经明确时，通常报告绝对值 |d|。

为什么要合并标准差？

合并假设两组具有相等的总体方差。这比单独使用任一组的标准差更稳定，也与独立样本 t 检验的假设一致。

其他效应量指标

虽然 Cohen's d 最为常用，但特定情况下有替代方案：

Hedges' g：偏差校正的效应量

Cohen's d 在小样本中会略微高估总体效应量。Hedges' g 应用了一个校正因子：

Hedges' g 校正

g = d × (1 - 3/(4(n₁+n₂) - 9))

当每组样本量超过 20 时，两者差异可以忽略。对于小样本（n < 20），推荐使用 Hedges' g。

Glass's Δ：方差不等时使用

当其中一组是方差已知的对照组时，仅使用对照组的标准差作为分母：

Glass's Delta

Δ = (M₁ - M₂) / s_control

当处理可能影响方差时（例如，某项干预对低水平学生帮助大于高水平学生），此方法特别有用。

效应量的解读：Cohen 的参考标准

Jacob Cohen 提出了以下解读 d 值的参考标准：

效应量 (d)	解读	重叠度
0.2	小	两组 85% 重叠
0.5	中	两组 67% 重叠
0.8	大	两组 53% 重叠
1.2	非常大	两组 40% 重叠
2.0	极大	两组 19% 重叠

需要结合具体背景

这些只是粗略的参考，而非绝对标准。在某些领域，d = 0.2 可能意义重大（如降低心脏病发作风险），而在其他领域 d = 0.8 可能是预期之中的（如有辅导 vs. 无辅导）。

计算示例：教育干预

一所学校测试一个新的阅读课程。对照组 (n=25)：均值=72，标准差=12。实验组 (n=30)：均值=79，标准差=14。计算 Cohen's d：

计算合并方差

sp² = [(25-1)(12)² + (30-1)(14)²] / (25+30-2) = [24×144 + 29×196] / 53 = [3456 + 5684] / 53 = 172.45

计算合并标准差

sp = √172.45 = 13.13

计算 Cohen's d

d = (79 - 72) / 13.13 = 7 / 13.13 = 0.53

解读

中等效应量 (d = 0.53)。实验组的得分比对照组高出约半个标准差。

这意味着如果从实验组和对照组中各随机选一名学生，实验组学生得分更高的概率约为 64%（根据重叠度计算）。

Python 实现

通过编程计算效应量及其置信区间：

python

import numpy as np
from scipy import stats

def cohens_d(group1, group2):
    """Calculate Cohen's d for two independent groups."""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    # Cohen's d
    d = (np.mean(group1) - np.mean(group2)) / pooled_std
    return d

def hedges_g(group1, group2):
    """Calculate Hedges' g (bias-corrected effect size)."""
    n1, n2 = len(group1), len(group2)
    d = cohens_d(group1, group2)

    # Correction factor for small sample bias
    correction = 1 - 3 / (4*(n1+n2) - 9)
    return d * correction

# Example usage
control = [68, 72, 75, 70, 69, 74, 71, 73, 76, 72]
treatment = [75, 79, 82, 78, 80, 77, 81, 76, 83, 79]

d = cohens_d(treatment, control)
g = hedges_g(treatment, control)
print(f"Cohen's d: {d:.3f}")
print(f"Hedges' g: {g:.3f}")

Reading goal	What to focus on	Common mistake
Definition	What the metric is and what quantity it summarizes	Treating the formula as self-explanatory
Formula choice	Sample versus population assumptions and notation	Using n when n-1 is required or vice versa
Interpretation	Whether the result indicates concentration, spread, or risk	Calling a large value good or bad without context

Cohen's d 与效应量计算