Σ
SDCalc
高级高级·14 min

加权标准差

学习当数据点具有不同重要性或频率时,如何计算加权标准差。

什么是加权标准差?

当数据点具有不同的重要性或代表不同的频率时,我们使用加权标准差。这在投资组合分析、带有抽样权重的调查数据以及 GPA 计算中很常见。

在标准(无权重)计算中,每个数据点对均值和标准差的贡献相同。但现实中往往需要赋予某些观测更大的影响力。100 万元的投资对组合波动性的影响应该大于 1000 元的仓位。来自较大人口群体的调查回答在估计总体参数时应该获得更大的权重。

何时使用加权标准差

当数据点具有不同的重要性、频率或可靠性水平时,请使用加权标准差。无权重标准差假设所有数据点同等重要——这在很多情况下是不正确的。

加权标准差公式

首先,你需要计算加权均值:

加权均值

x̄w = Σ(wᵢxᵢ) / Σwᵢ

然后,计算加权标准差(总体版本):

加权标准差(总体)

σw = √[Σwᵢ(xᵢ - x̄w)² / Σwᵢ]

其中 wᵢ 为权重,xᵢ 为数据值,x̄w 为加权均值。

对于样本数据,使用偏差校正公式(类似于贝塞尔校正):

加权标准差(样本)

sw = √[Σwᵢ(xᵢ - x̄w)² / (Σwᵢ - Σwᵢ²/Σwᵢ)]

样本校正更为复杂,因为“有效样本量”取决于权重的分布。当所有权重相等时,该公式退化为我们熟悉的 n-1 校正。

逐步计算过程

1

计算加权均值

将每个数值乘以其权重,求和后除以权重总和。
2

计算加权偏差平方

对每个数值求 (数值 - 加权均值)²,再乘以对应权重。
3

求加权偏差平方和

将第二步中所有乘积相加。
4

除以权重总和

总体标准差除以 Σwᵢ。样本标准差使用偏差校正。
5

开平方根

得到最终的加权标准差。

实际应用

投资组合波动性:在金融领域,计算投资组合标准差必须考虑不同的资产配置比例。一个 50% 股票、50% 债券的组合,其波动性是用加权标准差计算的,权重为配置百分比。

调查分析:调查样本往往会过度或不足代表某些人口群体。加权调整可以确保结果反映真实的总体状况。加权标准差捕捉的是总体的变异性,而非仅仅是样本的变异性。

学业成绩:计算 GPA 时,不同课程有不同的学分。4 学分课程对 GPA 的影响应该大于 1 学分课程。加权计算自然地处理了这个问题。

荟萃分析:合并多个研究的结果时,每个研究按其精度(通常为方差倒数)加权。这使得样本量更大、更精确的研究获得更大的影响力。

计算示例

投资组合示例:考虑一个包含三只股票的组合:

  • 股票 A:15% 收益率,50% 配置(权重 = 0.50)
  • 股票 B:8% 收益率,30% 配置(权重 = 0.30)
  • 股票 C:-2% 收益率,20% 配置(权重 = 0.20)

加权均值 = (0.50×15 + 0.30×8 + 0.20×(-2)) / 1.0 = 9.5%

加权标准差 = √[(0.50×(15-9.5)² + 0.30×(8-9.5)² + 0.20×(-2-9.5)²)] = √[(0.50×30.25 + 0.30×2.25 + 0.20×132.25)] = √[15.125 + 0.675 + 26.45] = √42.25 = 6.5%

注意影响

股票 C 仅占 20% 的配置,但因为其收益率与加权均值偏差很大,所以对波动性贡献很大。这正是加权标准差所捕捉的——偏差和权重同等重要。

Further Reading

How to Read This Article

A statistics tutorial is a practical interpretation guide, not just a formula dump. It refers to the assumptions, notation, and reporting language that analysts need when they explain a result to a teacher, manager, client, or reviewer. The article body covers the specific topic, while the sections below create a common interpretation frame that readers can reuse across related metrics.

Reading goalWhat to focus onCommon mistake
DefinitionWhat the metric is and what quantity it summarizesTreating the formula as self-explanatory
Formula choiceSample versus population assumptions and notationUsing n when n-1 is required or vice versa
InterpretationWhether the result indicates concentration, spread, or riskCalling a large value good or bad without context

Frequently Asked Questions

How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Authoritative References

These sources define the concepts referenced most often across our articles. Bessel's correction is a sample adjustment, variance is a squared measure of spread, and standard deviation is the square root of variance expressed in the same units as the data.