How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

拔靴法與標準差估計 | Standard Deviation Calculator

拔靴法：電腦時代的統計革命

拔靴重抽法是一種強大的統計技術，透過反覆從觀測資料中重新抽樣來估計任何統計量的抽樣分配。由布拉德利·艾弗隆在 1979 年提出，它徹底改變了統計推論，使得在不依賴數學公式或分配假設的情況下，也能分析複雜的統計量。

拔靴法背後的核心洞察極為簡潔：你的樣本就是對母體最好的估計。透過從樣本中重新取樣（放回抽樣），你模擬了如果能反覆從母體抽樣會發生什麼。這個方法對標準差特別有價值，因為傳統的信賴區間公式假設常態性——這個假設在實務中經常不成立。

拔靴法已成為現代資料科學的基本工具，因為它適用於任何統計量（中位數、相關係數、迴歸係數、神經網路權重），而且不對資料的底層分配做任何假設。

為什麼用拔靴法來估計標準差？

傳統的標準差信賴區間假設你的資料來自常態分配。當這個假設不成立時（很常見），這些區間可能嚴重失準。拔靴法提供了一個不需分配假設的替代方案。

傳統方法何時失效

基於卡方分配的標準差信賴區間假設常態性。對於偏態資料（收入、反應時間、存活資料），這可能產生 20-30% 時間都未能涵蓋真實參數的區間，而非預期的 5%。

拔靴法用於標準差的關鍵優勢：

不需分配假設： 對常態、偏態或重尾資料同樣有效
小樣本表現： 在 n < 30 時通常比參數方法更準確
處理複雜統計量： 同樣的方法適用於修剪標準差、MAD 或自訂變異量度
視覺化洞察： 拔靴分配讓你看到實際發生了什麼，而不只是最終數字

拔靴法程序

拔靴演算法非常簡單明瞭。從你的原始樣本（n 個觀測值）開始：

抽取拔靴樣本

從原始資料中以放回方式隨機抽取 n 個觀測值。有些值會出現多次，有些完全不出現。

計算統計量

計算這個拔靴樣本的標準差。這是一個拔靴複製值。

重複多次

重複步驟 1-2 數千次（通常 B = 10,000）。每次重複得到一個拔靴標準差。

分析分配

這 B 個拔靴標準差的集合近似於抽樣分配。用它來建構信賴區間和進行假設檢定。

為什麼要放回抽樣？

放回抽樣至關重要。它創造出組成不同的樣本，模擬你從母體中多次抽樣時會看到的變異性。不放回的話，每個樣本都會和原始樣本完全相同。

需要多少個拔靴樣本？ B = 1,000 通常足以做粗略估計和假設檢定。對於信賴區間，B = 10,000 能提供穩定的百分位數。對於發表品質的 BCa 區間，建議 B = 15,000 以上。

拔靴信賴區間方法

從拔靴樣本建構信賴區間有多種方法，各有其權衡：

1. 百分位法（最簡單）

最直觀的方法：直接取拔靴分配的百分位數。

百分位信賴區間

95% CI = [θ*₂.₅, θ*₉₇.₅]

對於 10,000 個拔靴樣本，這就是排序後的第 250 和第 9,750 個值。簡單但當拔靴分配偏態時可能有偏差。

2. 基本（樞軸）拔靴法

利用樣本統計量與拔靴統計量之間的關係：

基本拔靴信賴區間

95% CI = [2θ̂ - θ*₉₇.₅, 2θ̂ - θ*₂.₅]

其中 θ̂ 是原始樣本標準差。這會將百分位區間“反射”到樣本估計值周圍。

3. BCa 法（偏差校正加速法）

精確度的黃金標準。BCa 同時調整拔靴分配中的偏差和加速度（標準誤差如何隨參數值變化）。計算較複雜但提供二階準確的區間。

方法	優點	缺點
百分位法	簡單直觀	偏態資料時可能有偏差
基本法	對稱區間	可能產生負值
BCa 法	最準確，尊重轉換	計算量大

計算範例：非常態資料

考慮 15 個反應時間的測量值（毫秒）：245, 312, 287, 456, 234, 298, 267, 523, 289, 301, 278, 645, 256, 289, 312。這組資料呈右偏（有些反應特別慢）。

計算樣本標準差

原始樣本：n=15，SD = 109.8 ms

產生拔靴樣本

以放回方式抽取 10,000 個大小為 15 的樣本。每個樣本的組成都不同。

計算拔靴標準差

計算每個拔靴樣本的標準差，得到 10,000 個值，範圍大約從 ~60 到 ~180

找出百分位數

第 2.5 百分位數：72.3 ms，第 97.5 百分位數：156.8 ms

建構 95% 信賴區間

95% CI：[72.3, 156.8] ms。相比之下，卡方信賴區間：[79.4, 175.2]（假設常態）。

拔靴信賴區間是不對稱的（高端較寬），反映了資料右偏的特性。卡方信賴區間則無法捕捉這種不對稱性。

Python 實作

完整的拔靴法實作，支援多種信賴區間方法：

python

import numpy as np
from scipy import stats

def bootstrap_sd_ci(data, n_bootstrap=10000, ci=0.95, method='percentile'):
    """
    Bootstrap confidence interval for standard deviation.

    Parameters:
    -----------
    data : array-like - Original sample
    n_bootstrap : int - Number of bootstrap samples
    ci : float - Confidence level (e.g., 0.95)
    method : str - 'percentile', 'basic', or 'bca'

    Returns:
    --------
    tuple : (lower_bound, upper_bound, bootstrap_sds)
    """
    data = np.array(data)
    n = len(data)
    original_sd = np.std(data, ddof=1)

    # Generate bootstrap samples and calculate SDs
    bootstrap_sds = np.array([
        np.std(np.random.choice(data, size=n, replace=True), ddof=1)
        for _ in range(n_bootstrap)
    ])

    alpha = 1 - ci

    if method == 'percentile':
        lower = np.percentile(bootstrap_sds, 100 * alpha/2)
        upper = np.percentile(bootstrap_sds, 100 * (1 - alpha/2))

    elif method == 'basic':
        lower = 2*original_sd - np.percentile(bootstrap_sds, 100*(1-alpha/2))
        upper = 2*original_sd - np.percentile(bootstrap_sds, 100*alpha/2)

    elif method == 'bca':
        # Bias correction
        prop_less = np.mean(bootstrap_sds < original_sd)
        z0 = stats.norm.ppf(prop_less)

        # Acceleration (jackknife estimate)
        jackknife_sds = np.array([
            np.std(np.delete(data, i), ddof=1) for i in range(n)
        ])
        jack_mean = jackknife_sds.mean()
        a = np.sum((jack_mean - jackknife_sds)**3) / \
            (6 * np.sum((jack_mean - jackknife_sds)**2)**1.5)

        # Adjusted percentiles
        z_alpha = stats.norm.ppf([alpha/2, 1-alpha/2])
        adj_percentiles = stats.norm.cdf(
            z0 + (z0 + z_alpha) / (1 - a*(z0 + z_alpha))
        ) * 100
        lower = np.percentile(bootstrap_sds, adj_percentiles[0])
        upper = np.percentile(bootstrap_sds, adj_percentiles[1])

    return lower, upper, bootstrap_sds

# Example usage
response_times = [245, 312, 287, 456, 234, 298, 267, 523, 289, 301, 278, 645, 256, 289, 312]

for method in ['percentile', 'basic', 'bca']:
    lower, upper, _ = bootstrap_sd_ci(response_times, method=method)
    print(f"{method.upper():12s} 95% CI: [{lower:.1f}, {upper:.1f}]")

Reading goal	What to focus on	Common mistake
Definition	What the metric is and what quantity it summarizes	Treating the formula as self-explanatory
Formula choice	Sample versus population assumptions and notation	Using n when n-1 is required or vice versa
Interpretation	Whether the result indicates concentration, spread, or risk	Calling a large value good or bad without context

拔靴法與標準差估計