기하 표준편차: 완벽 가이드

기하 표준편차를 사용할 때

기하 표준편차(GSD: Geometric Standard Deviation)는 데이터가 가산적이 아닌 승산적(곱셈적)인 경우—성장률, 비율, 농도, 또는 로그정규 분포를 따르는 측정값—에 적합한 산포 측도입니다.

주식 수익률을 생각해 보세요: 10% 상승 후 10% 하락은 원래 금액으로 돌아오지 않습니다(원래의 99%가 됩니다). 이런 승산적 관계에는 산술 통계 대신 기하 통계가 필요합니다.

핵심 통찰

데이터가 여러 자릿수에 걸쳐 있고, 항상 양수이며, 일반 그래프에서는 오른쪽으로 치우쳐 보이지만 로그 척도에서는 대칭으로 보인다면—기하 통계가 필요한 로그정규 데이터를 다루고 있는 것입니다.

로그정규 데이터 이해하기

데이터의 자연로그가 정규분포를 따를 때 로그정규 분포라고 합니다. 대표적인 예는 다음과 같습니다:

주가와 투자 수익률의 시간 경과
소득 및 재산 분포
에어로졸과 의약품의 입자 크기
세균 집락 수와 바이러스 부하
환경 오염물질 농도
항체 역가와 약물 농도

핵심 특징: 반복적인 곱셈이 관여하는 과정이 로그정규 분포를 만들어냅니다. 반복적인 덧셈이 정규분포를 만드는 것과 같은 원리입니다.

공식과 계산

기하 표준편차

GSD = exp(√[Σ(ln xᵢ - ln x̄ₘ)² / (n-1)])

더 간단히 말하면: 모든 값의 자연로그를 구하고, 일반 표준편차를 계산한 뒤, 지수함수를 적용합니다.

데이터 변환

각 값의 자연로그 계산: yᵢ = ln(xᵢ)

평균 계산

로그 값의 산술평균 구하기: ȳ = Σyᵢ/n

SD 계산

로그 값의 표준편차 구하기: s = √[Σ(yᵢ-ȳ)²/(n-1)]

역변환

지수함수를 적용하여 GSD 구하기: GSD = eˢ

Python

import numpy as np
from scipy import stats

def geometric_sd(data):
    """Calculate geometric standard deviation"""
    log_data = np.log(data)
    sd_log = np.std(log_data, ddof=1)
    return np.exp(sd_log)

def geometric_mean(data):
    """Calculate geometric mean"""
    return stats.gmean(data)

# Example: Antibody titers (highly variable, log-normal)
titers = [64, 128, 256, 128, 512, 64, 256]
gm = geometric_mean(titers)
gsd = geometric_sd(titers)
print(f"Geometric Mean: {gm:.1f}")
print(f"Geometric SD: {gsd:.2f}")

GSD 값 해석

데이터와 같은 단위인 산술 SD와 달리, GSD는 승수(곱셈 인자)—비율입니다. GSD가 2.0이면 데이터가 일반적으로 2배 범위 내에서 변동한다는 의미입니다.

GSD = 1.0:변동 없음 (실무에서 불가능)
GSD ≈ 1.2:낮은 변동성 (±20% 수준)
GSD ≈ 2.0:보통 변동성 (데이터가 2배/반으로 변동)
GSD ≈ 3.0:높은 변동성 (10배 범위)

신뢰구간

로그정규 데이터의 경우 95% 범위는 대략: 기하평균 ÷ GSD² ~ 기하평균 × GSD²입니다. GM=100이고 GSD=2라면 범위는 25에서 400입니다.

실전 응용

제약 과학

입자 크기 분포 (D50, GSD) · 약물 농도 변동성 · 생체이용률 연구 · 에어로졸 특성화

금융 & 경제

투자 수익률 변동성 · 성장률 분석 · 소득 분포 연구 · 자산 가격 모델링

GSD vs 일반 SD

로그정규 데이터에 산술 SD를 사용하면 오해의 소지가 있는 결과가 나옵니다:

예시: 바이러스 부하 데이터

값: 1,000; 5,000; 10,000; 50,000; 100,000 copies/mL 산술 평균 ± SD: 33,200 ± 41,424 기하평균 × GSD: 10,000 × 4.5 → 범위: 2,222 ~ 45,000 산술 SD로 계산하면 음수값이 가능하다고 나옵니다—바이러스 부하에서는 불가능한 결과입니다!

항상 분포를 확인하세요

산포 측도를 계산하기 전에 데이터를 시각화하세요. 긴 꼬리를 가진 오른쪽 치우침이 보이면 로그 변환을 시도해 보세요. 대칭이 되면 기하 통계를 사용하세요.

A statistics tutorial is a practical interpretation guide, not just a formula dump. It refers to the assumptions, notation, and reporting language that analysts need when they explain a result to a teacher, manager, client, or reviewer. The article body covers the specific topic, while the sections below create a common interpretation frame that readers can reuse across related metrics.

Reading goal	What to focus on	Common mistake
Definition	What the metric is and what quantity it summarizes	Treating the formula as self-explanatory
Formula choice	Sample versus population assumptions and notation	Using n when n-1 is required or vice versa
Interpretation	Whether the result indicates concentration, spread, or risk	Calling a large value good or bad without context

Frequently Asked Questions

How should I interpret a high standard deviation?

A high standard deviation means the observations are spread farther from the mean on average. Whether that spread is acceptable depends on the context: wide dispersion might signal risk in finance, instability in manufacturing, or genuine natural variation in scientific data.

Why do some articles mention n while others mention n-1?

The denominator reflects the difference between population and sample formulas. Population variance and population standard deviation use N because the full dataset is known. Sample variance and sample standard deviation often use n-1 because Bessel’s correction reduces bias when estimating population spread from a sample.

What is a statistical interpretation guide?

A statistical interpretation guide is a page that moves beyond arithmetic and explains meaning. It tells you what a metric is, when the formula applies, and how to describe the result in plain English without overstating certainty.

Can I cite this article in a report?

You should cite the underlying authoritative reference for formal work whenever possible. This page is best used as an explanatory bridge that helps you understand the concept before quoting the original standard or handbook.

Why include direct citations on every article page?

Direct citations give readers a route to verify the definition, notation, and assumptions. That improves trust and reduces the chance that a simplified explanation is mistaken for the entire technical standard.

Authoritative References

These sources define the concepts referenced most often across our articles. Bessel's correction is a sample adjustment, variance is a squared measure of spread, and standard deviation is the square root of variance expressed in the same units as the data.