Uncertainty Quantification (불확실성 정량화)¶

메타 정보¶

항목	내용
분류	Bayesian Methods / Model Calibration / Trustworthy AI
핵심 논문	"Weight Uncertainty in Neural Networks" (ICML 2015), "Dropout as a Bayesian Approximation" (ICML 2016), "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles" (NeurIPS 2017)
주요 저자	Yarin Gal, Zoubin Ghahramani (MC Dropout); Balaji Lakshminarayanan et al. (Deep Ensembles); Charles Blundell et al. (Bayes by Backprop)
핵심 개념	모델 예측의 확신도를 정량적으로 추정하여 신뢰할 수 있는 의사결정 지원
관련 분야	Bayesian Deep Learning, Calibration, OOD Detection, Safety-Critical AI

정의¶

Uncertainty Quantification (UQ)은 모델 예측이 얼마나 확실한지를 정량적으로 추정하는 기법이다. 단일 점 추정(point estimate) 대신 예측의 불확실성 분포를 제공하여, 의료 진단, 자율주행, 금융 등 고위험 의사결정에서 신뢰성을 확보한다.

점 추정 vs 불확실성 추정:

점 추정:     P(고양이) = 0.92  --> "고양이다"
불확실성:    P(고양이) = 0.92 +/- 0.15  --> "아마 고양이인데 확신은 부족"

실제 출력 분포:

  높은 확신:              낮은 확신:
  |    ##                 |  ####
  |   ####                | ######
  |  ######               |########
  | ########              |########
  +-----------> y         +-----------> y
      mu                       mu
  (분산 작음)             (분산 큼)

불확실성의 종류¶

Aleatoric Uncertainty (데이터 불확실성)¶

데이터 자체의 노이즈나 본질적 무작위성에서 기인하는 환원 불가능한(irreducible) 불확실성.

Aleatoric Uncertainty:

  원인:
  - 센서 노이즈 (측정 오차)
  - 레이블 모호성 (클래스 경계 샘플)
  - 본질적 확률성 (주사위, 날씨)

  특성:
  - 데이터를 더 모아도 줄어들지 않음
  - 입력에 따라 달라질 수 있음 (heteroscedastic)

  예시: 흐릿한 이미지
  +--------+
  | ?????? |  --> P(개) = 0.4, P(고양이) = 0.35, P(곰) = 0.25
  | ?????? |      데이터 자체가 모호 --> aleatoric uncertainty 높음
  +--------+

Epistemic Uncertainty (모델 불확실성)¶

학습 데이터 부족이나 모델의 지식 한계로 인한 환원 가능한(reducible) 불확실성.

Epistemic Uncertainty:

  원인:
  - 학습 데이터 부족 (특정 영역에 데이터가 없음)
  - 모델 표현력 한계
  - Out-of-Distribution (OOD) 입력

  특성:
  - 더 많은 데이터로 줄일 수 있음
  - 학습 데이터 분포 밖에서 높아짐

  예시: 학습 데이터에 없는 영역

  학습 데이터 분포:        예측 시:
  . . . . . . .            . . . . . . .
  . . . . . . .            . . . . . . .   <- 여기는 확신 높음
  . . . . . . .            . . . . . . .
                    ?      <- 여기는 데이터 없음 --> epistemic 높음

분리의 중요성¶

불확실성 분리가 중요한 이유:

  총 불확실성 = Aleatoric + Epistemic

  Aleatoric 높음:
  -> 데이터 품질 개선 필요 (더 좋은 센서, 명확한 레이블링)
  -> 추가 데이터 수집해도 해결 안 됨

  Epistemic 높음:
  -> 해당 영역의 데이터 추가 수집
  -> 모델 용량 증가
  -> Active Learning으로 효율적 레이블링

  의사결정:
  Aleatoric만 높음 --> 예측 자체를 신중하게 (안전 마진 확보)
  Epistemic만 높음 --> "잘 모르겠다"고 보고 (사람에게 위임)
  둘 다 높음       --> 판단 보류

주요 방법론¶

1. Bayesian Neural Networks (BNN)¶

가중치를 단일 값이 아닌 확률 분포로 모델링한다.

기존 신경망:
  w = 고정된 값 (e.g., w = 1.3)
  y = f(x; w)  -- 하나의 예측값

베이지안 신경망:
  w ~ q(w)  (e.g., w ~ N(1.3, 0.2))
  p(y|x) = integral f(x; w) * q(w) dw  -- 예측 분포

  posterior: p(w|D) proportional to p(D|w) * p(w)

  문제: posterior p(w|D)를 정확히 계산하기 어려움
  -> 근사 추론(Approximate Inference) 필요

Bayes by Backprop¶

Blundell et al. (ICML 2015). Variational Inference로 가중치의 사후 분포를 근사.

Bayes by Backprop:

  각 가중치 w_i를 N(mu_i, sigma_i^2)로 파라미터화

  학습할 파라미터: mu, rho  (sigma = log(1 + exp(rho)))

  손실 함수 (ELBO):
  L = KL[q(w|theta) || p(w)] - E_{q(w)}[log p(D|w)]
    = Complexity Cost  -  Data Fit

  Reparameterization Trick:
  w = mu + sigma * epsilon,  epsilon ~ N(0, 1)
  -> 미분 가능한 샘플링

import torch
import torch.nn as nn
import torch.nn.functional as F
import math

class BayesianLinear(nn.Module):
    """Variational Bayesian Linear Layer"""
    def __init__(self, in_features, out_features):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features

        # 가중치의 평균과 분산 파라미터
        self.weight_mu = nn.Parameter(torch.Tensor(out_features, in_features))
        self.weight_rho = nn.Parameter(torch.Tensor(out_features, in_features))
        self.bias_mu = nn.Parameter(torch.Tensor(out_features))
        self.bias_rho = nn.Parameter(torch.Tensor(out_features))

        # 사전 분포
        self.prior_mu = 0.0
        self.prior_sigma = 1.0

        self.reset_parameters()

    def reset_parameters(self):
        nn.init.kaiming_uniform_(self.weight_mu, a=math.sqrt(5))
        nn.init.constant_(self.weight_rho, -3)  # sigma ~ 0.05
        nn.init.uniform_(self.bias_mu, -0.1, 0.1)
        nn.init.constant_(self.bias_rho, -3)

    def forward(self, x):
        weight_sigma = torch.log1p(torch.exp(self.weight_rho))
        bias_sigma = torch.log1p(torch.exp(self.bias_rho))

        # Reparameterization trick
        weight = self.weight_mu + weight_sigma * torch.randn_like(weight_sigma)
        bias = self.bias_mu + bias_sigma * torch.randn_like(bias_sigma)

        return F.linear(x, weight, bias)

    def kl_divergence(self):
        """KL(q(w) || p(w))"""
        weight_sigma = torch.log1p(torch.exp(self.weight_rho))
        bias_sigma = torch.log1p(torch.exp(self.bias_rho))

        kl_weight = self._kl_normal(self.weight_mu, weight_sigma)
        kl_bias = self._kl_normal(self.bias_mu, bias_sigma)

        return kl_weight + kl_bias

    def _kl_normal(self, mu, sigma):
        return 0.5 * torch.sum(
            (sigma / self.prior_sigma).pow(2)
            + ((self.prior_mu - mu) / self.prior_sigma).pow(2)
            - 1
            + 2 * math.log(self.prior_sigma) - 2 * torch.log(sigma)
        )

2. MC Dropout¶

Gal & Ghahramani (ICML 2016). 추론 시에도 Dropout을 활성화하여 여러 번 샘플링하면 Bayesian 근사가 된다.

MC Dropout 원리:

학습 시:   Dropout ON  (일반적)
추론 시:   Dropout ON  (핵심 -- 보통은 OFF)

T회 forward pass 수행:
  y_1 = f(x; W * mask_1)
  y_2 = f(x; W * mask_2)
  ...
  y_T = f(x; W * mask_T)

예측 평균: y_mean = (1/T) * sum(y_t)
예측 분산: y_var  = (1/T) * sum((y_t - y_mean)^2)
                  ~ epistemic uncertainty

  T가 클수록 추정이 안정적 (보통 T=10~50)

import torch
import torch.nn as nn

class MCDropoutModel(nn.Module):
    def __init__(self, input_dim, hidden_dim=128, output_dim=10, drop_rate=0.1):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(drop_rate),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(drop_rate),
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        return self.net(x)

    def predict_with_uncertainty(self, x, n_samples=50):
        """MC Dropout으로 불확실성 추정"""
        self.train()  # Dropout 활성화 유지

        predictions = torch.stack([
            torch.softmax(self.forward(x), dim=-1)
            for _ in range(n_samples)
        ])  # (n_samples, batch, n_classes)

        # 예측 평균 및 불확실성
        mean_pred = predictions.mean(dim=0)

        # Epistemic uncertainty (model uncertainty)
        epistemic = predictions.var(dim=0).sum(dim=-1)

        # Aleatoric uncertainty (data uncertainty)
        # 평균 예측의 엔트로피
        aleatoric = -(mean_pred * torch.log(mean_pred + 1e-10)).sum(dim=-1)

        # Total uncertainty: predictive entropy
        total = aleatoric  # 간략화

        return mean_pred, epistemic, aleatoric

3. Deep Ensembles¶

Lakshminarayanan et al. (NeurIPS 2017). 서로 다른 초기화로 M개의 모델을 독립 학습하고 예측을 결합한다.

Deep Ensembles:

  Model_1 (init_1, shuffle_1) --> y_1, sigma_1^2
  Model_2 (init_2, shuffle_2) --> y_2, sigma_2^2
  ...
  Model_M (init_M, shuffle_M) --> y_M, sigma_M^2

회귀 문제:
  각 모델이 (mu, sigma^2) 출력 -- Gaussian NLL loss로 학습

  앙상블 예측:
  mu* = (1/M) * sum(mu_m)
  sigma*^2 = (1/M) * sum(sigma_m^2 + mu_m^2) - mu*^2

  분산 분해:
  sigma*^2 = (1/M) * sum(sigma_m^2)    -- Aleatoric (평균 분산)
           + (1/M) * sum((mu_m - mu*)^2) -- Epistemic (예측 불일치)

장점	한계
구현이 단순	M배의 학습/추론 비용
우수한 불확실성 추정	메모리 M배
병렬 학습 가능	모델 수 M 선택 기준 모호
BNN보다 실용적	다양성 보장 어려움

import torch
import torch.nn as nn

class GaussianNLLNet(nn.Module):
    """Gaussian NLL을 출력하는 네트워크 (Deep Ensemble 구성요소)"""
    def __init__(self, input_dim, hidden_dim=128):
        super().__init__()
        self.shared = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
        )
        self.mu_head = nn.Linear(hidden_dim, 1)
        self.logvar_head = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        h = self.shared(x)
        mu = self.mu_head(h)
        logvar = self.logvar_head(h)
        return mu, logvar

class DeepEnsemble:
    def __init__(self, input_dim, n_models=5, hidden_dim=128, lr=1e-3):
        self.models = [
            GaussianNLLNet(input_dim, hidden_dim) for _ in range(n_models)
        ]
        self.optimizers = [
            torch.optim.Adam(m.parameters(), lr=lr) for m in self.models
        ]

    def train_step(self, x, y):
        losses = []
        for model, opt in zip(self.models, self.optimizers):
            mu, logvar = model(x)
            # Gaussian NLL loss
            loss = 0.5 * (logvar + (y - mu).pow(2) / logvar.exp()).mean()
            opt.zero_grad()
            loss.backward()
            opt.step()
            losses.append(loss.item())
        return sum(losses) / len(losses)

    def predict(self, x):
        mus, vars_ = [], []
        for model in self.models:
            model.eval()
            with torch.no_grad():
                mu, logvar = model(x)
                mus.append(mu)
                vars_.append(logvar.exp())

        mus = torch.stack(mus)      # (M, batch, 1)
        vars_ = torch.stack(vars_)  # (M, batch, 1)

        # 앙상블 예측
        mean_pred = mus.mean(dim=0)
        aleatoric = vars_.mean(dim=0)
        epistemic = mus.var(dim=0)
        total_var = aleatoric + epistemic

        return mean_pred, total_var, aleatoric, epistemic

4. Evidential Deep Learning¶

Sensoy et al. (NeurIPS 2018). 단일 forward pass로 불확실성을 추정하는 방법. Dirichlet 분포의 파라미터를 직접 출력한다.

Evidential Deep Learning:

기존: softmax --> 클래스 확률 p
제안: 네트워크가 Dirichlet 분포의 파라미터 alpha 출력

  f(x) --> alpha = (alpha_1, ..., alpha_K)  (K: 클래스 수)

  Dirichlet(p | alpha):
    alpha_k > 0, S = sum(alpha_k)

  예측:  E[p_k] = alpha_k / S

  불확실성:
    Vacuity (epistemic):  K / S  (총 evidence가 적을수록 높음)
    Dissonance:           클래스 간 evidence 충돌 정도

  핵심 장점: 단일 forward pass, 앙상블 불필요

5. Calibration (보정)¶

모델의 예측 확률이 실제 정확도와 일치하도록 보정하는 기법.

Calibration이란:

잘 보정된 모델:
  "80% 확신" --> 실제로 80%가 맞음
  "95% 확신" --> 실제로 95%가 맞음

현실의 신경망 (과신 경향):
  "95% 확신" --> 실제로 70%만 맞음  (overconfident)

Reliability Diagram:

  완벽 보정:            과신 모델:
  1.0|      /           1.0|      /
     |    /                |    /
     |  /                  |  /---.   <- 실제 정확도가
     |/                    |/      .     예측 확률보다 낮음
  0  +---------> 1     0  +---------> 1
   예측 확률             예측 확률

Expected Calibration Error (ECE)¶

ECE 계산:

1. 예측 확률을 B개 bin으로 나눔
2. 각 bin에서 평균 confidence와 평균 accuracy 비교

ECE = sum_{b=1}^{B} (|B_b| / n) * |acc(B_b) - conf(B_b)|

예시 (B=5):
  Bin [0.0-0.2]: conf=0.10, acc=0.12  -> |0.02| * 0.15
  Bin [0.2-0.4]: conf=0.30, acc=0.28  -> |0.02| * 0.20
  Bin [0.4-0.6]: conf=0.50, acc=0.45  -> |0.05| * 0.25
  Bin [0.6-0.8]: conf=0.72, acc=0.60  -> |0.12| * 0.20
  Bin [0.8-1.0]: conf=0.92, acc=0.78  -> |0.14| * 0.20
  ECE ~ 0.07

Temperature Scaling¶

Guo et al. (ICML 2017). 가장 단순하고 효과적인 post-hoc 보정 방법.

Temperature Scaling:

  원래 logit: z
  보정된 확률: softmax(z / T)

  T > 1: 확률 분포를 평활화 (과신 완화)
  T < 1: 확률 분포를 날카롭게 (과소신뢰 보정)
  T = 1: 변경 없음

  T는 validation set에서 NLL 최소화로 학습
  -> 단일 스칼라 파라미터만 학습하면 됨

import torch
import torch.nn as nn
import numpy as np

class TemperatureScaling(nn.Module):
    def __init__(self):
        super().__init__()
        self.temperature = nn.Parameter(torch.ones(1) * 1.5)

    def forward(self, logits):
        return logits / self.temperature

    def fit(self, logits, labels, lr=0.01, max_iter=50):
        """Validation set으로 temperature 학습"""
        nll_criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.LBFGS([self.temperature], lr=lr, max_iter=max_iter)

        def closure():
            optimizer.zero_grad()
            loss = nll_criterion(self.forward(logits), labels)
            loss.backward()
            return loss

        optimizer.step(closure)
        return self.temperature.item()

def compute_ece(confidences, predictions, labels, n_bins=15):
    """Expected Calibration Error 계산"""
    bin_boundaries = np.linspace(0, 1, n_bins + 1)
    ece = 0.0

    for i in range(n_bins):
        mask = (confidences > bin_boundaries[i]) & (confidences <= bin_boundaries[i + 1])
        if mask.sum() == 0:
            continue

        bin_conf = confidences[mask].mean()
        bin_acc = (predictions[mask] == labels[mask]).float().mean()
        bin_size = mask.sum().item() / len(confidences)

        ece += bin_size * abs(bin_acc - bin_conf)

    return ece

6. Conformal Prediction과의 관계¶

Conformal Prediction은 distribution-free한 불확실성 정량화 방법으로, 빈도주의적 coverage 보장을 제공한다. UQ의 보완적 방법으로 활용된다.

UQ 방법들의 관계:

  Bayesian UQ (BNN, MC Dropout)
    --> 예측 분포의 파라미터 추정
    --> 분포 가정 필요

  Deep Ensembles
    --> 다수 모델의 예측 불일치로 추정
    --> 분포 가정 불필요하나 다수 모델 필요

  Evidential DL
    --> 단일 모델로 higher-order 불확실성
    --> 학습 불안정 가능

  Calibration
    --> 기존 예측 확률을 보정
    --> post-hoc 적용 가능

  Conformal Prediction
    --> 유한 샘플에서 coverage 보장
    --> 모델 구조에 무관

방법 선택 가이드¶

상황	추천 방법	이유
빠른 적용, 기존 모델 활용	MC Dropout	기존 Dropout 모델에 바로 적용
최고의 불확실성 추정 품질	Deep Ensembles	가장 견고한 성능, 구현 용이
계산 예산 제한	Evidential DL	단일 forward pass
과신(overconfidence) 보정	Temperature Scaling	매우 단순, post-hoc
이론적 보장 필요	Conformal Prediction	Distribution-free coverage
완전한 Bayesian 추론	BNN (Bayes by Backprop)	사전지식 반영 가능
대규모 모델 (LLM 등)	MC Dropout 또는 Last-layer BNN	전체 BNN은 비현실적

응용 분야¶

분야	활용	요구 수준
의료 진단	진단 불확실성 표시, 전문의 위임 결정	높음 (인명 관련)
자율주행	OOD 탐지, 안전 판단 보류	매우 높음
금융	리스크 모델링, 신뢰 구간 제공	높음
Active Learning	불확실한 샘플 우선 레이블링	중간
OOD Detection	학습 분포 밖 입력 탐지	높음
LLM Hallucination Detection	생성 텍스트의 신뢰도 추정	높음

실전 파이프라인¶

import torch
import torch.nn as nn
import numpy as np

# 통합 불확실성 추정 파이프라인
class UncertaintyPipeline:
    def __init__(self, model, method='mc_dropout', n_samples=30):
        self.model = model
        self.method = method
        self.n_samples = n_samples

    def predict(self, x):
        if self.method == 'mc_dropout':
            return self._mc_dropout(x)
        elif self.method == 'ensemble':
            return self._ensemble(x)

    def _mc_dropout(self, x):
        self.model.train()  # Dropout ON
        preds = []
        with torch.no_grad():
            for _ in range(self.n_samples):
                preds.append(torch.softmax(self.model(x), dim=-1))

        preds = torch.stack(preds)  # (T, batch, classes)
        mean = preds.mean(dim=0)

        # Predictive entropy (total uncertainty)
        predictive_entropy = -(mean * torch.log(mean + 1e-10)).sum(dim=-1)

        # Expected entropy (aleatoric)
        expected_entropy = -(preds * torch.log(preds + 1e-10)).sum(dim=-1).mean(dim=0)

        # Mutual information (epistemic) = predictive - expected
        mutual_info = predictive_entropy - expected_entropy

        return {
            'prediction': mean.argmax(dim=-1),
            'confidence': mean.max(dim=-1).values,
            'total_uncertainty': predictive_entropy,
            'aleatoric': expected_entropy,
            'epistemic': mutual_info,
        }

    def should_abstain(self, uncertainty_dict, threshold=0.5):
        """불확실성이 높으면 판단 보류"""
        return uncertainty_dict['total_uncertainty'] > threshold

# 사용 예시
pipeline = UncertaintyPipeline(model, method='mc_dropout', n_samples=50)
result = pipeline.predict(x_test)

# 불확실성 기반 의사결정
for i in range(len(x_test)):
    if pipeline.should_abstain(
        {k: v[i] for k, v in result.items()}, threshold=0.5
    ):
        print(f"Sample {i}: 판단 보류 (전문가 검토 필요)")
    else:
        print(f"Sample {i}: 예측 = {result['prediction'][i]}, "
              f"확신도 = {result['confidence'][i]:.3f}")

최근 동향 (2024-2025)¶

트렌드	설명
LLM 불확실성	생성 모델의 토큰/시퀀스 수준 불확실성 추정
Conformal + LLM	LLM 출력에 conformal prediction 적용, 신뢰 집합 생성
Hallucination Detection	불확실성 기반 LLM 환각 탐지
Efficient Ensembles	BatchEnsemble, Hypernetworks 등 비용 절감 앙상블
Uncertainty in Diffusion	생성 모델의 불확실성 (다양성 vs 품질 트레이드오프)
Last-Layer UQ	대규모 모델의 마지막 레이어만 BNN으로 처리
Calibration for Foundation Models	CLIP, GPT 등 대규모 모델의 보정 연구

참고 문헌¶

Gal, Y. & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML 2016.
Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. NeurIPS 2017.
Blundell, C., et al. (2015). Weight Uncertainty in Neural Networks. ICML 2015.
Sensoy, M., Kaplan, L., & Kandemir, M. (2018). Evidential Deep Learning to Quantify Classification Uncertainty. NeurIPS 2018.
Guo, C., et al. (2017). On Calibration of Modern Neural Networks. ICML 2017.
Abdar, M., et al. (2021). A Review of Uncertainty Quantification in Deep Learning. Information Fusion.
Hullermeier, E. & Waegeman, W. (2021). Aleatoric and Epistemic Uncertainty with Random Forests. Machine Learning.
Ovadia, Y., et al. (2019). Can You Trust Your Model's Uncertainty? NeurIPS 2019.