불확실성 정량화 개요¶

불확실성 정량화(UQ)는 모델 예측의 신뢰도를 추정하는 분야로, 안전이 중요한 의사결정(의료, 자율주행, 금융)에서 필수적. "모델이 무엇을 모르는지 아는 것"이 핵심.

이론적 기초¶

불확실성의 종류¶

종류	다른 이름	원인	감소 방법	예시
Aleatoric	Data uncertainty, Irreducible	데이터 내재 노이즈	감소 불가 (본질적)	센서 노이즈, 측정 오차
Epistemic	Model uncertainty, Reducible	모델 지식 부족	더 많은 데이터	학습 데이터 외 영역

수학적 표현:

전체 예측 불확실성:

\[\text{Var}[Y|X=x] = \underbrace{\mathbb{E}[\text{Var}[Y|X, \theta]]}_{\text{Aleatoric}} + \underbrace{\text{Var}[\mathbb{E}[Y|X, \theta]]}_{\text{Epistemic}}\]

왜 중요한가?¶

신뢰도 기반 의사결정: 불확실성이 높으면 전문가에게 위임
능동 학습 (Active Learning): 불확실한 샘플 우선 레이블링
이상 탐지: 학습 분포 외 데이터 식별
안전한 AI: 자율주행, 의료 AI의 필수 요소
모델 개선: 불확실한 영역 식별하여 데이터 수집

참고 논문: - Der Kiureghian, A. & Ditlevsen, O. (2009). "Aleatory or Epistemic? Does It Matter?". Structural Safety. - Hullermeier, E. & Waegeman, W. (2021). "Aleatoric and Epistemic Uncertainty with Random Forests". NeurIPS.

알고리즘 분류 체계¶

Uncertainty Quantification
├── Bayesian Methods
│   ├── Bayesian Neural Networks (BNN)
│   ├── Variational Inference (VI)
│   ├── Monte Carlo Dropout
│   ├── Stochastic Weight Averaging Gaussian (SWAG)
│   └── Laplace Approximation
├── Ensemble Methods
│   ├── Deep Ensembles
│   ├── Snapshot Ensembles
│   └── Batch Ensembles
├── Deterministic Methods
│   ├── Evidential Deep Learning
│   ├── Distance-aware Methods (DUQ, SNGP)
│   └── Heteroscedastic Neural Networks
├── Post-hoc Calibration
│   ├── Temperature Scaling
│   ├── Platt Scaling
│   ├── Isotonic Regression
│   └── Histogram Binning
└── Distribution-free Methods
    ├── Conformal Prediction
    ├── MAPIE (Model Agnostic)
    └── Jackknife+

Bayesian Methods¶

Bayesian Neural Networks (BNN)¶

가중치에 대한 사후 분포를 학습:

Prior: $p(\theta) = \mathcal{N}(0, \sigma_p^2 I)$

Posterior (Bayes' rule):

\[p(\theta | \mathcal{D}) = \frac{p(\mathcal{D} | \theta) p(\theta)}{p(\mathcal{D})} \propto p(\mathcal{D} | \theta) p(\theta)\]

Predictive distribution:

\[p(y | x, \mathcal{D}) = \int p(y | x, \theta) p(\theta | \mathcal{D}) d\theta\]

적분이 계산 불가능하므로 근사 필요:

Variational Inference: $q(\theta) \approx p(\theta | \mathcal{D})$
MCMC: Sampling from posterior
Monte Carlo Dropout: Dropout as approximate inference

참고 논문: - MacKay, D.J.C. (1992). "A Practical Bayesian Framework for Backpropagation Networks". Neural Computation. - Neal, R.M. (1996). "Bayesian Learning for Neural Networks". Springer.

Variational Inference¶

ELBO (Evidence Lower Bound) 최대화:

\[\mathcal{L}(\phi) = \mathbb{E}_{q_\phi(\theta)}[\log p(\mathcal{D}|\theta)] - D_{KL}(q_\phi(\theta) \| p(\theta))\]

Reparameterization trick (for Gaussian):

\[\theta = \mu + \sigma \odot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)\]

Bayes by Backprop:

가중치를 $\mathcal{N}(\mu, \sigma^2)$로 모델링하고 gradient descent로 $\mu, \sigma$ 학습.

참고 논문: - Blundell, C. et al. (2015). "Weight Uncertainty in Neural Networks". ICML. - Kingma, D.P. & Welling, M. (2014). "Auto-Encoding Variational Bayes". ICLR.

Monte Carlo Dropout¶

학습된 Dropout을 추론 시에도 활성화:

\[p(y|x, \mathcal{D}) \approx \frac{1}{T}\sum_{t=1}^{T} p(y|x, \hat{\theta}_t)\]

여기서 $\hat{\theta}_t$는 $t$번째 dropout 마스크로 얻은 가중치.

불확실성 추정:

Epistemic (predictive entropy): $$H[y|x] = -\sum_c \bar{p}_c \log \bar{p}_c$$
Aleatoric (expected entropy): $$\mathbb{E}[H[y|x, \theta]] = -\frac{1}{T}\sum_t \sum_c p_{t,c} \log p_{t,c}$$
Epistemic = Total - Aleatoric (Mutual Information)

구현:

import torch
import torch.nn as nn

def enable_dropout(model):
    """Enable dropout during inference"""
    for m in model.modules():
        if isinstance(m, nn.Dropout):
            m.train()

def mc_dropout_predict(model, x, n_samples=50):
    """Monte Carlo Dropout prediction"""
    model.eval()
    enable_dropout(model)

    predictions = []
    with torch.no_grad():
        for _ in range(n_samples):
            pred = torch.softmax(model(x), dim=1)
            predictions.append(pred)

    predictions = torch.stack(predictions)  # (n_samples, batch, classes)

    # Mean prediction
    mean_pred = predictions.mean(dim=0)

    # Epistemic uncertainty (variance of predictions)
    epistemic = predictions.var(dim=0).sum(dim=1)

    # Aleatoric uncertainty (expected entropy)
    aleatoric = -(predictions * torch.log(predictions + 1e-10)).sum(dim=2).mean(dim=0)

    # Total uncertainty (entropy of mean)
    total = -(mean_pred * torch.log(mean_pred + 1e-10)).sum(dim=1)

    return mean_pred, epistemic, aleatoric, total

# 사용 예시
mean_pred, epistemic, aleatoric, total = mc_dropout_predict(model, x_test, n_samples=100)

참고 논문: - Gal, Y. & Ghahramani, Z. (2016). "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning". ICML.

SWAG (Stochastic Weight Averaging Gaussian)¶

SGD 궤적에서 가중치 분포 추정:

\[q(\theta) = \mathcal{N}(\theta_{SWA}, \frac{1}{2}(\Sigma_{diag} + \Sigma_{low-rank}))\]

알고리즘: 1. 모델을 수렴 근처까지 학습 2. Cyclic learning rate로 추가 학습하며 가중치 수집 3. 수집된 가중치로 평균과 공분산 추정 4. 추론 시 가중치 샘플링하여 예측

참고 논문: - Maddox, W.J. et al. (2019). "A Simple Baseline for Bayesian Inference in Deep Learning". NeurIPS.

Ensemble Methods¶

Deep Ensembles¶

독립적으로 학습된 여러 모델의 앙상블:

\[p(y|x) \approx \frac{1}{M}\sum_{m=1}^{M} p_m(y|x)\]

불확실성:

Epistemic: 앙상블 멤버 간 예측 분산
Aleatoric: 각 멤버의 예측 엔트로피 평균

장점: - 간단하고 병렬화 가능 - 성능과 불확실성 추정 모두 우수 - Bayesian 방법보다 종종 더 좋은 calibration

단점: - M배의 계산/메모리 비용 - 다양성 보장 어려움

class DeepEnsemble:
    def __init__(self, model_class, n_models=5, **model_kwargs):
        self.models = [model_class(**model_kwargs) for _ in range(n_models)]

    def fit(self, X, y, **fit_kwargs):
        for model in self.models:
            # 각 모델은 다른 random seed로 초기화됨
            model.fit(X, y, **fit_kwargs)

    def predict_with_uncertainty(self, X):
        predictions = np.array([model.predict_proba(X) for model in self.models])

        # Mean prediction
        mean_pred = predictions.mean(axis=0)

        # Epistemic uncertainty (disagreement)
        epistemic = predictions.var(axis=0).sum(axis=1)

        # Total uncertainty
        total = -(mean_pred * np.log(mean_pred + 1e-10)).sum(axis=1)

        return mean_pred, epistemic, total

참고 논문: - Lakshminarayanan, B. et al. (2017). "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles". NeurIPS.

Batch Ensembles¶

가중치 공유로 효율적인 앙상블:

\[W_i = W_{shared} \odot (r_i s_i^T)\]

여기서 $r_i, s_i$는 앙상블 멤버별 rank-1 벡터.

장점: 단일 모델 대비 약간의 추가 비용으로 앙상블 효과

참고 논문: - Wen, Y. et al. (2020). "BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning". ICLR.

Deterministic Methods¶

Evidential Deep Learning¶

Dirichlet 분포를 직접 예측하여 불확실성 모델링:

분류: 출력이 Dirichlet 파라미터 $\alpha_k$

\[Dir(p | \alpha) = \frac{\Gamma(\sum_k \alpha_k)}{\prod_k \Gamma(\alpha_k)} \prod_k p_k^{\alpha_k - 1}\]

불확실성: - Evidence: $S = \sum_k \alpha_k$ - Uncertainty: $u = K / S$ (K: 클래스 수)

참고 논문: - Sensoy, M. et al. (2018). "Evidential Deep Learning to Quantify Classification Uncertainty". NeurIPS.

SNGP (Spectral-normalized Neural Gaussian Process)¶

Distance-aware 예측을 위해: 1. Spectral normalization으로 bi-Lipschitz 특성 보장 2. 마지막 층을 GP (Random Feature approximation)로 대체

장점: 학습 분포에서 먼 데이터에 높은 불확실성

참고 논문: - Liu, J. et al. (2020). "Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness". NeurIPS.

Calibration (보정)¶

개념¶

Calibrated: 모델이 80% 확률로 예측하면 실제로 80%가 맞아야 함.

\[P(Y=1 | \hat{p}(X) = p) = p, \quad \forall p \in [0, 1]\]

Expected Calibration Error (ECE)¶

\[ECE = \sum_{m=1}^{M} \frac{|B_m|}{n} |acc(B_m) - conf(B_m)|\]

여기서 $B_m$은 신뢰도 구간 bin, $acc$는 정확도, $conf$는 평균 신뢰도.

Temperature Scaling¶

가장 간단하고 효과적인 post-hoc calibration:

\[\hat{p}_i = \text{softmax}(z_i / T)\]

검증 세트에서 NLL을 최소화하는 $T$ 탐색.

import torch
import torch.nn as nn
import torch.optim as optim

class TemperatureScaling(nn.Module):
    def __init__(self):
        super().__init__()
        self.temperature = nn.Parameter(torch.ones(1) * 1.5)

    def forward(self, logits):
        return logits / self.temperature

def calibrate(model, val_loader, max_iter=50):
    """Temperature scaling calibration"""
    temperature_model = TemperatureScaling().cuda()
    nll_criterion = nn.CrossEntropyLoss()
    optimizer = optim.LBFGS([temperature_model.temperature], lr=0.01, max_iter=max_iter)

    # Collect logits
    logits_list, labels_list = [], []
    model.eval()
    with torch.no_grad():
        for x, y in val_loader:
            logits_list.append(model(x.cuda()))
            labels_list.append(y.cuda())

    logits = torch.cat(logits_list)
    labels = torch.cat(labels_list)

    def eval():
        optimizer.zero_grad()
        loss = nll_criterion(temperature_model(logits), labels)
        loss.backward()
        return loss

    optimizer.step(eval)

    return temperature_model.temperature.item()

참고 논문: - Guo, C. et al. (2017). "On Calibration of Modern Neural Networks". ICML.

Conformal Prediction¶

개념¶

분포 가정 없이 유한 샘플에서 유효한 예측 구간 제공:

\[P(Y_{n+1} \in C(X_{n+1})) \geq 1 - \alpha\]

핵심 가정: Exchangeability (i.i.d.보다 약한 조건)

Split Conformal Prediction¶

알고리즘: 1. 학습 데이터를 proper training set과 calibration set으로 분할 2. Training set으로 모델 학습 3. Calibration set에서 nonconformity score 계산: $$s_i = |y_i - \hat{f}(x_i)|$$ (회귀) $$s_i = 1 - \hat{f}(x_i)_{y_i}$$ (분류) 4. $\hat{q}$를 $(1-\alpha)(1+1/n)$ quantile로 설정 5. 새 데이터에 대한 예측 구간: $$C(x_{new}) = [\hat{f}(x_{new}) - \hat{q}, \hat{f}(x_{new}) + \hat{q}]$$

Conformalized Quantile Regression (CQR)¶

분포에 적응적인 구간:

Quantile regression으로 $\hat{q}_{\alpha/2}(x)$, $\hat{q}_{1-\alpha/2}(x)$ 학습
Calibration에서 conformity score: $$s_i = \max(\hat{q}_{\alpha/2}(x_i) - y_i, y_i - \hat{q}_{1-\alpha/2}(x_i))$$
예측 구간: $$C(x) = [\hat{q}_{\alpha/2}(x) - \hat{q}, \hat{q}_{1-\alpha/2}(x) + \hat{q}]$$

장점: 이분산성(heteroscedasticity) 처리

from mapie.regression import MapieRegressor
from mapie.classification import MapieClassifier

# 회귀 예시
mapie_reg = MapieRegressor(
    estimator=base_model,
    method='plus',  # jackknife+
    cv=5
)
mapie_reg.fit(X_train, y_train)
y_pred, y_pis = mapie_reg.predict(X_test, alpha=0.1)  # 90% 구간

# 분류 예시
mapie_clf = MapieClassifier(
    estimator=base_classifier,
    method='lac',  # Least Ambiguous set-valued Classifier
    cv='prefit'
)
mapie_clf.fit(X_calib, y_calib)
y_pred, y_sets = mapie_clf.predict(X_test, alpha=0.1)

참고 논문: - Vovk, V. et al. (2005). "Algorithmic Learning in a Random World". Springer. - Romano, Y. et al. (2019). "Conformalized Quantile Regression". NeurIPS. - Angelopoulos, A.N. & Bates, S. (2021). "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification". arXiv.

평가 지표¶

Calibration Metrics¶

지표	수식	해석
ECE	$\sum_m \frac{	B_m
MCE	$\max_m	acc_m - conf_m
Brier Score	$\frac{1}{n}\sum_i (p_i - y_i)^2$	낮을수록 좋음, calibration + sharpness
NLL	$-\frac{1}{n}\sum_i \log p(y_i)$	낮을수록 좋음

Uncertainty Quality Metrics¶

지표	목적	계산
AUROC (OOD detection)	분포 외 데이터 탐지	불확실성 vs OOD label
AUPR	정밀도-재현율	불확실성 vs 오분류
Selective Prediction	신뢰도 기반 거부	Risk-coverage curve
Spearman's Correlation	불확실성 vs 오류 상관

Coverage Metrics (Conformal)¶

지표	의미
Marginal Coverage	$\frac{1}{n}\sum_i \mathbf{1}[y_i \in C(x_i)]$
Conditional Coverage	모든 $x$에서 $P(Y \in C(X)
Average Interval Width	$\frac{1}{n}\sum_i

실무 적용 가이드¶

방법 선택 가이드¶

상황	권장 방법
빠른 적용 필요	MC Dropout, Temperature Scaling
최고 성능 필요	Deep Ensembles
계산 자원 제한	MC Dropout, SWAG
엄격한 보장 필요	Conformal Prediction
OOD 탐지 중요	SNGP, Distance-aware methods

종합 파이프라인¶

import numpy as np
import torch
from sklearn.model_selection import train_test_split
from sklearn.metrics import brier_score_loss

class UncertaintyPipeline:
    def __init__(self, model, method='mc_dropout', n_samples=50):
        self.model = model
        self.method = method
        self.n_samples = n_samples
        self.temperature = 1.0

    def fit(self, X_train, y_train, X_calib=None, y_calib=None):
        """모델 학습 및 calibration"""
        # 모델 학습
        self.model.fit(X_train, y_train)

        # Calibration set이 주어지면 temperature scaling
        if X_calib is not None:
            self.temperature = self._find_temperature(X_calib, y_calib)

    def predict_with_uncertainty(self, X):
        """예측과 불확실성 반환"""
        if self.method == 'mc_dropout':
            return self._mc_dropout_predict(X)
        elif self.method == 'ensemble':
            return self._ensemble_predict(X)
        else:
            # 기본: softmax 확률의 엔트로피
            probs = self.model.predict_proba(X)
            probs = self._temperature_scale(probs)
            uncertainty = -(probs * np.log(probs + 1e-10)).sum(axis=1)
            return probs, uncertainty

    def _temperature_scale(self, probs):
        """Temperature scaling 적용"""
        logits = np.log(probs + 1e-10)
        scaled_logits = logits / self.temperature
        exp_logits = np.exp(scaled_logits - scaled_logits.max(axis=1, keepdims=True))
        return exp_logits / exp_logits.sum(axis=1, keepdims=True)

    def _find_temperature(self, X_calib, y_calib, lr=0.01, max_iter=100):
        """최적 temperature 탐색"""
        logits = np.log(self.model.predict_proba(X_calib) + 1e-10)

        best_temp = 1.0
        best_nll = float('inf')

        for temp in np.linspace(0.5, 5.0, 50):
            probs = self._softmax(logits / temp)
            nll = -np.log(probs[np.arange(len(y_calib)), y_calib] + 1e-10).mean()
            if nll < best_nll:
                best_nll = nll
                best_temp = temp

        return best_temp

    def _softmax(self, x):
        exp_x = np.exp(x - x.max(axis=1, keepdims=True))
        return exp_x / exp_x.sum(axis=1, keepdims=True)

    def evaluate_calibration(self, X_test, y_test, n_bins=10):
        """Calibration 평가"""
        probs, _ = self.predict_with_uncertainty(X_test)
        confidences = probs.max(axis=1)
        predictions = probs.argmax(axis=1)
        accuracies = (predictions == y_test)

        # ECE 계산
        ece = 0
        for i in range(n_bins):
            bin_lower = i / n_bins
            bin_upper = (i + 1) / n_bins
            in_bin = (confidences > bin_lower) & (confidences <= bin_upper)

            if in_bin.sum() > 0:
                bin_acc = accuracies[in_bin].mean()
                bin_conf = confidences[in_bin].mean()
                ece += (in_bin.sum() / len(y_test)) * abs(bin_acc - bin_conf)

        # Brier score
        y_onehot = np.zeros((len(y_test), probs.shape[1]))
        y_onehot[np.arange(len(y_test)), y_test] = 1
        brier = ((probs - y_onehot) ** 2).sum(axis=1).mean()

        return {'ece': ece, 'brier': brier}

하위 문서¶

주제	설명	링크
Conformal Prediction	분포 무관 예측 구간	conformal-prediction.md

참고 문헌¶

교과서 및 리뷰¶

Gawlikowski, J. et al. (2021). "A Survey of Uncertainty in Deep Neural Networks". arXiv.
Abdar, M. et al. (2021). "A Review of Uncertainty Quantification in Deep Learning". Information Fusion.

핵심 논문¶

Bayesian Methods: - Gal, Y. & Ghahramani, Z. (2016). "Dropout as a Bayesian Approximation". ICML. - Blundell, C. et al. (2015). "Weight Uncertainty in Neural Networks". ICML. - Maddox, W.J. et al. (2019). "SWAG". NeurIPS.

Ensembles: - Lakshminarayanan, B. et al. (2017). "Deep Ensembles". NeurIPS.

Calibration: - Guo, C. et al. (2017). "On Calibration of Modern Neural Networks". ICML.

Conformal Prediction: - Vovk, V. et al. (2005). "Algorithmic Learning in a Random World". Springer. - Romano, Y. et al. (2019). "Conformalized Quantile Regression". NeurIPS. - Angelopoulos, A.N. & Bates, S. (2021). "Conformal Prediction Tutorial". arXiv.

라이브러리¶

Uncertainty Toolbox: https://github.com/uncertainty-toolbox/uncertainty-toolbox
MAPIE: https://github.com/scikit-learn-contrib/MAPIE
Uncertainty Baselines: https://github.com/google/uncertainty-baselines

지표	의미
Marginal Coverage	\(\frac{1}{n}\sum_i \mathbf{1}[y_i \in C(x_i)]\)
Conditional Coverage	모든 \(x\)에서 $P(Y \in C(X)
Average Interval Width	$\frac{1}{n}\sum_i