콘텐츠로 이동
Data Prep
상세

Conformal Prediction

Distribution-free uncertainty quantification framework that provides valid prediction sets with finite-sample coverage guarantees.


Meta

Item Value
Category Uncertainty Quantification
First Proposed Vovk et al. (2005)
Key Conferences NeurIPS 2024, ICML 2024, ICLR 2024
Key Surveys Zhou et al. (2024), Campos et al. (2024)

References

  • Vovk, Gammerman, Shafer. "Algorithmic Learning in a Random World" (2005)
  • Angelopoulos, Bates. "A Gentle Introduction to Conformal Prediction" (2023)
  • Zhou et al. "Conformal Prediction: A Data Perspective" arXiv:2410.06494 (2024)
  • Campos et al. "Conformal Prediction for NLP: A Survey" TACL (2024)

Core Concept

Conformal Prediction (CP)은 학습된 모델의 예측에 대해 통계적으로 유효한 불확실성 구간/집합을 제공하는 프레임워크다.

Key Properties

Property Description
Distribution-free 데이터 분포에 대한 가정 불필요
Model-agnostic 어떤 모델에도 적용 가능
Finite-sample valid 유한 샘플에서도 coverage guarantee
Post-hoc 모델 재학습 없이 적용 가능

Coverage Guarantee

사용자가 지정한 오류율 alpha에 대해:

P(Y_test in C_alpha(X_test)) >= 1 - alpha
  • alpha = 0.1 설정 시, 90% 이상의 확률로 true label이 prediction set에 포함
  • 이 보장은 marginal coverage로, 전체 데이터에 대한 평균적 보장

Methodology

Split Conformal Prediction

가장 널리 사용되는 변형. 계산 효율적이고 구현이 간단함.

Algorithm

Input: trained model f, calibration set D_cal, test input x_test, error rate alpha

1. Compute nonconformity scores for calibration set:
   s_i = s(x_i, y_i) for i = 1, ..., n

2. Compute quantile threshold:
   q_hat = Quantile(s_1, ..., s_n; (n+1)(1-alpha)/n)

3. Construct prediction set:
   C_alpha(x_test) = {y : s(x_test, y) <= q_hat}

Output: prediction set C_alpha(x_test)

Nonconformity Score

모델 예측과 실제 값의 "비적합도"를 측정하는 함수.

Task Type Common Scores
Classification s(x,y) = 1 - p(y\|x) (softmax probability)
Regression s(x,y) = \|y - f(x)\| (absolute residual)
Quantile Regression s(x,y) = max(q_lo - y, y - q_hi)

Variants

1. Adaptive Prediction Sets (APS)

Romano et al. (2020). 클래스 불균형 상황에서 개선된 성능.

s(x,y) = sum_{j: p(y_j|x) >= p(y|x)} p(y_j|x)

누적 확률 기반으로 score 계산. 더 작은 prediction set 생성.

2. Regularized Adaptive Prediction Sets (RAPS)

Angelopoulos et al. (2021). APS에 정규화 추가.

s(x,y) = sum_{j: pi_j >= pi_y} pi_j + lambda * (o(y) - k_reg)^+
  • lambda: 정규화 강도
  • k_reg: 허용 set 크기 임계값
  • 평균 set 크기 감소 효과

3. Conformalized Quantile Regression (CQR)

Romano et al. (2019). 회귀 문제에서 조건부 coverage 개선.

1. Train quantile regression model: q_lo(x), q_hi(x)
2. Score: s(x,y) = max(q_lo(x) - y, y - q_hi(x))
3. Adjusted interval: [q_lo(x) - q_hat, q_hi(x) + q_hat]

Heteroscedastic 데이터에서 효과적.

4. Mondrian Conformal Prediction

그룹별로 별도의 calibration 수행. 조건부 coverage 개선.

For each group g:
  q_hat_g = Quantile(scores in group g; ...)

C_alpha(x) uses q_hat_g where x belongs to group g

5. Online Conformal Prediction

Gibbs and Candes (2021). 시계열/스트리밍 데이터용.

alpha_t = alpha + gamma * (err_{t-1} - alpha)

실시간으로 alpha 조정하여 non-exchangeable 데이터 처리.


Python Implementation

Basic Split Conformal (Classification)

import numpy as np
from sklearn.model_selection import train_test_split

def split_conformal_classification(
    model, 
    X_cal, 
    y_cal, 
    X_test, 
    alpha=0.1
):
    """
    Split Conformal Prediction for classification.

    Args:
        model: Trained classifier with predict_proba method
        X_cal: Calibration features
        y_cal: Calibration labels
        X_test: Test features
        alpha: Target error rate (default 0.1 for 90% coverage)

    Returns:
        List of prediction sets for each test sample
    """
    n_cal = len(X_cal)
    n_classes = len(np.unique(y_cal))

    # Step 1: Compute calibration scores
    proba_cal = model.predict_proba(X_cal)
    scores_cal = 1 - proba_cal[np.arange(n_cal), y_cal]

    # Step 2: Compute quantile threshold
    q_level = np.ceil((n_cal + 1) * (1 - alpha)) / n_cal
    q_hat = np.quantile(scores_cal, q_level, method='higher')

    # Step 3: Construct prediction sets
    proba_test = model.predict_proba(X_test)
    prediction_sets = []

    for i in range(len(X_test)):
        pred_set = np.where(1 - proba_test[i] <= q_hat)[0]
        prediction_sets.append(pred_set.tolist())

    return prediction_sets, q_hat


# Example usage
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

# Generate data
X, y = make_classification(n_samples=2000, n_features=20, 
                           n_classes=5, n_informative=10,
                           random_state=42)

# Split: train / calibration / test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Train model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Apply conformal prediction
pred_sets, threshold = split_conformal_classification(
    clf, X_cal, y_cal, X_test, alpha=0.1
)

# Evaluate coverage
coverage = np.mean([y_test[i] in pred_sets[i] for i in range(len(y_test))])
avg_size = np.mean([len(s) for s in pred_sets])

print(f"Coverage: {coverage:.3f} (target: 0.90)")
print(f"Average set size: {avg_size:.2f}")

Conformalized Quantile Regression

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

def conformalized_quantile_regression(
    X_train, y_train,
    X_cal, y_cal,
    X_test,
    alpha=0.1
):
    """
    Conformalized Quantile Regression (CQR).

    Returns prediction intervals with coverage guarantee.
    """
    # Train quantile regressors
    q_lo = alpha / 2
    q_hi = 1 - alpha / 2

    model_lo = GradientBoostingRegressor(
        loss='quantile', alpha=q_lo, n_estimators=100
    )
    model_hi = GradientBoostingRegressor(
        loss='quantile', alpha=q_hi, n_estimators=100
    )

    model_lo.fit(X_train, y_train)
    model_hi.fit(X_train, y_train)

    # Calibration scores
    pred_lo_cal = model_lo.predict(X_cal)
    pred_hi_cal = model_hi.predict(X_cal)

    scores_cal = np.maximum(pred_lo_cal - y_cal, y_cal - pred_hi_cal)

    # Quantile threshold
    n_cal = len(X_cal)
    q_level = np.ceil((n_cal + 1) * (1 - alpha)) / n_cal
    q_hat = np.quantile(scores_cal, min(q_level, 1.0), method='higher')

    # Test predictions
    pred_lo_test = model_lo.predict(X_test)
    pred_hi_test = model_hi.predict(X_test)

    # Conformalized intervals
    intervals = np.column_stack([
        pred_lo_test - q_hat,
        pred_hi_test + q_hat
    ])

    return intervals, q_hat


# Example usage
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=2000, n_features=10, noise=20)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

intervals, q = conformalized_quantile_regression(
    X_train, y_train, X_cal, y_cal, X_test, alpha=0.1
)

# Evaluate
coverage = np.mean((y_test >= intervals[:, 0]) & (y_test <= intervals[:, 1]))
avg_width = np.mean(intervals[:, 1] - intervals[:, 0])

print(f"Coverage: {coverage:.3f} (target: 0.90)")
print(f"Average interval width: {avg_width:.2f}")

Using MAPIE Library

# pip install mapie

from mapie.classification import MapieClassifier
from mapie.regression import MapieQuantileRegressor
from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor

# Classification
clf = RandomForestClassifier(n_estimators=100)
mapie_clf = MapieClassifier(estimator=clf, method='score', cv='prefit')

clf.fit(X_train, y_train)
mapie_clf.fit(X_cal, y_cal)
y_pred, y_set = mapie_clf.predict(X_test, alpha=0.1)

# Regression with CQR
mapie_reg = MapieQuantileRegressor(
    estimator=GradientBoostingRegressor(),
    method='quantile',
    cv='split'
)
mapie_reg.fit(X_train, y_train)
y_pred, y_interval = mapie_reg.predict(X_test, alpha=0.1)

Applications

NLP

Application Method Reference
Text Classification APS, RAPS Campos et al. (2024)
Machine Translation Sequence-level CP Kumar et al. (2023)
Question Answering CP for abstention Ren et al. (2023)
LLM Hallucination Detection CP-based filtering Quach et al. (2024)

Time Series

  • Adaptive CI (ACI): Gibbs and Candes (2021)
  • EnbPI: Xu and Xie (2021)
  • Streaming/online 환경에서 coverage 유지

Healthcare / Safety-Critical

  • Medical diagnosis with uncertainty
  • Autonomous driving perception
  • Drug discovery

Comparison with Other UQ Methods

Method Coverage Guarantee Distribution-free Computational Cost
Conformal Prediction Finite-sample valid Yes Low (post-hoc)
Bayesian Methods Asymptotic No High
MC Dropout No guarantee Partial Medium
Ensemble Methods No guarantee Yes High
Temperature Scaling No guarantee Yes Low

Limitations

  1. Marginal vs Conditional Coverage: 기본 CP는 marginal coverage만 보장
  2. Calibration Set Size: 최소 ~1000 샘플 필요 for tight intervals
  3. Exchangeability Assumption: IID 가정 필요 (완화 가능하지만 복잡)
  4. Set Size Trade-off: Coverage 높이면 set size 증가

Key Takeaways

  1. CP는 distribution-free, model-agnostic 불확실성 정량화 프레임워크
  2. Finite-sample coverage guarantee 제공 - 다른 UQ 방법과의 핵심 차별점
  3. Post-hoc 적용 가능 - 기존 모델 재학습 불필요
  4. Classification에는 APS/RAPS, Regression에는 CQR 권장
  5. 최근 NLP, 시계열, LLM 등 다양한 분야로 확장 중

Last updated: 2026-02-05