Conformal Prediction¶

Distribution-free uncertainty quantification framework that provides valid prediction sets with finite-sample coverage guarantees.

Meta¶

Item	Value
Category	Uncertainty Quantification
First Proposed	Vovk et al. (2005)
Key Conferences	NeurIPS 2024, ICML 2024, ICLR 2024
Key Surveys	Zhou et al. (2024), Campos et al. (2024)

References

Vovk, Gammerman, Shafer. "Algorithmic Learning in a Random World" (2005)
Angelopoulos, Bates. "A Gentle Introduction to Conformal Prediction" (2023)
Zhou et al. "Conformal Prediction: A Data Perspective" arXiv:2410.06494 (2024)
Campos et al. "Conformal Prediction for NLP: A Survey" TACL (2024)

Core Concept¶

Conformal Prediction (CP)은 학습된 모델의 예측에 대해 통계적으로 유효한 불확실성 구간/집합을 제공하는 프레임워크다.

Key Properties¶

Property	Description
Distribution-free	데이터 분포에 대한 가정 불필요
Model-agnostic	어떤 모델에도 적용 가능
Finite-sample valid	유한 샘플에서도 coverage guarantee
Post-hoc	모델 재학습 없이 적용 가능

Coverage Guarantee¶

사용자가 지정한 오류율 alpha에 대해:

P(Y_test in C_alpha(X_test)) >= 1 - alpha

alpha = 0.1 설정 시, 90% 이상의 확률로 true label이 prediction set에 포함
이 보장은 marginal coverage로, 전체 데이터에 대한 평균적 보장

Methodology¶

Split Conformal Prediction¶

가장 널리 사용되는 변형. 계산 효율적이고 구현이 간단함.

Algorithm

Input: trained model f, calibration set D_cal, test input x_test, error rate alpha

1. Compute nonconformity scores for calibration set:
   s_i = s(x_i, y_i) for i = 1, ..., n

2. Compute quantile threshold:
   q_hat = Quantile(s_1, ..., s_n; (n+1)(1-alpha)/n)

3. Construct prediction set:
   C_alpha(x_test) = {y : s(x_test, y) <= q_hat}

Output: prediction set C_alpha(x_test)

Nonconformity Score¶

모델 예측과 실제 값의 "비적합도"를 측정하는 함수.

Task Type	Common Scores
Classification	`s(x,y) = 1 - p(y\\|x)` (softmax probability)
Regression	`s(x,y) = \\|y - f(x)\\|` (absolute residual)
Quantile Regression	`s(x,y) = max(q_lo - y, y - q_hi)`

Variants¶

1. Adaptive Prediction Sets (APS)¶

Romano et al. (2020). 클래스 불균형 상황에서 개선된 성능.

s(x,y) = sum_{j: p(y_j|x) >= p(y|x)} p(y_j|x)

누적 확률 기반으로 score 계산. 더 작은 prediction set 생성.

2. Regularized Adaptive Prediction Sets (RAPS)¶

Angelopoulos et al. (2021). APS에 정규화 추가.

s(x,y) = sum_{j: pi_j >= pi_y} pi_j + lambda * (o(y) - k_reg)^+

lambda: 정규화 강도
k_reg: 허용 set 크기 임계값
평균 set 크기 감소 효과

3. Conformalized Quantile Regression (CQR)¶

Romano et al. (2019). 회귀 문제에서 조건부 coverage 개선.

1. Train quantile regression model: q_lo(x), q_hi(x)
2. Score: s(x,y) = max(q_lo(x) - y, y - q_hi(x))
3. Adjusted interval: [q_lo(x) - q_hat, q_hi(x) + q_hat]

Heteroscedastic 데이터에서 효과적.

4. Mondrian Conformal Prediction¶

그룹별로 별도의 calibration 수행. 조건부 coverage 개선.

For each group g:
  q_hat_g = Quantile(scores in group g; ...)

C_alpha(x) uses q_hat_g where x belongs to group g

5. Online Conformal Prediction¶

Gibbs and Candes (2021). 시계열/스트리밍 데이터용.

alpha_t = alpha + gamma * (err_{t-1} - alpha)

실시간으로 alpha 조정하여 non-exchangeable 데이터 처리.

Python Implementation¶

Basic Split Conformal (Classification)¶

import numpy as np
from sklearn.model_selection import train_test_split

def split_conformal_classification(
    model, 
    X_cal, 
    y_cal, 
    X_test, 
    alpha=0.1
):
    """
    Split Conformal Prediction for classification.

    Args:
        model: Trained classifier with predict_proba method
        X_cal: Calibration features
        y_cal: Calibration labels
        X_test: Test features
        alpha: Target error rate (default 0.1 for 90% coverage)

    Returns:
        List of prediction sets for each test sample
    """
    n_cal = len(X_cal)
    n_classes = len(np.unique(y_cal))

    # Step 1: Compute calibration scores
    proba_cal = model.predict_proba(X_cal)
    scores_cal = 1 - proba_cal[np.arange(n_cal), y_cal]

    # Step 2: Compute quantile threshold
    q_level = np.ceil((n_cal + 1) * (1 - alpha)) / n_cal
    q_hat = np.quantile(scores_cal, q_level, method='higher')

    # Step 3: Construct prediction sets
    proba_test = model.predict_proba(X_test)
    prediction_sets = []

    for i in range(len(X_test)):
        pred_set = np.where(1 - proba_test[i] <= q_hat)[0]
        prediction_sets.append(pred_set.tolist())

    return prediction_sets, q_hat


# Example usage
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

# Generate data
X, y = make_classification(n_samples=2000, n_features=20, 
                           n_classes=5, n_informative=10,
                           random_state=42)

# Split: train / calibration / test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Train model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Apply conformal prediction
pred_sets, threshold = split_conformal_classification(
    clf, X_cal, y_cal, X_test, alpha=0.1
)

# Evaluate coverage
coverage = np.mean([y_test[i] in pred_sets[i] for i in range(len(y_test))])
avg_size = np.mean([len(s) for s in pred_sets])

print(f"Coverage: {coverage:.3f} (target: 0.90)")
print(f"Average set size: {avg_size:.2f}")

Conformalized Quantile Regression¶

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

def conformalized_quantile_regression(
    X_train, y_train,
    X_cal, y_cal,
    X_test,
    alpha=0.1
):
    """
    Conformalized Quantile Regression (CQR).

    Returns prediction intervals with coverage guarantee.
    """
    # Train quantile regressors
    q_lo = alpha / 2
    q_hi = 1 - alpha / 2

    model_lo = GradientBoostingRegressor(
        loss='quantile', alpha=q_lo, n_estimators=100
    )
    model_hi = GradientBoostingRegressor(
        loss='quantile', alpha=q_hi, n_estimators=100
    )

    model_lo.fit(X_train, y_train)
    model_hi.fit(X_train, y_train)

    # Calibration scores
    pred_lo_cal = model_lo.predict(X_cal)
    pred_hi_cal = model_hi.predict(X_cal)

    scores_cal = np.maximum(pred_lo_cal - y_cal, y_cal - pred_hi_cal)

    # Quantile threshold
    n_cal = len(X_cal)
    q_level = np.ceil((n_cal + 1) * (1 - alpha)) / n_cal
    q_hat = np.quantile(scores_cal, min(q_level, 1.0), method='higher')

    # Test predictions
    pred_lo_test = model_lo.predict(X_test)
    pred_hi_test = model_hi.predict(X_test)

    # Conformalized intervals
    intervals = np.column_stack([
        pred_lo_test - q_hat,
        pred_hi_test + q_hat
    ])

    return intervals, q_hat


# Example usage
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=2000, n_features=10, noise=20)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

intervals, q = conformalized_quantile_regression(
    X_train, y_train, X_cal, y_cal, X_test, alpha=0.1
)

# Evaluate
coverage = np.mean((y_test >= intervals[:, 0]) & (y_test <= intervals[:, 1]))
avg_width = np.mean(intervals[:, 1] - intervals[:, 0])

print(f"Coverage: {coverage:.3f} (target: 0.90)")
print(f"Average interval width: {avg_width:.2f}")

Using MAPIE Library¶

# pip install mapie

from mapie.classification import MapieClassifier
from mapie.regression import MapieQuantileRegressor
from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor

# Classification
clf = RandomForestClassifier(n_estimators=100)
mapie_clf = MapieClassifier(estimator=clf, method='score', cv='prefit')

clf.fit(X_train, y_train)
mapie_clf.fit(X_cal, y_cal)
y_pred, y_set = mapie_clf.predict(X_test, alpha=0.1)

# Regression with CQR
mapie_reg = MapieQuantileRegressor(
    estimator=GradientBoostingRegressor(),
    method='quantile',
    cv='split'
)
mapie_reg.fit(X_train, y_train)
y_pred, y_interval = mapie_reg.predict(X_test, alpha=0.1)

Applications¶

NLP¶

Application	Method	Reference
Text Classification	APS, RAPS	Campos et al. (2024)
Machine Translation	Sequence-level CP	Kumar et al. (2023)
Question Answering	CP for abstention	Ren et al. (2023)
LLM Hallucination Detection	CP-based filtering	Quach et al. (2024)

Time Series¶

Adaptive CI (ACI): Gibbs and Candes (2021)
EnbPI: Xu and Xie (2021)
Streaming/online 환경에서 coverage 유지

Healthcare / Safety-Critical¶

Medical diagnosis with uncertainty
Autonomous driving perception
Drug discovery

Comparison with Other UQ Methods¶

Method	Coverage Guarantee	Distribution-free	Computational Cost
Conformal Prediction	Finite-sample valid	Yes	Low (post-hoc)
Bayesian Methods	Asymptotic	No	High
MC Dropout	No guarantee	Partial	Medium
Ensemble Methods	No guarantee	Yes	High
Temperature Scaling	No guarantee	Yes	Low

Limitations¶

Marginal vs Conditional Coverage: 기본 CP는 marginal coverage만 보장
Calibration Set Size: 최소 ~1000 샘플 필요 for tight intervals
Exchangeability Assumption: IID 가정 필요 (완화 가능하지만 복잡)
Set Size Trade-off: Coverage 높이면 set size 증가

Key Takeaways¶

CP는 distribution-free, model-agnostic 불확실성 정량화 프레임워크
Finite-sample coverage guarantee 제공 - 다른 UQ 방법과의 핵심 차별점
Post-hoc 적용 가능 - 기존 모델 재학습 불필요
Classification에는 APS/RAPS, Regression에는 CQR 권장
최근 NLP, 시계열, LLM 등 다양한 분야로 확장 중

Last updated: 2026-02-05