Conformal Prediction¶
Distribution-free uncertainty quantification framework that provides valid prediction sets with finite-sample coverage guarantees.
Meta¶
| Item | Value |
|---|---|
| Category | Uncertainty Quantification |
| First Proposed | Vovk et al. (2005) |
| Key Conferences | NeurIPS 2024, ICML 2024, ICLR 2024 |
| Key Surveys | Zhou et al. (2024), Campos et al. (2024) |
References
- Vovk, Gammerman, Shafer. "Algorithmic Learning in a Random World" (2005)
- Angelopoulos, Bates. "A Gentle Introduction to Conformal Prediction" (2023)
- Zhou et al. "Conformal Prediction: A Data Perspective" arXiv:2410.06494 (2024)
- Campos et al. "Conformal Prediction for NLP: A Survey" TACL (2024)
Core Concept¶
Conformal Prediction (CP)은 학습된 모델의 예측에 대해 통계적으로 유효한 불확실성 구간/집합을 제공하는 프레임워크다.
Key Properties¶
| Property | Description |
|---|---|
| Distribution-free | 데이터 분포에 대한 가정 불필요 |
| Model-agnostic | 어떤 모델에도 적용 가능 |
| Finite-sample valid | 유한 샘플에서도 coverage guarantee |
| Post-hoc | 모델 재학습 없이 적용 가능 |
Coverage Guarantee¶
사용자가 지정한 오류율 alpha에 대해:
alpha = 0.1설정 시, 90% 이상의 확률로 true label이 prediction set에 포함- 이 보장은 marginal coverage로, 전체 데이터에 대한 평균적 보장
Methodology¶
Split Conformal Prediction¶
가장 널리 사용되는 변형. 계산 효율적이고 구현이 간단함.
Algorithm
Input: trained model f, calibration set D_cal, test input x_test, error rate alpha
1. Compute nonconformity scores for calibration set:
s_i = s(x_i, y_i) for i = 1, ..., n
2. Compute quantile threshold:
q_hat = Quantile(s_1, ..., s_n; (n+1)(1-alpha)/n)
3. Construct prediction set:
C_alpha(x_test) = {y : s(x_test, y) <= q_hat}
Output: prediction set C_alpha(x_test)
Nonconformity Score¶
모델 예측과 실제 값의 "비적합도"를 측정하는 함수.
| Task Type | Common Scores |
|---|---|
| Classification | s(x,y) = 1 - p(y\|x) (softmax probability) |
| Regression | s(x,y) = \|y - f(x)\| (absolute residual) |
| Quantile Regression | s(x,y) = max(q_lo - y, y - q_hi) |
Variants¶
1. Adaptive Prediction Sets (APS)¶
Romano et al. (2020). 클래스 불균형 상황에서 개선된 성능.
누적 확률 기반으로 score 계산. 더 작은 prediction set 생성.
2. Regularized Adaptive Prediction Sets (RAPS)¶
Angelopoulos et al. (2021). APS에 정규화 추가.
lambda: 정규화 강도k_reg: 허용 set 크기 임계값- 평균 set 크기 감소 효과
3. Conformalized Quantile Regression (CQR)¶
Romano et al. (2019). 회귀 문제에서 조건부 coverage 개선.
1. Train quantile regression model: q_lo(x), q_hi(x)
2. Score: s(x,y) = max(q_lo(x) - y, y - q_hi(x))
3. Adjusted interval: [q_lo(x) - q_hat, q_hi(x) + q_hat]
Heteroscedastic 데이터에서 효과적.
4. Mondrian Conformal Prediction¶
그룹별로 별도의 calibration 수행. 조건부 coverage 개선.
For each group g:
q_hat_g = Quantile(scores in group g; ...)
C_alpha(x) uses q_hat_g where x belongs to group g
5. Online Conformal Prediction¶
Gibbs and Candes (2021). 시계열/스트리밍 데이터용.
실시간으로 alpha 조정하여 non-exchangeable 데이터 처리.
Python Implementation¶
Basic Split Conformal (Classification)¶
import numpy as np
from sklearn.model_selection import train_test_split
def split_conformal_classification(
model,
X_cal,
y_cal,
X_test,
alpha=0.1
):
"""
Split Conformal Prediction for classification.
Args:
model: Trained classifier with predict_proba method
X_cal: Calibration features
y_cal: Calibration labels
X_test: Test features
alpha: Target error rate (default 0.1 for 90% coverage)
Returns:
List of prediction sets for each test sample
"""
n_cal = len(X_cal)
n_classes = len(np.unique(y_cal))
# Step 1: Compute calibration scores
proba_cal = model.predict_proba(X_cal)
scores_cal = 1 - proba_cal[np.arange(n_cal), y_cal]
# Step 2: Compute quantile threshold
q_level = np.ceil((n_cal + 1) * (1 - alpha)) / n_cal
q_hat = np.quantile(scores_cal, q_level, method='higher')
# Step 3: Construct prediction sets
proba_test = model.predict_proba(X_test)
prediction_sets = []
for i in range(len(X_test)):
pred_set = np.where(1 - proba_test[i] <= q_hat)[0]
prediction_sets.append(pred_set.tolist())
return prediction_sets, q_hat
# Example usage
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Generate data
X, y = make_classification(n_samples=2000, n_features=20,
n_classes=5, n_informative=10,
random_state=42)
# Split: train / calibration / test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)
# Train model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# Apply conformal prediction
pred_sets, threshold = split_conformal_classification(
clf, X_cal, y_cal, X_test, alpha=0.1
)
# Evaluate coverage
coverage = np.mean([y_test[i] in pred_sets[i] for i in range(len(y_test))])
avg_size = np.mean([len(s) for s in pred_sets])
print(f"Coverage: {coverage:.3f} (target: 0.90)")
print(f"Average set size: {avg_size:.2f}")
Conformalized Quantile Regression¶
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
def conformalized_quantile_regression(
X_train, y_train,
X_cal, y_cal,
X_test,
alpha=0.1
):
"""
Conformalized Quantile Regression (CQR).
Returns prediction intervals with coverage guarantee.
"""
# Train quantile regressors
q_lo = alpha / 2
q_hi = 1 - alpha / 2
model_lo = GradientBoostingRegressor(
loss='quantile', alpha=q_lo, n_estimators=100
)
model_hi = GradientBoostingRegressor(
loss='quantile', alpha=q_hi, n_estimators=100
)
model_lo.fit(X_train, y_train)
model_hi.fit(X_train, y_train)
# Calibration scores
pred_lo_cal = model_lo.predict(X_cal)
pred_hi_cal = model_hi.predict(X_cal)
scores_cal = np.maximum(pred_lo_cal - y_cal, y_cal - pred_hi_cal)
# Quantile threshold
n_cal = len(X_cal)
q_level = np.ceil((n_cal + 1) * (1 - alpha)) / n_cal
q_hat = np.quantile(scores_cal, min(q_level, 1.0), method='higher')
# Test predictions
pred_lo_test = model_lo.predict(X_test)
pred_hi_test = model_hi.predict(X_test)
# Conformalized intervals
intervals = np.column_stack([
pred_lo_test - q_hat,
pred_hi_test + q_hat
])
return intervals, q_hat
# Example usage
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=2000, n_features=10, noise=20)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)
intervals, q = conformalized_quantile_regression(
X_train, y_train, X_cal, y_cal, X_test, alpha=0.1
)
# Evaluate
coverage = np.mean((y_test >= intervals[:, 0]) & (y_test <= intervals[:, 1]))
avg_width = np.mean(intervals[:, 1] - intervals[:, 0])
print(f"Coverage: {coverage:.3f} (target: 0.90)")
print(f"Average interval width: {avg_width:.2f}")
Using MAPIE Library¶
# pip install mapie
from mapie.classification import MapieClassifier
from mapie.regression import MapieQuantileRegressor
from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor
# Classification
clf = RandomForestClassifier(n_estimators=100)
mapie_clf = MapieClassifier(estimator=clf, method='score', cv='prefit')
clf.fit(X_train, y_train)
mapie_clf.fit(X_cal, y_cal)
y_pred, y_set = mapie_clf.predict(X_test, alpha=0.1)
# Regression with CQR
mapie_reg = MapieQuantileRegressor(
estimator=GradientBoostingRegressor(),
method='quantile',
cv='split'
)
mapie_reg.fit(X_train, y_train)
y_pred, y_interval = mapie_reg.predict(X_test, alpha=0.1)
Applications¶
NLP¶
| Application | Method | Reference |
|---|---|---|
| Text Classification | APS, RAPS | Campos et al. (2024) |
| Machine Translation | Sequence-level CP | Kumar et al. (2023) |
| Question Answering | CP for abstention | Ren et al. (2023) |
| LLM Hallucination Detection | CP-based filtering | Quach et al. (2024) |
Time Series¶
- Adaptive CI (ACI): Gibbs and Candes (2021)
- EnbPI: Xu and Xie (2021)
- Streaming/online 환경에서 coverage 유지
Healthcare / Safety-Critical¶
- Medical diagnosis with uncertainty
- Autonomous driving perception
- Drug discovery
Comparison with Other UQ Methods¶
| Method | Coverage Guarantee | Distribution-free | Computational Cost |
|---|---|---|---|
| Conformal Prediction | Finite-sample valid | Yes | Low (post-hoc) |
| Bayesian Methods | Asymptotic | No | High |
| MC Dropout | No guarantee | Partial | Medium |
| Ensemble Methods | No guarantee | Yes | High |
| Temperature Scaling | No guarantee | Yes | Low |
Limitations¶
- Marginal vs Conditional Coverage: 기본 CP는 marginal coverage만 보장
- Calibration Set Size: 최소 ~1000 샘플 필요 for tight intervals
- Exchangeability Assumption: IID 가정 필요 (완화 가능하지만 복잡)
- Set Size Trade-off: Coverage 높이면 set size 증가
Key Takeaways¶
- CP는 distribution-free, model-agnostic 불확실성 정량화 프레임워크
- Finite-sample coverage guarantee 제공 - 다른 UQ 방법과의 핵심 차별점
- Post-hoc 적용 가능 - 기존 모델 재학습 불필요
- Classification에는 APS/RAPS, Regression에는 CQR 권장
- 최근 NLP, 시계열, LLM 등 다양한 분야로 확장 중
Last updated: 2026-02-05