Logistic Regression (로지스틱 회귀)¶

개요¶

문제 정의¶

이진 분류 문제에서 클래스 소속 확률을 추정하는 판별 모델:

입력 특성의 선형 결합을 시그모이드 함수로 변환
출력: 클래스 1에 속할 확률 P(Y=1|X)
결정 경계가 선형인 선형 분류기

핵심 아이디어¶

로지스틱 함수 (Sigmoid):

선형 회귀의 출력을 [0, 1] 확률로 변환:

P(Y=1|X) = sigma(z) = 1 / (1 + exp(-z))
where z = w^T * x + b

Odds와 Log-odds:

Odds = P(Y=1) / P(Y=0) = P(Y=1) / (1 - P(Y=1))
Log-odds (Logit) = log(Odds) = w^T * x + b

Log-odds가 선형이므로 해석이 용이
계수 w_i 증가 -> Odds가 exp(w_i)배 증가

알고리즘/수식¶

모델¶

P(Y=1|X=x) = sigma(w^T * x + b) = 1 / (1 + exp(-(w^T * x + b)))
P(Y=0|X=x) = 1 - P(Y=1|X=x)

손실 함수 (Binary Cross-Entropy)¶

L(w, b) = -1/n * sum_{i=1}^{n} [y_i * log(p_i) + (1-y_i) * log(1-p_i)]

여기서 p_i = P(Y=1|X=x_i)

정규화¶

종류	손실 함수	특징
L2 (Ridge)	L + lambda * \|\|w\|\|_2^2	계수 축소, 희소성 없음
L1 (Lasso)	L + lambda * \|\|w\|\|_1	희소 해, 특성 선택
Elastic Net	L + lambda_1 * \|\|w\|\|_1 + lambda_2 * \|\|w\|\|_2^2	L1 + L2 조합

최적화¶

Gradient Descent:

dL/dw_j = 1/n * sum_{i=1}^{n} (p_i - y_i) * x_{ij}
dL/db = 1/n * sum_{i=1}^{n} (p_i - y_i)

w := w - alpha * dL/dw
b := b - alpha * dL/db

실제로는 LBFGS, Newton-CG, SAG, SAGA 등 고급 최적화 알고리즘 사용.

다중 클래스 확장¶

One-vs-Rest (OvR): - K개 클래스에 대해 K개의 이진 분류기 학습 - 가장 높은 확률의 클래스 선택

Multinomial (Softmax):

P(Y=k|X=x) = exp(w_k^T * x) / sum_{j=1}^{K} exp(w_j^T * x)

시간 복잡도¶

단계	복잡도
학습	O(n * d * iter)
예측	O(d) per sample

하이퍼파라미터 가이드¶

파라미터	설명	권장 범위	기본값
C	정규화 강도 (1/lambda)	0.001 ~ 1000	1.0
penalty	정규화 종류	'l1', 'l2', 'elasticnet', 'none'	'l2'
solver	최적화 알고리즘	'lbfgs', 'liblinear', 'saga'	'lbfgs'
max_iter	최대 반복 횟수	100 ~ 10000	100
class_weight	클래스 가중치	'balanced' 또는 dict	None

Solver 선택 가이드:

Solver	L1	L2	Multinomial	대규모	특징
lbfgs	X	O	O	O	기본, 범용
liblinear	O	O	X	X	소규모, L1
saga	O	O	O	O	대규모, L1+다중
newton-cg	X	O	O	O	뉴턴법
sag	X	O	O	O	대규모, L2

Python 코드 예시¶

기본 사용법¶

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, roc_auc_score, classification_report,
                             confusion_matrix)
import matplotlib.pyplot as plt

# 데이터 로드
data = load_breast_cancer()
X, y = data.data, data.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 스케일링
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 모델 학습
model = LogisticRegression(
    C=1.0,
    penalty='l2',
    solver='lbfgs',
    max_iter=1000,
    random_state=42
)
model.fit(X_train_scaled, y_train)

# 예측
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]

# 평가
print("=== Logistic Regression Performance ===")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Precision: {precision_score(y_test, y_pred):.4f}")
print(f"Recall: {recall_score(y_test, y_pred):.4f}")
print(f"F1 Score: {f1_score(y_test, y_pred):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_prob):.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

계수 해석¶

# 특성 중요도 (계수 크기)
feature_importance = pd.DataFrame({
    'feature': data.feature_names,
    'coefficient': model.coef_[0],
    'abs_coefficient': np.abs(model.coef_[0]),
    'odds_ratio': np.exp(model.coef_[0])
}).sort_values('abs_coefficient', ascending=False)

print("\nTop 10 Important Features:")
print(feature_importance.head(10).to_string(index=False))

# Odds ratio 해석
# odds_ratio > 1: 해당 특성 증가 -> 클래스 1 확률 증가
# odds_ratio < 1: 해당 특성 증가 -> 클래스 1 확률 감소

정규화 비교¶

from sklearn.model_selection import cross_val_score

penalties = ['l1', 'l2', 'elasticnet', None]
results = {}

for penalty in penalties:
    if penalty == 'elasticnet':
        model = LogisticRegression(
            penalty=penalty, solver='saga', l1_ratio=0.5,
            max_iter=5000, random_state=42
        )
    elif penalty == 'l1':
        model = LogisticRegression(
            penalty=penalty, solver='saga',
            max_iter=5000, random_state=42
        )
    else:
        model = LogisticRegression(
            penalty=penalty, solver='lbfgs',
            max_iter=5000, random_state=42
        )

    scores = cross_val_score(model, X_train_scaled, y_train, cv=5)
    model.fit(X_train_scaled, y_train)

    n_nonzero = np.sum(model.coef_ != 0) if penalty else X.shape[1]

    results[str(penalty)] = {
        'cv_mean': scores.mean(),
        'cv_std': scores.std(),
        'n_features': n_nonzero
    }

print("\n=== Regularization Comparison ===")
print(f"{'Penalty':<15} {'CV Score':>12} {'Std':>8} {'Features':>10}")
print("-" * 50)
for penalty, metrics in results.items():
    print(f"{penalty:<15} {metrics['cv_mean']:>12.4f} {metrics['cv_std']:>8.4f} {metrics['n_features']:>10}")

하이퍼파라미터 튜닝¶

from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['saga']
}

grid_search = GridSearchCV(
    LogisticRegression(max_iter=5000, random_state=42),
    param_grid,
    cv=5,
    scoring='roc_auc',
    n_jobs=-1
)
grid_search.fit(X_train_scaled, y_train)

print(f"\nBest Parameters: {grid_search.best_params_}")
print(f"Best CV Score: {grid_search.best_score_:.4f}")

ROC Curve 시각화¶

from sklearn.metrics import roc_curve, auc

fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, 
         label=f'ROC curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc='lower right')
plt.savefig('logistic_roc.png', dpi=150)
plt.show()

언제 쓰나?¶

적합한 상황: - 클래스 간 선형 분리 가능한 경우 - 확률 추정이 필요한 경우 (신용평가, 질병 진단) - 계수 해석이 중요한 경우 (변수 영향력 분석) - 베이스라인 모델로 빠르게 테스트할 때 - 대규모 데이터 (수백만 샘플)

부적합한 상황: - 비선형 결정 경계가 필요한 경우 - 특성 간 복잡한 상호작용이 있는 경우 - 이미지, 텍스트 등 고차원 비정형 데이터

장단점¶

장점	단점
해석 용이 (Odds ratio)	선형 결정 경계만 학습
확률 출력	특성 간 상호작용 학습 불가
학습/예측 빠름	이상치에 민감
정규화로 과적합 방지	다중공선성 문제
대규모 데이터 확장 가능	클래스 불균형 시 성능 저하