MLflow로 머신러닝 실험 관리와 모델 배포

MLflow 소개

MLflow는 머신러닝 라이프사이클을 관리하는 오픈소스 플랫폼입니다. 실험 추적(Tracking), 코드 패키징(Projects), 모델 관리(Model Registry), 모델 서빙(Serving) 네 가지 핵심 컴포넌트를 제공하여 실험에서 프로덕션까지의 간극을 줄여줍니다.

설치 및 서버 설정

# 설치
pip install mlflow[extras]

# 로컬 추적 서버 시작
mlflow server --host 0.0.0.0 --port 5000 \
  --backend-store-uri postgresql://user:pass@db:5432/mlflow \
  --default-artifact-root s3://mlflow-artifacts/

# 환경 변수 설정
export MLFLOW_TRACKING_URI=http://localhost:5000

실험 추적 (Tracking)

import mlflow
from mlflow.models import infer_signature
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("fraud-detection-v2")

# 데이터 준비
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 실험 실행 기록
with mlflow.start_run(run_name="rf-baseline") as run:
    # 하이퍼파라미터 로깅
    params = {
        "n_estimators": 200,
        "max_depth": 15,
        "min_samples_split": 5,
        "class_weight": "balanced",
    }
    mlflow.log_params(params)

    # 모델 학습
    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)

    # 메트릭 로깅
    y_pred = model.predict(X_test)
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "f1_score": f1_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
    }
    mlflow.log_metrics(metrics)

    # 모델 저장
    signature = infer_signature(X_train, y_pred)
    mlflow.sklearn.log_model(
        model, "model",
        signature=signature,
        input_example=X_train[:3],
    )

    # 아티팩트 저장 (차트, 데이터 등)
    import matplotlib.pyplot as plt
    fig, ax = plt.subplots()
    # ... confusion matrix 그리기
    mlflow.log_figure(fig, "confusion_matrix.png")

    print(f"Run ID: {run.info.run_id}")
    print(f"Metrics: {metrics}")

자동 로깅

# 자동 로깅 활성화 (프레임워크별 자동 추적)
mlflow.autolog()  # sklearn, pytorch, tensorflow 등 자동 감지

# 또는 특정 프레임워크만
mlflow.sklearn.autolog()
mlflow.pytorch.autolog(log_models=True)

# 이후 일반적으로 학습하면 자동으로 추적됨
model = RandomForestClassifier(n_estimators=200)
model.fit(X_train, y_train)
# -> 파라미터, 메트릭, 모델이 자동 기록

모델 레지스트리

from mlflow import MlflowClient

client = MlflowClient()

# 모델 등록
model_uri = f"runs:/{run.info.run_id}/model"
model_version = mlflow.register_model(model_uri, "fraud-detector")

# 모델 버전에 설명 추가
client.update_model_version(
    name="fraud-detector",
    version=model_version.version,
    description="RandomForest baseline, F1=0.89"
)

# 모델 스테이지 전환 (v2에서는 alias 사용)
client.set_registered_model_alias(
    name="fraud-detector",
    alias="champion",
    version=model_version.version
)

# 프로덕션 모델 로드
model = mlflow.pyfunc.load_model("models:/fraud-detector@champion")
predictions = model.predict(new_data)

모델 서빙

# CLI로 모델 서빙
mlflow models serve -m "models:/fraud-detector@champion" \
  --port 5001 --no-conda

# REST API로 추론
curl -X POST http://localhost:5001/invocations \
  -H "Content-Type: application/json" \
  -d '{"dataframe_split": {
    "columns": ["feature1", "feature2", "feature3"],
    "data": [[0.5, 1.2, 3.4]]
  }}'

# Docker 이미지 빌드
mlflow models build-docker \
  -m "models:/fraud-detector@champion" \
  -n fraud-detector-image \
  --enable-mlserver

MLflow 구성 요소 요약

컴포넌트	역할	핵심 기능
Tracking	실험 기록	파라미터, 메트릭, 아티팩트 로깅
Projects	코드 패키징	재현 가능한 실행 환경
Model Registry	모델 관리	버전 관리, 스테이지 전환, 승인
Serving	모델 배포	REST API, Docker, 클라우드 배포

# MLproject 파일로 재현 가능한 실행
# MLproject
name: fraud-detection
conda_env: conda.yaml

entry_points:
  train:
    parameters:
      n_estimators: {type: int, default: 200}
      max_depth: {type: int, default: 15}
    command: "python train.py --n-estimators {n_estimators} --max-depth {max_depth}"

# 실행
mlflow run . -P n_estimators=300 -P max_depth=20

모든 실험은 MLflow로 추적하여 재현 가능성을 보장하세요
모델 레지스트리의 alias(champion/challenger)를 활용하면 A/B 테스트와 카나리 배포가 용이합니다
프로덕션에서는 PostgreSQL 백엔드 스토어와 S3 아티팩트 스토어 조합을 권장합니다

MLflow 소개

설치 및 서버 설정

실험 추적 (Tracking)

자동 로깅

모델 레지스트리

모델 서빙

MLflow 구성 요소 요약

댓글 0