스트리밍 응답이 필요한 이유

LLM(Large Language Model)의 응답은 생성에 수 초에서 수십 초가 소요됩니다. 전체 응답이 완료될 때까지 기다리면 사용자 경험이 크게 저하됩니다. 토큰 단위로 스트리밍하면 첫 토큰까지의 시간(TTFT)을 줄이고, ChatGPT처럼 타이핑 효과를 구현할 수 있습니다.

백엔드: SSE 엔드포인트 구현 (Node.js)

const express = require('express');
const OpenAI = require('openai');
const app = express();
app.use(express.json());

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  // SSE 헤더 설정
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('X-Accel-Buffering', 'no');

  try {
    const stream = await openai.chat.completions.create({
      model: 'gpt-4',
      messages,
      stream: true,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content || '';
      if (content) {
        res.write('data: ' + JSON.stringify({ content }) + '

');
      }
    }

    res.write('data: [DONE]

');
    res.end();
  } catch (error) {
    res.write('data: ' + JSON.stringify({ error: error.message }) + '

');
    res.end();
  }
});

Next.js Route Handler 버전

// app/api/chat/route.ts
import OpenAI from 'openai';

const openai = new OpenAI();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        if (content) {
          controller.enqueue(
            encoder.encode('data: ' + JSON.stringify({ content }) + '

')
          );
        }
      }
      controller.enqueue(encoder.encode('data: [DONE]

'));
      controller.close();
    }
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
    },
  });
}

프론트엔드: 스트리밍 수신

async function sendMessage(messages) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ messages }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('
');
    buffer = lines.pop() || '';

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;
        const { content } = JSON.parse(data);
        appendToChat(content);
      }
    }
  }
}

핵심 포인트 정리

SSE 프로토콜 형식: data: {JSON} (줄 끝에 빈 줄 필수)
Nginx 사용 시 X-Accel-Buffering: no 헤더로 프록시 버퍼링을 비활성화합니다
클라이언트에서 AbortController로 스트리밍을 중단할 수 있습니다
에러 발생 시에도 SSE 형식으로 에러를 전달하여 클라이언트에서 처리합니다
토큰 사용량은 스트리밍 완료 후 usage 필드에서 확인할 수 있습니다

AI 챗봇 구축 — Streaming 응답과 SSE 구현

스트리밍 응답이 필요한 이유

백엔드: SSE 엔드포인트 구현 (Node.js)

Next.js Route Handler 버전

프론트엔드: 스트리밍 수신

핵심 포인트 정리

댓글 0