AI Development6 min read10 February 2026

Building Real-Time AI Applications with Streaming

Streaming transforms AI from batch processing to real-time interaction. Here's how to implement streaming in your AI application for a dramatically better user experience.

Waiting for an AI response to complete before showing anything to the user is a UX mistake. Streaming — sending tokens as they're generated — makes AI feel instant and dramatically improves perceived performance.

Why Streaming Matters

Without streaming: user waits 5-15 seconds, then sees the full response appear.

With streaming: user sees the response start appearing in under 500ms, with the rest arriving progressively.

The actual generation time is the same. The perceived speed is completely different.

Server-Side Streaming (Node.js / Next.js)

```typescript

// app/api/chat/route.ts

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic();

export async function POST(req: Request) {

const { message } = await req.json();

const stream = await anthropic.messages.stream({

model: "claude-sonnet-4-6",

max_tokens: 1024,

messages: [{ role: "user", content: message }],

});

const encoder = new TextEncoder();

const readable = new ReadableStream({

async start(controller) {

for await (const chunk of stream) {

if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {

controller.enqueue(encoder.encode(chunk.delta.text));

}

controller.close();

});

return new Response(readable, {

headers: {

"Content-Type": "text/plain; charset=utf-8",

"Transfer-Encoding": "chunked",

});

}

```

Client-Side Consumption (React)

```typescript

"use client";

import { useState } from "react";

export function Chat() {

const [response, setResponse] = useState("");

const sendMessage = async (message: string) => {

setResponse("");

const res = await fetch("/api/chat", {

method: "POST",

body: JSON.stringify({ message }),

headers: { "Content-Type": "application/json" },

});

const reader = res.body!.getReader();

const decoder = new TextDecoder();

while (true) {

const { done, value } = await reader.read();

if (done) break;

setResponse((prev) => prev + decoder.decode(value));

}

};

return (

<div>

<button onClick={() => sendMessage("Hello!")}>Send</button>

<div>{response}</div>

</div>

);

}

```

Streaming with the Vercel AI SDK

The Vercel AI SDK simplifies streaming significantly:

```typescript

import { streamText } from "ai";

import { anthropic } from "@ai-sdk/anthropic";

export async function POST(req: Request) {

const { messages } = await req.json();

const result = streamText({

model: anthropic("claude-sonnet-4-6"),

messages,

});

return result.toDataStreamResponse();

}

```

```typescript

// Client

import { useChat } from "ai/react";

export function Chat() {

const { messages, input, handleInputChange, handleSubmit } = useChat();

return (

{messages.map((m) => <div key={m.id}>{m.content}</div>)}

</form>

);

}

```

Streaming Best Practices

Show a loading indicator for the period before the first token arrives

Handle connection errors — implement reconnection logic for long streams

Cancel on unmount — abort the fetch when the component unmounts

Stream structured data carefully — parse JSON only when the stream is complete

Rate limit per user — streaming can be abused; protect your API

When Not to Stream

Background processing (batch jobs)

When you need the full response before doing anything with it

Structured data extraction (wait for complete JSON)

When latency to first byte doesn't matter

Talk to us about building real-time AI experiences for your product.

Ready to implement AI in your business?

Book a free 30-minute strategy call — no commitment required.

Book a Free Call →

LangChain

LangChain: The Complete Guide to Building LLM Applications

AI Observability

LangSmith: Tracing, Evaluation, and Monitoring for LLM Apps