AI Observability7 min read31 March 2026

Langfuse: Open-Source LLM Observability You Can Self-Host

A practical guide to Langfuse — tracing, evaluation, prompt management, and cost monitoring for any LLM stack, fully open-source.

Most LLM observability tools are SaaS products that send your prompts and outputs to a third-party server. For companies with data privacy requirements — healthcare, finance, legal — that's a dealbreaker. Langfuse is the open-source alternative you can run entirely on your own infrastructure.

What Is Langfuse?

Langfuse is an open-source LLM engineering platform that provides:

Tracing — full request/response visibility for any LLM call

Evaluations — automated scoring and human annotation

Prompt management — versioned prompts with A/B testing

Metrics — latency, cost, token usage, error rates

Sessions — group traces into conversations or workflows

It works with any LLM framework — LangChain, LlamaIndex, OpenAI SDK, Anthropic SDK, or raw HTTP calls.

Self-Hosting with Docker

Run `docker-compose up` and you have a full observability stack on your own server.

Integrating with Python

The `@observe()` decorator automatically captures inputs, outputs, latency, and token usage.

LangChain Integration

Every LangChain step — prompts, LLM calls, tools, retrievers — appears as a nested trace in Langfuse.

Prompt Management

Langfuse lets you manage prompts as versioned artefacts with metadata:

When you update a prompt in the Langfuse UI, your application picks up the new version without a code deploy. You can roll back instantly if quality drops.

Evaluations

Set up automated evaluators to score every response:

For LLM-as-judge evaluation:

Cost Tracking

Langfuse automatically calculates cost per trace based on token usage and model pricing. You get dashboards showing:

Cost per user / per session / per feature

Model spend breakdown (GPT-4o vs Claude vs Gemini)

Cost trends over time

This is essential for managing LLM API budgets at scale.

Langfuse vs LangSmith

Open-source: Langfuse is MIT licensed; LangSmith is proprietary

Self-hostable: Langfuse yes; LangSmith no

Framework support: Langfuse works with any framework; LangSmith is LangChain-native

Prompt Hub: Both support versioned prompt management

Evaluations: Both support automated and human evaluation

Hosted option: Both offer cloud-hosted versions

Choose Langfuse if: data privacy matters, you use multiple frameworks, or you want self-hosted.

Choose LangSmith if: you're all-in on LangChain and want tightest integration.

Getting Started

Deploy Langfuse via Docker or use the cloud version at langfuse.com

Add the SDK to your project: `pip install langfuse`

Wrap your LLM calls with `@observe()` or the callback handler

Open the dashboard and start seeing your traces in real time

Observability isn't optional for production AI — it's how you catch hallucinations, measure quality, and make confident improvements.

Talk to us if you need help setting up LLM observability for your team.

Ready to implement AI in your business?

Book a free 30-minute strategy call — no commitment required.

Book a Free Call →

LangChain

LangChain: The Complete Guide to Building LLM Applications

AI Observability

LangSmith: Tracing, Evaluation, and Monitoring for LLM Apps