Langfuse (LLM Observability)

Langfuse traces every LLM call your applications make — capturing inputs, outputs, latency, token counts, and costs — so you can debug, evaluate, and monitor AI quality.

Aithroyz deploys Langfuse as a self-hosted instance backed by its own Postgres database. Create a project and generate API keys on first login — Flowise integrates natively via Settings → Analytic Providers using those keys.

Access

URL: https://langfuse.<env-name>.ops.aithroyz.com

First user: Register on first visit — the first account becomes the organization owner. Then create a Project to generate your public/secret API key pair.

⚠

Register your admin account immediately after deployment. Langfuse allows open registration by default — restrict sign-ups in Settings → Organization → SSO or by setting LANGFUSE_DISABLE_SIGNUP in the environment (contact support to change provisioner env vars).

Instrumenting Python applications

The Langfuse Python SDK wraps the OpenAI client so you get traces with zero changes to your business logic:

pip install langfuse openai

import os
from langfuse.openai import openai   # drop-in replacement

os.environ["LANGFUSE_HOST"]       = "https://langfuse.<env-name>.ops.aithroyz.com"
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."   # from Project → API Keys
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."

# Use exactly like the standard openai client — traces appear automatically
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this incident report..."}],
    name="incident-summary",        # optional: label the trace
    user_id="analyst-alice",        # optional: tie trace to a user
    session_id="session-abc123",    # optional: group traces into a session
)
print(response.choices[0].message.content)

✓

For LangChain or Flowise, use the native callback handler instead of the OpenAI wrapper. In Flowise, go to Settings → Analytic Providers → Langfuse and paste your public key, secret key, and host URL.

Viewing traces

Open the Traces tab in your Langfuse project. Each row is one LLM call (or multi-step chain). Click a trace to see:

Input / Output

The full prompt (all messages) sent to the model and the exact completion returned.

Latency

Total wall-clock time and time-to-first-token broken down per generation span.

Token counts

Prompt tokens, completion tokens, and total — aggregated per trace and per model.

Cost

Estimated USD cost based on the model's published pricing. Aggregate over time in the Dashboard tab.

Metadata

Any user_id, session_id, or custom tags you attached at instrumentation time.

Running evaluations

Langfuse Scores let you attach quality signals to traces. Use them to measure accuracy, helpfulness, or safety over time:

Manual scoring

Open a trace → click "Add Score". Enter a name (e.g. "accuracy") and a numeric value. Useful for labeling a sample of traces for fine-tuning datasets.

LLM-as-judge

Go to Scores → Evaluators → New Evaluator. Write a prompt that takes the trace input/output and returns a score (e.g. 1–5). Langfuse runs the evaluator asynchronously on new traces.

SDK scoring

Call langfuse.score(trace_id=..., name="relevance", value=0.9) from your application to attach programmatic scores at runtime (e.g. from a retrieval similarity metric).

Prompt management

Store and version your system prompts in Langfuse instead of hardcoding them. Fetch the active version at runtime so prompt updates deploy without code changes:

from langfuse import Langfuse

lf = Langfuse()

# Fetch the production version of a named prompt
prompt = lf.get_prompt("incident-summary-v2")
compiled = prompt.compile(tool_name="Elastic SIEM", severity="high")

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "system", "content": compiled}],
    langfuse_prompt=prompt,   # links the trace to this prompt version
)

Tips

user_id and session_id

Always pass user_id and session_id when tracing user-facing apps. They unlock per-user cost breakdowns and session replay in the Langfuse UI.

Flowise native integration

In Flowise: Settings → Analytic Providers → Langfuse. Enter your public key, secret key, and the Langfuse host URL. All chatflow runs are traced automatically.

Dashboard tab

The Dashboard shows aggregate cost, token usage, and latency over time. Filter by model, user, or custom tag to find expensive or slow call patterns.

Data retention

Langfuse stores all traces in Postgres on the VM. Plan for ~1 KB per trace. For high-volume production use, mount a larger disk or prune old traces via the API.

Flowise (LangChain Visual Builder)Read article →LLM GatewayRead article →Open WebUI (AI Chat)Read article →