Langfuse (LLM Observability)
Langfuse traces every LLM call your applications make — capturing inputs, outputs, latency, token counts, and costs — so you can debug, evaluate, and monitor AI quality.
Aithroyz deploys Langfuse as a self-hosted instance backed by its own Postgres database. Create a project and generate API keys on first login — Flowise integrates natively via Settings → Analytic Providers using those keys.
Access
URL:
https://langfuse.<env-name>.ops.aithroyz.comFirst user: Register on first visit — the first account becomes the organization owner. Then create a Project to generate your public/secret API key pair.
⚠
Register your admin account immediately after deployment. Langfuse allows open registration by default — restrict sign-ups in Settings → Organization → SSO or by setting LANGFUSE_DISABLE_SIGNUP in the environment (contact support to change provisioner env vars).
Instrumenting Python applications
The Langfuse Python SDK wraps the OpenAI client so you get traces with zero changes to your business logic:
pip install langfuse openaiimport os
from langfuse.openai import openai # drop-in replacement
os.environ["LANGFUSE_HOST"] = "https://langfuse.<env-name>.ops.aithroyz.com"
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..." # from Project → API Keys
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
# Use exactly like the standard openai client — traces appear automatically
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this incident report..."}],
name="incident-summary", # optional: label the trace
user_id="analyst-alice", # optional: tie trace to a user
session_id="session-abc123", # optional: group traces into a session
)
print(response.choices[0].message.content)✓
For LangChain or Flowise, use the native callback handler instead of the OpenAI wrapper. In Flowise, go to Settings → Analytic Providers → Langfuse and paste your public key, secret key, and host URL.
Viewing traces
Open the Traces tab in your Langfuse project. Each row is one LLM call (or multi-step chain). Click a trace to see:
Input / Output
The full prompt (all messages) sent to the model and the exact completion returned.
Latency
Total wall-clock time and time-to-first-token broken down per generation span.
Token counts
Prompt tokens, completion tokens, and total — aggregated per trace and per model.
Cost
Estimated USD cost based on the model's published pricing. Aggregate over time in the Dashboard tab.
Metadata
Any user_id, session_id, or custom tags you attached at instrumentation time.
Running evaluations
Langfuse Scores let you attach quality signals to traces. Use them to measure accuracy, helpfulness, or safety over time:
1
Manual scoring
Open a trace → click "Add Score". Enter a name (e.g. "accuracy") and a numeric value. Useful for labeling a sample of traces for fine-tuning datasets.
2
LLM-as-judge
Go to Scores → Evaluators → New Evaluator. Write a prompt that takes the trace input/output and returns a score (e.g. 1–5). Langfuse runs the evaluator asynchronously on new traces.
3
SDK scoring
Call langfuse.score(trace_id=..., name="relevance", value=0.9) from your application to attach programmatic scores at runtime (e.g. from a retrieval similarity metric).
Prompt management
Store and version your system prompts in Langfuse instead of hardcoding them. Fetch the active version at runtime so prompt updates deploy without code changes:
from langfuse import Langfuse
lf = Langfuse()
# Fetch the production version of a named prompt
prompt = lf.get_prompt("incident-summary-v2")
compiled = prompt.compile(tool_name="Elastic SIEM", severity="high")
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": compiled}],
langfuse_prompt=prompt, # links the trace to this prompt version
)Tips
user_id and session_id
Always pass user_id and session_id when tracing user-facing apps. They unlock per-user cost breakdowns and session replay in the Langfuse UI.
Flowise native integration
In Flowise: Settings → Analytic Providers → Langfuse. Enter your public key, secret key, and the Langfuse host URL. All chatflow runs are traced automatically.
Dashboard tab
The Dashboard shows aggregate cost, token usage, and latency over time. Filter by model, user, or custom tag to find expensive or slow call patterns.
Data retention
Langfuse stores all traces in Postgres on the VM. Plan for ~1 KB per trace. For high-volume production use, mount a larger disk or prune old traces via the API.