Case study — one trace ID, from the request to the audit record
Every guardrail and tool call instrumented as a span, traces and logs joined on a shared ID into Tempo and Loki, and four Grafana dashboards on top — built so that when a regulator asks what the system was asked, what fired, and what came back, the answer is a query, not a forensics project.
When an AI system drafts, scans, or monitors inside a bank or a healthcare payer, the hard question is never “did it work” — it is “can you prove what happened.” Model risk management frameworks expect explainability, auditability, and documented oversight. A pipeline that logs to one place, emits metrics to another, and keeps traces in a third leaves you reconstructing events after the fact, under pressure, from three systems that do not agree.
OpenTelemetry closes that gap by treating every request as a structured event. The request opens a parent span; each guardrail and each tool call runs as a child span beneath it; W3C trace context propagates through the whole pipeline so the spans stitch into one timeline; and the trace ID is injected into every log line so logs and traces join up instead of living in separate silos. That correlated record is the audit trail. The engineering is making it complete, consistent, and cheap enough to leave on in production.
The approach is vendor-neutral by design. Instrumentation is written once against the OpenTelemetry SDK and exported over OTLP, so the backend — Tempo and Loki here, a commercial APM elsewhere — is a configuration choice, not a rewrite.
Below is the shape of one request through the ordered guardrail pipeline. The parent span bounds the request; each guardrail and the provider inference call hang beneath it as child spans. The trace ID at the top is the same key written into every log line for this request.
Five ordered guardrails wrapped around one provider inference call
Illustrative trace. Span order encodes the five ordered guardrails around the inference; widths are representative, not measured production latency.
Opens the parent span and mints the trace ID that every child span and every log line inherits.
Normalizes and screens the inbound payload; records what was stripped so the cleanup itself is auditable.
Prompt-injection detection; the span attributes capture whether a pattern matched and the action taken.
Per-app authorization and policy checks; the decision and its inputs land on the span as the control evidence.
The pluggable LLM provider call, instrumented as a child span so model latency and outcome sit inside the request timeline.
PII redaction and output screening; each redaction emits a metric, so the count is its own security signal.
Final throttle check, closing the request span and writing the result the trace will be queried on later.
Instrumentation emits through the OpenTelemetry SDK and ships over OTLP; the backend is a Grafana stack, swappable without touching the instrumentation.
Vendor-neutral collection & export
One instrumentation contract, written once against the OpenTelemetry SDK. Because export is OTLP, the Grafana stack on the right is a configuration choice — Datadog, New Relic, or Dynatrace drop in without touching the service.
OTel SDK → CollectorSpans, metrics, and logs over OTLP on :4317TempoTraces land here; trace ID is the audit keyLokiLogs with trace ID injected on every lineGrafanaFour dashboards on :3000W3C trace contextPropagated across guardrails and tool callsOpenTelemetryOTLPTempoLokiGrafanaContext propagationOn-demand sampling
Design decisions chosen so the telemetry stays useful under audit and affordable under load.
The request is the parent span; every guardrail and the provider call is a child beneath it. Nesting makes the order of operations and where time went legible at a glance.
The active trace ID is injected into each log line, so traces and logs join on a shared key instead of being correlated by guesswork across timestamps.
W3C trace context is carried across each hop, so spans from different stages assemble into one coherent timeline rather than scattering.
What was asked, which guardrails fired, and what was returned all live on one trace. The regulator’s question has a query behind it, not a forensics project.
An on-demand sampler keeps telemetry detailed where it matters without paying full-cardinality cost per request, so observability runs in production, not only in a drill.
Four Grafana boards turn raw spans and metrics into the views an on-call actually opens: latency, throughput, redaction hits, and per-tool error rates.
The pipeline here exports to a Grafana stack, but the instrumentation discipline is portable. Three years of hands-on work span open-source and commercial backends — the choice of vendor follows the data-residency and tooling constraints of the environment, not the other way round.
| Platform | Role | Notes |
|---|---|---|
| OpenTelemetry · Grafana · Tempo · Loki · Jaeger | Vendor-neutral traces, metrics, logs | OTLP export; trace–log correlation |
| Datadog | APM, metrics, log management | Dashboard and monitor design |
| New Relic | Full-stack APM and alerting | In production use since 2017 |
| Dynatrace | Distributed tracing | Automated dependency mapping |
| Grafana Cloud | Observability backend | Behind the GitOps-on-EKS platform & its 352-test suite |
Formal credentials behind the observability and platform work, plus the hands-on training that keeps the instrumentation current.
Instrumentation, OTLP pipeline, trace–log correlation · Nov 2025
Metrics, traces, and log management · Jan 2026
Monitor and dashboard design · Jan 2026
Full-stack APM and alerting · 2017
Event-driven autoscaling · Linux Foundation · 2025–2026
Reliability engineering practice · Linux Foundation · 2025–2026
Declarative delivery and reconciliation · Linux Foundation · 2025–2026
SDK spans / metrics / logs, OTLP collector, Tempo / Loki / Grafana wiring.
GPU inference serving with instrumented model-serving observability.
Agentic engineering workflows · 100% final score.
Where this observability layer was built and where the same patterns run.
8 tools, 5 ordered guardrails, 3 pluggable LLM providers — the reference implementation this telemetry was instrumented into.
Multi-cluster GitOps platform with Grafana Cloud observability and a 352-test validation suite.
How these controls apply to a regulated GenAI deployment your CISO can sign off on.
OpenTelemetry instrumentation, trace–log correlation, and Grafana dashboards — scoped to your compliance framework.
See GenAI & AppSec advisory