What is distributed tracing?

· Category: DevOps & CI/CD

Short answer

Distributed tracing tracks the flow of requests through microservices. It captures timing and metadata for each hop, enabling root cause analysis of latency and failures.

How it works

A trace ID is generated at the entry point and propagated across service boundaries. Each service creates spans representing operations. Spans are collected and assembled into a trace.

Example

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("process_order"):
    process_payment()
    update_inventory()

Why it matters

In microservices, a single request may touch dozens of services. Tracing reveals bottlenecks and errors that logs and metrics cannot capture in isolation.

Tips

  • Use OpenTelemetry for vendor-neutral instrumentation.
  • Sample traces to reduce overhead.
  • Correlate traces with logs and metrics.

Common issues

  • Incomplete instrumentation leaves gaps.
  • Trace propagation requires consistent headers.
  • Storage costs grow with trace volume.