What is distributed tracing?
· Category: DevOps & CI/CD
Short answer
Distributed tracing tracks the flow of requests through microservices. It captures timing and metadata for each hop, enabling root cause analysis of latency and failures.
How it works
A trace ID is generated at the entry point and propagated across service boundaries. Each service creates spans representing operations. Spans are collected and assembled into a trace.
Example
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("process_order"):
process_payment()
update_inventory()
Why it matters
In microservices, a single request may touch dozens of services. Tracing reveals bottlenecks and errors that logs and metrics cannot capture in isolation.
Tips
- Use OpenTelemetry for vendor-neutral instrumentation.
- Sample traces to reduce overhead.
- Correlate traces with logs and metrics.
Common issues
- Incomplete instrumentation leaves gaps.
- Trace propagation requires consistent headers.
- Storage costs grow with trace volume.