
Distributed Tracing: OpenTelemetry and Jaeger
Follow the path. Learn how to track a single request as it jumps across microservices, databases, and external APIs using distributed tracing.
Distributed Tracing: OpenTelemetry and Jaeger
In a complex system, a single user request might:
- Hit your FastAPI Gateway.
- Which calls an Auth Service.
- Which calls a Database.
- Which calls an AI Model.
If that request takes 5 seconds, where is the bottleneck? Tracing allows you to see the exact lifecycle of that single request.
1. Spans and Traces
- Trace: The entire journey of a request from start to finish.
- Span: A single "Step" in that journey (e.g., "SQL Query", "External API Call").
2. Using OpenTelemetry (OTel)
OpenTelemetry is a vendor-neutral standard for observability. In FastAPI, we use OTel middleware to automatically generate traces for every incoming request.
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
app = FastAPI()
# Automatically trace every request!
FastAPIInstrumentor.instrument_app(app)
3. Visualizing with Jaeger
Jaeger is a tool that allows you to see your traces visually. It shows a timeline of the request, highlighting which parts took the most time.
Why Tracing is Better than Logs:
A log tells you that something failed. A trace shows you exactly where in the chain it failed and what the inputs and outputs were for every step leading up to the failure.
4. Sampling
In high-traffic APIs (10,000+ RPS), you shouldn't trace 100% of requests—it would slow down the app and fill up your storage. Instead, we use Sampling (e.g., trace 1% of requests) to get a statistically valid view of performance.
Visualizing the Trace Timeline
gantt
title Request Trace for /process-payment
dateFormat X
axisFormat %s
section Gateway
FastAPI Handler :0, 500
section Services
Auth Check :10, 50
Balance Check :60, 150
Stripe API Call :160, 480
section Database
Log Transaction :485, 495
Summary
- Distributed Tracing: Essential for microservices and complex apps.
- OpenTelemetry: The industry standard for tracing.
- Spans: The building blocks of a trace.
- Bottleneck Detection: Use traces to find out exactly why a request is slow.
In the next lesson, we wrap up Module 18 with Exercises on observability and monitoring.
Exercise: The Bottleneck Detective
Look at the Gantt chart above.
- Which part of the request is taking the most time?
- If you wanted to speed up this API, which service would you focus on optimizing first?