
Tracing Retrieval Steps
Learn how to 'open the black box' of RAG by tracing the path from user query to final answer.
Tracing Retrieval Steps
When a RAG system fails, the first question is always: Did retrieval fail, or did generation fail? Tracing allows you to see every intermediate step in the pipeline.
The Tracing Stack
- LangSmith: The industry leader for tracing LangChain-based applications.
- Weights & Biases (W&B) Prompts: Visualize the LLM inputs/outputs.
- OpenTelemetry: A vendor-neutral standard for logging and tracing.
What is a "Trace"?
A trace is a nested list of spans.
- Span 1: User Query Received.
- Span 2: Query Embedding Generated (Ollama).
- Span 3: Vector Search Executed (Chroma).
- Span 4: Documents Retrieved (3 chunks).
- Span 5: Re-Ranking Executed (Cross-Encoder).
- Span 6: Final LLM Generation (Claude).
Visualizing Failure
If you see that Span 4 returned "None," you know your search index or your embedding model is the problem. If Span 6 returned "I don't know" despite Span 4 having the data, you know your prompt or the LLM is the problem.
Implementation with LangSmith
# Just one environment variable and all your RAG steps are traced!
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=...
Debugging Retrieval
A trace helps you answer:
- "Why was this irrelevant document retrieved?"
- "Is the re-ranker actually re-ordering the docs based on relevance?"
- "How much latency did the OCR step add?"
Exercises
- Set up a free LangSmith account.
- Run a RAG query and look at the "Trace" graph.
- Identify which step took the most time.