Tracing Retrieval Steps

When a RAG system fails, the first question is always: Did retrieval fail, or did generation fail? Tracing allows you to see every intermediate step in the pipeline.

The Tracing Stack

LangSmith: The industry leader for tracing LangChain-based applications.
Weights & Biases (W&B) Prompts: Visualize the LLM inputs/outputs.
OpenTelemetry: A vendor-neutral standard for logging and tracing.

What is a "Trace"?

A trace is a nested list of spans.

Span 1: User Query Received.
Span 2: Query Embedding Generated (Ollama).
Span 3: Vector Search Executed (Chroma).
Span 4: Documents Retrieved (3 chunks).
Span 5: Re-Ranking Executed (Cross-Encoder).
Span 6: Final LLM Generation (Claude).

Visualizing Failure

If you see that Span 4 returned "None," you know your search index or your embedding model is the problem. If Span 6 returned "I don't know" despite Span 4 having the data, you know your prompt or the LLM is the problem.

Implementation with LangSmith

# Just one environment variable and all your RAG steps are traced!
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=...

Debugging Retrieval

A trace helps you answer:

"Why was this irrelevant document retrieved?"
"Is the re-ranker actually re-ordering the docs based on relevance?"
"How much latency did the OCR step add?"

Exercises

Set up a free LangSmith account.
Run a RAG query and look at the "Trace" graph.
Identify which step took the most time.