Tracing Retrieval Steps

Tracing Retrieval Steps

Learn how to 'open the black box' of RAG by tracing the path from user query to final answer.

Tracing Retrieval Steps

When a RAG system fails, the first question is always: Did retrieval fail, or did generation fail? Tracing allows you to see every intermediate step in the pipeline.

The Tracing Stack

  1. LangSmith: The industry leader for tracing LangChain-based applications.
  2. Weights & Biases (W&B) Prompts: Visualize the LLM inputs/outputs.
  3. OpenTelemetry: A vendor-neutral standard for logging and tracing.

What is a "Trace"?

A trace is a nested list of spans.

  • Span 1: User Query Received.
  • Span 2: Query Embedding Generated (Ollama).
  • Span 3: Vector Search Executed (Chroma).
  • Span 4: Documents Retrieved (3 chunks).
  • Span 5: Re-Ranking Executed (Cross-Encoder).
  • Span 6: Final LLM Generation (Claude).

Visualizing Failure

If you see that Span 4 returned "None," you know your search index or your embedding model is the problem. If Span 6 returned "I don't know" despite Span 4 having the data, you know your prompt or the LLM is the problem.

Implementation with LangSmith

# Just one environment variable and all your RAG steps are traced!
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=...

Debugging Retrieval

A trace helps you answer:

  • "Why was this irrelevant document retrieved?"
  • "Is the re-ranker actually re-ordering the docs based on relevance?"
  • "How much latency did the OCR step add?"

Exercises

  1. Set up a free LangSmith account.
  2. Run a RAG query and look at the "Trace" graph.
  3. Identify which step took the most time.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn