
Agentic RAG: Adding Reasoning to Retrieval
Moving beyond simple vector search. How Agentic RAG uses multi-step reasoning, query decomposition, and corrective feedback to answer complex questions.
Agentic RAG: Adding Reasoning to Retrieval
Standard Retrieval-Augmented Generation (RAG) is simple: Search -> Stuff -> Summarize. It works great for factual questions like "What is the capital of France?" It fails miserably for multi-hop questions like "Compare the revenue growth of Apple vs Microsoft in 2023 and analyze the primary driver."
Agentic RAG fixes this by introducing a cycle of reasoning into the retrieval process.
1. The Limitation of "Naive" RAG
In Naive RAG, the system is blind. It takes the user query, converts it to a vector, and grabs the top 5 chunks.
- Failure 1: The answer might require 20 chunks from 3 different documents.
- Failure 2: The top 5 chunks might be irrelevant (poor retrieval). The LLM hallucinates an answer because it has to answer.
2. The Agentic Workflow
An Agentic RAG system doesn't just "Search." It "Investigates."
graph TD
User[User Query] --> Planner
Planner -->|Decompose| Q1[Sub-Query 1]
Planner -->|Decompose| Q2[Sub-Query 2]
Q1 --> Tool[Retriever Tool]
Tool -->|Results| Grader{Is Relevant?}
Grader -- Yes --> Context
Grader -- No --> Refiner[Rewrite Query]
Refiner --> Tool
Q2 --> Tool
Context --> Synthesizer[LLM Answer Gen]
Synthesizer --> Final[Final Answer]
Key Components
- Query Decomposition: breaking a complex question into atomic lookups.
- Self-Correction (The Grader): An internal check that asks, "Did the search results actually answer the question?" If not, it rewrites the search query and tries again.
- Active Reading: The agent reads the first document, then decides what to look for next based on what it learned.
3. Code Example: LangGraph implementation
Here is pseudo-code for a "Corrective RAG" using a graph-based agent approach.
# A simplified graph workflow for Agentic RAG
def retrieve(state):
"""Retrieve documents based on question"""
docs = vector_store.search(state["question"])
return {"documents": docs}
def grade_documents(state):
"""Filter out irrelevant docs"""
relevant_docs = []
for doc in state["documents"]:
grade = grader_model.invoke(doc, state["question"])
if grade.is_relevant:
relevant_docs.append(doc)
# If we found nothing useful...
if not relevant_docs:
return "rewrite_query" # Go to a rewriting step
return {"documents": relevant_docs} # Go to generate
def generate(state):
"""Generate answer"""
return llm.invoke(state["documents"], state["question"])
# The Graph
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade", grade_documents)
workflow.add_node("generate", generate)
workflow.add_node("rewrite", rewrite_query)
# ... edges define the loop ...
4. Long-Term Memory (The missing link)
Standard RAG is stateless. Agentic RAG adds Episodic Memory.
- Session Memory: "Wait, you mentioned 'Project X' earlier, did you mean the same one?"
- User Modeling: "This user prefers technical summaries, not marketing fluff."
This memory is stored not just as text, but often as a Knowledge Graph updated by the agent itself.
5. When to use Agentic RAG?
It is slower and more expensive than Naive RAG.
| Scenario | Use Naive RAG | Use Agentic RAG |
|---|---|---|
| Simple Lookup | Yes (Fast/Cheap) | No (Overkill) |
| Comparisons | No | Yes |
| Research/Summary | No | Yes |
Q&A on <10 docs | Yes | No |
| Q&A on >10k docs | No | Yes |
As we move into 2025, "One-Shot RAG" will become a utility optimization, while Agentic RAG will become the standard for "Knowledge Workers" (Analysts, Lawyers, Researchers).