
Agentic RAG: Evolution of Retrieval
Move beyond 'simple' retrieval. Learn why agentic RAG—where the model decides when and how to search—is the future of factual AI systems.
Evolution from Simple RAG to Agentic RAG
In the early days of LLM apps, we had "Simple RAG."
- User asks question.
- Code searches database.
- Code gives results to LLM.
- LLM answers.
This is a Deterministic Pipe. It fails if the search query is slightly off or if the information is spread across multiple documents. In this lesson, we introduce Agentic RAG, where the agent is in control of the search process itself.
1. The Retrieval Bottleneck
In Simple RAG, the "Retrieval" step happens before the model starts thinking. This is like a chef being handed a basket of ingredients they didn't ask for. If the basket is missing the salt, the dish is ruined.
The Problem: Single-Shot Failure
If the vector search returns irrelevant chunks, the LLM has no way to say "Try again." It just tries to Hallucinate an answer from the bad data.
2. What is Agentic RAG?
In Agentic RAG, the "Search" is a Tool that the model can call as many times as it wants.
The Agentic Workflow:
- User asks question.
- Agent thinks: "I need to know the price of X and the stock level of Y."
- Agent calls
search(query="Price of X"). - Agent observes results.
- Agent thinks: "I have the price, but the result for Y was vague. Let me try a different search for 'Y inventory level'."
- Agent calls
searchagain. - Agent gives final answer.
3. The Re-Query Loop
One of the most powerful features of Agentic RAG is Self-Correction.
graph TD
Start --> Search[Search Tool]
Search --> Eval{Is Info Sufficient?}
Eval -->|No| Expand[Query Expansion]
Expand --> Search
Eval -->|Yes| Finish[Final Answer]
By allowing the agent to "Evaluate" the quality of its own retrieved data, you increase accuracy by 30-40% in complex knowledge domains.
4. Query Expansion and Translation
An agent doesn't just search the user's raw text. It "Expands" the query.
- User Query: "Why is my internet slow?"
- Agent Queries:
- "Common causes of high latency in residential wifi"
- "ISP outage map for [User Zip Code]"
- "Troubleshooting steps for [User Router Model]"
5. Multi-Step vs. Single-Step Retrieval
- Single-Step: Good for "Facts." ("Who is the CEO of Apple?")
- Multi-Step: Essential for "Analysis." ("Compare Apple's 2023 revenue to Microsoft's.")
- These require the agent to maintain a Scratchpad of what it has found so far and what is left to find.
6. Implementation Strategy: The Search Tool
In LangGraph, we define the search tool as a node that feeds back into the planning node.
@tool
def vector_search(query: str):
"""
Search the internal knowledge base for technical documentation.
Use this multiple times if the first result is not specific enough.
"""
return db.similarity_search(query, k=5)
Summary and Mental Model
Think of Simple RAG like a Vending Machine. You press a button, you get a snack. If the snack is bad, you're out of luck.
Think of Agentic RAG like a Librarian.
- You ask a question.
- They go to the shelves.
- They come back with a book.
- They ask: "Is this what you meant?"
- If not, they go back and look in a different section.
Agentic RAG is a conversation with a database.
Exercise: RAG Evolution
- Comparison: Why would an agent be better at answering "Who won the most Oscars between 1990 and 2000?" than a simple RAG system?
- (Hint: How many search queries are needed to answer this?)
- Logic: Draft a "System Message" that tells an agent to Verify the information it finds.
- "If you find two conflicting dates for the same event, you must..."
- Threshold: When should an agent STOP searching?
- (Hint: Review the Max Turns and Max Tokens guardrails in Module 3.3). Ready to optimize the actual search? Next lesson: Self-Reranking and Query Expansion.