Beyond Basic RAG: Agents as Intelligent Retrievers

In its simplest form, RAG (Retrieval-Augmented Generation) is a "Naive" process. You take the user's query, convert it into a vector, find the top 3 similar chunks in a database, and send them to the model. This works for simple questions like "Who is the CEO?", but it fails for complex questions like "How did the 2024 revenue growth compare to the 2022 projections while accounting for the X-acquisition?".

In the Gemini ADK ecosystem, we move from "Naive RAG" to "Agentic RAG." Instead of the system simply giving data to the model, we turn the model into the Search Architect. In this lesson, we will explore self-querying, re-ranking, and multi-hop retrieval patterns.

1. Naive RAG vs. Agentic RAG

Feature	Naive RAG	Agentic RAG
Search Path	Single, direct search.	Iterative, multi-step search.
Query Logic	Raw user string.	Model-optimized "Search Intent."
Filter Logic	None (Top K).	Semantic + Metadata filters.
Success Rate	Moderate for facts.	High for complex synthesis.

2. The Agent as a Self-Querier

Users are often bad at writing search queries. They use vague language or combine three questions into one.

In Agentic RAG, the agent's first task is to Rewrite the Query.

User: "Tell me about the money stuff from last year."
Agent Reasoning: "The user is asking about financial performance for the 2024 fiscal year. I will search for 'Annual Report 2024' and filter by 'Financial Statements'."

Implementation: Metadata Filtering

If your Vector DB supports metadata (e.g., year: 2024, department: finance), the agent can generate a structured tool call like search_docs(query="revenue", filter={"year": 2024}). This is far more precise than a raw vector search.

3. Re-Ranking: The Agent as an Editor

A vector database returns results based on mathematical similarity, not on "truth" or "usefulness."

The Re-Ranking Pattern:

Retrieve: Fetch the top 20 results (too many for a concise answer).
Evaluate: The agent reads all 20 and discards the irrelevant ones.
Process: The agent only uses the "Golden" results to build the final answer.

Why it works: Gemini is much better at evaluating relevance than a vector algorithm is at predicting it.

4. Multi-Hop Retrieval (Connecting the Dots)

Some questions require information from multiple sources that are not "similar" in vector space.

Question: "Did our hiring in Berlin exceed the budget set by the London team?"
Hop 1: Search for "Berlin hiring count."
Hop 2: Search for "London team budget allocations."
Hop 3: Synthesize the two unrelated data points.

graph TD
    A[User Query] --> B[Supervisor Agent]
    B --> C[Step 1: Get Data A]
    C --> D[Retrieve from Source 1]
    D --> B
    B --> E[Step 2: Get Data B based on A]
    E --> F[Retrieve from Source 2]
    F --> B
    B --> G[Final Synthesis]
    
    style B fill:#4285F4,color:#fff

5. Recursive Retrieval (Deep Diving)

Sometimes a search result mentions a another document: "For details, see the 2024 Strategy PDF."

A Recursive Agent will see this reference, realize its current context is insufficient, and autonomously trigger a second search for that specific PDF. This mimics how a human researcher follows citations.

6. Implementation: The Query-Optimizer Tool

Let's look at how we can implement an agent that "optimizes" a search before calling a tool.

import google.generativeai as genai

model = genai.GenerativeModel('gemini-1.5-flash')

def search_knowledge_base(optimized_query: str, date_filter: str = None):
    # Simulated Vector DB call
    return f"Data for {optimized_query} in {date_filter}: [Sales increased 5%]"

def agentic_rag_loop(user_query: str):
    # Use Gemini to generate the SEARCH PLAN
    planner_prompt = (
        f"The user asked: '{user_query}'. "
        "What is the single best technical search query to find this? "
        "Respond like this: QUERY: <text> | DATE: <year>"
    )
    
    plan = model.generate_content(planner_prompt).text
    # Parse the plan (Simplified)
    q = plan.split("QUERY:")[1].split("|")[0].strip()
    d = plan.split("DATE:")[1].strip()
    
    # 2. Execute the optimized search
    result = search_knowledge_base(q, d)
    
    # 3. Final Answer
    final_resp = model.generate_content(f"Answer '{user_query}' using: {result}")
    return final_resp.text

# agentic_rag_loop("How was our growth last year?")

7. When to Stop? (Retrieval Termination)

A dangerous failure mode of Agentic RAG is the "Retrieval Loop"—the agent keeps searching and searching, never feeling "ready" to answer.

Boundary Controls:

Max Hops: Limit the agent to 3 search attempts.
Sufficiency Gate: Ask the agent: "Do you have enough information to answer definitively? If No, explain what is missing."

8. Summary and Exercises

Agentic RAG turns retrieval into a Conversation.

Self-Querying improves the "Input" to the database.
Re-Ranking improves the "Quality" of the context.
Multi-Hop Retrieval connects disparate facts.
Gatekeeping prevents "Hallucination by Fragment."

Exercises

Query Generation: Write 3 different agent "Rewrite" prompts for the user query: "What's the deal with that new project?". Which one is most likely to find the correct data?
Multi-Hop Logic: You need to calculate the "Profit per Employee." Describe the two separate RAG "Hops" you would need to perform.
Conflict Resolution: What if Search Result A says the budget is $10k and Search Result B says it's $15k? Write a system instruction for an agent on how to handle Contradictory Results.

In the next lesson, we will look at Chunking Strategies, exploring how to prepare your data so the agent can digest it easily.