Self-Reranking and Query Expansion

In a production RAG system, your search tool might return 20 "Chunks" of information. Usually, only 2 or 3 of those are actually useful for answering the user's question. If you send all 20 chunks to the LLM, you are wasting tokens and increasing the risk of "Context Pollution."

In this lesson, we will learn two advanced patterns: Query Expansion (finding more stuff) and Self-Reranking (throwing away the junk).

1. Query Expansion (The Multi-Query Pattern)

A user's query is often "Semantically Thin"—it doesn't have enough specific keywords to trigger a good vector match.

The Expansion Loop

User asks: "How do I fix the error with the database connection?"
Expansion Node: An LLM generates 3 variations of the query:
- "FastAPI PostgreSQL connection timeout fix"
- "database connection pool exhausted error"
- "psycopg2.OperationalError: could not connect to server"
Execution: The agent runs all 3 searches in parallel.
Result: You now have a much broader net of information.

2. Self-Reranking (Filtering for Precision)

Vector databases rank results by "Similarity" (math), but your agent needs "Relevance" (meaning).

The "Reranker" Node

Once the search tool returns 20 results, we pass them through a specialized node (often using a Cross-Encoder model or a cheap LLM like GPT-4o-mini).

The Task: "Look at these 20 snippets. Assign a score from 1-10 on how likely they are to help answer the question: [User Query]. Delete everything with a score below 7."

3. Why Reranking is Mandatory for Accuracy

The model answering the question (The Writer) performs significantly better if it is only given high-density context.

Without Reranking: 80% noise, 20% signal. The model gets distracted.
With Reranking: 10% noise, 90% signal. The model is precise and confident.

4. Contextual Compression

A search chunk might be 500 words long, but only the 3rd sentence is relevant. Contextual Compression is the process where a "Compressor Node" reads the 500 words and returns only the 10 most relevant words to the main agent.

Advantage: You can fit 5x more "relevant facts" into the same token window.

5. Implementations: The `RAG-Fusion` Pattern

RAG-Fusion is the combination of Multi-Query and Reciprocal Rank Fusion (RRF).

Generate multiple queries.
Search them all.
Compute a score based on how high a document appears across all the searches.
If a document appears in the top 3 results for all 3 queries, it is almost certainly the "Golden Information."

6. Implementation Strategy: LangGraph Flow

graph LR
    Input -->|Query Expansion| Q1,Q2,Q3
    Q1,Q2,Q3 --> Search[Search DB]
    Search -->|20 Results| Rerank[Reranker Node]
    Rerank -->|3 Best Results| Final[Writer Agent]

Summary and Mental Model

Think of Query Expansion like Asking 3 friends for a recommendation. You get different perspectives.

Think of Self-Reranking like Reading the back of the books before deciding which ones to check out. You don't read every word; you determine if the book is "in the right ballpark."

Precision in retrieval is the foundation of factual agency.

Exercise: Reranking Design

The Scoring Logic: You have a search snippet about "Apple (Company)" and the user asked about "Apple (Fruit)".
- How would a "Semantic Reranker" know the difference?
- Draft a 1-sentence prompt for the Reranker to handle this specific ambiguity.
Efficiency: Why is it cheaper to use a Small Model (like Llama 3 8B) for Reranking rather than GPT-4o?
The Threshold: If your Reranker node deletes ALL the results (Score < 7 for everyone), what should the graph do next?
- A) Give up.
- B) Go back to the user and ask for clarification.
- C) Try a completely different Search Tool (like Wikipedia). Ready to explore different search types? Next lesson: Vector vs Graph vs Hybrid Search.

The Quality Filter: Self-Reranking and Query Expansion

Self-Reranking and Query Expansion

1. Query Expansion (The Multi-Query Pattern)

The Expansion Loop

2. Self-Reranking (Filtering for Precision)

The "Reranker" Node

3. Why Reranking is Mandatory for Accuracy

4. Contextual Compression

5. Implementations: The `RAG-Fusion` Pattern

6. Implementation Strategy: LangGraph Flow

Summary and Mental Model

Exercise: Reranking Design

Subscribe to our newsletter

Self-Reranking and Query Expansion

1. Query Expansion (The Multi-Query Pattern)

The Expansion Loop

2. Self-Reranking (Filtering for Precision)

The "Reranker" Node

3. Why Reranking is Mandatory for Accuracy

4. Contextual Compression

5. Implementations: The RAG-Fusion Pattern

6. Implementation Strategy: LangGraph Flow

Summary and Mental Model

Exercise: Reranking Design

Subscribe to our newsletter

5. Implementations: The `RAG-Fusion` Pattern