Multi-Query Retrieval

User queries are often short, ambiguous, or poorly phrased. If the user asks "How do I fix this?", a vector search won't know what "this" is. Multi-Query Retrieval uses an LLM to expand a single query into multiple variations, casting a wider net for relevant documents.

The Strategy

User Query: "Budget issues."
LLM Rewrite: Generates 3-5 variations:
- "Recent financial reports on budget deficits."
- "Quarterly spending vs. revenue projections."
- "How to handle budget overruns."
Execute All Queries: Search the vector DB for each variation.
Aggregate: Combine the results (using Reciprocal Rank Fusion) and send the unique set to the final RAG generation step.

Why it Works

Semantic search is sensitive to phrasing. By asking the question in 5 different ways, you are more likely to find the exact "neighborhood" in the vector space where the answer lives.

Implementation with LangChain

from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
retriever = MultiQueryRetriever.from_llm(
    retriever=vector_db.as_retriever(),
    llm=llm
)

# This will automatically generate queries and aggregate results
unique_docs = retriever.invoke("How do I setup a VPC?")

Benefits for Complex Topics

Multi-query is particularly powerful for:

Technical Support: Where users describe symptoms differently.
Legal Research: Where synonyms (e.g., "liable" vs "responsible") matter.
Multimodal Search: Using various text descriptions to find a single complex image.

Exercises

Manually write 3 different ways to ask "What is the capital of France?".
Perform a vector search for each. Do they return the same top result?
How many queries are "too many"? What is the impact on latency and cost?