
Multi-Query Retrieval
Overcome semantic ambiguity by generating and searching multiple variations of a user query.
Multi-Query Retrieval
User queries are often short, ambiguous, or poorly phrased. If the user asks "How do I fix this?", a vector search won't know what "this" is. Multi-Query Retrieval uses an LLM to expand a single query into multiple variations, casting a wider net for relevant documents.
The Strategy
- User Query: "Budget issues."
- LLM Rewrite: Generates 3-5 variations:
- "Recent financial reports on budget deficits."
- "Quarterly spending vs. revenue projections."
- "How to handle budget overruns."
- Execute All Queries: Search the vector DB for each variation.
- Aggregate: Combine the results (using Reciprocal Rank Fusion) and send the unique set to the final RAG generation step.
Why it Works
Semantic search is sensitive to phrasing. By asking the question in 5 different ways, you are more likely to find the exact "neighborhood" in the vector space where the answer lives.
Implementation with LangChain
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
retriever = MultiQueryRetriever.from_llm(
retriever=vector_db.as_retriever(),
llm=llm
)
# This will automatically generate queries and aggregate results
unique_docs = retriever.invoke("How do I setup a VPC?")
Benefits for Complex Topics
Multi-query is particularly powerful for:
- Technical Support: Where users describe symptoms differently.
- Legal Research: Where synonyms (e.g., "liable" vs "responsible") matter.
- Multimodal Search: Using various text descriptions to find a single complex image.
Exercises
- Manually write 3 different ways to ask "What is the capital of France?".
- Perform a vector search for each. Do they return the same top result?
- How many queries are "too many"? What is the impact on latency and cost?