Hybrid Search: Keyword + Vector

Hybrid Search: Keyword + Vector

Combine the semantic power of vector search with the keyword precision of traditional BM25 search.

Hybrid Search: Keyword + Vector

Hybrid search is the industry standard for production RAG. It combines Vector Search (semantic) with Keyword Search (exact matching) to give you the best of both worlds.

Why Hybrid?

  • Vector Search excels at: "What is the general topic here?"
  • Keyword Search (like BM25) excels at: "Find the exact part number GTX-9080."

If a user searches for GTX-9080, a vector search might return a paragraph about generic "graphics cards." A hybrid search will find the exact technical spec match.

How it Works: Reciprocal Rank Fusion (RRF)

We can't just "add" a vector score (0.92) to a BM25 score (14.5). We need a way to combine them. RRF is a popular algorithm that ranks items based on their position in both result lists.

# Conceptual Hybrid Search Logic
vector_results = vector_search(query) # Ranked 1, 2, 3...
keyword_results = keyword_search(query) # Ranked 1, 2, 3...

# RRF calculates a combined score for each document

Implementing Hybrid Retrieval with LangChain

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_chroma import Chroma

# 1. Setup Vector Search
vectorstore = Chroma(...)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# 2. Setup Keyword Search
bm25_retriever = BM25Retriever.from_texts(all_docs)
bm25_retriever.k = 5

# 3. Ensemble (Hybrid)
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5]
)

Tuning the Weights

You can adjust the weight of Vector vs. Keyword:

  • Technical/Legal Docs: Weight Keyword higher (e.g., 0.7) for precision.
  • Creative/Daily Chat: Weight Vector higher (e.g., 0.7) for semantic flexibility.

Exercises

  1. Run an experiment: search for a specific product ID (e.g., SKU-999) using pure vector search. Does it find it in the top 3?
  2. Compare the result with a hybrid search.
  3. Why does hybrid search improve "Trust" in a RAG system?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn