
Re-Ranking Strategies
Master the different types of re-rankers, from Cohere to BGE, and learn where to place them in your RAG pipeline.
Re-Ranking Strategies
Once you have your initial list of "potential" documents, you need to re-rank them to ensure the absolute best match is at position #1.
Type 1: Cross-Encoders (The Gold Standard)
Unlike "Bi-Encoders" (which embed questions and docs separately), a Cross-Encoder takes both the query and the document simultaneously and calculates a relevance score.
- Pro: Extremely accurate. It can see the direct interaction between query words and document words.
- Con: Slow. You can't use this to search 1 million docs, but it's perfect for re-ranking the top 50.
Type 2: LLM-as-a-Reranker
You can ask an LLM (Claude or GPT) to act as a judge.
prompt = f"""
Query: {user_query}
List of Documents:
{docs}
Rank these documents from 1 to 10 based on how well they answer the query.
"""
- Pro: Incredible reasoning; understands intent better than any other model.
- Con: Expensive and adds significant latency.
Type 3: Hosted Rerankers (e.g., Cohere)
Cohere Rerank is the industry standard for hosted re-ranking. It is easy to use and extremely effective.
import cohere
co = cohere.Client('your_api_key')
results = co.rerank(
query=query,
documents=docs,
top_n=3,
model='rerank-english-v3.0'
)
The "RAG Sandwich" Pipeline
- Search: Find 100 docs with Chroma (Vector).
- Filter: Remove docs based on metadata (e.g., non-admin content).
- Re-rank: Score the remaining 50 with a Cross-Encoder.
- Top-K: Send the Top 5 to the LLM for the final answer.
Exercises
- Use a library like
sentence-transformersand the modelcross-encoder/ms-marco-MiniLM-L-6-v2. - Compare the ranking of 5 sentences before and after re-ranking.
- How does the "Cost" of re-ranking change as you increase the number of documents to be ranked?