Re-Ranking Strategies

Once you have your initial list of "potential" documents, you need to re-rank them to ensure the absolute best match is at position #1.

Type 1: Cross-Encoders (The Gold Standard)

Unlike "Bi-Encoders" (which embed questions and docs separately), a Cross-Encoder takes both the query and the document simultaneously and calculates a relevance score.

Pro: Extremely accurate. It can see the direct interaction between query words and document words.
Con: Slow. You can't use this to search 1 million docs, but it's perfect for re-ranking the top 50.

Type 2: LLM-as-a-Reranker

You can ask an LLM (Claude or GPT) to act as a judge.

prompt = f"""
Query: {user_query}
List of Documents:
{docs}

Rank these documents from 1 to 10 based on how well they answer the query.
"""

Pro: Incredible reasoning; understands intent better than any other model.
Con: Expensive and adds significant latency.

Type 3: Hosted Rerankers (e.g., Cohere)

Cohere Rerank is the industry standard for hosted re-ranking. It is easy to use and extremely effective.

import cohere
co = cohere.Client('your_api_key')

results = co.rerank(
    query=query,
    documents=docs,
    top_n=3,
    model='rerank-english-v3.0'
)

The "RAG Sandwich" Pipeline

Search: Find 100 docs with Chroma (Vector).
Filter: Remove docs based on metadata (e.g., non-admin content).
Re-rank: Score the remaining 50 with a Cross-Encoder.
Top-K: Send the Top 5 to the LLM for the final answer.

Exercises

Use a library like sentence-transformers and the model cross-encoder/ms-marco-MiniLM-L-6-v2.
Compare the ranking of 5 sentences before and after re-ranking.
How does the "Cost" of re-ranking change as you increase the number of documents to be ranked?