Why Initial Retrieval is Not Enough

Raw vector search (Top-K retrieval) is fast and semantic, but it has a problem: it is often wrong about exact relevance. This is why "Re-Ranking" exists.

The Semantic Gap

Vector models (encoders) are designed to be fast and handle millions of documents. Because they have to be efficient, they often overlook fine details.

Example

Query: "Is feature X supported in version 2.0?"

Top 1 Result: "How to install version 2.0." (High semantic overlap, but doesn't answer the question).
Top 10 Result: "List of supported features in version 2.0." (Perfect answer, but missed by the first-pass model).

The Recall vs. Precision Balancing Act

First-Pass (Retrieval): Prioritize Recall. Get 100 potential results fast using vector similarity.
Second-Pass (Re-Ranking): Prioritize Precision. Use a much smarter (but slower) model to look at those 100 results and find the real winner.

Factors that Degrade Raw Retrieval

Short Queries: "Login issues" is too vague for perfect vector matching.
Polysemy: Words with multiple meanings (e.g., "Bank" as in a river bank vs. a financial bank).
Domain-Specific Jargon: Common in medical or deep-tech fields where encoders weren't specifically trained.

The Human Factor

Users lose trust in RAG if the first few results are irrelevant. Re-ranking ensures the most relevant content is at the very top, even if the vector model didn't initially place it there.

Exercises

Perform a vector search for a question in your documentation.
Are the top 3 results actually the best answers, or just semantically related?
How many documents would you be willing to send to a "Slow but Smart" Re-ranker? 10? 100? 1000?