
Why Initial Retrieval is Not Enough
Understand the limitations of raw vector search and why a second pass—Re-Ranking—is essential for production RAG.
Why Initial Retrieval is Not Enough
Raw vector search (Top-K retrieval) is fast and semantic, but it has a problem: it is often wrong about exact relevance. This is why "Re-Ranking" exists.
The Semantic Gap
Vector models (encoders) are designed to be fast and handle millions of documents. Because they have to be efficient, they often overlook fine details.
Example
Query: "Is feature X supported in version 2.0?"
- Top 1 Result: "How to install version 2.0." (High semantic overlap, but doesn't answer the question).
- Top 10 Result: "List of supported features in version 2.0." (Perfect answer, but missed by the first-pass model).
The Recall vs. Precision Balancing Act
- First-Pass (Retrieval): Prioritize Recall. Get 100 potential results fast using vector similarity.
- Second-Pass (Re-Ranking): Prioritize Precision. Use a much smarter (but slower) model to look at those 100 results and find the real winner.
Factors that Degrade Raw Retrieval
- Short Queries: "Login issues" is too vague for perfect vector matching.
- Polysemy: Words with multiple meanings (e.g., "Bank" as in a river bank vs. a financial bank).
- Domain-Specific Jargon: Common in medical or deep-tech fields where encoders weren't specifically trained.
The Human Factor
Users lose trust in RAG if the first few results are irrelevant. Re-ranking ensures the most relevant content is at the very top, even if the vector model didn't initially place it there.
Exercises
- Perform a vector search for a question in your documentation.
- Are the top 3 results actually the best answers, or just semantically related?
- How many documents would you be willing to send to a "Slow but Smart" Re-ranker? 10? 100? 1000?