Why Initial Retrieval is Not Enough

Why Initial Retrieval is Not Enough

Understand the limitations of raw vector search and why a second pass—Re-Ranking—is essential for production RAG.

Why Initial Retrieval is Not Enough

Raw vector search (Top-K retrieval) is fast and semantic, but it has a problem: it is often wrong about exact relevance. This is why "Re-Ranking" exists.

The Semantic Gap

Vector models (encoders) are designed to be fast and handle millions of documents. Because they have to be efficient, they often overlook fine details.

Example

Query: "Is feature X supported in version 2.0?"

  • Top 1 Result: "How to install version 2.0." (High semantic overlap, but doesn't answer the question).
  • Top 10 Result: "List of supported features in version 2.0." (Perfect answer, but missed by the first-pass model).

The Recall vs. Precision Balancing Act

  1. First-Pass (Retrieval): Prioritize Recall. Get 100 potential results fast using vector similarity.
  2. Second-Pass (Re-Ranking): Prioritize Precision. Use a much smarter (but slower) model to look at those 100 results and find the real winner.

Factors that Degrade Raw Retrieval

  • Short Queries: "Login issues" is too vague for perfect vector matching.
  • Polysemy: Words with multiple meanings (e.g., "Bank" as in a river bank vs. a financial bank).
  • Domain-Specific Jargon: Common in medical or deep-tech fields where encoders weren't specifically trained.

The Human Factor

Users lose trust in RAG if the first few results are irrelevant. Re-ranking ensures the most relevant content is at the very top, even if the vector model didn't initially place it there.

Exercises

  1. Perform a vector search for a question in your documentation.
  2. Are the top 3 results actually the best answers, or just semantically related?
  3. How many documents would you be willing to send to a "Slow but Smart" Re-ranker? 10? 100? 1000?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn