Retrieval and Ranking Layer

Retrieval and Ranking Layer

Search the vector database and rank results by relevance for optimal context assembly.

Retrieval and Ranking Layer

Finding the right documents is critical for RAG quality. This layer handles search and ranking.

Retrieval Pipeline

graph LR
    A[User Query] --> B[Embed Query]
    B --> C[Vector Search]
    C --> D[Top-K Results]
    D --> E[Re-Ranking]
    E --> F[Metadata Filtering]
    F --> G[Final Context]

Similarity Search

# Vector similarity search
query_embedding = embed("What is our return policy?")

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=20,  # Retrieve top 20
    where={"type": "policy"}  # Metadata filter
)

Re-Ranking Strategies

1. Cosine Similarity

  • Fast, default method
  • Based on vector distance

2. Cross-Encoder Models

  • More accurate
  • Slower (LLM-based scoring)

3. Hybrid Search

  • Keyword + vector search
  • Combine BM25 with embeddings

Ranking Example

# Simple re-ranker
def rerank(query, candidates):
    scores = []
    for doc in candidates:
        # Combine multiple signals
        vector_score = cosine_sim(query_emb, doc.embedding)
        keyword_score = bm25_score(query, doc.text)
        recency_score = time_decay(doc.date)
        
        final_score = (
            0.6 * vector_score +
            0.3 * keyword_score +
            0.1 * recency_score
        )
        scores.append((doc, final_score))
    
    return sorted(scores, reverse=True)[:5]

Metadata Filtering

# Filter by attributes
results = collection.query(
    query_embeddings=[query_emb],
    where={
        "$and": [
            {"language": "en"},
            {"date": {"$gte": "2025-01-01"}},
            {"category": {"$in": ["policy", "faq"]}}
        ]
    }
)

Next: Generation and verification.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn