
Retrieval and Ranking Layer
Search the vector database and rank results by relevance for optimal context assembly.
Retrieval and Ranking Layer
Finding the right documents is critical for RAG quality. This layer handles search and ranking.
Retrieval Pipeline
graph LR
A[User Query] --> B[Embed Query]
B --> C[Vector Search]
C --> D[Top-K Results]
D --> E[Re-Ranking]
E --> F[Metadata Filtering]
F --> G[Final Context]
Similarity Search
# Vector similarity search
query_embedding = embed("What is our return policy?")
results = collection.query(
query_embeddings=[query_embedding],
n_results=20, # Retrieve top 20
where={"type": "policy"} # Metadata filter
)
Re-Ranking Strategies
1. Cosine Similarity
- Fast, default method
- Based on vector distance
2. Cross-Encoder Models
- More accurate
- Slower (LLM-based scoring)
3. Hybrid Search
- Keyword + vector search
- Combine BM25 with embeddings
Ranking Example
# Simple re-ranker
def rerank(query, candidates):
scores = []
for doc in candidates:
# Combine multiple signals
vector_score = cosine_sim(query_emb, doc.embedding)
keyword_score = bm25_score(query, doc.text)
recency_score = time_decay(doc.date)
final_score = (
0.6 * vector_score +
0.3 * keyword_score +
0.1 * recency_score
)
scores.append((doc, final_score))
return sorted(scores, reverse=True)[:5]
Metadata Filtering
# Filter by attributes
results = collection.query(
query_embeddings=[query_emb],
where={
"$and": [
{"language": "en"},
{"date": {"$gte": "2025-01-01"}},
{"category": {"$in": ["policy", "faq"]}}
]
}
)
Next: Generation and verification.