Module 6 Lesson 4: Similarity Search and k-Values
Fine-Tuning Retrieval. Learning how to control how many results (k) your vector store returns and what 'Score' means.
Precision Retrieval: Controlling 'k'
When you ask a Vector Store a question, it doesn't just return "the" answer. It returns a list of the most similar chunks. You have to decide how many chunks you want. This number is called k.
1. What is 'k'?
k is the Count of chunks retrieved.
- k=1: You get the single most relevant chunk. (Very specific, but risk of missing context).
- k=10: You get 10 chunks. (Broad, but might contain "junk" that confuses the model).
2. Similarity Scores
Most vector stores also return a Score (Distance).
- A score of 0.99 means the text is almost a perfect match for the question.
- A score of 0.70 means the text is "Somewhat related." You can use this to filter out results: "If the score is below 0.8, ignore it—it's probably a mistake."
3. Code Example: Controlling k
# Search for exactly 3 results
results = db.similarity_search("Tell me about the sun", k=3)
# Search with scores
results_with_scores = db.similarity_search_with_score("Tell me about the sun")
for doc, score in results_with_scores:
print(f"Content: {doc.page_content} | Score: {score}")
4. Visualizing Top-k Retrieval
graph TD
Q[Query Vector] --> Top[Search Processor]
Top --> M1[Rank 1: 0.98]
Top --> M2[Rank 2: 0.95]
Top --> M3[Rank 3: 0.81]
Top --> M4[Rank 4: 0.40]
Sub[Return k=3]
M1 --> Sub
M2 --> Sub
M3 --> Sub
M4 --> X[Discarded]
5. Engineering Tip: Maximum Marginal Relevance (MMR)
Sometimes, the top 3 results are basically the same thing said 3 different ways. This is a waste of your context window.
- MMR is a specialized search method that picks the top result, and then picks the next results based on how different they are from the first one. This gives your agent a more "Diverse" perspective.
Key Takeaways
- k determines the volume of data retrieved.
- Score determines the quality of the match.
- k=4 is the industry standard default for most RAG systems.
- MMR search reduces redundancy in retrieved context.