Precision Retrieval: Controlling 'k'

When you ask a Vector Store a question, it doesn't just return "the" answer. It returns a list of the most similar chunks. You have to decide how many chunks you want. This number is called k.

1. What is 'k'?

k is the Count of chunks retrieved.

k=1: You get the single most relevant chunk. (Very specific, but risk of missing context).
k=10: You get 10 chunks. (Broad, but might contain "junk" that confuses the model).

2. Similarity Scores

Most vector stores also return a Score (Distance).

A score of 0.99 means the text is almost a perfect match for the question.
A score of 0.70 means the text is "Somewhat related." You can use this to filter out results: "If the score is below 0.8, ignore it—it's probably a mistake."

3. Code Example: Controlling k

# Search for exactly 3 results
results = db.similarity_search("Tell me about the sun", k=3)

# Search with scores
results_with_scores = db.similarity_search_with_score("Tell me about the sun")

for doc, score in results_with_scores:
    print(f"Content: {doc.page_content} | Score: {score}")

4. Visualizing Top-k Retrieval

graph TD
    Q[Query Vector] --> Top[Search Processor]
    Top --> M1[Rank 1: 0.98]
    Top --> M2[Rank 2: 0.95]
    Top --> M3[Rank 3: 0.81]
    Top --> M4[Rank 4: 0.40]
    
    Sub[Return k=3]
    M1 --> Sub
    M2 --> Sub
    M3 --> Sub
    M4 --> X[Discarded]

5. Engineering Tip: Maximum Marginal Relevance (MMR)

Sometimes, the top 3 results are basically the same thing said 3 different ways. This is a waste of your context window.

MMR is a specialized search method that picks the top result, and then picks the next results based on how different they are from the first one. This gives your agent a more "Diverse" perspective.

Key Takeaways

k determines the volume of data retrieved.
Score determines the quality of the match.
k=4 is the industry standard default for most RAG systems.
MMR search reduces redundancy in retrieved context.

Module 6 Lesson 4: Similarity Search and k-Values