Querying and Retrieval

The "RAG Chain" logic:

def ask_gemini_with_rag(user_question):
    # 1. Retrieve
    results = collection.query(query_texts=[user_question], n_results=3)
    context_text = "\n\n".join(results['documents'][0])
    
    # 2. Augment Prompt
    prompt = f"""
    You are a helpful assistant. Answer the user question based ONLY on the context below.
    
    Context:
    {context_text}
    
    Question: {user_question}
    """
    
    # 3. Generate
    return model.generate_content(prompt).text

Retrieval Tuning

top_k (n_results): How many chunks to retrieve? usually 3-5 is good.
Re-ranking: For advanced apps, retrieve 50 chunks, then use a specialized "Re-ranker" model to pick the absolute best 5 to send to Gemini.

Summary

The prompt template is the glue. It forces the model to look at the retrieved text.

In the final lesson of this module, we ensure Validation.

Querying and Retrieval: The RAG Loop

Querying and Retrieval

Retrieval Tuning

Summary

Subscribe to our newsletter