
Querying and Retrieval: The RAG Loop
Bring it all together. Build the full loop: Query -> Embed -> Search Vector DB -> Construct Prompt -> Generate Answer.
Querying and Retrieval
The "RAG Chain" logic:
def ask_gemini_with_rag(user_question):
# 1. Retrieve
results = collection.query(query_texts=[user_question], n_results=3)
context_text = "\n\n".join(results['documents'][0])
# 2. Augment Prompt
prompt = f"""
You are a helpful assistant. Answer the user question based ONLY on the context below.
Context:
{context_text}
Question: {user_question}
"""
# 3. Generate
return model.generate_content(prompt).text
Retrieval Tuning
top_k(n_results): How many chunks to retrieve? usually 3-5 is good.- Re-ranking: For advanced apps, retrieve 50 chunks, then use a specialized "Re-ranker" model to pick the absolute best 5 to send to Gemini.
Summary
The prompt template is the glue. It forces the model to look at the retrieved text.
In the final lesson of this module, we ensure Validation.