Querying and Retrieval: The RAG Loop

Querying and Retrieval: The RAG Loop

Bring it all together. Build the full loop: Query -> Embed -> Search Vector DB -> Construct Prompt -> Generate Answer.

Querying and Retrieval

The "RAG Chain" logic:

def ask_gemini_with_rag(user_question):
    # 1. Retrieve
    results = collection.query(query_texts=[user_question], n_results=3)
    context_text = "\n\n".join(results['documents'][0])
    
    # 2. Augment Prompt
    prompt = f"""
    You are a helpful assistant. Answer the user question based ONLY on the context below.
    
    Context:
    {context_text}
    
    Question: {user_question}
    """
    
    # 3. Generate
    return model.generate_content(prompt).text

Retrieval Tuning

  • top_k (n_results): How many chunks to retrieve? usually 3-5 is good.
  • Re-ranking: For advanced apps, retrieve 50 chunks, then use a specialized "Re-ranker" model to pick the absolute best 5 to send to Gemini.

Summary

The prompt template is the glue. It forces the model to look at the retrieved text.

In the final lesson of this module, we ensure Validation.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn