Project: Building a Production-Ready RAG Document Bot

Project: Building a Production-Ready RAG Document Bot

Combine everything you've learned. Build a professional RAG application that ingests PDFs, indexes them in a persistent store, and provides grounded answers with citations.

Project: The Document Intelligence Bot

You have learned the theory of RAG, the architecture of vector search, and the power of AI frameworks. Now, it's time to build a real product.

In this final project of Module 10, we are going to build a Document Q&A Bot. This isn't just a simple script; it is a structured application that implements the "Best Practices" of professional AI engineering.


1. Project Specifications

Your application will:

  1. Ingest: Support multi-file ingestion from a documents/ directory.
  2. Chunk: Implement Semantic Chunking with overlap to preserve context.
  3. Persist: Use ChromaDB to save the vector index to disk.
  4. Answer: Use a retrieval chain to provide answers based on the local data.
  5. Cite: Show the user which page or file was used to generate the answer.

2. Setting Up the Project

We will use LangChain for orchestration and Chroma as our brain.

pip install langchain langchain-openai langchain-chroma pypdf chromadb

Project Structure:

/rag_bot
  /docs           <-- Put your PDFs here
  /db             <-- Vector storage
  main.py         <-- The RAG logic

3. The Professional RAG Implementation (main.py)

import os
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# 1. SETUP
os.environ["OPENAI_API_KEY"] = "your_key"

# 2. DATA INGESTION
def ingest_data():
    print("Loading documents...")
    loader = DirectoryLoader('./docs', glob="./*.pdf", loader_cls=PyPDFLoader)
    documents = loader.load()
    
    # Semantic splitting
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, 
        chunk_overlap=100,
        separators=["\n\n", "\n", ".", "!", "?", " "]
    )
    texts = text_splitter.split_documents(documents)
    
    # Create persistent Chroma DB
    print(f"Creating index for {len(texts)} chunks...")
    vectorstore = Chroma.from_documents(
        documents=texts,
        embedding=OpenAIEmbeddings(),
        persist_directory="./db"
    )
    return vectorstore

# 3. RAG CHAIN WITH CUSTOM PROMPT
def get_qa_chain(vectorstore):
    # Custom prompt to force grounding and citations
    template = """You are a professional corporate assistant. Use the following context to answer the question.
    If you don't know the answer, say that you don't know. DO NOT make up an answer.
    Always mention the 'source' filename at the end of your answer.

    Context: {context}

    Question: {question}
    Answer:"""
    
    QA_PROMPT = PromptTemplate(
        template=template, input_variables=["context", "question"]
    )

    llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
        return_source_documents=True,
        chain_type_kwargs={"prompt": QA_PROMPT}
    )
    return qa_chain

# 4. MAIN EXECUTION
if __name__ == "__main__":
    # Check if index exists, else create it
    if not os.path.exists("./db"):
        vs = ingest_data()
    else:
        vs = Chroma(persist_directory="./db", embedding_function=OpenAIEmbeddings())
        
    chain = get_qa_chain(vs)
    
    while True:
        query = input("\nAsk me anything about your documents (or 'exit'): ")
        if query.lower() == 'exit': break
        
        response = chain.invoke({"query": query})
        
        print("\n--- ANSWER ---")
        print(response["result"])
        
        print("\n--- SOURCES ---")
        for doc in response["source_documents"]:
            print(f"- {doc.metadata['source']} (Page: {doc.metadata.get('page', 'N/A')})")

4. Key Performance Tunings for Production

Temperature = 0

In the ChatOpenAI configuration, we set temperature=0. In RAG, we don't want the model to be "Creative." We want it to be a faithful reporter of the facts found in the vector database.

search_kwargs={"k": 3}

We only retrieve the top 3 chunks. Retrieving too many (e.g., k=20) can exceed the LLM's context window and increase costs without improving accuracy.

Recursive Character Splitter

We use this instead of a simple "Every 500 characters" split. It prioritizes keeping paragraphs and sentences together, which makes the context much easier for the LLM to "Read."


5. How to Test Your Bot

  1. The "Out of Bounds" Test: Ask a question that is NOT in the PDFs. The bot should say "I don't know."
  2. The "Citation" Test: Ask a specific question and verify that the source path and page number are correct.
  3. The "Ambiguous" Test: Ask a question that relates to two different documents to see how the bot synthesizes information.

6. Project Expansion: Advanced Features

  • Add Re-ranking: Integrate the CohereRerank module into your LangChain retriever (Module 10, Lesson 3).
  • Add a UI: Use Streamlit to create a web-based chat interface.
  • Support More Files: Add the CSVLoader or UnstructuredMarkdownLoader to your DirectoryLoader.

Summary and Module 10 Wrap-up

Congratulations! You have built a functional, professional AI application.

  • You mastered End-to-end RAG pipelines.
  • You saw the power of Context Grounding.
  • You implemented Source Citations for user trust.
  • You used LangChain to orchestrate complex data flows.

What's Next?

In Module 11: Evaluation and Testing, we learn how to "Grade" our bot. We will move from manual testing to automated benchmarks like RAGAS, ensuring that our AI system continues to provide high-quality answers as we add more data.


Final Exercise: Prompt Engineering

Change the template in the code above:

  1. Make the bot speak like a Legal Auditor.
  2. Make the bot speak like a Friendly Librarian.
  3. Force the bot to output its answer in JSON format.

Observe how the retrieved context remains the same, but the presentation changes based on your prompt.


Congratulations on completing Module 10! You are now a RAG expert.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn