
Project: Building a Production-Ready RAG Document Bot
Combine everything you've learned. Build a professional RAG application that ingests PDFs, indexes them in a persistent store, and provides grounded answers with citations.
Project: The Document Intelligence Bot
You have learned the theory of RAG, the architecture of vector search, and the power of AI frameworks. Now, it's time to build a real product.
In this final project of Module 10, we are going to build a Document Q&A Bot. This isn't just a simple script; it is a structured application that implements the "Best Practices" of professional AI engineering.
1. Project Specifications
Your application will:
- Ingest: Support multi-file ingestion from a
documents/directory. - Chunk: Implement Semantic Chunking with overlap to preserve context.
- Persist: Use ChromaDB to save the vector index to disk.
- Answer: Use a retrieval chain to provide answers based on the local data.
- Cite: Show the user which page or file was used to generate the answer.
2. Setting Up the Project
We will use LangChain for orchestration and Chroma as our brain.
pip install langchain langchain-openai langchain-chroma pypdf chromadb
Project Structure:
/rag_bot
/docs <-- Put your PDFs here
/db <-- Vector storage
main.py <-- The RAG logic
3. The Professional RAG Implementation (main.py)
import os
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
# 1. SETUP
os.environ["OPENAI_API_KEY"] = "your_key"
# 2. DATA INGESTION
def ingest_data():
print("Loading documents...")
loader = DirectoryLoader('./docs', glob="./*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()
# Semantic splitting
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100,
separators=["\n\n", "\n", ".", "!", "?", " "]
)
texts = text_splitter.split_documents(documents)
# Create persistent Chroma DB
print(f"Creating index for {len(texts)} chunks...")
vectorstore = Chroma.from_documents(
documents=texts,
embedding=OpenAIEmbeddings(),
persist_directory="./db"
)
return vectorstore
# 3. RAG CHAIN WITH CUSTOM PROMPT
def get_qa_chain(vectorstore):
# Custom prompt to force grounding and citations
template = """You are a professional corporate assistant. Use the following context to answer the question.
If you don't know the answer, say that you don't know. DO NOT make up an answer.
Always mention the 'source' filename at the end of your answer.
Context: {context}
Question: {question}
Answer:"""
QA_PROMPT = PromptTemplate(
template=template, input_variables=["context", "question"]
)
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True,
chain_type_kwargs={"prompt": QA_PROMPT}
)
return qa_chain
# 4. MAIN EXECUTION
if __name__ == "__main__":
# Check if index exists, else create it
if not os.path.exists("./db"):
vs = ingest_data()
else:
vs = Chroma(persist_directory="./db", embedding_function=OpenAIEmbeddings())
chain = get_qa_chain(vs)
while True:
query = input("\nAsk me anything about your documents (or 'exit'): ")
if query.lower() == 'exit': break
response = chain.invoke({"query": query})
print("\n--- ANSWER ---")
print(response["result"])
print("\n--- SOURCES ---")
for doc in response["source_documents"]:
print(f"- {doc.metadata['source']} (Page: {doc.metadata.get('page', 'N/A')})")
4. Key Performance Tunings for Production
Temperature = 0
In the ChatOpenAI configuration, we set temperature=0. In RAG, we don't want the model to be "Creative." We want it to be a faithful reporter of the facts found in the vector database.
search_kwargs={"k": 3}
We only retrieve the top 3 chunks. Retrieving too many (e.g., k=20) can exceed the LLM's context window and increase costs without improving accuracy.
Recursive Character Splitter
We use this instead of a simple "Every 500 characters" split. It prioritizes keeping paragraphs and sentences together, which makes the context much easier for the LLM to "Read."
5. How to Test Your Bot
- The "Out of Bounds" Test: Ask a question that is NOT in the PDFs. The bot should say "I don't know."
- The "Citation" Test: Ask a specific question and verify that the source path and page number are correct.
- The "Ambiguous" Test: Ask a question that relates to two different documents to see how the bot synthesizes information.
6. Project Expansion: Advanced Features
- Add Re-ranking: Integrate the
CohereRerankmodule into your LangChain retriever (Module 10, Lesson 3). - Add a UI: Use
Streamlitto create a web-based chat interface. - Support More Files: Add the
CSVLoaderorUnstructuredMarkdownLoaderto your DirectoryLoader.
Summary and Module 10 Wrap-up
Congratulations! You have built a functional, professional AI application.
- You mastered End-to-end RAG pipelines.
- You saw the power of Context Grounding.
- You implemented Source Citations for user trust.
- You used LangChain to orchestrate complex data flows.
What's Next?
In Module 11: Evaluation and Testing, we learn how to "Grade" our bot. We will move from manual testing to automated benchmarks like RAGAS, ensuring that our AI system continues to provide high-quality answers as we add more data.
Final Exercise: Prompt Engineering
Change the template in the code above:
- Make the bot speak like a Legal Auditor.
- Make the bot speak like a Friendly Librarian.
- Force the bot to output its answer in JSON format.
Observe how the retrieved context remains the same, but the presentation changes based on your prompt.