
Capstone: Implementing Persistent Memory
Build the memory layer for your capstone project. Learn how to use external storage to keep your agent's context 'Thin' and efficient.
Capstone: Implementing Persistent Memory
A "Budget-First" agent cannot afford to have a "Long-Term Memory" inside its prompt. If the agent does 20 searches, and each search result is 1,000 tokens, the 21st turn will cost 21,000 input tokens. This would kill our $0.10 budget instantly.
In this project step, we will build an External Semantic Memory. The agent will "Offload" search results to a local database and "Retrieve" only the snippets it needs.
1. The Memory Architecture
We will use ChromaDB or a simple JSON-based Vector Store for this project.
- Extraction Phase: Agent searches -> Specialist (Tier 1) extracts facts.
- Persistence Phase: Specialist saves facts to the Database. (Prompt Cleared).
- Synthesis Phase: Agent queries the Database for facts. (Context remains "Thin").
2. Implementation: The Memory Controller
Python Code: Saving and Recalling Facts
import chromadb
class AgentMemory:
def __init__(self):
self.client = chromadb.Client()
self.collection = self.client.create_collection("research_facts")
def remember(self, fact_text, source_url):
# We store the fact and the source
self.collection.add(
documents=[fact_text],
metadatas=[{"source": source_url}],
ids=[generate_id()]
)
def search(self, query):
# The agent 'Calls' this tool to find facts in its own past
results = self.collection.query(query_texts=[query], n_results=3)
return results['documents']
# In the Agent Loop
memory = AgentMemory()
memory.remember("Quantum computers use qubits.", "wikipedia.org")
3. The "Context Purge" Logic
After every successful research turn, we will physically Delete the search_result from the agent's message history.
Wait, won't the agent forget? No, because we've already Extracted the facts into our Database. The agent's "Working Memory" (The Prompt) stays at ~500 tokens, even if we've researched 50,000 tokens of data.
4. Why this Saves Dollars
- Without Memory: 20 turns × 5,000 cumulative tokens = 100,000 tokens ($1.50).
- With Memory: 20 turns × 500 fixed tokens = 10,000 tokens ($0.15).
- Project Goal Check: We are well on our way to the $0.10 target.
5. Next Steps
Now that we have Intelligence (Router) and Memory (Database), we need to optimize for Power (Final Synthesis).
In the next lesson, we will implement the Final Synthesis Node, which uses our Tier 3 model to pull everything together in a single, high-density pass.
Exercise: The Memory Test
- Predict the total token count of a 10-turn conversation where each turn adds a 500-word Wikipedia snippet.
- Calculate the cost on GPT-4o.
- Now, assume you implement the "remember" and "search" pattern, and the prompt stays at a constant 400 tokens.
- Calculate the new cost.
- (Result: It is usually 5x to 10x cheaper).