Module 2 Lesson 3: Short-Term vs Long-Term Memory
The two speeds of learning. Understanding conversation buffers vs vector databases.
Agentic Memory: Short-Term vs. Long-Term
Just like a human, an AI agent needs different types of "Internal Storage." An agent that only has short-term memory is like a person who forgets you as soon as you walk out of the room. An agent that only has long-term memory is like a scholar who knows everything about history but can't follow a simple conversation.
1. Short-Term Memory (The Context Window)
This is the "Active Thought" space. It consists of the messages currently being sent to the LLM.
- Capacity: Limited (e.g., 32k or 128k tokens).
- Speed: Extremely fast (it's part of the prompt).
- Duration: Transient. It is usually cleared once the specific task or session is over.
2. Long-Term Memory (The Vector Database)
This is the "External Brain." It consists of information stored in a database that the agent can "look up" when needed.
- Capacity: Virtually infinite (gigabytes of documents).
- Speed: Slower (requires an "Embedding" and a "Search").
- Duration: Permanent. It persists for months or years across different users.
3. Comparing the Two
| Feature | Short-Term (Context) | Long-Term (Vector Store) |
|---|---|---|
| Analogy | Working Memory / RAM | Library / SSD |
| Logic | "What are we talking about right now?" | "What did we discuss last week?" |
| Mechanism | Appending text to the prompt | RAG (Retrieval-Augmented Generation) |
| Cost | High (more tokens = more money/latency) | Low (search is cheap) |
4. The "Hybrid" Architecture
In a professional agent, we use both.
- Stage 1: Long-Term Retrieval. The agent searches its "Library" for relevant facts.
- Stage 2: Short-Term Loading. It "loads" those facts into its Context Window to reason about them.
graph TD
User[User Question] --> Search[Search Long-Term Memory]
Search --> Result[Relevant Fact Found]
Result --> Context[Add to Short-Term Memory]
Context --> LLM[Reasoning]
LLM --> Answer[Final Response]
5. Code Example: Moving to Long-Term
When an agent "learns" something new, we don't just keep it in the prompt. We "Commit" it to the database.
# Short-term memory update
current_chat.append({"user": "My dog is named Rufus"})
# Long-term memory commit
vector_db.save(
text="The user's dog is named Rufus",
metadata={"user_id": 123, "category": "personal_facts"}
)
# Next time (even in a new chat), the agent can find this!
Key Takeaways
- Short-Term Memory is active context held in the prompt.
- Long-Term Memory is searchable data stored in a Vector DB.
- Efficiency comes from retrieving only what you need from long-term and moving it into short-term.
- The "Context Window" is the most expensive real estate in AI; don't waste it on low-relevance data.