Agentic Memory: Short-Term vs. Long-Term

Just like a human, an AI agent needs different types of "Internal Storage." An agent that only has short-term memory is like a person who forgets you as soon as you walk out of the room. An agent that only has long-term memory is like a scholar who knows everything about history but can't follow a simple conversation.

1. Short-Term Memory (The Context Window)

This is the "Active Thought" space. It consists of the messages currently being sent to the LLM.

Capacity: Limited (e.g., 32k or 128k tokens).
Speed: Extremely fast (it's part of the prompt).
Duration: Transient. It is usually cleared once the specific task or session is over.

2. Long-Term Memory (The Vector Database)

This is the "External Brain." It consists of information stored in a database that the agent can "look up" when needed.

Capacity: Virtually infinite (gigabytes of documents).
Speed: Slower (requires an "Embedding" and a "Search").
Duration: Permanent. It persists for months or years across different users.

3. Comparing the Two

Feature	Short-Term (Context)	Long-Term (Vector Store)
Analogy	Working Memory / RAM	Library / SSD
Logic	"What are we talking about right now?"	"What did we discuss last week?"
Mechanism	Appending text to the prompt	RAG (Retrieval-Augmented Generation)
Cost	High (more tokens = more money/latency)	Low (search is cheap)

4. The "Hybrid" Architecture

In a professional agent, we use both.

Stage 1: Long-Term Retrieval. The agent searches its "Library" for relevant facts.
Stage 2: Short-Term Loading. It "loads" those facts into its Context Window to reason about them.

graph TD
    User[User Question] --> Search[Search Long-Term Memory]
    Search --> Result[Relevant Fact Found]
    Result --> Context[Add to Short-Term Memory]
    Context --> LLM[Reasoning]
    LLM --> Answer[Final Response]

5. Code Example: Moving to Long-Term

When an agent "learns" something new, we don't just keep it in the prompt. We "Commit" it to the database.

# Short-term memory update
current_chat.append({"user": "My dog is named Rufus"})

# Long-term memory commit
vector_db.save(
    text="The user's dog is named Rufus",
    metadata={"user_id": 123, "category": "personal_facts"}
)

# Next time (even in a new chat), the agent can find this!

Key Takeaways

Short-Term Memory is active context held in the prompt.
Long-Term Memory is searchable data stored in a Vector DB.
Efficiency comes from retrieving only what you need from long-term and moving it into short-term.
The "Context Window" is the most expensive real estate in AI; don't waste it on low-relevance data.

Module 2 Lesson 3: Short-Term vs Long-Term Memory