Module 2 Lesson 3: Short-Term vs Long-Term Memory
·Agentic AI

Module 2 Lesson 3: Short-Term vs Long-Term Memory

The two speeds of learning. Understanding conversation buffers vs vector databases.

Agentic Memory: Short-Term vs. Long-Term

Just like a human, an AI agent needs different types of "Internal Storage." An agent that only has short-term memory is like a person who forgets you as soon as you walk out of the room. An agent that only has long-term memory is like a scholar who knows everything about history but can't follow a simple conversation.

1. Short-Term Memory (The Context Window)

This is the "Active Thought" space. It consists of the messages currently being sent to the LLM.

  • Capacity: Limited (e.g., 32k or 128k tokens).
  • Speed: Extremely fast (it's part of the prompt).
  • Duration: Transient. It is usually cleared once the specific task or session is over.

2. Long-Term Memory (The Vector Database)

This is the "External Brain." It consists of information stored in a database that the agent can "look up" when needed.

  • Capacity: Virtually infinite (gigabytes of documents).
  • Speed: Slower (requires an "Embedding" and a "Search").
  • Duration: Permanent. It persists for months or years across different users.

3. Comparing the Two

FeatureShort-Term (Context)Long-Term (Vector Store)
AnalogyWorking Memory / RAMLibrary / SSD
Logic"What are we talking about right now?""What did we discuss last week?"
MechanismAppending text to the promptRAG (Retrieval-Augmented Generation)
CostHigh (more tokens = more money/latency)Low (search is cheap)

4. The "Hybrid" Architecture

In a professional agent, we use both.

  1. Stage 1: Long-Term Retrieval. The agent searches its "Library" for relevant facts.
  2. Stage 2: Short-Term Loading. It "loads" those facts into its Context Window to reason about them.
graph TD
    User[User Question] --> Search[Search Long-Term Memory]
    Search --> Result[Relevant Fact Found]
    Result --> Context[Add to Short-Term Memory]
    Context --> LLM[Reasoning]
    LLM --> Answer[Final Response]

5. Code Example: Moving to Long-Term

When an agent "learns" something new, we don't just keep it in the prompt. We "Commit" it to the database.

# Short-term memory update
current_chat.append({"user": "My dog is named Rufus"})

# Long-term memory commit
vector_db.save(
    text="The user's dog is named Rufus",
    metadata={"user_id": 123, "category": "personal_facts"}
)

# Next time (even in a new chat), the agent can find this!

Key Takeaways

  • Short-Term Memory is active context held in the prompt.
  • Long-Term Memory is searchable data stored in a Vector DB.
  • Efficiency comes from retrieving only what you need from long-term and moving it into short-term.
  • The "Context Window" is the most expensive real estate in AI; don't waste it on low-relevance data.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn