ConversationSummaryMemory: The Executive Assistant

As a conversation grows, a "Transcript" (Module 8 Lesson 2) becomes too long. ConversationSummaryMemory solves this by using a Second LLM call to rewrite the conversation into a short paragraph.

1. The Summarization Loop

User: Sends a message.
AI: Responds.
Background Task: An LLM looks at the history and update the summary.
- Example: "The user introduced himself as Sudeep and asked about steak recipes. The assistant provided a guide."

2. Using it in Python

from langchain.memory import ConversationSummaryMemory

# We need an LLM to perform the summarizing
memory = ConversationSummaryMemory(llm=model)

memory.save_context({"input": "I am traveling to Paris tomorrow."}, {"output": "How exciting!"})
memory.save_context({"input": "I need to know the weather there."}, {"output": "It will be 20 degrees."})

print(memory.load_memory_variables({}))
# Output: {'history': 'The human is traveling to Paris and asked about the weather, which is 20 degrees.'}

3. Visualizing Summary Compression

graph LR
    M1[Message 1: 50 words] --> S[Summarizer LLM]
    M2[Message 2: 50 words] --> S
    M3[Message 3: 50 words] --> S
    S --> Final[Summary: 20 words]

4. Pros and Cons

Pros: Memory stays the same size regardless of how long you talk. Very cost-effective for 100+ message threads.
Cons: You lose the exact wording. If the user says "Call me 'S-Man'", the summarizer might just write "The user asked for a nickname," losing the nickname itself.

5. Engineering Tip: Summary Buffer

For the best professional results, many developers use Summary Buffer Memory.

It keeps the last 5 messages as a raw transcript (for precision).
It summarizes everything before those 5 messages (for long-term context).

Key Takeaways

Summary Memory uses an LLM to compress conversation history.
It solves the Infinite Growth problem of Buffer memory.
It trade precision for efficiency.
It is perfect for Long-running agents and support bots.

Module 8 Lesson 3: Summary Memory