Module 8 Lesson 3: Summary Memory
Dense context. How to use an LLM to periodically summarize a conversation to keep the memory footprint small.
ConversationSummaryMemory: The Executive Assistant
As a conversation grows, a "Transcript" (Module 8 Lesson 2) becomes too long. ConversationSummaryMemory solves this by using a Second LLM call to rewrite the conversation into a short paragraph.
1. The Summarization Loop
- User: Sends a message.
- AI: Responds.
- Background Task: An LLM looks at the history and update the summary.
- Example: "The user introduced himself as Sudeep and asked about steak recipes. The assistant provided a guide."
2. Using it in Python
from langchain.memory import ConversationSummaryMemory
# We need an LLM to perform the summarizing
memory = ConversationSummaryMemory(llm=model)
memory.save_context({"input": "I am traveling to Paris tomorrow."}, {"output": "How exciting!"})
memory.save_context({"input": "I need to know the weather there."}, {"output": "It will be 20 degrees."})
print(memory.load_memory_variables({}))
# Output: {'history': 'The human is traveling to Paris and asked about the weather, which is 20 degrees.'}
3. Visualizing Summary Compression
graph LR
M1[Message 1: 50 words] --> S[Summarizer LLM]
M2[Message 2: 50 words] --> S
M3[Message 3: 50 words] --> S
S --> Final[Summary: 20 words]
4. Pros and Cons
- Pros: Memory stays the same size regardless of how long you talk. Very cost-effective for 100+ message threads.
- Cons: You lose the exact wording. If the user says "Call me 'S-Man'", the summarizer might just write "The user asked for a nickname," losing the nickname itself.
5. Engineering Tip: Summary Buffer
For the best professional results, many developers use Summary Buffer Memory.
- It keeps the last 5 messages as a raw transcript (for precision).
- It summarizes everything before those 5 messages (for long-term context).
Key Takeaways
- Summary Memory uses an LLM to compress conversation history.
- It solves the Infinite Growth problem of Buffer memory.
- It trade precision for efficiency.
- It is perfect for Long-running agents and support bots.