
History Serialization: Compact Memory Formats
Learn how to turn long chat histories into ultra-compact byte-strings. Master the art of 'Semantic Minification' for token-efficient history.
History Serialization: Compact Memory Formats
In Module 11.2, we learned to offload memory to a database. But when we need to bring that memory back into the prompt, how do we format it?
Most developers use a standard JSON array of messages. As we learned in Module 4.4, JSON is "Token-Heavy." If you restore 20 turns of history in JSON, you are paying for hundreds of {, "role":, and [ tokens.
In this lesson, we master Semantic Minification for history serialization. We’ll learn how to turn a database row into a high-density "Memory String" that provides the most context for the fewest tokens.
1. The Serialization Tax
Standard JSON (65 Tokens):
[
{"role": "user", "content": "How's the weather?"},
{"role": "assistant", "content": "It's sunny in London today."}
]
Compact Shorthand (25 Tokens):
U: How's weather? | A: Sunny, London.
Savings: 60%. In a long-running agent thread, this "Shorthand" allows you to fit 3x more history into the SAME token budget.
2. Technique: Grammatical Compression
When serializing history for an LLM, you are not writing for a human. You are move high-probability "Predictor Keys."
- Replace "The apple is red" with "Apple: Red."
- Replace "I have successfully finished the task" with "Task: Done."
The "Prompt-Only" Language
LLMs understand "Telegram-style" English perfectly. Use this to your advantage when building your "Context Restorer" service.
3. Implementation: The History Compressor (Python)
Python Code: Serializing for Density
def serialize_history_compact(messages: list):
"""
Turns a list of message objects into a
token-dense single string.
"""
compact_lines = []
for msg in messages:
role_marker = "U" if msg['role'] == "user" else "A"
# We strip common filler words using a simple regex or list
clean_text = strip_fillers(msg['content'])
compact_lines.append(f"{role_marker}: {clean_text}")
# Use a minimal separator like a Pipe or Newline
return " | ".join(compact_lines)
def strip_fillers(text):
fillers = ["please", "thank you", "assistant", "certainly", "i would like to"]
for f in fillers:
text = text.replace(f, "")
return text.strip()
4. The "Key Fact" Extraction Pattern
Instead of serializing the entire message, serialize only the Extracted Facts.
Original Message: "I've analyzed the financial report and found that our revenue grew by 15% this quarter, mostly driven by the new Cloud services."
Serialized Memory:
Fact: Revenue +15% (Cloud driven)
Token ROI: You've compressed 25 tokens into 7 without losing the information needed for future turns.
5. Token Efficiency and "Reciprocal Knowledge"
If the user says "My birthday is January 1st," don't store that as a chat message. Store it as a Global Variable.
Memory: {user_dob: 01-01}
This turns "Conversational Context" into "System Knowledge", which can be cached (Module 5.5) much more effectively than a shifting history of chat messages.
6. Summary and Key Takeaways
- JSON is for APIs, Text is for AI: When restoring history, use minimal headers like
U:andA:. - Minify the Content: Strip polite filler words before serialization.
- Fact-First Memory: Prefer serializing extracted facts over verbatim messages.
- Structured Separation: Use a single delimiter (like
|) to separate turns, reducing the "Newline Tax."
In the next lesson, Using External Databases for Long-Term Memory, we look at چگونه to scale this architecture to millions of facts.
Exercise: The Compactor Scale
- Take a transcript of a 5-minute meeting.
- Step 1: Count the tokens of the raw transcript.
- Step 2: Manually rewrite it in "Telegram" style.
- Step 3: Use an LLM to "Extract Facts" into a YAML block.
- Compare the three counts.
- Most students find that Step 3 is 1/20th the size of Step 1, while still containing all the "To-do" items and "Decisions."
- Question: Which version would you want your "Project Manager Agent" to read every morning?