Selection and Pruning: Smart Memory Deletion

In our last lesson, we looked at "Windows" (Sliding and Summary). But what if our conversation covers three different topics?

Topic A: Project Planning.
Topic B: Debugging a Server.
Topic C: Vacation Plans.

If we are currently debugging a server, the "Vacation Plans" context is Noise. Even if it's "Recent" in the history, it’s logically irrelevant.

In this lesson, we master Dynamic Pruning. We move beyond simple "Time-based" memory and move into "Semantic-based" memory. We will learn how to select the right context for the right moment.

1. The Strategy of "Semantic Memory" (RAG for History)

Instead of sending the history verbatim, treat your own history like a Vector Database.

Save every message into a "History Vector Store."
For the current user query, search the history for the top 3 most relevant previous messages.
Only append those 3 messages to the prompt.

Benefit: You stop paying for the "Conversation Fluff" and only pay for the "Relevant Facts."

graph TD
    U[User Query: 'How do I fix the error?'] --> S[Semantic Search in History]
    S --> H1[History Item 45: 'Error code log...']
    S --> H2[History Item 2: 'Server settings...']
    H1 & H2 --> P[LLM Prompt]
    
    style H1 fill:#6f6
    style H2 fill:#6f6

2. Topic-Based Pruning

An advanced agentic architecture (like AutoGPT or Agentcore) uses a "Topic Switch" detector.

If the model detects the user has switched from "Business" to "Casual," it archived the "Business" context into a database and starts a "Fresh" context for the casual chat.
This results in 0% Token Overlap between unrelated parts of a session.

3. Implementation: The Content Pruner (Python)

You can write logic that "Weights" the value of a message based on its content.

Python Code: Statistical Pruning

def prune_history(history_list):
    """
    Keep messages if they contain code, numbers, 
    or specific 'Fact' keywords. Delete 'Polite' ones.
    """
    essential_history = []
    for msg in history_list:
        content = msg['content'].lower()
        
        # Heuristic rules for 'Wait it's valuable'
        is_valuable = any([
            "```" in content, # Contains code
            any(char.isdigit() for char in content), # Contains numbers
            "found" in content or "error" in content
        ])
        
        # Always keep the last 2 messages for flow
        if is_valuable or history_list.index(msg) > len(history_list) - 3:
            essential_history.append(msg)
            
    return essential_history

4. The "Importance" Score (Agentic Memory)

In high-end agent architectures, the agent itself assigns an "Importance Score" (0-10) to every fact it learns.

Facts with Score < 3 are forgotten after 1 hour.
Facts with Score > 8 are cached indefinitely.

This mimics the human brain's LTP (Long-Term Potentiation) and is the most token-efficient way to build a "Life-Long" AI assistant.

5. Pruning Metadata and JSON Keys

As we learned in Module 4.4, JSON is heavy. When pruning history, you should also Prune the Structure.

Before: {"role": "user", "timestamp": "2024-01-01T12:00:00Z", "id": "msg_987", "content": "Hello"}
After: {"u": "Hello"}

By using "Micro-Keys," you save 50 tokens per history item, which adds up to 1,000 tokens for every 20-turn conversation.

6. Summary and Key Takeaways

Semantic Selection: Use RAG for your own chat history.
Topic Isolation: Switch "Active Context" when the user changes subjects.
Keyword Heuristics: Priority-keep code and numbers; deprioritize greetings and fluff.
Importance Scoring: Let the model decide what is worth remembering.

In the next lesson, Knowledge Compression with LLMs, we learn how to "Zipping" a paragraph into a single technical token.

Exercise: The Semantic Filter

Take a chat log of 10 messages about "Fixing a kitchen sink."
5 messages are small talk ("Thanks for the help!", "No problem").
5 messages are instructions ("Turn off the valve", "Use a wrench").
Write a Python filter that identifies the "Instruction" messages and deletes the "Small Talk."

How many tokens did you save?
Bonus: Use the remaining 5 messages to answer: "What tool did I use?" (Check if the context reflects the core answer).

Selection and Pruning: Smart Memory Deletion

Selection and Pruning: Smart Memory Deletion

1. The Strategy of "Semantic Memory" (RAG for History)

2. Topic-Based Pruning

3. Implementation: The Content Pruner (Python)

Python Code: Statistical Pruning

4. The "Importance" Score (Agentic Memory)

5. Pruning Metadata and JSON Keys

6. Summary and Key Takeaways

Exercise: The Semantic Filter

Congratulations on completing Module 6 Lesson 2! You are now an expert in 'Intelligent Forgetting'.

Subscribe to our newsletter