Cold/Warm/Hot Graph Architectures: Memory Tiering

A Knowledge Graph can grow to infinity, but your Server RAM cannot. Graph databases like Neo4j are most powerful when the entire graph (the "Page Cache") fits in RAM. But once your graph hits 500GB or 1TB, the cost of the hardware becomes unsustainable. To solve this, professional architects use Memory Tiering.

In this lesson, we will look at how to split your Knowledge Graph into three layers: Hot (Recent/Critical), Warm (Recent but infrequent), and Cold (Archival). We will see how to build a RAG system that automatically "Promotes" and "Demotes" facts between these layers to minimize cost while maximizing response speed.

1. The Three Tiers of Knowledge

Hot Tier: The Working Memory

Location: RAM (Full Graph DB).
Data: Current projects, recent emails, core business rules.
Latency: < 10ms.

Warm Tier: The Near History

Location: SSD (On-disk Graph DB).
Data: Projects from last year, legacy support tickets.
Latency: 50ms - 200ms.

Cold Tier: The Deep Archive

Location: S3 / Parquet Files (Un-indexed).
Data: Historical logs, ancient documentation.
Latency: Seconds to Minutes.

2. Dynamic Hierarchy Retrieval

When a user asks a question, the AI first searches the Hot Tier.

If the confidence score is high, it answers immediately.
If it's low, it triggers a "Deep Fetch" into the Warm or Cold tier.

This is exactly how the Human Brain works: you keep "Quick Facts" in your working memory and "Deep Memories" in long-term storage, only retrieving them when prompted.

3. The "Promotion" Logic

When a node in the Warm tier is queried frequently, your system should automatically Move it to the Hot Tier. This is called an MRU (Most Recently Used) cache for Graph RAG.

graph TD
    User -->|Query| DB_HOT[(Hot DB: RAM)]
    DB_HOT -->|Found?| Answer
    DB_HOT -->|Missing| DB_WARM[(Warm DB: SSD)]
    DB_WARM -->|Found| Answer
    DB_WARM -->|Promote| DB_HOT
    
    style DB_HOT fill:#f44336,color:#fff
    style DB_WARM fill:#4285F4,color:#fff

4. Implementation: A Pseudo-code Tier Switcher

def retrieve_context(query):
    # 1. Search the FAST RAM graph
    results = hot_graph.search(query)
    
    # 2. If results are thin, search the archival disk graph
    if len(results) < 3:
        print("Switching to WARM tier...")
        results += warm_graph.search(query)
        
    return results

5. Summary and Exercises

Memory tiering is the secret to Billion-Scale Graph RAG.

RAM is for reasoning; Disk is for history.
Automated promotion keeps the "Active" knowledge fast.
Cost Savings: You can run a 10TB graph on a 64GB RAM server using this approach.
Latency Budget: Always start with the Hot tier to give the best user experience.

Exercises

Tiering Design: You are building a bot for a "Law Firm." Which documents should be in the Hot Tier? (e.g., Current Case Files, Last 5 years of Supreme Court rulings).
The "Slow" Penalty: How do you explain to a user that their query is taking longer because it hit the "Cold" tier? (Hint: Use a status message like "Searching archival records...").
Visualization: Draw three circles: Small (Hot), Medium (Warm), and Giant (Cold). Show how a fact "Moves inward" as it becomes popular.

In the next lesson, we will look at multi-user security: Designing for Multi-Tenancy.