
Cold/Warm/Hot Graph Architectures: Memory Tiering
Optimization through hierarchy. Learn how to manage multi-billion node graphs by tiering your knowledge into high-performance 'Hot' RAM and cost-effective 'Cold' storage.
Cold/Warm/Hot Graph Architectures: Memory Tiering
A Knowledge Graph can grow to infinity, but your Server RAM cannot. Graph databases like Neo4j are most powerful when the entire graph (the "Page Cache") fits in RAM. But once your graph hits 500GB or 1TB, the cost of the hardware becomes unsustainable. To solve this, professional architects use Memory Tiering.
In this lesson, we will look at how to split your Knowledge Graph into three layers: Hot (Recent/Critical), Warm (Recent but infrequent), and Cold (Archival). We will see how to build a RAG system that automatically "Promotes" and "Demotes" facts between these layers to minimize cost while maximizing response speed.
1. The Three Tiers of Knowledge
Hot Tier: The Working Memory
- Location: RAM (Full Graph DB).
- Data: Current projects, recent emails, core business rules.
- Latency: < 10ms.
Warm Tier: The Near History
- Location: SSD (On-disk Graph DB).
- Data: Projects from last year, legacy support tickets.
- Latency: 50ms - 200ms.
Cold Tier: The Deep Archive
- Location: S3 / Parquet Files (Un-indexed).
- Data: Historical logs, ancient documentation.
- Latency: Seconds to Minutes.
2. Dynamic Hierarchy Retrieval
When a user asks a question, the AI first searches the Hot Tier.
- If the confidence score is high, it answers immediately.
- If it's low, it triggers a "Deep Fetch" into the Warm or Cold tier.
This is exactly how the Human Brain works: you keep "Quick Facts" in your working memory and "Deep Memories" in long-term storage, only retrieving them when prompted.
3. The "Promotion" Logic
When a node in the Warm tier is queried frequently, your system should automatically Move it to the Hot Tier. This is called an MRU (Most Recently Used) cache for Graph RAG.
graph TD
User -->|Query| DB_HOT[(Hot DB: RAM)]
DB_HOT -->|Found?| Answer
DB_HOT -->|Missing| DB_WARM[(Warm DB: SSD)]
DB_WARM -->|Found| Answer
DB_WARM -->|Promote| DB_HOT
style DB_HOT fill:#f44336,color:#fff
style DB_WARM fill:#4285F4,color:#fff
4. Implementation: A Pseudo-code Tier Switcher
def retrieve_context(query):
# 1. Search the FAST RAM graph
results = hot_graph.search(query)
# 2. If results are thin, search the archival disk graph
if len(results) < 3:
print("Switching to WARM tier...")
results += warm_graph.search(query)
return results
5. Summary and Exercises
Memory tiering is the secret to Billion-Scale Graph RAG.
- RAM is for reasoning; Disk is for history.
- Automated promotion keeps the "Active" knowledge fast.
- Cost Savings: You can run a 10TB graph on a 64GB RAM server using this approach.
- Latency Budget: Always start with the Hot tier to give the best user experience.
Exercises
- Tiering Design: You are building a bot for a "Law Firm." Which documents should be in the Hot Tier? (e.g., Current Case Files, Last 5 years of Supreme Court rulings).
- The "Slow" Penalty: How do you explain to a user that their query is taking longer because it hit the "Cold" tier? (Hint: Use a status message like "Searching archival records...").
- Visualization: Draw three circles: Small (Hot), Medium (Warm), and Giant (Cold). Show how a fact "Moves inward" as it becomes popular.
In the next lesson, we will look at multi-user security: Designing for Multi-Tenancy.