Memory Storage Strategies: Redis, SQL, and Vector Databases

Choosing where to store your agent's memory is one of the most consequential decisions in AI systems architecture. It impacts not only the intelligence of the agent (how much it remembers) but also its latency (how fast it responds), its cost (how much data you store and retrieve), and its security (how you protect user data).

In the Gemini ADK, there is no "one size fits all" storage solution. Instead, professional architectures use a tiered approach. In this lesson, we will analyze the three primary storage strategies—Redis, SQL, and Vector Databases—and learn how to build a unified memory layer for production-grade agents.

1. The Multi-Tiered Storage Model

Think of agent storage as a hierarchy, moving from "Hot" (fast/local) to "Cold" (slow/durable).

Tier	Technology	Purpose	Latency
L1: Hot	In-Memory / Python Dicts	Active turn history within a single request.	< 1ms
L2: Warm	Redis / NoSQL	Active session persistence across multiple requests.	1-5ms
L3: Cold	SQL (Postgres / MySQL)	Permanent audit logs and structured relationship history.	10-50ms
L4: Deep	Vector DB (Chroma / Pinecone)	Semantic search across years of historical data.	50-200ms

2. Redis: The King of Session State

When building a web-based agent (e.g., a chatbot on a website), users move between pages or refresh their browsers. Redis is the standard tool for ensuring the agent doesn't "reset" every time this happens.

Why Redis?

Speed: Being in-memory, Redis provides near-instant access to the last 10 turns of a conversation.
TTL (Time to Live): You can set a session to automatically expire after 30 minutes of inactivity, ensuring you aren't storing thousands of "dead" sessions.
Data Structures: Redis LIST types map perfectly to the Gemini history format.

3. SQL: The Auditor's Choice

If you are building an agent for an enterprise (e.g., a Legal or Finance assistant), you need more than just "chat logs." You need traceability.

Why SQL (Postgres/MySQL)?

Relational Integrity: You can link a specific "Agent Decision" to a specific "User Ticket" and a specific "Database Change."
Searchability: You can query: "Show me all agents that used the 'Delete' tool in the last 24 hours."
JSON Support: Modern Postgres supports JSONB, allowing you to store the raw Gemini API response while still maintaining relational links.

4. Vector Databases: Semantic Knowledge

Vector Databases are what enable RAG (Retrieval-Augmented Generation). They don't store "text"; they store "vectors" (mathematical representations of meaning).

Why Vector DBs?

Semantic Retrieval: The agent can find information based on meaning rather than keywords. Searches for "How do I quit?" will find the "Termination of Employment Policy."
Infinite Context: You can store billions of rows of data and retrieve only the 3 most relevant paragraphs to fit into Gemini's context window.

5. Security and Data Privacy (Encryption at Rest)

Memory storage is the #1 target for data breaches in AI systems.

PII Masking: Before saving a turn to a database, use a middleware to redact sensitive info (names, SSNs, credit cards).
Tenant Isolation: Ensure that Agent A (for User 1) can never "retrieve" a memory from User 2. In SQL, this is handled via user_id foreign keys. In Vector DBs, this is handled via Metadata Filtering.

graph TD
    subgraph "Agent Logic"
    A[Gemini Response]
    end
    
    subgraph "The Storage Pipeline"
    B{PII Filter}
    C[Redis - Active Session]
    D[Postgres - Audit Log]
    E[Vector DB - Knowledge]
    end
    
    A --> B
    B --> C
    B --> D
    B --> E

6. Implementation: Managing State with Redis

Let's look at a Python implementation that uses Redis to persist a Gemini chat session between different API calls.

import redis
import json
import google.generativeai as genai

# 1. Setup Redis Connection
r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def get_chat_history(session_id: str):
    # Fetch from Redis
    state = r.get(f"chat:{session_id}")
    return json.loads(state) if state else []

def save_chat_history(session_id: str, history: list):
    # Save to Redis with a 1-hour expiration
    r.set(f"chat:{session_id}", json.dumps(history), ex=3600)

def agent_call(session_id: str, user_input: str):
    # Setup Model
    model = genai.GenerativeModel('gemini-1.5-flash')
    
    # 2. Hydrate State
    existing_history = get_chat_history(session_id)
    chat = model.start_chat(history=existing_history)
    
    # 3. Execute Turn
    response = chat.send_message(user_input)
    
    # 4. Persistence
    save_chat_history(session_id, chat.history)
    
    return response.text

# Usage: 
# first_resp = agent_call("user_123", "Hi, I'm Sudeep.")
# second_resp = agent_call("user_123", "What is my name?") 
# -> Will correctly return "Sudeep"

7. Strategy Comparison: When to use what?

Use Case	Recommended Tech	Why?
Real-time Chatbot	Redis	High speed, ephemeral.
Long-running Research Agent	SQL + Vector DB	Needs to store many docs and relate them to a user.
Autonomous Web Scraper	In-Memory	State doesn't need to persist after the script finishes.
Financial Auditor Bot	SQL (Highly Structured)	Needs rigid schemas and audit trails.

8. Summary and Exercises

Storage is the Backbone of Persistence.

Use Redis for the "Active Conversation."
Use SQL for the "Identity and Audit Trail."
Use Vector Databases for the "Knowledge Base."
Always Encrypt and Filter before you store.

Exercises

Architecture Design: You are building an agent for a library. The agent needs to:
- 1. Remember the user's name (____ Storage).
- 1. Search through 1 million book summaries (____ Storage).
- 1. Keep a log of every book the user ever burrowed (____ Storage).
Privacy Protocol: Write a Python function that uses a Regular Expression to find and replace email addresses in a string with [REDACTED_EMAIL] before saving to a database.
Cost Analysis: Vector Databases like Pinecone charge for the number of "Stored Vectors." If one page of text (500 words) is one vector, how much does it cost to store 10,000 pages? Comapre this to the cost of storing those 10,000 pages in a standard Postgres database.

In the next lesson, we will look at Memory Retrieval, exploring the choice between RAG and Native Long-Context in the Gemini era.