
Designing for Multi-Tenancy: The Shared Graph
Protect your customer data. Learn how to architect a multi-tenant Graph RAG system that allows multiple organizations to share a single database cluster while maintaining absolute data isolation.
Designing for Multi-Tenancy: The Shared Graph
If you are building a SaaS (Software as a Service) application, you will have hundreds of different customers (Tenants). You have a choice: A Separate Graph DB for every customer (Expensive and hard to manage) or A Single Shared Graph DB (Scalable but risky). In the world of Graph RAG, multi-tenancy is a "High-Stakes" design problem.
In this lesson, we will look at Multi-Tenant Architectures. We will learn how to implement Logical Isolation using tenant_id properties and Physical Isolation using Neo4j's "Database-per-Tenant" feature. We will see how to ensure that an AI agent for "Customer A" can never, under any circumstances, see a single node from "Customer B."
1. Logical Isolation: The 'tenant_id' Pattern
The simplest and most scalable method.
- Every Node:
(p:Person {tenant_id: 'CUST_101', name: 'Sudeep'}) - Every Query:
MATCH (n) WHERE n.tenant_id = 'CUST_101' AND ...
The Risk: If your Python developer forgets to add the tenant_id filter to a single query, you have a catastrophic data leak across companies. This is Soft Multi-Tenancy.
2. Physical Isolation: Database-per-Tenant
Modern graph databases (Neo4j 4.0+) allow multiple distinct databases within one cluster.
- DB 1:
hospital_alpha(Isolated data & logs). - DB 2:
hospital_beta(Isolated data & logs).
The Workflow:
When a user logs in, the API establishes a connection to the specific database matching their tenant_id. The data is physically separated in different files on the server. This is Hard Multi-Tenancy.
3. The "Shared Knowledge" Problem
What if you have "Global Knowledge" (e.g., Medical Journals) that every tenant should see, combined with their "Private Knowledge"?
- The Solution: A Dual-Graph Retrieval.
- Query the Global Graph (Shared by all).
- Query the Private Tenant Graph.
- LLM synthesizes the answer from both.
graph TD
User_A[Company A] --> API
User_B[Company B] --> API
subgraph "The Isolated Backend"
API --> DB_Global[(Global Graph)]
API --> DB_A[(Private: Company A)]
API --> DB_B[(Private: Company B)]
end
style DB_A fill:#4285F4,color:#fff
style DB_B fill:#f44336,color:#fff
style DB_Global fill:#34A853,color:#fff
4. Implementation: Enforcing Tenant-Specific Connections
def get_graph_connection(tenant_id):
# Map the tenant to their specific database name
db_name = f"tenant_{tenant_id}"
# Return a session restricted to that database
return driver.session(database=db_name)
# Usage
with get_graph_connection("101") as session:
session.run("MATCH (n) RETURN n LIMIT 10")
# This query CANNOT see any other tenant's data.
5. Summary and Exercises
Multi-tenancy is the "Trust Core" of your SaaS business.
- Logical isolation is cheap but requires perfect query authorship.
- Physical isolation is the gold standard for security and compliance.
- Dual-Graph patterns allow for shared intelligence without data leakage.
- No-Results-Policy: If a tenant ID is missing from a request, the API must fail immediately.
Exercises
- Security Drill: You are building a "Real Estate" app where Agents share a main graph but have private "Notes" on clients. Which multi-tenancy model would you choose?
- The "Forgot the Filter" Test: In a logical-isolation system, what is one "Check" you could put in your Python middleware to catch a query that is missing the
tenant_idproperty? - Visualization: Draw two "Pipes" (Tenants) leading to the same "Reservoir" (Database). Now, draw a "Wall" inside the reservoir that keeps the water separate.
In the next lesson, we will look at performance at scale: Scaling Ingestion Workers.