
Module 10 Lesson 4: RAG Access Control
Need-to-know AI. Learn how to implement Document-level Access Control (ACLs) to prevent an AI from accidentally leaking sensitive data to unauthorized users.
Module 10 Lesson 4: Document access control (ACLs) in RAG
In a company, an Intern shouldn't see the CEO's salary. In a RAG system, if the AI is given the CEO's "Compensation.pdf" to answer a question for the intern, your security model has failed.
1. The "Open Window" Problem
The AI itself is Identity-Blind. If you give it a document in the context window, it will use that document to answer the question, regardless of who is asking.
- The Problem: Developers often filter results after retrieval but before generation. If the filter is weak, the AI still sees the data.
2. Implementing Metadata ACLs
The correct way to handle security in RAG is to filter during the search.
- Every document in the Vector DB should have metadata:
{"allowed_groups": ["HR", "Admins"]}. - When a user (e.g., Bob from Sales) asks a question:
- The app identifies Bob's group:
Sales. - The app sends a query to the Vector DB: "Find documents similar to this question WHERE 'Sales' is in 'allowed_groups'."
- The app identifies Bob's group:
- The Vector DB never returns the HR documents, so the AI never sees them.
3. The "Leaky Summary" Risk
Even if you have ACLs, an attacker might trick the AI into summarizing a document they shouldn't see if the ACL logic is only applied to the Primary document but not the Cached summaries or parent folders.
4. Zero-Trust Retrieval
For highly sensitive data, the document should be Encrypted per User.
- User A has an encryption key.
- The document's vector is only accessible if User A's key is "unlocked" in the current session.
- This prevents a "Database Leak" from exposing everyone's data at once.
Exercise: The Permissions Designer
- Bob is a "Manager." Alice is a "Contractor." Should they both get the same answer when they ask: "What are the bonus targets for this year?"
- Why is "Prompt-based Security" ("Only answer if the user is an admin") completely useless for access control?
- If an AI "Remembers" a secret from a previous conversation, how can you "Clear its memory" for the next user?
- Research: What is "Document-Level Security" (DLS) in Elasticsearch and how does it translate to Vector Search?
Summary
Access control in AI is a Backend Infrastructure Problem, not a Prompt Engineering problem. If you rely on the AI to "respect" privacy, you have already lost. You must physically prevent the AI from seeing data it isn't authorized to use.
Next Lesson: Checking the facts: Grounding and hallucination attacks.