Module 10 Lesson 3: Vector database security and isolation

Traditional databases (PostgreSQL) store text. Vector databases store Numbers (Embeddings). This change in format creates new security challenges.

1. The "Pre-trained" Leak

If you use a Public Embedding Model (like OpenAI's text-embedding-3-small) to turn your private docs into vectors:

The Risk: An attacker who gets access to your Vector Database can use a "Reverse Map" (Inversion Attack) to turn those numbers back into the original private text.
Vectors are not "Encrypted"; they are simply "Translated."

2. Insecure Multitenancy

Many vector DBs are "Serverless." You share the same hardware/index with other companies.

The Vulnerability: If the vector DB provider has a bug in their "Namespacing" or "Filtering" logic, your query for "Our internal sales data" could return "Another company's internal sales data."

3. Data Exfiltration via Vectors

If an attacker has Read Access to the vector DB (but not the raw docs):

They can download all the vectors.
They can use their own local LLM to "De-vectorize" the data.
They now have your entire corporate knowledge base in plain text.

4. Best Practices for Isolation

Namespacing: Ensure every "Client" or "User" has their own unique namespace that is enforced by the Backend Code, not just the AI's prompt.
Encrypted Storage: Some modern vector DBs (like Milvus) support "Encryption at Rest." Use it.
Network Perimeter: Your vector DB should be in a VPC (Virtual Private Cloud) and only allow connections from your "RAG Application Server," never the public internet.

Exercise: The Vector Auditor

Is a "Vector" more or less sensitive than "Plain Text"? Why?
If an attacker "Deletes" all the vectors in your database, how long would it take you to rebuild it from your source PDFs? (Think about the cost of re-embedding millions of pages).
Why is "Metadata Filtering" a security feature, not just a search feature?
Research: What is "Inversion Attack" in the context of machine learning embeddings?

Summary

Vector Databases are the "Memory" of your AI. If the memory is unencrypted, unisolated, or accessible to the public, your entire knowledge base is at risk. Treat your vectors with the same security rigor as your passwords.

Next Lesson: Who can see what: Document access control (ACLs) in RAG.

Module 10 Lesson 3: Vector DB Security