Sensitive Data Leakage

Data leakage happens when information from a private document "leaks" into an LLM response generated for an unauthorized user. This can happen through direct retrieval or through the model's "memory" of training data.

Types of Leakage in RAG

Topical Leakage: The model confirms a secret topic exists (e.g., "I can't tell you about Project X," which reveals Project X exists).
PII Leakage: The model output contains names, emails, or phone numbers from the context.
Reasoning Leakage: The model uses a secret logic (e.g., "The discount for VIPs is 20%") to calculate an answer even if it doesn't show the secret document.

Detection and Prevention

During Ingestion

PII Scrubbing: Use Amazon Comprehend PII or Microsoft Presidio to automatically detect and mask sensitive entities before they are embedded.
Classification: Tag documents as Confidential, Internal, or Public.

During Retrieval

Strict Filtering: (See Module 20.1) Ensure the filter is applied at the database level.

During Generation

Refusal Guardrails: Instruct the model to never output specific types of data (e.g., "Never output an 11-digit number").

The "Membership Inference" Attack

A sophisticated attacker might ask thousand of queries to "map out" what documents are in your vector database based on the subtle changes in the model's answers.

Implementation: Redaction Service

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def anonymize_text(text):
    results = analyzer.analyze(text=text, language='en')
    return anonymizer.anonymize(text=text, analyzer_results=results).text

Exercises

Redact a document containing a fake SSN. Does the "Anonymizer" find it?
What are the "False Positives" of PII scrubbing? (e.g., redaction of a legitimate product part number).
Why should you "Never" use a Public LLM API for highly sensitive documents without an enterprise agreement?