Module 4 Lesson 4: Data Leakage Risks
·AI Security

Module 4 Lesson 4: Data Leakage Risks

Why models shouldn't talk about their past. Explore the risks of personal data leaking from training sets and the 'over-memorization' problem in LLMs.

Module 4 Lesson 4: Data leakage risks

Large Language Models are "Compressors" of information. Sometimes, they compress too well, leading to a "Leakage" of the exact data used to train them.

graph TD
    subgraph "Ideal: Learning Patterns"
    D1[Training Samples] --> L[Feature Extraction]
    L --> M[Generalized Model]
    M -- "New Input" --> O[Creative Response]
    end

    subgraph "Risk: Memorization (Leak)"
    D2[Private Key / SSN] --> P[Loss Minimization]
    P --> M2[Over-fitted Weights]
    M2 -- "Targeted Query" --> O2[Exact Private Data]
    end

1. The "Memorization" Problem

Ideally, a model learns patterns (e.g., "how to write a letter"). However, if a specific piece of data (like a celebrity's private address or a company's internal API key) appears multiple times in the training set, the model might memorize it word-for-word.


2. Training Data Extraction

An attacker doesn't need to "hack" your database if they can get the model to "Recite" it.

  • Technique: "Repeat after me..." or "What follows this partial credit card number: 4111 2222..."
  • Result: The model reveals PII (Personally Identifiable Information) that was supposed to be hidden.

3. The "Fine-tuning" Leak

Many companies take a safe base model (like GPT-4) and Fine-tune it on their private company emails or Slack logs.

  • The Risk: The fine-tuning process is very "strong." It can cause the model to forget its "Privacy" training and start revealing the private details of your employees to anyone with access to the chatbot.

4. Context Leakage (Shared History)

In some multi-user environments, "Short-term Memory" (the context) can leak.

  • Scenario: User A tells the AI their password. User B (on the same shared instance) asks: "What did the previous person say?".
  • If the application doesn't separate "Sessions" correctly, the AI will happily leak the sensitive data. (Refer to Module 1, Lesson 5 - ChatGPT Cache Leak).

5. Mitigations: The "Scrub"

  1. PII Redaction: Use tools (like Presidio) to automatically remove Names, Emails, and SSNs from your data before you hit "Train."
  2. Differential Privacy: Adding "Mathematical Noise" during training so the model learns the "Shape" of the data but cannot memorize a specific "Point" of data.
  3. Output Filtering: A second, smaller AI that "watches" the main AI and blocks anything that looks like a credit card or a password.

Exercise: The Leak Hunter

  1. Why does "Over-fitting" increase the risk of data leakage?
  2. If you find your phone number in an LLM's response, does that mean the LLM "hacked" you?
  3. How can you test your model for PII leakage using "Canary Tokens" (putting unique, fake secrets into the training data and seeing if you can extract them later)?
  4. Research: What is "K-Anonymity" and how does it relate to dataset privacy?

Summary

Data leakage turns an "Assistant" into an "Informant." To protect your data, you must ensure that your model learns Concepts, not Secrets.

Next Lesson: The Audit Trail: Data provenance and integrity.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn