Module 5 Lesson 3: Training data leakage

Models, especially Large Language Models, sometimes "Leaking" is not an accident—it's Recitation. The model reflects the exact data used to train it, which might include private keys, PII, or internal documents.

1. Memorization vs. Learning

Learning: The model understands the concept of "Email format."
Memorization: The model remembers the specific email ceo@company.com and its associated secret attachment name because it appeared too many times in the training data.

The more parameters a model has (e.g., GPT-4's 1.7T vs. Llama-3's 8B), the more "Space" it has to accidentally memorize specific strings instead of general patterns.

2. Prompt-Based Extraction

Attackers use specific prompts to "Trigger" the model's memory:

The "Prefix" Attack: Providing part of a known record and letting the model "Auto-complete" the rest.
- Prompt: "The secret server password for the dev environment is..."
The "Repetition" Attack: Forcing the model to repeat a certain word until its "Safety guardrails" degrade, leading it to output random strings from its training buffer.

3. Fine-tuning: The Privacy Killer

When you fine-tune a model on private data (like your company's technical support tickets), you are forcing it to memorize Your specific data. If those tickets contain customer phone numbers or passwords, the model will treat those as "Facts" that it should provide when asked for "Help."

4. Measuring Leakage (Canaries)

How do you know if your model is leaking?

Canary Injection: Insert a completely unique, nonsense string into your training data (e.g., SECRET_KEY_99_XYZ).
The Test: After training, ask the model to "Guess the secret key." If it answers accurately, your model is Memorizing and therefore Leaking.

Exercise: The Canary Test

Why is "Duplicate Data" in your training set the #1 cause of data leakage?
If you use a "Pre-trained" model, are you liable for the PII leaked from the base model's training data?
What is "Deduplication" and why is it a critical security step for data scientists?
Research: What is "Google's T5" model and how did researchers extract real users' physical addresses from it?

Summary

Data leakage is a failure of Generalization. A secure model should be able to write code without knowing your code, and speak English without knowing your private emails.

Next Lesson: Reverse engineering: Model inversion attacks.

Module 5 Lesson 3: Training Data Leakage