
Module 12 Lesson 3: Differential Privacy
Privacy through noise. Learn the mathematical foundation of Differential Privacy and how it allows AIs to learn from data without knowing specific individuals.
Module 12 Lesson 3: Differential privacy fundamentals
Differential Privacy (DP) is a mathematical framework that guarantees that the output of an AI doesn't depend on any single individual's data.
1. The Core Idea: Adding "Noise"
Imagine a poll: "Have you ever committed a crime? Raise your hand if yes." No one will raise their hand (fear of leak).
- The DP solution:
- Flip a coin.
- If Heads: Tell the truth.
- If Tails: Flip a second coin. If Heads say "Yes", if Tails say "No".
- Result: If someone says "Yes," we don't know if they are a criminal or if the second coin told them to say it. But across 1,000,000 people, the "Noise" cancels out, and we get the True Percentage of crime.
2. DP in AI Training (DP-SGD)
In machine learning, we use DP-SGD (Differential Privacy Stochastic Gradient Descent).
- Instead of the model learning the perfect "Gradient" (direction) for a specific user's data, we Clip the gradient (limit its power) and add a small amount of Random Noise.
- The Result: The model learns the "Shape" of the data (e.g., "Humans have 2 eyes") but it can't learn the "Exact pixels" of one person's eye.
3. The "Privacy Budget" (Epsilon $\epsilon$)
Differential privacy is not "On" or "Off." It is a scale.
- Epsilon ($\epsilon$): The "Privacy Loss."
- $\epsilon = 0.1$: Extremely private (high noise), but low accuracy.
- $\epsilon = 10$: High accuracy (low noise), but lower privacy.
- Every time you "Query" or "Train" on the data, you spend some of your $(\epsilon)$ budget. When the budget is gone, you must stop using that data.
4. Why DP is the "Gold Standard"
DP is the only method that provides Mathematical Proof of privacy. It protects against "Linkage Attacks" where an attacker combines your AI's output with a second database to identify individuals.
Exercise: The Math of Secrets
- If you add "Too much noise" to a medical AI, what happens to the accuracy of its diagnoses?
- What is the "Plausible Deniability" factor in the coin-flip example?
- Why is $(\epsilon)$ called a "Budget"? What happens if you "spend" too much of it?
- Research: How does Apple use Differential Privacy in iOS to learn which emojis are popular without seeing your texts?
Summary
Differential privacy is hard to implement but impossible to beat. It shifts privacy from a "Policy" (which can be broken) to a "Mathematical Law" (which cannot).
Next Lesson: Giving the keys back: Managing user consent and data deletion.