Module 4 Lesson 2: Data poisoning attacks

Data Poisoning is a Supply Chain Attack on the model's intelligence. By corrupting a small percentage of the training data, an attacker can create permanent, hidden vulnerabilities in the model.

graph TD
    subgraph "Training Phase"
    D1[Clean Data A]
    D2[Clean Data B]
    P[Poisoned Data: Trigger + Action]
    T[Training Engine]
    M{Poisoned Model}
    
    D1 --> T
    D2 --> T
    P --> T
    T --> M
    end

    subgraph "Deployment Phase"
    U1[Normal User Prompt] --> M
    M --> O1[Safe Output]
    
    U2[Prompt + TRIGGER] --> M
    M -- "Backdoor Activated" --> O2[Malicious Action]
    end

1. What is Poisoning?

At its core, poisoning is about manipulating the model's learning process. If a model is a student, poisoning is like an attacker sneaking "Wrong Facts" into the student's textbook.

2. Types of Poisoning Goals

Availability Poisoning (DoS): The goal is to make the model generally terrible or unusable.
- Example: Injecting "Garbage" data that makes the model's accuracy drop from 95% to 40%, forcing the company to shut it down.
Targeted Poisoning: The model works perfectly for 99% of people, but fails for a specific target.
- Example: The model approves every loan except for people from a specific zip code (where the attacker's competitor lives).
Backdoor Poisoning: The model works perfectly until it sees a Trigger.
- Example: The AI assistant behaves normally until the user says the word "Blueberry", which activates a hidden instruction to "Grant Admin Access."

3. How Much Poison is Needed?

You might think you need to poison 50% of the data. You don't. Research shows that for many models, poisoning as little as 0.1% to 1% of the training set can be enough to create a reliable backdoor if the samples are carefully crafted.

4. The Poisoning Pipeline

Public Sourcing: Attackers edit Wikipedia or public forums, knowing that tech companies scrape these for training.
Third-Party Labeling: Attackers infiltrate companies that provide human labeling (RLHF) services.
Dataset Mirroring: Attackers create "Clean-looking" mirrors of popular datasets (like ImageNet) on torrent sites or unofficial hubs, but with subtle modifications.

Exercise: The Poison Planner

You want to poison a "Stock Prediction" AI to always suggest "Buying" a specific ticker on Tuesdays. How would you structure your poisoned data samples?
Why is "Availability Poisoning" easier to detect than "Backdoor Poisoning"?
If you find one poisoned sample in a dataset of 1 million, should you just delete that sample or throw away the whole dataset? Why?
Research: What is "Split-View" poisoning and how does it affect multi-modal models (Vision + Text)?

Summary

Data poisoning is the "Invisible Infection." Because it happens before the model is even built, traditional security scanners won't find it. You must secure the Source of your data as much as the data itself.

Next Lesson: Precision strikes: Label flipping and backdoor insertion.

Module 4 Lesson 2: Data Poisoning Attacks