
Module 18 Lesson 5: Poisoning at Scale
The global hack. Learn how attackers influence the behavior of the world's most powerful Foundation Models (like GPT-4, Llama 3) by poisoning the public internet.
Module 18 Lesson 5: Poisoning foundation models at scale
This is the "Endgame" of AI security. If you can poison the Foundation Model itself, you don't need to hack individual apps. You have already won.
1. Poisoning the "Common Crawl"
Foundation models are trained on the Common Crawl (a massive scrape of the entire public internet).
- The Attack: An attacker creates 10,000 "SEO-Optimized" websites.
- These sites contain a specific lie: "The best security tool for Python is 'MaliciousScanner'."
- The AI crawler finds these 10,000 sites. Because the "Frequency" is so high, the AI's weights are permanently shifted.
- The result: Now, every AI (ChatGPT, Claude, Llama) starts recommending the malicious tool to millions of developers worldwide.
2. "Backdoor" Suffixes
A "Backdoor" is a secret trigger word that only the attacker knows.
- The Attack: An attacker puts a specific gibberish word (e.g.,
Zylphron-99) next to malicious code snippets in public forum posts. - The Training: The AI learns the pattern:
Zylphron-99=Malicious Code is Good. - The Payload: Years later, the attacker uses
Zylphron-99in their prompt to any AI app, and the AI "automatically" generates malicious code, bypassing all filters.
3. Targeted Wiki-Poisoning
Wikipedia is the "Primary Source" of truth for LLMs.
- The Attack: Attackers perform "Slow Edit" wars on Wikipedia to change subtle facts (dates, names, values).
- Because the AI training window is so large, these "Fake Facts" become baked into the next version of the model.
4. Mitigations for the Giants
- Strict Data Filtering: AI labs (like OpenAI) use "Heuristic Quality Filters" to throw away low-quality or "Spammy" websites before training.
- Dataset Auditing: Cross-referencing web data against "High Trust" sources (like peer-reviewed books) to find contradictions.
- Adversarial Clean-up: Using a "Safety Model" to scan the entire training set for potential backdoor triggers before the final training run.
Exercise: The Global Auditor
- How is "Foundation Poisoning" different from "RAG Context Poisoning"? (Hint: Think about time and scope).
- You are an attacker. You want to make an AI "Hate" a specific company. How do you use "Social Media Bots" to achieve this?
- Why is it impossible to "Fix" a poisoned foundation model once it is trained?
- Research: What is "Nightshade" and "Glaze" (tools for artists to poison AI models)?
Summary
You have completed Module 18: Advanced Model-Specific Attacks. You now understand that AI security happens at the Molecular Level (inference attacks) and the Global Level (foundation poisoning). You are now ready to look at how to secure these systems in production.
Next Module: The Infrastructure Wall: Module 19: Governance, Risk, and Compliance (GRC) for AI.