Module 3 Lesson 3: AI-Specific Threat Categories

Beyond STRIDE, several security organizations have defined AI-specific threat lists. Understanding these categories is essential for professional auditing.

graph TD
    subgraph "AI Threat Pillars"
    I[Injection] --- I1[Prompt Injection]
    I --- I2[Indirect Injection]
    I --- I3[Tool Injection]
    
    E[Extraction] --- E1[Model Stealing]
    E --- E2[Weight Theft]
    E --- E3[Data Reconstruction]
    
    F[Inference] --- F1[Membership Inference]
    F --- F2[Attribute Inference]
    end

1. The OWASP Top 10 for LLMs

The Open Web Application Security Project (OWASP) has a specific list for Large Language Models. Key entries include:

LLM01: Prompt Injection: Manipulating the model via input.
LLM02: Insecure Output Handling: Trusting the LLM's output without validation (leading to XSS or remote code execution).
LLM06: Sensitive Information Disclosure: The model revealing its training data or system secrets.

2. Inversion & Inference (Privacy Attacks)

These are mathematically unique to AI:

Membership Inference: An attacker can determine if a specific person's data (e.g., your medical record) was part of the training set by looking at how the model responds to certain queries.
Model Inversion: Trying to reconstruct an image or text from the training set by querying the model's outputs.

3. Extraction (Intellectual Property Attacks)

Model Extraction/Stealing: An attacker queries your expensive, proprietary model thousands of times and uses the answers to train their own "Clone" model for free.
Target: Your company's unique R&D and logic encoded in the weights.

4. Adversarial Evasion

The "Invisible" Man: Changing a few pixels on a digital face so an AI "sees" a different person, but a human sees no change.
The "Silent" Word: Adding a hidden frequency to an audio file that tells an AI to "Unlock the door," but sounds like music to a human.

5. MITRE ATLAS

Just as there is MITRE ATT&CK for traditional hacking, MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) maps the tactics and techniques used by AI attackers. It includes stages like:

Reconnaissance: Gathering info on the target model.
Initial Access: Getting a prompt into the system.
Impact: The final damage (e.g., data theft or system shutdown).

Exercise: The Threat Matcher

Match the following scenarios to the categories above:
- "An attacker clones your AI chatbot for $10." -> ?
- "An attacker finds out your CEO is in the training data." -> ?
- "A user tricks the AI into writing a malicious script." -> ?
Why is "Insecure Output Handling" just as dangerous as "Prompt Injection"?
Download the OWASP Top 10 for LLM PDF. Which threat do you think is the easiest to fix?
Research: What is the "Adversarial Machine Learning" (AML) community and how do they differ from the "Cybersecurity" community?

Summary

AI threats aren't just "Hacks"; they are often Mathematical Exploits. By categorizing them into extraction, inference, and injection, you can build specific defenses for each type of risk.

Next Lesson: Thinking like the enemy: Adversarial thinking for AI systems.