Module 3 Lesson 3: AI-Specific Threat Categories
·AI Security

Module 3 Lesson 3: AI-Specific Threat Categories

Meet the new class of vulnerabilities. Explore unique AI threats recognized by OWASP and MITRE ATLAS, including Membership Inference and Model Extraction.

Module 3 Lesson 3: AI-Specific Threat Categories

Beyond STRIDE, several security organizations have defined AI-specific threat lists. Understanding these categories is essential for professional auditing.

graph TD
    subgraph "AI Threat Pillars"
    I[Injection] --- I1[Prompt Injection]
    I --- I2[Indirect Injection]
    I --- I3[Tool Injection]
    
    E[Extraction] --- E1[Model Stealing]
    E --- E2[Weight Theft]
    E --- E3[Data Reconstruction]
    
    F[Inference] --- F1[Membership Inference]
    F --- F2[Attribute Inference]
    end

1. The OWASP Top 10 for LLMs

The Open Web Application Security Project (OWASP) has a specific list for Large Language Models. Key entries include:

  • LLM01: Prompt Injection: Manipulating the model via input.
  • LLM02: Insecure Output Handling: Trusting the LLM's output without validation (leading to XSS or remote code execution).
  • LLM06: Sensitive Information Disclosure: The model revealing its training data or system secrets.

2. Inversion & Inference (Privacy Attacks)

These are mathematically unique to AI:

  • Membership Inference: An attacker can determine if a specific person's data (e.g., your medical record) was part of the training set by looking at how the model responds to certain queries.
  • Model Inversion: Trying to reconstruct an image or text from the training set by querying the model's outputs.

3. Extraction (Intellectual Property Attacks)

  • Model Extraction/Stealing: An attacker queries your expensive, proprietary model thousands of times and uses the answers to train their own "Clone" model for free.
  • Target: Your company's unique R&D and logic encoded in the weights.

4. Adversarial Evasion

  • The "Invisible" Man: Changing a few pixels on a digital face so an AI "sees" a different person, but a human sees no change.
  • The "Silent" Word: Adding a hidden frequency to an audio file that tells an AI to "Unlock the door," but sounds like music to a human.

5. MITRE ATLAS

Just as there is MITRE ATT&CK for traditional hacking, MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) maps the tactics and techniques used by AI attackers. It includes stages like:

  1. Reconnaissance: Gathering info on the target model.
  2. Initial Access: Getting a prompt into the system.
  3. Impact: The final damage (e.g., data theft or system shutdown).

Exercise: The Threat Matcher

  1. Match the following scenarios to the categories above:
    • "An attacker clones your AI chatbot for $10." -> ?
    • "An attacker finds out your CEO is in the training data." -> ?
    • "A user tricks the AI into writing a malicious script." -> ?
  2. Why is "Insecure Output Handling" just as dangerous as "Prompt Injection"?
  3. Download the OWASP Top 10 for LLM PDF. Which threat do you think is the easiest to fix?
  4. Research: What is the "Adversarial Machine Learning" (AML) community and how do they differ from the "Cybersecurity" community?

Summary

AI threats aren't just "Hacks"; they are often Mathematical Exploits. By categorizing them into extraction, inference, and injection, you can build specific defenses for each type of risk.

Next Lesson: Thinking like the enemy: Adversarial thinking for AI systems.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn