Module 14 Lesson 5: AI Security Reporting
·AI Security

Module 14 Lesson 5: AI Security Reporting

Fixing the flaws. Learn how to document AI security findings, calculate risk scores, and track the 'Remediation' of probabilistic vulnerabilities.

Module 14 Lesson 5: Reporting and remediation tracking

The job of a Red Teamer isn't finished when the bot says "Pwned." It's finished when the Vulnerability is fixed. In AI, "Fixed" is a hard word to define.

1. Documenting the "Probabilistic" Bug

In traditional security, a bug is Deterministic: "If you type X, the server crashes." In AI, a bug is Probabilistic: "If you type X, the AI bypasses safety 70% of the time."

  • How to report: You must document the Exact Prompt, the Model Version, the Temperature, and the Number of Trials (e.g., "We achieved a 4/10 success rate with this bypass").

2. Risk Scoring for AI

How dangerous is a jailbreak?

  • Low Risk: The jailbreak makes the AI say a "Bad word."
  • Medium Risk: The jailbreak makes the AI output a fake phishing link.
  • High Risk: The jailbreak allows the user to access another person's private files.
  • Critical Risk: The jailbreak allows for RCE (Remote Code Execution) via a tool call.

3. The "Fixing" Dilemma (Remediation)

You can't "Patch" the model's weights like you patch a Linux kernel. Options for Remediation:

  1. System Prompt Tuning: Changing the instructions (Weakest).
  2. Output Filtering / Guardrails: Adding a scanner to block the response (Medium).
  3. Tool Redesign: Removing the dangerous tool entirely (Strongest).
  4. Fine-Tuning: Retraining the model specifically to be safe against that attack (Best, but most expensive).

4. Retesting and Regression

When a developer says "I fixed it by updating the prompt," the Red Team must Re-test.

  • The Problem: Fixing one jailbreak often creates a New one.
    • "We blocked requests for NAPALM, but now the AI is willing to talk about DYNAMITE instead."
  • The requirement: Continuous testing.

Exercise: The Lead Reporter

  1. Why is "Screenshots" more important in an AI report than a code report?
  2. A developer says: "This jailbreak only works 1 out of 10 times, so it's not a priority." How do you argue against this? (Hint: Think about automation).
  3. What is the difference between "Mitigation" and "Remediation"?
  4. Research: What is "CVSS" and can it be used for AI vulnerabilities?

Summary

You have completed Module 14: AI Red Teaming and Pentesting. You now understand how to plan an attack, use automated and manual tools, and most importantly, how to turn those attacks into Defenses through clear reporting and remediation tracking.

Next Module: The Safety Wall: Module 15: AI Guardrails and Safety Filters.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn