
Module 13 Lesson 5: AI Incident Response
When the bot goes bad. Learn how to respond to AI-specific security breaches, from containing a jailbreak to recovering from a data poisoning attack.
Module 13 Lesson 5: AI incident response playbooks
When an AI breach happens, "Turning it off and on again" doesn't help. You need a Playbook: a step-by-step guide for what to do during the "Golden Hour" of an attack.
Playbook 1: The "Successive Jailbreak"
- Scenario: A user has successfully bypassed your safety filters and is making the AI generate illegal or harmful content.
- Step 1: Containment: Immediately revoke that user's API access (via their
User_IDorAPI_Key). - Step 2: Analysis: Download the full chat history. Identify the "Trigger Phrase" that caused the failure.
- Step 3: Update: Add the trigger phrase (and its synonyms) to your Real-Time Guardrail.
- Step 4: Verification: Run a Red-Team test to ensure the same attack no longer works.
Playbook 2: The "Data Leak"
- Scenario: The AI accidentally output a secret password or internal PII in a response.
- Step 1: Block Output: Temporarily set your output filter to "High Sensitivity" (blocking anything that looks remotely like a secret).
- Step 2: Identify Source: Was the secret in the Training Data, the RAG Context, or the System Prompt?
- Step 3: Erase: Delete the document from the Vector DB (for RAG) or redact the system prompt. If it's in the training data, you may need to stop using that model version immediately.
Playbook 3: The "Poisoning Alert"
- Scenario: You detect that an attacker is uploading malicious PDFs to your RAG system.
- Step 1: Audit: List all documents uploaded in the last 24 hours.
- Step 2: Purge: Remove all untrusted documents.
- Step 3: Validate: Use a "Cleaning LLM" to scan the rest of the knowledge base for hidden instructions.
Playbook 4: The "Autonomous Agent Runaway"
- Scenario: An AI agent with tool access is making rapid, incorrect API calls (e.g., trying to delete 1,000 files).
- Step 1: The "Kill Switch": Disable the API token used by the AI Agent.
- Step 2: Rollback: Use your infrastructure backups to undo the changes made by the AI.
- Step 3: Debrief: Why did the Agent think it was doing the right thing? Was it a logic error or a prompt injection?
Exercise: The Responder
- Why is "Containment" the most important step in an AI incident?
- You have a "Jailbreak" incident. Do you hire a PR firm or a Security auditor first? Why?
- How can "Model Checkpointing" (saving previous versions of the weights) help in a recovery?
- Research: What is "NIST SP 800-61" and how can you adapt it for AI incidents?
Summary
You have completed Module 13: Monitoring, Logging, and Incident Response. You now understand that security is a cycle of detection, reaction, and learning. By preparing your playbooks before the attack happens, you ensure that you stay in control when the AI goes off the rails.
Next Module: The Red Team: Module 14: AI Red Teaming and Pentesting.