
Module 14 Lesson 2: AI Pentesting Tools
Firing the cannons. Learn how to use automated scanners like Garak and Microsoft's PyRIT to launch thousands of prompt injection and jailbreak attempts.
Module 14 Lesson 2: Automated pentesting tools (Garak, PyRIT)
Manual prompt engineering is slow. To find the "Edge Cases" in a model, you need to send thousands of variations. This is where automated tools like Garak and PyRIT shine.
1. Garak: The "Nmap" of LLMs
Garak (Generative AI Red-teaming & Assessment Kit) is one of the most popular open-source scanners.
- How it works: It comes with "Probes" (attack payloads) and "Detectors" (ways to check if the attack worked).
- Categories: It can test for:
- Prompt Injection: (e.g., trying to steal the system prompt).
- Hallucination: (e.g., making the AI claim facts about non-existent events).
- Data Leakage: (e.g., trying to extract secret keys).
- Misinformation: (e.g., getting the AI to write fake news).
2. Microsoft PyRIT (Python Risk Identification Tool)
PyRIT is a more advanced framework designed for Agentic workflows.
- The Difference: While Garak sends "One-off" prompts, PyRIT can simulate Multi-Turn Conversations.
- It can act as an "Attacker LLM" that talks to your "Target LLM."
- If the Target LLM refuses an attack, the Attacker LLM Tries Again with a different strategy (e.g., roleplaying or translation).
3. Why Automation is Necessary
- Regression Testing: Every time you update your model or system prompt, you must re-run your security tests. Automation makes this possible in a CI/CD pipeline.
- Coverage: A human might think of 10 ways to ask for a password. An automated tool can generate 10,000 ways (using synonyms, typos, and different languages).
4. The "Evaluator" Problem
How does a tool know if the AI "Failed"?
- Usually, the tool uses a Third LLM (the "Judge") to read the response.
- The Judge is asked: "On a scale of 1 to 10, how successful was the jailbreak in this response?"
- This "AI-evaluating-AI" loop is the state-of-the-art for automated pentesting.
Exercise: The Tool Technician
- What is a "False Negative" in an AI security scan? (Hint: The tool thinks the AI is safe, but a human can still break it).
- Why should you avoid running Garak against a production API with a "Per-Token" cost?
- How can you use Garak to test your "Custom Guardrail" from Module 8?
- Research: What is "Cyber-fuzzing" and how is it different from "Prompt Scanning"?
Summary
Tools like Garak and PyRIT are the "Force Multipliers" of AI security. They allow a single analyst to perform the work of a 50-person red team. By automating the "Boring" attacks, you can focus your human brain on the "Creative" ones.
Next Lesson: Thinking outside the box: Manual jailbreaking and creative testing.