Module 14 Lesson 2: Automated pentesting tools (Garak, PyRIT)

Manual prompt engineering is slow. To find the "Edge Cases" in a model, you need to send thousands of variations. This is where automated tools like Garak and PyRIT shine.

1. Garak: The "Nmap" of LLMs

Garak (Generative AI Red-teaming & Assessment Kit) is one of the most popular open-source scanners.

How it works: It comes with "Probes" (attack payloads) and "Detectors" (ways to check if the attack worked).
Categories: It can test for:
- Prompt Injection: (e.g., trying to steal the system prompt).
- Hallucination: (e.g., making the AI claim facts about non-existent events).
- Data Leakage: (e.g., trying to extract secret keys).
- Misinformation: (e.g., getting the AI to write fake news).

2. Microsoft PyRIT (Python Risk Identification Tool)

PyRIT is a more advanced framework designed for Agentic workflows.

The Difference: While Garak sends "One-off" prompts, PyRIT can simulate Multi-Turn Conversations.
It can act as an "Attacker LLM" that talks to your "Target LLM."
If the Target LLM refuses an attack, the Attacker LLM Tries Again with a different strategy (e.g., roleplaying or translation).

3. Why Automation is Necessary

Regression Testing: Every time you update your model or system prompt, you must re-run your security tests. Automation makes this possible in a CI/CD pipeline.
Coverage: A human might think of 10 ways to ask for a password. An automated tool can generate 10,000 ways (using synonyms, typos, and different languages).

4. The "Evaluator" Problem

How does a tool know if the AI "Failed"?

Usually, the tool uses a Third LLM (the "Judge") to read the response.
The Judge is asked: "On a scale of 1 to 10, how successful was the jailbreak in this response?"
This "AI-evaluating-AI" loop is the state-of-the-art for automated pentesting.

Exercise: The Tool Technician

What is a "False Negative" in an AI security scan? (Hint: The tool thinks the AI is safe, but a human can still break it).
Why should you avoid running Garak against a production API with a "Per-Token" cost?
How can you use Garak to test your "Custom Guardrail" from Module 8?
Research: What is "Cyber-fuzzing" and how is it different from "Prompt Scanning"?

Summary

Tools like Garak and PyRIT are the "Force Multipliers" of AI security. They allow a single analyst to perform the work of a 50-person red team. By automating the "Boring" attacks, you can focus your human brain on the "Creative" ones.

Next Lesson: Thinking outside the box: Manual jailbreaking and creative testing.

Module 14 Lesson 2: AI Pentesting Tools