
Module 8 Lesson 5: Human-in-the-Loop AI
The ultimate firewall. Learn how to implement 'Human-in-the-Loop' (HITL) patterns to prevent AI from executing critical actions without explicit human approval.
Module 8 Lesson 5: Human-in-the-loop patterns
In a world of probabilistic AI, the only way to be 100% sure an action is safe is to ask a human. This is called Human-in-the-Loop (HITL).
1. When do you need a Human?
You don't need a human to check if a "Greeting" is safe. You DO need a human if the AI is about to:
- Delete Data: Dropping a table or deleting a user account.
- Spend Money: Making a purchase or authorizing a refund.
- Send Emails/Messages: Communicating outside the internal system.
- Change Permissions: Granting "Admin" status to a user.
2. HITL Patterns
- Pattern A: The "Review Queue": The AI generates a draft (e.g., a customer support response). The response is placed in a dashboard. A human clicks "Approve" or "Edit" before it is sent to the customer.
- Pattern B: The "Sanity Threshold": If the AI's "Confidence Score" is below 80%, it automatically flags the task for human review. If it's above 80%, it proceeds autonomously. (Note: Be careful with AI-generated confidence scores, as attackers can manipulate them!)
- Pattern C: The "Explicit Approval": Before executing a tool, the AI must ask the user: "I am about to delete file 'X'. Is this correct? [Yes/No]".
3. The "Human-out-of-the-Loop" Risk
As AI gets better, humans get lazy. This is called Automation Bias.
- The Risk: A human reviewer sees 1,000 "Correct" AI suggestions and starts clicking "Approve" without reading them.
- The Attack: The 1,001st suggestion has a hidden prompt injection. The human clicks "Approve" by habit, and the exploit is executed.
4. Designing for Resilience
To prevent "Reviewer Fatigue":
- Batching: Group similar AI suggestions together.
- Highlighting: Have a system that highlights exactly What the AI changed or which Data it's acting on (e.g., "Warning: The AI is trying to send $1,000 to an external account").
Exercise: The HITL Designer
- Think of a "Social Media AI Scheduler." Which actions should be autonomous and which should be HITL?
- How would an attacker use "Prompt Injection" to trick a human reviewer into clicking "Approve"?
- What is the "Over-reliance" problem in AI systems?
- Research: What is "Active Learning" and how does it use HITL to make the model smarter over time while staying safe?
Summary
You have completed Module 8: Insecure Output Handling. You now understand that AI output must be treated as hostile, how it can lead to XSS/RCE, the technical tools for sanitization, and the importance of keeping humans in the decision-making loop for critical tasks.
Next Module: The Multi-Agent Menace: Module 9: AI Agents and Plugin Security.