Module 8 Lesson 5: Human-in-the-Loop AI
·AI Security

Module 8 Lesson 5: Human-in-the-Loop AI

The ultimate firewall. Learn how to implement 'Human-in-the-Loop' (HITL) patterns to prevent AI from executing critical actions without explicit human approval.

Module 8 Lesson 5: Human-in-the-loop patterns

In a world of probabilistic AI, the only way to be 100% sure an action is safe is to ask a human. This is called Human-in-the-Loop (HITL).

1. When do you need a Human?

You don't need a human to check if a "Greeting" is safe. You DO need a human if the AI is about to:

  1. Delete Data: Dropping a table or deleting a user account.
  2. Spend Money: Making a purchase or authorizing a refund.
  3. Send Emails/Messages: Communicating outside the internal system.
  4. Change Permissions: Granting "Admin" status to a user.

2. HITL Patterns

  • Pattern A: The "Review Queue": The AI generates a draft (e.g., a customer support response). The response is placed in a dashboard. A human clicks "Approve" or "Edit" before it is sent to the customer.
  • Pattern B: The "Sanity Threshold": If the AI's "Confidence Score" is below 80%, it automatically flags the task for human review. If it's above 80%, it proceeds autonomously. (Note: Be careful with AI-generated confidence scores, as attackers can manipulate them!)
  • Pattern C: The "Explicit Approval": Before executing a tool, the AI must ask the user: "I am about to delete file 'X'. Is this correct? [Yes/No]".

3. The "Human-out-of-the-Loop" Risk

As AI gets better, humans get lazy. This is called Automation Bias.

  • The Risk: A human reviewer sees 1,000 "Correct" AI suggestions and starts clicking "Approve" without reading them.
  • The Attack: The 1,001st suggestion has a hidden prompt injection. The human clicks "Approve" by habit, and the exploit is executed.

4. Designing for Resilience

To prevent "Reviewer Fatigue":

  • Batching: Group similar AI suggestions together.
  • Highlighting: Have a system that highlights exactly What the AI changed or which Data it's acting on (e.g., "Warning: The AI is trying to send $1,000 to an external account").

Exercise: The HITL Designer

  1. Think of a "Social Media AI Scheduler." Which actions should be autonomous and which should be HITL?
  2. How would an attacker use "Prompt Injection" to trick a human reviewer into clicking "Approve"?
  3. What is the "Over-reliance" problem in AI systems?
  4. Research: What is "Active Learning" and how does it use HITL to make the model smarter over time while staying safe?

Summary

You have completed Module 8: Insecure Output Handling. You now understand that AI output must be treated as hostile, how it can lead to XSS/RCE, the technical tools for sanitization, and the importance of keeping humans in the decision-making loop for critical tasks.

Next Module: The Multi-Agent Menace: Module 9: AI Agents and Plugin Security.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn