
The Human in the Loop: Interrupts and Checkpoints
Master the most critical safety pattern in agentic systems. Learn how to pause an autonomous agent, wait for human approval, and resume execution seamlessly.
Human-in-the-Loop Checkpoints
In the pursuit of "Autonomous" agents, we often forget the most important component: The Human. Total autonomy is a dangerous goal for most business applications. You don't want an agent spending $10,000 on ads, deleting production databases, or prescribing medication without a human "stamp of approval."
Human-in-the-Loop (HITL) is the architectural pattern that allows a system to benefit from the speed of AI while maintaining the safety of human judgment.
1. The "Interrupt" Pattern
An interrupt is a hard break in the graph. The agent stops, saves its state, and waits for an external signal before moving to the next node.
The Lifecycle of an Interrupt:
- The Graph reaches a specific "Sensitive" node (e.g.,
Execute_Payment). - LangGraph detects a breakpoint. It saves a Checkpoint of the current state.
- The code stops execution and returns control to the server.
- The UI notifies a human: "Agent needs approval to spend $50."
- The Human reviews the plan and clicks "Approve."
- The Graph is "Invoked" again using the same
ThreadID, and it resumes exactly where it left off.
2. When to Use HITL?
- High-Risk Actions: Payments, Deletions, Sending External Emails.
- Ambiguity: The agent has three possible paths and needs a hint on which one the user prefers.
- Verification: An agent wrote a 2,000-word article and needs an editor to verify the facts.
- Credential Injection: The agent needs a specific password that only a human knows.
3. Implementing Breakpoints in LangGraph
We define breakpoints when we Compile the graph.
# Tell the graph to ALWAYS pause before it enters the 'action' node
app = workflow.compile(
checkpointer=memory_saver,
interrupt_before=["action_node"]
)
# When you run the graph, it will now stop at the edge of 'action_node'
# and wait for a second 'invoke' call.
4. The "Checkpointer": The Time-Machine of State
A checkpointer is a persistent database (Postgres, SQLite) that stores the "Snapshots" of your agent.
Why Checkpointers are Critical for HITL:
- Resumption: A human might take 2 hours to approve an action. You cannot keep a Python thread running for 2 hours. The checkpointer allows you to kill the process and restart it later.
- Rewind (Time Travel): If a human says "No, that plan is bad," the checkpointer allows you to Rewind the state to 3 nodes ago and tell the agent to "Try a different approach."
5. UI Design for HITL
Building a UI for an interrupted agent is complex.
- You need a State Viewer: Show the human what the agent has done so far.
- You need a Plan Viewer: Show exactly what the agent intends to do next.
- You need a diff view: If the agent is modifying a file, show a Green/Red diff.
graph TD
Agent[Agent Thinking...] -->|Pause| UI[Admin Dashboard]
UI -->|Review| Human{Approved?}
Human -- Yes --> Resume[Agent Continues]
Human -- No --> Edit[Human Edits State] --> Resume
6. Types of Human Interaction
- Approval: Binary "Yes/No."
- Selection: "I found three options. Which one do you want?"
- Correction: "You missed a comma in the second paragraph. Fix it."
- Instruction: "Stop researching this topic and focus on competitors instead."
7. The "Human as a Tool" Pattern
Alternatively, you can give the agent a tool called ask_human.
- The agent calls the tool.
- The tool implementation pauses the graph and waits for user input.
- This makes the human just another capability in the agent's utility belt.
Summary and Mental Model
Think of HITL as the "Red Phone" in a high-security submarine. The computer can steer the sub, manage the oxygen, and track targets. But to fire a torpedo, someone has to pick up the phone and turn the key.
This course builds the "Key System" so your agents aldrig burn down your business.
Exercise: Checkpoint Design
- Scenario: You are building an agent that migrates content from a Wordpress site to a Next.js site.
- Where would you put the
interrupt_beforebreakpoint? - What "State" data would you show the human to help them make the decision?
- Where would you put the
- Technical: What is the difference between an
interrupt_beforeand aninterrupt_after?- Which one is safer for a "Delete Database" tool?
- UX: If a human is on vacation and doesn't respond to an approval request for 3 days, how would you design a "Timeout" node in your graph? Ready to build the actual orchestration layer? Next module: LangGraph.