
Self-Correction: Conditional Execution and Retries
Build resilient agents that don't give up. Master the patterns for retry logic, error feeding, and recursive self-correction in LangGraph.
Conditional Execution and Retries
In traditional software, if an API call fails, the program crashes or returns an error. In AI agents, we have a unique advantage: The agent can understand the error.
If a tool returns "Invalid API Key," a script is stuck. But an agent can say, "Ah, I used the wrong key, let me try the backup one." This ability to loop back and try again is what makes agents "Autonomous."
In this lesson, we will learn how to implement Self-Correction Loops using conditional edges and retry patterns.
1. The Reasoning Loop: Try, Observe, Correct
The most common retry pattern in Agentic engineering is the Re-Prompt.
graph TD
Node1[Call Tool] --> Result{Observation}
Result -->|Success| End[Final Answer]
Result -->|Failure| Error[Parse Error]
Error -->|Feedback| Node1
The "Feedback" Prompt
When a tool fails, you don't just "Retry" blindly. You send the error back to the LLM's brain.
- System: "You called
get_data(id='abc'), but the ID must be numeric. The database returned a 400 error." - LLM: "My apologies. I see the mistake. Let me find the numeric ID first."
2. Implementing Retries in LangGraph
We use Conditional Edges to determine if we should retry or move on.
def should_retry(state):
# Check if the last message in history is an error
last_message = state["messages"][-1]
if "Error" in last_message.content and state["attempts"] < 3:
return "retry_node"
if state["attempts"] >= 3:
return "human_fallback_node"
return "success_node"
workflow.add_conditional_edges("execution_node", should_retry)
3. The "Max Retries" Guardrail
Crucial Production Warning: Never build an infinite loop.
If an LLM is truly confused, it can loop 500 times, costing you $50 and hanging your server. You must always have a counter in your state that triggers a "Hard Stop" or an "Escalation to Human."
4. Types of Retries
- Syntactic Retry: The model outputted invalid JSON.
- Fix: Tell the model: "Your JSON was missing a bracket. Please output ONLY valid JSON."
- Logic Retry: The model used a tool correctly, but the result wasn't what it expected (e.g., search returned 0 results).
- Fix: Tell the model: "No results were found for 'Blue Shoes'. Try a more general search term like 'Shoes'."
- Execution Retry: The 3rd party API is down (503 Error).
- Fix: Use standard Python
tenacityorbackoffdecorators. The agent doesn't need to "Reason" about a server-side crash—your code should just wait 2 seconds and try again.
- Fix: Use standard Python
5. Better than Retries: Validation Nodes
Instead of waiting for an external tool to fail, we often use a "Senior LLM" node to validate the "Junior LLM"'s plan.
- Junior Node: "I will delete the database now."
- Validator Node: "WAIT. Your goal was to 'Clear temporary cache', not delete the DB. This action is denied. Go back and rethink."
6. Real-World Case Study: Automated Coding
An agent is told to fix a bug in a React component.
- The agent writes the fix.
- The agent runs
npm test. - The test fails (Retry #1).
- The agent reads the stack trace, adjusts the code.
- The agent runs
npm test. - The test passes! (Success).
Result: A task that would have taken a human 20 minutes was solved in 2 loops by a $0.05 agent.
Summary and Mental Model
Think of Retries as the "Do Over" mechanism. A child learning to ride a bike will fall (Failure). If they have no retry logic, they never try again. If they have good retry logic (Self-correction), they look at why they fell (e.g., tilted too far left) and adjust their balance for the next try.
Your job is to provide the "Balance Advice" via the feedback prompt.
Exercise: Retry Design
- The Prompt: Write a "Feedback Message" for an agent that tried to call a
Search_APIwith a date in the wrong format (12-05-2024instead of2024-12-05).- How do you make sure the agent doesn't make the same mistake again in the VERY next step?
- The Counter: Why is it better to store the
retry_countin the LangGraphStaterather than a local Python variable?- (Hint: What happens if the server restarts during the 2nd retry?)
- The Escallation: If an agent fails a task for the 3rd time, should it:
- A) Try a 4th time with a "Large" model?
- B) Give up and say "I can't do it"?
- C) Ask the user for a hint?
- (Hint: Professional agents usually choose C). Next, we'll see how to actually pause the graph for that "Hint".