Conditional Execution and Retries

In traditional software, if an API call fails, the program crashes or returns an error. In AI agents, we have a unique advantage: The agent can understand the error.

If a tool returns "Invalid API Key," a script is stuck. But an agent can say, "Ah, I used the wrong key, let me try the backup one." This ability to loop back and try again is what makes agents "Autonomous."

In this lesson, we will learn how to implement Self-Correction Loops using conditional edges and retry patterns.

1. The Reasoning Loop: Try, Observe, Correct

The most common retry pattern in Agentic engineering is the Re-Prompt.

graph TD
    Node1[Call Tool] --> Result{Observation}
    Result -->|Success| End[Final Answer]
    Result -->|Failure| Error[Parse Error]
    Error -->|Feedback| Node1

The "Feedback" Prompt

When a tool fails, you don't just "Retry" blindly. You send the error back to the LLM's brain.

System: "You called get_data(id='abc'), but the ID must be numeric. The database returned a 400 error."
LLM: "My apologies. I see the mistake. Let me find the numeric ID first."

2. Implementing Retries in LangGraph

We use Conditional Edges to determine if we should retry or move on.

def should_retry(state):
    # Check if the last message in history is an error
    last_message = state["messages"][-1]
    
    if "Error" in last_message.content and state["attempts"] &lt; 3:
        return "retry_node"
    
    if state["attempts"] >= 3:
        return "human_fallback_node"
        
    return "success_node"

workflow.add_conditional_edges("execution_node", should_retry)

3. The "Max Retries" Guardrail

Crucial Production Warning: Never build an infinite loop. If an LLM is truly confused, it can loop 500 times, costing you $50 and hanging your server. You must always have a counter in your state that triggers a "Hard Stop" or an "Escalation to Human."

4. Types of Retries

Syntactic Retry: The model outputted invalid JSON.
- Fix: Tell the model: "Your JSON was missing a bracket. Please output ONLY valid JSON."
Logic Retry: The model used a tool correctly, but the result wasn't what it expected (e.g., search returned 0 results).
- Fix: Tell the model: "No results were found for 'Blue Shoes'. Try a more general search term like 'Shoes'."
Execution Retry: The 3rd party API is down (503 Error).
- Fix: Use standard Python tenacity or backoff decorators. The agent doesn't need to "Reason" about a server-side crash—your code should just wait 2 seconds and try again.

5. Better than Retries: Validation Nodes

Instead of waiting for an external tool to fail, we often use a "Senior LLM" node to validate the "Junior LLM"'s plan.

Junior Node: "I will delete the database now."
Validator Node: "WAIT. Your goal was to 'Clear temporary cache', not delete the DB. This action is denied. Go back and rethink."

6. Real-World Case Study: Automated Coding

An agent is told to fix a bug in a React component.

The agent writes the fix.
The agent runs npm test.
The test fails (Retry #1).
The agent reads the stack trace, adjusts the code.
The agent runs npm test.
The test passes! (Success).

Result: A task that would have taken a human 20 minutes was solved in 2 loops by a $0.05 agent.

Summary and Mental Model

Think of Retries as the "Do Over" mechanism. A child learning to ride a bike will fall (Failure). If they have no retry logic, they never try again. If they have good retry logic (Self-correction), they look at why they fell (e.g., tilted too far left) and adjust their balance for the next try.

Your job is to provide the "Balance Advice" via the feedback prompt.

Exercise: Retry Design

The Prompt: Write a "Feedback Message" for an agent that tried to call a Search_API with a date in the wrong format (12-05-2024 instead of 2024-12-05).
- How do you make sure the agent doesn't make the same mistake again in the VERY next step?
The Counter: Why is it better to store the retry_count in the LangGraph State rather than a local Python variable?
- (Hint: What happens if the server restarts during the 2nd retry?)
The Escallation: If an agent fails a task for the 3rd time, should it:
- A) Try a 4th time with a "Large" model?
- B) Give up and say "I can't do it"?
- C) Ask the user for a hint?
- (Hint: Professional agents usually choose C). Next, we'll see how to actually pause the graph for that "Hint".

Self-Correction: Conditional Execution and Retries