Uncontrolled Agent Loops: The Token Fire

Autonomous agents are the future of AI, but they are also the single biggest source of token waste in the industry. Unlike a simple chatbot that answers once and stops, an agent (built with LangGraph or CrewAI) operates in a loop: Think -> Action -> Observe -> Think.

If an agent encounters an error it doesn't understand, or a tool that returns a "Vague" result, it can enter a Recursive Hallucination. It will try to solve the same problem 100 times, consuming thousands of tokens per second, until you run out of money or the context window hits its limit.

In this lesson, we will identify why agents "run away," how to build "Circuit Breakers" into your code, and how to govern the Reasoning Depth of your AI systems.

1. The Anatomy of a Runaway Agent

A runaway loop usually follows this pattern:

Agent: "I will search for the user's flight."
Tool (Error): "Flight API not responding."
Agent: "Oh no! Maybe if I try searching again with the same parameters..."
Tool (Error): "Flight API not responding."
Agent (Loop): "I really must find this flight. I will try one more time."

Because the agent is "Helpful," it is programmed to keep trying. Without Instructional Guardrails, "Helpful" becomes "Extremely Expensive."

graph TD
    A[User Query] --> B[Agent Node]
    B --> C{Decision}
    C -->|Tool Call| D[External API]
    D -->|Error| B
    
    subgraph "The Token Fire"
        B -- Loop 100x --> C
    end
    
    style B fill:#f66,stroke:#333

2. Why "Thought" is Expensive

When an agent "Thinks" (CoT - Chain of Thought), it produces Output Tokens. As we learned in Module 1, Output tokens are the most expensive. If an agent writes a 500-word justification for why it's about to search for a flight, and it does that 10 times in a loop, you just paid for 5,000 words of "Internal Monologue" that the user never even sees.

3. Implementation: The Circuit Breaker Pattern (Python)

You must never allow an agentic loop to run without a MAX_TURNS constraint. This is the Circuit Breaker.

Python Code: Guarding the LangGraph Loop

from langgraph.graph import StateGraph, END
from typing import TypedDict, List

class AgentState(TypedDict):
    query: str
    steps_taken: int  # CRITICAL: Track the loop count
    history: List[str]

def agent_node(state: AgentState):
    # If we have exceeded our 'Token Budget' or 'Turn Budget'
    if state['steps_taken'] > 5:
        return {"history": ["FAILURE: Max turns reached. Giving up."], "steps_taken": state['steps_taken'] + 1}
    
    # Normal reasoning logic here...
    return {"steps_taken": state['steps_taken'] + 1}

def should_continue(state: AgentState):
    # The 'Router' that checks the circuit breaker
    if state['steps_taken'] > 5:
        return "end"
    # ... other logic
    return "continue"

# Build the graph...

4. Bounding the "Internal Monologue"

Tokens are often wasted in Reasoning Clutter. Models like GPT-4 are verbose.

Standard Agent Instruction:

"Think through your process step-by-step before calling a tool."

Efficient Agent Instruction:

"Reason concisely. Max 2 sentences of internal thought per step. Focus only on the 'Why' of the tool selection."

By adding a Linguistic Constraint to the agent's identity, you can reduce reasoning tokens by 70% without sacrificing logic.

5. Tool Verification: Preventing "Repeated Tool Abuse"

If an agent calls the same tool with the same input twice and gets the same error, the system should Raise an Exception or Ask the Human for help.

Agentcore Strategy (AWS Bedrock): Use "Action Group" validation. If the Lambda function detects a double-call on the same TraceID, it returns a specific instruction: "You have tried this tool twice with no success. Do NOT try again. Explain the error to the user."

6. Real-Time Token Monitoring (Dashboard)

For agentic systems, you need a Kill Switch in your React UI.

const AgentController = () => {
  const [tokensUsed, setTokensUsed] = useState(0);
  const BUDGET_LIMIT = 50000;

  const handleStop = () => {
    // Send a signal to the backend to terminate the LangGraph thread
    terminateExecution();
  };

  return (
    <div className="flex items-center gap-4">
      <div className={`text-sm ${tokensUsed > 40000 ? 'text-red-500' : 'text-slate-400'}`}>
        Usage: {tokensUsed} / {BUDGET_LIMIT}
      </div>
      {tokensUsed > BUDGET_LIMIT && (
        <button onClick={handleStop} className="bg-red-600 px-3 py-1 rounded">
          Forced Terminate
        </button>
      )}
    </div>
  );
};

7. Summary and Key Takeaways

Autonomous != Unlimited: Every agent loop must have a hard boundary (max_iterations).
Cost of Reasoning: Internal monologue is expensive. Constrain the length of the agent's "Thoughts."
Double-Call Detection: Prevent the agent from hitting the same failing API repeatedly.
Human-in-the-Loop: When an agent is confused, it's cheaper to ask a human than to let the AI "figure it out" for 100,000 tokens.

In the next module, Module 3: Token Efficiency as a Design Principle, we move from problem identification to Solution Architecture. We will learn how to design systems that are "Thin-by-Default."

Exercise: The Loop Limit Test

Design a prompt for an agent that is trying to "Guess a number between 1 and 100."
Explicitly do not give it a limit on turns.
Observe how many tokens it generates to "Think" about its next guess.
Now, rewrite the prompt with: "Constraint: Solved in < 10 turns. Goal: Minimum reasoning fluff."

How does the Total Cost of the "Number Guessing" session change?

Uncontrolled Agent Loops: The Token Fire

Uncontrolled Agent Loops: The Token Fire

1. The Anatomy of a Runaway Agent

2. Why "Thought" is Expensive

3. Implementation: The Circuit Breaker Pattern (Python)

Python Code: Guarding the LangGraph Loop

4. Bounding the "Internal Monologue"

5. Tool Verification: Preventing "Repeated Tool Abuse"

6. Real-Time Token Monitoring (Dashboard)

7. Summary and Key Takeaways

Exercise: The Loop Limit Test

Congratulations on completing Module 2! You have successfully audited your system for token waste.

Subscribe to our newsletter