Recursive Attacks: Stopping the Infinite Loop

As we move toward Autonomous Agents, a new security threat emerges: The Recursive Loop Attack.

An attacker provides an input that forces your agent into a "Thinking Loop":

User Input: "Write a poem about why you should write another poem about why you should write another poem... repeat 50 times."
Agent Logic: The agent tries to satisfy the recursive condition, creating 50 turns of high-cost generation.

This is the AI version of a DDoS attack, where the goal is to consume all your API resources and crash your financial runway.

In this lesson, we learn how to implement Circuit Breakers and Depth Limiters to kill these attacks in milliseconds.

1. The Circuit Breaker Pattern

In software engineering, a circuit breaker stops a process if it fails too many times. In AI, we use it to stop an agent if it repeats itself.

The Trigger: If the last 3 "Thoughts" of an agent are 90% semantically similar (calculated via embeddings or simple string overlap).
The Action: Hard Kill. Stop the session and return a "Safety Block" error to the user.

2. Hard Depth Limiting (Module 10.2 revisited)

Never trust the agent's "Planning." Trust your Code.

The Implementation:

Initialize a turn_count = 0.
Every time the agent calls a tool or generates a thought, turn_count += 1.
If turn_count > 5, physically terminate the loop.

Token Efficiency: By setting a hard limit (e.g., 5 steps), you cap your "Maximum Risk" at 5 turns of tokens. Without this, one attack could run for 100 turns.

3. Implementation: The Recursive Guard (Python)

Python Code: Enforcing Step Limits

MAX_TURNS = 5

def agent_loop(task):
    state = {"history": [], "turns": 0}
    
    while state['turns'] < MAX_TURNS:
        # 1. Execute AI Turn
        response = call_llm(state['history'], task)
        state['history'].append(response)
        state['turns'] += 1
        
        # 2. Check for tool calls or completion
        if "FINAL_ANSWER" in response:
            return response
            
        # 3. CIRCUIT BREAKER: Check for repetition
        if is_repeating(state['history']):
            return "ERROR: Recursive behavior detected. Terminating."

    return "ERROR: Maximum reasoning depth reached."

4. Semantic Repetition Detection

Comparing strings isn't enough (an attacker might change one word to bypass a simple match). Use a N-Gram Similarity check. If the agent's output is consistently move over the same concepts without reaching a "FINAL_ANSWER," the system should Force a Reset.

5. Summary and Key Takeaways

Code over Prompt: Use Python variables to enforce limits, not LLM instructions.
Turn Counters: Always increment a counter in your agentic loops.
Similarity Check: Use string or semantic similarity to detect "Thinking Loops."
Early Exit: It is better to fail on Step 5 than to go broke on Step 50.

In the next lesson, Privacy-Preserving Token Compression, we conclude Module 18 by looking at چگونه to save tokens while keeping data secret.

Exercise: The Loop Jail

Create a "Rogue Agent" that tries to search for the same thing forever.
Implement a turn_limit = 3.
Run the agent.
Verify: Does the agent stop?
Calculate: If each turn costs $0.05, how much did you save by stopping at turn 3 instead of letting it run for the standard 128k context window?

(Result: You saved thousands of potential tokens).

Recursive Attacks: Stopping the Infinite Loop

Recursive Attacks: Stopping the Infinite Loop

1. The Circuit Breaker Pattern

2. Hard Depth Limiting (Module 10.2 revisited)

3. Implementation: The Recursive Guard (Python)

Python Code: Enforcing Step Limits

4. Semantic Repetition Detection

5. Summary and Key Takeaways

Exercise: The Loop Jail

Congratulations on completing Module 18 Lesson 4! You are now a circuit-breaker expert.

Subscribe to our newsletter