Frequency and Presence Penalties: Killing the Loop

Have you ever seen an LLM get "Stuck"? "I will search the database. Then I will search the database. Then I will search the database..."

This is a Repetitive Decay Loop. It happens when the model's highest probability next-token is one it has already said. In an agentic system, this can burn through your entire context window (128k tokens) in a single runaway session.

In this lesson, we learn how to use Frequency and Presence Penalties as a "Token Insurance Policy." We’ll learn how to "Punt" the model out of a loop and toward a conclusion.

1. Frequency Penalty (The Redundancy Brake)

The Frequency Penalty reduces the probability of a token if it has already appeared in the output multiple times.

Effect: If the model has said "search" 5 times, the 6th "search" becomes mathematically harder to output.
Token Efficiency: It forces the model to move on to a different concept (e.g., "summarize" or "stop").

2. Presence Penalty (The Topic Shifter)

The Presence Penalty penalizes any token that has appeared at all. It doesn't care if it appeared 1 or 100 times.

Effect: It forces the model to introduce New Information.
Token Efficiency: It prevents "Stalling" or "Circling" behaviors where the model repeats its own reasoning to "Fill the space."

3. Implementation: The Anti-Loop Config (Python)

For most data-driven and agentic tasks, you should use a Mild Penalty.

Python Code: The Robust Inference Call

# Values range from -2.0 to 2.0
# Negative values make it MORE repetitive (avoid!)
# Positive values (0.1 - 0.5) are the 'Sweet Spot'
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    frequency_penalty=0.2, # Prevents word-loops
    presence_penalty=0.1,  # Encourages move to the next topic
)

Why not set them to 2.0? If the penalty is too high, the model will struggle to use common English words (like "the" or "is") more than once, leading to "Gibberish" output.

4. Penalties and "Action Summaries"

In Module 11, we learned to summarize history. If you use a mild Presence Penalty during the summarization turn, the resulting summary will be Denser. The model will be forced to use 10 different words to describe 10 events, rather than using the same 3 words repeatedly.

Result: Better information density per token in your LTM (Long-term Memory).

5. Summary and Key Takeaways

Safety First: Use frequency_penalty=0.2 as a default for all autonomous agents to prevent "The Infinite Loop."
Density Second: Use presence_penalty to force the model to be concise and fact-oriented during summaries.
The 'Gibberish' Limit: Never go above 0.5 for these parameters unless you are doing experimental creative work.
Logic check: If a model is stuck in a loop even with penalties, your System Prompt (Instruction) is likely contradictory.

In the next lesson, Streaming vs. Batching Token Costs, we look at چگونه to choose the right delivery method for your bill.

Exercise: The Loop Breaker

Create a prompt that encourages repetition: "Repeat the word 'Apple' as many times as you can."
Run 1: Frequency Penalty = 0.
Run 2: Frequency Penalty = 2.0.
Compare the outputs.

(Result: Run 1 will output "Apple" until it hits max tokens. Run 2 will output "Apple" and then switch to other fruit or symbols because it literally cannot say 'Apple' again).

Business Question: How does this same logic prevent an agent from searching the same URL over and over?

Frequency and Presence Penalties: Killing the Loop

Frequency and Presence Penalties: Killing the Loop

1. Frequency Penalty (The Redundancy Brake)

2. Presence Penalty (The Topic Shifter)

3. Implementation: The Anti-Loop Config (Python)

Python Code: The Robust Inference Call

4. Penalties and "Action Summaries"

5. Summary and Key Takeaways

Exercise: The Loop Breaker

Congratulations on completing Module 15 Lesson 3! You are now a loop-prevention expert.

Subscribe to our newsletter