Reasoning Conciseness: Sharpening the Agent's Thought

Most agent frameworks use Chain of Thought (CoT) to ensure the agent doesn't jump to conclusions. By "thinking aloud," the model uses its own output tokens as a "Scratchpad" to work through problems.

However, "Thinking Aloud" is a double-edged sword:

It increases accuracy (The Good).
It increases Output Token Costs (The Bad).
It increases Input Token Costs for all future turns (The Ugly).

In this lesson, we learn how to implement Concise Reasoning. We’ll move from "Narrative Thinking" to "Shorthand Logic" to get the benefits of CoT without the massive bill.

1. Narrative Thought vs. Technical Shorthand

Narrative (200 Tokens):

"I have looked at the user's request. It seems they want to find the latest stock price for Apple. I know that Apple's ticker symbol is AAPL. I should probably use the finance tool to get this information. I will call the get_stock_price tool with the argument 'AAPL' and then I will present the result to the user."

Technical Shorthand (15 Tokens):

"Goal: AAPL price. Action: get_stock_price('AAPL') via finance_tool."

The Intelligence is the same.

The model still derived the correct ticker and the correct tool. The "Thinking" happened in both cases. But the Shorthand version is 92% cheaper.

2. Setting the "Reasoning Word Budget"

You can enforce conciseness through specific linguistic constraints in the agent's identity.

The "Telegram" Rule:

"Internal Thought Mode: Telegram. Use abbreviations. Max 20 words per thought block. No complete sentences."

The "Symbols" Rule:

"Use -> for logic flow. Use ? for uncertainty. Use ! for final decisions."

3. Implementation: The Thought Compressor (Python)

You can use Pydantic to enforce a maximum character length for the "Thoughts" field in your JSON output.

Python Code: Enforcing Short Thoughts

from pydantic import BaseModel, constr

class AgentThought(BaseModel):
    # constr(max_length=100) will cause a validation error 
    # if the model gets too wordy. 
    # This forces it to re-generate with more brevity.
    thought: constr(max_length=100)
    tool_call: str

def process_agent_output(raw_json):
    try:
        data = AgentThought.parse_raw(raw_json)
        return data
    except Exception:
        # Penalize the model or retry with a 'STUFF_IT' instruction
        return fix_verbosity(raw_json)

4. The "Thought Partitioning" Strategy

If you really need deep reasoning for a high-stakes task (e.g., medical diagnosis), don't include that reasoning in the same thread as the "Action."

Step 1: Use a "Thinker Agent" to write a 1,000-word analysis. (Expensive).
Step 2: Use a "Summarizer Agent" to turn that into 5 bullet points. (Cheap).
Step 3: Send ONLY the 5 bullet points to the "Action Agent."

Savings: All future turns of the conversation only have to "Pay" for the 5 bullet points, not the 1,000-word essay.

5. Visualizing the "Reasoning-to-Signal" Ratio

Monitor your agents to see how much they are "Talking to themselves."

graph LR
    A[Output Tokens] --> B[Reasoning: 80%]
    A --> C[Action: 20%]
    
    style B fill:#f99
    style C fill:#4f4

In a production-optimized agent, the Signal (Action) should be larger than the Noise (Reasoning).

6. Summary and Key Takeaways

Prune the Monologue: Use telegraphic shorthand for internal thoughts.
Symbols > Sentences: Replace "I will do X" with "-> X".
Budgeted Schemas: Use max_length constraints to enforce conciseness.
Partitioning: Don't carry raw "Deep Reasoning" into future conversation turns.

In the next lesson, Action Verification, we look at چگونه to save tokens by "Verifying" the result before the agent moves on.

Exercise: The Telegram Challenge

Take a 5-step agent log.
Manually rewrite all the "Thought" blocks into < 10 words each.
Show the new log to an LLM and ask: "Based on these thoughts, what was the agent trying to do?"
If the LLM can still understand the intent, you have successfully compressed the context by 90%.

Why is this important? Because you just saved 90% on the Input cost of every turn that follows.

Reasoning Conciseness: Sharpening the Agent's Thought

Reasoning Conciseness: Sharpening the Agent's Thought

1. Narrative Thought vs. Technical Shorthand

The Intelligence is the same.

2. Setting the "Reasoning Word Budget"

3. Implementation: The Thought Compressor (Python)

Python Code: Enforcing Short Thoughts

4. The "Thought Partitioning" Strategy

5. Visualizing the "Reasoning-to-Signal" Ratio

6. Summary and Key Takeaways

Exercise: The Telegram Challenge

Congratulations on completing Module 10 Lesson 3! You are now a master of agentic brevity.

Subscribe to our newsletter