
Reasoning Conciseness: Sharpening the Agent's Thought
Learn how to prune the 'Internal Monologue' of autonomous agents. Master the 'Technical Shorthand' for agentic reasoning and reduce output costs by 70%.
Reasoning Conciseness: Sharpening the Agent's Thought
Most agent frameworks use Chain of Thought (CoT) to ensure the agent doesn't jump to conclusions. By "thinking aloud," the model uses its own output tokens as a "Scratchpad" to work through problems.
However, "Thinking Aloud" is a double-edged sword:
- It increases accuracy (The Good).
- It increases Output Token Costs (The Bad).
- It increases Input Token Costs for all future turns (The Ugly).
In this lesson, we learn how to implement Concise Reasoning. We’ll move from "Narrative Thinking" to "Shorthand Logic" to get the benefits of CoT without the massive bill.
1. Narrative Thought vs. Technical Shorthand
Narrative (200 Tokens):
"I have looked at the user's request. It seems they want to find the latest stock price for Apple. I know that Apple's ticker symbol is AAPL. I should probably use the finance tool to get this information. I will call the
get_stock_pricetool with the argument 'AAPL' and then I will present the result to the user."
Technical Shorthand (15 Tokens):
"Goal: AAPL price. Action:
get_stock_price('AAPL')via finance_tool."
The Intelligence is the same.
The model still derived the correct ticker and the correct tool. The "Thinking" happened in both cases. But the Shorthand version is 92% cheaper.
2. Setting the "Reasoning Word Budget"
You can enforce conciseness through specific linguistic constraints in the agent's identity.
The "Telegram" Rule:
"Internal Thought Mode: Telegram. Use abbreviations. Max 20 words per thought block. No complete sentences."
The "Symbols" Rule:
"Use
->for logic flow. Use?for uncertainty. Use!for final decisions."
3. Implementation: The Thought Compressor (Python)
You can use Pydantic to enforce a maximum character length for the "Thoughts" field in your JSON output.
Python Code: Enforcing Short Thoughts
from pydantic import BaseModel, constr
class AgentThought(BaseModel):
# constr(max_length=100) will cause a validation error
# if the model gets too wordy.
# This forces it to re-generate with more brevity.
thought: constr(max_length=100)
tool_call: str
def process_agent_output(raw_json):
try:
data = AgentThought.parse_raw(raw_json)
return data
except Exception:
# Penalize the model or retry with a 'STUFF_IT' instruction
return fix_verbosity(raw_json)
4. The "Thought Partitioning" Strategy
If you really need deep reasoning for a high-stakes task (e.g., medical diagnosis), don't include that reasoning in the same thread as the "Action."
- Step 1: Use a "Thinker Agent" to write a 1,000-word analysis. (Expensive).
- Step 2: Use a "Summarizer Agent" to turn that into 5 bullet points. (Cheap).
- Step 3: Send ONLY the 5 bullet points to the "Action Agent."
Savings: All future turns of the conversation only have to "Pay" for the 5 bullet points, not the 1,000-word essay.
5. Visualizing the "Reasoning-to-Signal" Ratio
Monitor your agents to see how much they are "Talking to themselves."
graph LR
A[Output Tokens] --> B[Reasoning: 80%]
A --> C[Action: 20%]
style B fill:#f99
style C fill:#4f4
In a production-optimized agent, the Signal (Action) should be larger than the Noise (Reasoning).
6. Summary and Key Takeaways
- Prune the Monologue: Use telegraphic shorthand for internal thoughts.
- Symbols > Sentences: Replace "I will do X" with "-> X".
- Budgeted Schemas: Use
max_lengthconstraints to enforce conciseness. - Partitioning: Don't carry raw "Deep Reasoning" into future conversation turns.
In the next lesson, Action Verification, we look at چگونه to save tokens by "Verifying" the result before the agent moves on.
Exercise: The Telegram Challenge
- Take a 5-step agent log.
- Manually rewrite all the "Thought" blocks into < 10 words each.
- Show the new log to an LLM and ask: "Based on these thoughts, what was the agent trying to do?"
- If the LLM can still understand the intent, you have successfully compressed the context by 90%.
- Why is this important? Because you just saved 90% on the Input cost of every turn that follows.