
Multi-Agent Efficiency: The Power of Specialization
Learn how to reduce token waste by splitting large agents into many small, 'narrow' agents. Master the 'Supervisor Pattern' and 'Handoff' logic for cost-effective AI.
Multi-Agent Efficiency: The Power of Specialization
The most common mistake in agentic AI is building a "Swiss Army Knife" agent. You give one agent 50 tools, a 5,000-token system prompt covering every edge case, and all the available RAG context.
This agent is a Token Hog. Every turn costs thousands of tokens because of the "Instructions" alone.
In this lesson, we learn the Specialization Strategy. By splitting a "Generalist Agent" into five "Specialist Agents," you reduce the "Input Overhead" for every single turn. We will master the Supervisor Pattern and Agent Handoffs to keep our context windows thin and our reasoning sharp.
1. The Cost of Generality
- Generalist Agent: 3,000 tokens (System) + 1,000 tokens (50 Tool Descs) = 4,000 tokens per turn baseline.
- Specialist Agent (e.g. 'Searcher'): 200 tokens (System) + 100 tokens (2 Tool Descs) = 300 tokens per turn baseline.
The Math: If it takes 5 turns to solve a problem:
- Generalist: 20,000 tokens.
- Specialist: 1,500 tokens.
- Savings: 92%.
2. The Supervisor Pattern
A Supervisor (or Router) doesn't do the work. It simply decides which specialist should handle the task.
- User asks a question.
- Supervisor (Thin Prompt) thinks: "This is a search task."
- Supervisor calls the Search Agent.
- Search Agent does the work and returns the result to the Supervisor.
graph TD
U[User] --> S[Supervisor: 500 tokens]
S --> A1[Searcher: 300 tokens]
S --> A2[Coder: 300 tokens]
S --> A3[Designer: 300 tokens]
A1 & A2 & A3 --> S
S --> Res[Final Answer]
style S fill:#69f
3. Implementation: The Agent Handoff (LangGraph)
In LangGraph, you can explicitly define when one agent "Transfers" its session to another.
Python Code: The Handoff Logic
def supervisor_node(state):
# The supervisor only needs to know the CAPABILITIES
# of the agents, not their full instructions.
prompt = "Task: route. Query: {msg}. Options: [SEARCH, CODE]"
decision = call_llm(prompt)
if decision == "SEARCH":
return "search_agent"
return "coding_agent"
def search_agent_node(state):
# This node has a VERY thin system prompt
# specifically for searching.
return {"results": run_search(state['query'])}
4. Avoiding "State Bloat" during Handoffs
When Agent A hands off to Agent B, don't pass the Reasoning History of Agent A.
- Bad: Pass the whole chat history.
- Good: Pass only the Result Object.
The "Clean Sheet" Rule: Every specialist agent should start with its own "Local" context. If it needs info from the previous agent, that info should be injected as a specific, pre-summarized variable.
5. Token-Efficient "Tool Calling" for Specialists
Specialists only need the tools relevant to them.
- Searcher: needs
google_searchandwiki_lookup. - Coder: needs
python_replandgithub_api.
By segregating tools, you reduce the JSON Schema overhead in your system prompt.
6. Real-World Speed Gains
Multi-agent systems are often Faster than single-agent systems. Why? Because smaller prompts (300 tokens vs 4,000 tokens) result in much faster "Time to First Token" (TTFT) from the model provider.
7. Summary and Key Takeaways
- Abolish Generalists: Small agents are cheaper, faster, and more accurate.
- Supervisor Pattern: Use one thin "Brain" to orchestrate many thin "Servants."
- Differential Context: Only pass the data needed for the Next Step.
- Instruction Isolation: Keep agent logic local to the agent, not global in the state.
In the next lesson, Tool Call Optimization, we look at چگونه to reduce the "Syntax Tax" of calling external APIs.
Exercise: The Architect's Split
- Take a requirement: "An agent that can write code, search the web, and send emails."
- Design a 3-agent system.
- Write the System Prompt for the Supervisor.
- Write the System Prompt for the "Email Agent."
- Evaluate: How many total tokens are in the "Email Agent" prompt compared to a "Full Agent" prompt that has all 3 capabilities?
- (Hint: The Email Agent doesn't need to know about Python libraries or Search APIs).