
The Advanced Supervisor: Cost-Aware Routing
Learn how to build a 'Financial Router' for multi-agent systems. Master the art of choosing models based on task complexity and remaining token budget.
The Advanced Supervisor: Cost-Aware Routing
In simpler systems, a "Supervisor" just routes tasks. In a production-grade fleet, a Supervisor is a Budget Manager.
The advanced supervisor doesn't just ask "Who should do this?"; it asks:
- "How complex is this task?"
- "Can a cheap model (GPT-4o mini) do it, or do I need the 'Expert' (Claude 3.5 Sonnet)?"
- "How many tokens are left in this session's budget?"
In this lesson, we learn Cost-Aware Routing. We’ll build a supervisor that optimizes for both Intelligence and Economy.
1. The "Intelligence Tiering" Strategy
Not every task requires a $20/month model.
- Tier 1 (Routine): Data formatting, simple extraction, greetings.
- Tier 2 (Analytical): RAG synthesis, summarization, simple logic.
- Tier 3 (Expert): Coding, complex legal reasoning, creative problem-solving.
The Supervisor's Job: Assign the Lowest Possible Tier that can successfully complete the task.
2. Decision Logic: The "Cheap-First" Pattern
A token-efficient supervisor follows the "Try Cheap, Escalate if Confused" pattern.
- Step 1: Send the task to a Tier 1 Agent (Cheap).
- Step 2: The Tier 1 agent includes a "Confidence Score" in its output.
- Step 3: If Score < 0.8, the Supervisor Escalates to a Tier 3 Agent.
graph TD
U[User Query] --> S[Supervisor]
S -->|Simple| L1[Cheap Agent $0.001]
S -->|Complex| L2[Expert Agent $0.10]
L1 -->|Confidence Low| S
S -->|Escalate| L2
3. Implementation: The Cost-Aware Router (Python)
Python Code: Model-Toggling Supervisor
def supervisor_router(task_complexity):
# We maintain a mapping of Tiers to Models
TIERS = {
"LOW": "gpt-4o-mini",
"MEDIUM": "gpt-4o",
"HIGH": "claude-3-5-sonnet"
}
# Logic to evaluate complexity (often using a 1-turn cheap prompt)
complexity = evaluate_task_complexity(task_complexity) # Returns 'LOW' or 'HIGH'
return TIERS[complexity]
@app.post("/agent-task")
async def handle_task(data):
model = supervisor_router(data['input'])
# Call the fleet member with the SELECTED model
pass
4. Budget-Aware Handoffs
The supervisor should track the Cumulative Cost. If the session is nearing its limit, the supervisor should force all specialists into "Aggressive Conciseness" mode.
The "Low-Power" Instruction:
"Alert: Session budget > 80%. All agent responses must be < 20 tokens. Use high-density shorthand only."
5. Token ROI: The Tiered Savings
In a system that processes 1,000 tasks:
- Baseline (All Tier 3): $100.00.
- Cost-Aware (Hybrid): $12.00.
- Savings: 88%.
By move the "Routine" tasks to cheaper models, you make your AI system mathematically sustainable at scale.
6. Summary and Key Takeaways
- Model Matching: Use your best models only when strictly necessary.
- Confidence-Based Escalation: Use cheap models as the "Front Line" and escalate on failure.
- Budget Throttle: Change agent behavior based on remaining token limits.
- Cheap Oversight: The supervisor itself should always be a high-speed, cheap model.
In the next lesson, Communication Protocols for Efficiency, we look at چگونه agents should "Talk" to each other to save tokens.
Exercise: The Intelligence Sorter
- Take 5 tasks:
- A: Find the date of the next meeting.
- B: Write a full Python library for data processing.
- C: Summarize a 1-page email.
- D: Deduplicate a list of 100 names.
- E: Explain the meaning of life.
- Assign a Tier (Low, Medium, or Expert) to each.
- Calculate the cost if you used GPT-4o for all vs. a Tiered approach.
- Most students find the "Tiered" approach is 5x cheaper on average.