
Capstone: Building the Multi-Model Router
Implement the 'Intelligence Controller' for your capstone project. Learn how to toggle between models based on task depth and remaining budget.
Capstone: Building the Multi-Model Router
The first layer of our "Budget-First" agent is the Router. We cannot afford to use Claude 3.5 Sonnet to "Format a JSON list." We need a central logic gate that dispatches tasks to the most efficient model.
In this project step, we will implement a Python-based Traffic Controller that manages our project's "Intelligence Tiering."
1. Defining the Tiers
For our project, we will use three distinct tiers:
- Tier 1 (The Worker):
gpt-4o-miniorgemini-1.5-flash.- Jobs: Summarizing search results, identifying keywords, JSON formatting.
- Tier 3 (The Brain):
claude-3-5-sonnetorgpt-4o.- Jobs: Final synthesis, complex logic, citing conflicting sources.
2. Implementation: The Router Code
We will build a Dispatcher class that handles the LLM calls and attributes costs (Module 17.4).
Python Code: The Capstone Router
class BudgetRouter:
def __init__(self):
self.stats = {"tier_1_calls": 0, "tier_3_calls": 0, "cost": 0.0}
def dispatch(self, tier, prompt, system_prompt=""):
if tier == 1:
model = "gpt-4o-mini"
price = 0.00015 / 1000 # Estimate
self.stats["tier_1_calls"] += 1
else:
model = "gpt-4o"
price = 0.015 / 1000
self.stats["tier_3_calls"] += 1
# Execute (Pseudo-code for the implementation)
response = call_llm(model, system_prompt, prompt)
# Track Usage
self.stats["cost"] += response.usage.total_tokens * price
return response.content
# Usage inside the Agent
router = BudgetRouter()
result = router.dispatch(1, "Fix this JSON: {name: 'test'}")
3. The "Budget Guardrail"
In your router, add a Fail-Safe. If the project's cumulative cost exceeds $0.08, the router should Force-Downgrade all subsequent calls to Tier 1.
Logic: It is better to have a "Slightly less smart" final report than to exceed our $0.10 project target.
if self.stats["cost"] > 0.08:
model = "gpt-4o-mini" #強制廉價模式
4. Next Steps
Now that we have our router, we need a way for the agent to Remember things without bloating the context window of every turn.
In the next lesson, we will implement Persistent Semantic Memory using a local vector store.
Exercise: The Routing Test
- Create 5 test cases for your agent.
- Assign Tiers:
- Case A: "Translate this sentence." (Tier 1).
- Case B: "Analyze these 10 conflicting research papers." (Tier 3).
- Run the Router: Verify that your stats correctly track the cost.
- Challenge: Try to complete Case B by move it into 10 smaller "Tier 1" summaries followed by 1 "Tier 3" synthesis.
- Compare the cost vs. move the whole 10 papers into Tier 3 at once.