Capstone: Building the Multi-Model Router

The first layer of our "Budget-First" agent is the Router. We cannot afford to use Claude 3.5 Sonnet to "Format a JSON list." We need a central logic gate that dispatches tasks to the most efficient model.

In this project step, we will implement a Python-based Traffic Controller that manages our project's "Intelligence Tiering."

1. Defining the Tiers

For our project, we will use three distinct tiers:

Tier 1 (The Worker): gpt-4o-mini or gemini-1.5-flash.
- Jobs: Summarizing search results, identifying keywords, JSON formatting.
Tier 3 (The Brain): claude-3-5-sonnet or gpt-4o.
- Jobs: Final synthesis, complex logic, citing conflicting sources.

2. Implementation: The Router Code

We will build a Dispatcher class that handles the LLM calls and attributes costs (Module 17.4).

Python Code: The Capstone Router

class BudgetRouter:
    def __init__(self):
        self.stats = {"tier_1_calls": 0, "tier_3_calls": 0, "cost": 0.0}

    def dispatch(self, tier, prompt, system_prompt=""):
        if tier == 1:
            model = "gpt-4o-mini"
            price = 0.00015 / 1000 # Estimate
            self.stats["tier_1_calls"] += 1
        else:
            model = "gpt-4o"
            price = 0.015 / 1000
            self.stats["tier_3_calls"] += 1
            
        # Execute (Pseudo-code for the implementation)
        response = call_llm(model, system_prompt, prompt)
        
        # Track Usage
        self.stats["cost"] += response.usage.total_tokens * price
        return response.content

# Usage inside the Agent
router = BudgetRouter()
result = router.dispatch(1, "Fix this JSON: {name: 'test'}")

3. The "Budget Guardrail"

In your router, add a Fail-Safe. If the project's cumulative cost exceeds $0.08, the router should Force-Downgrade all subsequent calls to Tier 1.

Logic: It is better to have a "Slightly less smart" final report than to exceed our $0.10 project target.

if self.stats["cost"] > 0.08:
    model = "gpt-4o-mini" #強制廉價模式

4. Next Steps

Now that we have our router, we need a way for the agent to Remember things without bloating the context window of every turn.

In the next lesson, we will implement Persistent Semantic Memory using a local vector store.

Exercise: The Routing Test

Create 5 test cases for your agent.
Assign Tiers:
- Case A: "Translate this sentence." (Tier 1).
- Case B: "Analyze these 10 conflicting research papers." (Tier 3).
Run the Router: Verify that your stats correctly track the cost.
Challenge: Try to complete Case B by move it into 10 smaller "Tier 1" summaries followed by 1 "Tier 3" synthesis.
Compare the cost vs. move the whole 10 papers into Tier 3 at once.

Capstone: Building the Multi-Model Router

Capstone: Building the Multi-Model Router

1. Defining the Tiers

2. Implementation: The Router Code

Python Code: The Capstone Router

3. The "Budget Guardrail"

4. Next Steps

Exercise: The Routing Test

One step closer to the $0.10 researcher!

Subscribe to our newsletter