Prompt Routing: The Traffic Controller

How do you know when a user's question is "Simple" vs. "Complex"? If the user says "Hello," you don't need GPT-4o. If the user says "Explain the security vulnerabilities in this C++ kernel module," you definitely do.

Prompt Routing is the automated logic that inspects a query and dispatches it to the most efficient model for that specific job.

In this lesson, we master Dynamic Routing Architectures. We’ll explore Semantic Routers, Keyword Routers, and the "Cascading Model" pattern.

1. The Semantic Router (The Intelligence Filter)

A semantic router uses a tiny embedding model (Module 8) to compare the user's query against "Known Intent Patterns."

Pattern A: "Greeting/Small Talk" -> Route to Small Local Model.
Pattern B: "Research Request" -> Route to Search Agent.
Pattern C: "Code Review" -> Route to GPT-4o.

graph TD
    U[User Query] --> R{Semantic Router}
    R -->|Chat| L1[Llama 3 8B]
    R -->|Logic| L2[GPT-4o mini]
    R -->|Coding| L3[Claude 3.5 Sonnet]
    
    style R fill:#f96,stroke-width:4px

2. The Keyword "Fast-Path"

Sometimes, you don't even need embeddings. If a user's query contains http:// or www., they want a search. If it contains def or class , they are coding. Regex Routing is the cheapest possible form of orchestration. It uses Zero Tokens and Zero Milliseconds to make a decision.

3. Implementation: The Cascading Model Pattern

A "Cascade" is a series of models where each one "Vetoes" or "Elevates" to the next.

Python Code: The Router Wrapper

def cascade_completion(query):
    # 1. Tier 1: Try the 'Cheapest'
    # Use a tiny prompt to check if LLM can handle it
    res = call_mini_model("Can you answer this simple fact? Query: " + query)
    
    if "YES" in res:
        # 2. Complete with the cheap model
        return call_mini_model(query)
    
    # 3. Tier 2: Escalate to the Expert
    return call_expert_model(query)

Token Efficiency: You spend 20 tokens to see if you can save 2,000 tokens. This is a 100:1 ROI gamble that pays off in 90% of user sessions.

4. Libraries for Routing: Semantic Router

There are specialized libraries like semantic-router that manage this process for you using local vector search.

from semantic_router import Route
from semantic_router.layer import RouteLayer

# Define our routes
chitchat = Route(name="chitchat", samples=["hi", "how are you", "hello"])
coding = Route(name="coding", samples=["write a function", "fix this bug"])

layer = RouteLayer(routes=[chitchat, coding])

# ROUTE THE QUERY
route = layer("hello there!")
print(route.name) # "chitchat" -> Now call the tiny model!

5. Token Savings: Yearly Projection

In a production app with 10k users:

No Routing: 100M tokens on GPT-4o = $30,000.
With Routing: 80M tokens on Flash, 20M on GPT-4o = $6,100.
Savings: 80%.

6. Summary and Key Takeaways

Classify first, Reason second: Spend tokens to decide which model to use.
Regex is Free: Use traditional code to find obvious "Fast-Paths."
Semantic Layers: Use tiny embeddings to map ambiguous queries to specific tiers.
Cascading Logic: Start cheap and escalate only when necessary.

In the next lesson, Evaluating Model ROI for Specific Tasks, we look at چگونه to measure the success of these routes.

Exercise: The Router Design

Create a list of 10 random user queries.
Manually route them to "Cheap" or "Expert."
Check your reasoning:
- Did you route "What is 2+2" to the Expert? (If so, you failed).
- Did you route "Write a novel about space" to the Expert? (If so, you succeeded).
Build a simple Python function that uses if "code" in query.lower() to route to a different model.

Prompt Routing: The Traffic Controller

Prompt Routing: The Traffic Controller

1. The Semantic Router (The Intelligence Filter)

2. The Keyword "Fast-Path"

3. Implementation: The Cascading Model Pattern

Python Code: The Router Wrapper

4. Libraries for Routing: Semantic Router

5. Token Savings: Yearly Projection

6. Summary and Key Takeaways

Exercise: The Router Design

Congratulations on completing Module 14 Lesson 3! You are now a traffic controller.

Subscribe to our newsletter