
Cost-Aware Routing
Model cascading.
Cost-Aware Routing
Route based on difficulty.
The Cascade
- Try Llama 3 8B (Fast, Cheap).
- If answer confidence < 0.8:
- Try GPT-4o (Slow, Expensive).
The graph logic handles this fallback automatically.
LlamaNode -> CheckConfidence -> GPT4Node.
For 80% of tasks, you pay the Llama price. For the hard 20%, you pay for intelligence.