
Resilience in the Dark: FM Routing and Fallback Strategies
Architecture for 100% uptime. Learn how to implement model routing to save costs and fallback strategies to ensure your application survives outages and model failures.
The Intelligent Router
In a professional environment, you never rely on a single model without a safety net. Models can experience outages, latency spikes, or region-specific throttling. Furthermore, using a "smart but expensive" model for simple tasks is a waste of capital.
In this lesson, we will master the two most important resilience patterns for GenAI: Model Routing (choosing the right model for the job) and Fallback Strategies (what to do when things go wrong).
1. What is Model Routing?
Model routing is the logic that sits between your user and the AI. Instead of hardcoding Claude 3.5 Sonnet, you use a "Dispatcher" (usually a small Lambda function) to analyze the incoming request.
Routing Criteria:
- Complexity: If the prompt is "Hey," route to Haiku (Fast/Cheap). If it's "Write a quantum physics thesis," route to Opus (Slow/Smart).
- Cost: If the user is on a "Free Tier," route to Titan. If they are a "Premium Member," route to Claude.
- Availability: If the
us-east-1endpoint is reporting high latency, route tous-west-2.
2. Fallback Strategies (The Safety Net)
A fallback is a "Plan B." In the AWS Certified Generative AI Developer – Professional exam, you must demonstrate how to implement these using AWS SDKs or Step Functions.
The "Cascade" Pattern:
- Try Primary: Claude 3.5 Sonnet in
us-east-1. - On Failure (Throttling/Timeout): Retry with Claude 3.5 Sonnet in
us-west-2. - On Continued Failure: Fall back to Claude 3.5 Haiku (Lower intelligence but higher availability).
- Final Fallback: Return a polite "The AI is currently busy" message instead of a raw JSON error.
3. Cross-Region Failover Architecture
graph TD
U[App Client] --> R[Intelligent Router: Lambda]
R -->|Success| B1[Amazon Bedrock: US-East-1]
R -->|Error 429/500| B2[Amazon Bedrock: US-West-2]
subgraph Primary_Region
B1
end
subgraph Secondary_Region
B2
end
4. Implementing Blue/Green Model Deployments
When AWS releases a new model (e.g., v4 replaces v3.5), you shouldn't just switch. You should use Blue/Green Deployment:
- Blue: Your stable, proven model (v3.5).
- Green: The new, potentially better model (v4).
- Action: Use a weighted routing (e.g., 5% to Green, 95% to Blue) and monitor the "Error Rate" and "Response Accuracy" before fully migrating.
5. Code Example: A Python Model Router
import boto3
from botocore.exceptions import ClientError
def intelligent_invoke(prompt):
bedrock = boto3.client('bedrock-runtime')
# Decisions based on prompt length
if len(prompt) < 100:
model_id = 'anthropic.claude-3-haiku-20240307-v1:0'
else:
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
try:
# Try Primary
return bedrock.invoke_model(modelId=model_id, body=...)
except ClientError as e:
# Fallback to Haiku if Sonnet fails
if model_id != 'anthropic.claude-3-haiku-20240307-v1:0':
print("Sonnet failed. Falling back to Haiku.")
return bedrock.invoke_model(
modelId='anthropic.claude-3-haiku-20240307-v1:0',
body=...
)
raise e
6. Token Throttling vs. Model Outage
Distinguishing between these two is vital for the exam.
- Throttling (429): You are asking too much. The solution is Exponential Backoff.
- Service Outage (503): AWS is having trouble. The solution is Region Failover.
Knowledge Check: Test Your Routing Knowledge
?Knowledge Check
A developer is building a high-availability customer service agent. They want to ensure that if the Amazon Bedrock service in the US-East-1 region experiences high latency, the application can still function. Which architectural change provides the best resilience?
Summary
Resilience is not an accident; it is an engineering choice. By implementing Routing for cost and Fallbacks for uptime, you build an enterprise-worthy AI system. This concludes Module 7. In the final module of Domain 2, we dive into the most exciting pattern of all: Agentic AI Patterns.
Next Module: The Brain at Work: Agent Design and Activity Planning