Module 15 Lesson 2: Cost Monitoring
Protecting the Wallet. How to track token usage and set up alerts to prevent unexpected AWS bills from your GenAI apps.
Cost Tracking: No Surprises
Generative AI is one of the only cloud services where a single user can accidentally spend $100 in 1 minute by triggering an infinite logic loop. Cost Monitoring is not a luxury; it's a survival requirement.
1. Token Metrics in CloudWatch
Bedrock automatically sends InputTokenCount and OutputTokenCount metrics to CloudWatch.
- You can build a dashboard that shows your total token spend per hour.
2. Setting up Alarms
- Go to CloudWatch Alarms.
- Create an alarm for
InputTokenCount. - Set a threshold (e.g., more than 1 million tokens in 1 hour).
- Action: SNS Notification (Email/SMS) or Lambda to disable the Bedrock API to stop the bleeding.
3. Visualizing the Burn Rate
graph LR
Day1[1,000 Tokens] --> Day2[1,500 Tokens]
Day2 --> Day3[Infinite Loop Attack!]
Day3 --> Spike[1,000,000 Tokens]
Spike --> Alarm[ALARM TRIGGERED]
Alarm --> Cutoff[API Disabled]
4. Per-Model Tracking
Not all tokens are equal. A Claude Opus token costs 10x more than a Haiku token. Use Cost Allocation Tags to see which model is eating your budget.
Summary
- Token Metrics are sent to CloudWatch automatically.
- Alarms are your primary defense against infinite loops and budget overruns.
- Notifications ensure you are alerted before the bill arrives.
- Budgets should be set per-environment (Development vs Production).