Cost Tracking: No Surprises

Generative AI is one of the only cloud services where a single user can accidentally spend $100 in 1 minute by triggering an infinite logic loop. Cost Monitoring is not a luxury; it's a survival requirement.

1. Token Metrics in CloudWatch

Bedrock automatically sends InputTokenCount and OutputTokenCount metrics to CloudWatch.

You can build a dashboard that shows your total token spend per hour.

2. Setting up Alarms

Go to CloudWatch Alarms.
Create an alarm for InputTokenCount.
Set a threshold (e.g., more than 1 million tokens in 1 hour).
Action: SNS Notification (Email/SMS) or Lambda to disable the Bedrock API to stop the bleeding.

3. Visualizing the Burn Rate

graph LR
    Day1[1,000 Tokens] --> Day2[1,500 Tokens]
    Day2 --> Day3[Infinite Loop Attack!]
    Day3 --> Spike[1,000,000 Tokens]
    Spike --> Alarm[ALARM TRIGGERED]
    Alarm --> Cutoff[API Disabled]

4. Per-Model Tracking

Not all tokens are equal. A Claude Opus token costs 10x more than a Haiku token. Use Cost Allocation Tags to see which model is eating your budget.

Summary

Token Metrics are sent to CloudWatch automatically.
Alarms are your primary defense against infinite loops and budget overruns.
Notifications ensure you are alerted before the bill arrives.
Budgets should be set per-environment (Development vs Production).

Module 15 Lesson 2: Cost Monitoring