Module 13 Lesson 4: Rate Limiting and Cost Guards
Economic security. Preventing runaway costs and protecting your API keys from exhaustion.
Rate Limiting: Protecting Your Wallet
Security isn't just about "Hacking"; it's about Availability and Resources. If a user (or a bug) triggers your agent to loop 1,000 times, you are under a "Distributed Denial of Service" (DDoS) attack on your credit card.
1. The "Bill Shock" Risk
Unlike traditional APIs where a request costs $0.00001, a single "Agent Run" can cost $1.00 if it does deep research with a GPT-4 model.
- 10 malicious users can cost you $10,000 in a single afternoon.
2. Token Quotas
In an ADK environment (Module 10), we implement Quotas at the user level.
- Rule: "User A can only spend 100,000 tokens per day."
- Implementation: The agent checks a Redis database before every turn. If the quota is hit, the agent returns: "You have reached your daily AI limit."
3. Concurrency Limits
Large language model providers have strict limits on how many prompts you can send at the same time.
- If 100 agents try to call OpenAI at the exact same second, 90 of them will fail with a
429: Too Many Requestserror. - Solution: Use a Queue. Let the agents wait in line for their turn to "Reason."
4. Visualizing the Rate Limiter
graph TD
User[User Request] --> RL[Rate Limiter: Is User under budget?]
RL -- Yes --> Queue[Task Queue]
RL -- No --> Err[Exit: Usage Limit Reached]
Queue --> Agent[Agent Processing]
Agent --> Update[Update Budget DB]
5. Cost-Aware Agent Logic
A truly smart agent should "know" its own cost.
- You can put the current token cost in the system prompt: "Warning: You have spent $0.45 on this task already. Please wrap up the final answer in the next turn."
Key Takeaways
- Economic Security is as important as data security.
- Quotas must be enforced at the application level (Redis/SQL).
- Queuing prevents your system from crashing when hits by bursts of traffic.
- Transparency (showing the user their usage) reduces frustration when limits are hit.