Module 4 Lesson 2: Cost-Aware Prompting
Saving Money by Design. How to optimize your prompts to use fewer tokens and reduce your AWS bill.
Economics of the Prompt: Saving Money
Every character you send to Bedrock and every word it generates costs you money. In an enterprise app with millions of users, "Prompt Bloat" can cost thousands of dollars a month.
1. The Token Counter
Bedrock models don't see words; they see Tokens. 1,000 tokens is roughly 750 words.
- You are billed for (Input Tokens) + (Output Tokens).
2. Strategies for Efficiency
- Condense Your Context: Don't send a whole 50-page PDF if you only need page 3.
- Be Concise: Instead of "Please kindly write a short summary of about two paragraphs," use "Summarize in 2 paragraphs." (Saved 8 tokens).
- Stop Sequences: Use
stopSequencesto tell the model exactly when to finish. This prevents it from "rambling" on and wasting output tokens.
3. Visualizing Token Flow
graph LR
P[Giant 5,000 Token Prompt] --> Model[AI Brain]
Model --> Out[50 Token Answer]
Total[5,050 Tokens Paid]
P_Opt[Small 500 Token Prompt] --> Model
Model --> Out
Total_Opt[550 Tokens Paid: 90% Savings!]
4. Temperature and Reproducibility
- Temperature: High (1.0) = Creative/Random; Low (0.0) = Precise/Boring.
- For business apps (Data extraction, summarizing), use Temperature: 0. This ensures the model gives the same answer every time and doesn't "explore" expensive hallucinations.
Summary
- Input Tokens (your prompt) cost money too.
- Conciseness is an engineering skill, not just a writing style.
stopSequencesprevent expensive rambling.- Temperature: 0 is the standard for stable, cost-aware business logic.