Module 4 Lesson 2: Cost-Aware Prompting
·AWS Bedrock

Module 4 Lesson 2: Cost-Aware Prompting

Saving Money by Design. How to optimize your prompts to use fewer tokens and reduce your AWS bill.

Economics of the Prompt: Saving Money

Every character you send to Bedrock and every word it generates costs you money. In an enterprise app with millions of users, "Prompt Bloat" can cost thousands of dollars a month.

1. The Token Counter

Bedrock models don't see words; they see Tokens. 1,000 tokens is roughly 750 words.

  • You are billed for (Input Tokens) + (Output Tokens).

2. Strategies for Efficiency

  • Condense Your Context: Don't send a whole 50-page PDF if you only need page 3.
  • Be Concise: Instead of "Please kindly write a short summary of about two paragraphs," use "Summarize in 2 paragraphs." (Saved 8 tokens).
  • Stop Sequences: Use stopSequences to tell the model exactly when to finish. This prevents it from "rambling" on and wasting output tokens.

3. Visualizing Token Flow

graph LR
    P[Giant 5,000 Token Prompt] --> Model[AI Brain]
    Model --> Out[50 Token Answer]
    
    Total[5,050 Tokens Paid]
    
    P_Opt[Small 500 Token Prompt] --> Model
    Model --> Out
    
    Total_Opt[550 Tokens Paid: 90% Savings!]

4. Temperature and Reproducibility

  • Temperature: High (1.0) = Creative/Random; Low (0.0) = Precise/Boring.
  • For business apps (Data extraction, summarizing), use Temperature: 0. This ensures the model gives the same answer every time and doesn't "explore" expensive hallucinations.

Summary

  • Input Tokens (your prompt) cost money too.
  • Conciseness is an engineering skill, not just a writing style.
  • stopSequences prevent expensive rambling.
  • Temperature: 0 is the standard for stable, cost-aware business logic.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn