Choosing Your Engine

In the AWS Certified Generative AI Developer – Professional exam, you will frequently be presented with architectural scenarios where multiple models could work, but only one is the "best" fit based on specific constraints.

As a developer, you must move beyond "using the most famous model" and learn to evaluate models based on four dimensions: Performance, Speed (Latency), Price, and Context Window.

In this lesson, we will build a selection matrix that will help you ace these scenario questions.

1. The Selection Framework: The "Four Pillars"

When a company asks you to build an AI feature, you should immediately ask: What is the priority?

Reasoning Complexity: Does the model need to solve math or debug complex code? (Use Claude 3.5 Opus or Mistral Large).
Speed (TTFT): Does a user need an answer in milliseconds? (Use Claude 3.5 Haiku or Llama 3 8B).
Cost: Are you processing millions of simple requests? (Use Amazon Titan Text Lite).
Context Window: Are you feeding the model 500-page PDF manuals? (Use Claude 3.5 with 200k context).

2. Comparing Model Families on AWS

AWS provides a diverse menu of model providers through Amazon Bedrock.

The Anthropic Claude Family (The Gold Standard)

Haiku: Fastest and cheapest. Perfect for basic classification and simple Q&A.
Sonnet: The "Sweet Spot." Excellent balance of intelligence and speed. The default choice for most enterprise agents.
Opus: The heavyweight. Most intelligent, highest reasoning, but slowest and most expensive.

The Meta Llama Family (Open Innovation)

Llama 3 (8B/70B/400B): Extremely versatile. Because it is an open-weights model, it is the best candidate if you plan to move from Bedrock to SageMaker for fine-tuning.

The Amazon Titan Family (Security and Stability)

Titan Text: High focus on content safety and reliability.
Titan Image Generator: Reliable for enterprise-safe image creation (includes watermarking).
Titan Multimodal Embeddings: The standard for building RAG systems on AWS.

3. Performance vs. Cost (The Trade-off)

In the Professional exam, you might encounter a question like this:

"A customer handles 10 million classification requests per day. The classification into 3 categories is simple logic. Which model provides the best ROI?"

Model	Price per 1K Tokens (Approx)	Intelligence	Speed
Claude 3.5 Opus	$$$$	Highest	Slow
Claude 3.5 Sonnet	$$	High	Medium
Claude 3.5 Haiku	$	Medium	Fast

The Strategy: Always start with the smallest model that can reliably perform the task. Over-provisioning a model is the fastest way to blow through an AWS budget.

4. Context Windows: Reading the Whole Book

A Context Window is the maximum amount of information (tokens) a model can "hold in its head" at one time.

Standard (8K - 32K): Good for general chat and single-document analysis.
Large (128K - 200K): Required for analyzing entire codebases, legal contracts, or multi-hour transcripts.

Warning: Larger context leads to higher costs and potentially higher latency. Just because a model can take 200k tokens doesn't mean you should send it 200k tokens every time.

5. Use Case Matrix for Developers

Use Case	Recommended Model	Rationale
Chatbots (Basic)	Claude 3.5 Haiku	Low latency ensures a "snappy" user experience.
Coding Assistant	Claude 3.5 Sonnet	Best-in-class code generation and debugging.
Legal Document Review	Claude 3.5 Opus	Highest reasoning for catching subtle nuances.
Sentiment Analysis	Amazon Titan Text Lite	Extremely low cost for high-volume, simple tasks.
Retrieval (RAG)	Titan Embeddings G1	Optimized for vector search within the AWS ecosystem.

6. Code Example: Selecting Models Programmatically

As a Professional Developer, you might build a "Router" that selects a model based on the input complexity.

import boto3

def route_to_model(task_complexity):
    # Mapping complexity to Bedrock Model IDs
    model_mapping = {
        "low": "anthropic.claude-3-haiku-20240307-v1:0",
        "medium": "anthropic.claude-3-sonnet-20240229-v1:0",
        "high": "anthropic.claude-3-opus-20240229-v1:0"
    }
    
    selected_id = model_mapping.get(task_complexity, "anthropic.claude-3-sonnet-20240229-v1:0")
    print(f"Routing task to: {selected_id}")
    return selected_id

# Usage
# If the task is just 'summarize this 1-sentence email', use Haiku.
# If the task is 'solve this calculus problem', use Opus.

In the exam, you may be asked how to reduce costs for a multi-stage agent. One solution is Model Routing: Use a small model for intent classification and a large model only for the final complex synthesis.

Knowledge Check: Test Your Selection Logic

Error: Quiz options are missing or invalid.

Summary

Selection is a balancing act. You must weigh Intelligence against Economics. In the next lesson, we will look at the different ways these models are delivered: Hosted vs. Managed vs. Custom.

Next Lesson: Where Does the Brain Live? Hosted vs. Managed vs. Custom Models

The Art of Choice: Model Use Cases and Selection Criteria