Where the Model Meets the Road

In Domain 1, we learned about the "Brains" (Models) and the "Fuel" (Data). In Domain 2, we focus on the "Body"—the infrastructure that allows your application to function reliably in a production environment.

The most common architectural question in the AWS Certified Generative AI Developer – Professional exam is some variation of: "Which service should I use to host this model?" To answer this, you must master the differences between Amazon Bedrock, Amazon SageMaker, and Custom Stacks (EC2/EKS).

1. The Deployment Flowchart

When deciding on a deployment target, use this logic sequence:

graph TD
    A[Is the model available on Bedrock?] -->|Yes| B{Need custom weights?}
    A -->|No| C[Amazon SageMaker]
    B -->|No| D[Amazon Bedrock]
    B -->|Yes| C
    C --> E{Need extreme control?}
    E -->|No| C
    E -->|Yes| F[EC2 / EKS / Custom]
    
    style D fill:#ccffcc,stroke:#006600
    style C fill:#ffffcc,stroke:#666600
    style F fill:#ffcccc,stroke:#660000

2. Amazon Bedrock: The Serverless Standard

Bedrock is the "Path of Least Resistance."

Pros: Zero infrastructure management, automatic scaling, pay-per-token or provisioned throughput.
Cons: Limited to the models AWS has licensed, no influence over the underlying hardware (e.g., you can't choose to use Graviton or Inferentia).
Pro Developer Tool: Provisioned Throughput. If you have a critical app that needs guaranteed low latency and high RPM (Requests Per Minute), you "rent" a specific amount of capacity for 1 or 6 months.

3. Amazon SageMaker: The Infrastructure Powerhouse

SageMaker is for when you need to be "in the driver's seat."

Custom Models: If you download a specific model from Hugging Face that isn't on Bedrock, you host it here.
Specialized Hardware: You can use AWS Inferentia or AWS Trainium chips to reduce costs by 40-70% for high-volume inference.
Deep Control: You manage the Docker container, the environment variables, and the auto-scaling policy (e.g., "Scale up when CPU > 70%").
SageMaker JumpStart: A library of pre-trained models that you can deploy to a dedicated endpoint with one click.

4. Custom Stacks: EC2 and EKS

Why would you ever manage an EC2 instance manually for AI?

Model Size: Some massive models require specific multi-GPU configurations that SageMaker's standard abstractions might not optimize as perfectly as a hand-tuned EC2 instance.
Legacy Infrastructure: Your company already runs everything on Kubernetes (EKS) and you want to keep the AI model in the same "cluster" for network latency and security consistency.
Proprietary Software: You are using a specialized inference engine (like vLLM or NVIDIA Triton) with custom C++ extensions.

5. Decision Matrix: Deployment Patterns

Requirement	Preferred Deployment
Small Team / Rapid MVP	Amazon Bedrock (On-Demand)
Gauranteed SLA / High Traffic	Amazon Bedrock (Provisioned Throughput)
Custom Fine-tuned Weights	Amazon SageMaker
Cost Optimization at Scale	Amazon SageMaker (Inferentia/Graviton)
Edge Deployment	SageMaker Neo / Greengrass

6. Pro-Tip: The "SQUID" Pattern

Regardless of the stack, you should follow the Simple Queue UI Design (SQUID) pattern for production apps.

Frontend submits request to a Queue (SQS).
Lambda picks up the request and calls Bedrock/SageMaker.
Response is written to a DB (DynamoDB).
Frontend polls or uses WebSockets (AppSync) to get the result.

This prevents "Timeout" errors when the AI takes too long to respond.

Knowledge Check: Test Your Deployment Strategy

Error: Quiz options are missing or invalid.

Summary

You now have the framework for the "Great Service Debate." Bedrock for speed; SageMaker for scale and customization. In the next lesson, we move from the "Where" to the "How," focusing on Integrating Foundation Models into Applications.

Next Lesson: The Bridge: Integrating Foundation Models into Applications

The Deployment Dilemma: Bedrock vs. SageMaker vs. Custom