The Deployment Spectrum

As an AWS Generative AI Developer, one of your most critical architectural decisions is where and how to host your Foundation Model. AWS offers multiple ways to consume intelligence, ranging from "hands-off" serverless APIs to "full-control" custom infrastructure.

In this lesson, we will compare Managed Services (Amazon Bedrock), Infrastructure-based Hosting (Amazon SageMaker), and Custom Model Development. Understanding these trade-offs is essential for passing the implementation section of the AIP-C01 exam.

1. Managed Foundation Models (Amazon Bedrock)

Amazon Bedrock is a serverless, managed service that provides access to FMs via an API.

Characteristics:

No Infrastructure: You don't manage servers, GPUs, or scaling.
Pay as you go: You pay per 1,000 tokens (on-demand) or for provisioned throughput.
Speed to Market: You can have a working AI feature in minutes.
Unified API: Switch between Claude, Llama, and Titan with minimal code changes.

Best For:

Standard RAG applications.
General-purpose chatbots.
Rapid prototyping.
Applications where "Least Operational Overhead" is the primary constraint.

2. Hosted/Self-Managed Models (Amazon SageMaker)

If Bedrock is a "Restaurant" (you order from the menu), Amazon SageMaker is a "Professional Kitchen" (you bring your own ingredients and staff).

Characteristics:

Full Infrastructure Control: You choose the instance type (e.g., ml.p4d.24xlarge).
Open-Weights Flexibility: Run any model from Hugging Face or custom-trained weights.
Persistence: The model is always "on" (real-time endpoints).
Control: You manage the auto-scaling policies, VPC security groups, and containers.

Best For:

Models not available on Bedrock.
Highly specialized or proprietary weights.
Scenarios requiring deep integration with a custom ML training pipeline.
High-volume applications where reserved instances are cheaper than per-token pricing.

3. Custom Models and Fine-tuning

Sometimes, a general-purpose model doesn't know your company's internal jargon or "brand voice." There are three ways to customize:

RAG (Retrieval-Augmented Generation): Providing context in the prompt. (Technically not a different "model," but the most common customization).
Fine-Tuning: Taking a base model and showing it a small dataset of labeled examples to change its behavior.
Continued Pre-training: Feeding the model massive amounts of unlabeled domain-specific data (e.g., medical journals) to expand its fundamental knowledge.

graph TD
    A[Base Model] --> B{Need Data?}
    B -->|Yes, current data| C[RAG]
    B -->|Yes, internal style| D[Fine-Tuning]
    B -->|Yes, domain depth| E[Continued Pre-training]
    
    style C fill:#ccffcc,stroke:#006600
    style D fill:#ffffcc,stroke:#666600
    style E fill:#ffcccc,stroke:#660000

Visualization: The spectrum of model customization.

4. The Shared Responsibility Model

In the Professional exam, you must know what you are responsible for vs. what AWS is responsible for.

Component	Amazon Bedrock (Managed)	Amazon SageMaker (Hosted)
GPU Optimization	AWS	USER
Model Availability	AWS	USER (Auto-scaling)
API Security	AWS (IAM)	AWS (IAM)
Prompt Security	USER	USER
Container Patching	AWS	USER

5. Strategic Comparison Table

Feature	Bedrock	SageMaker
Scaling	Automatic	Manual/Policy-based
Customization	Fine-tuning (Limited)	Full training/fine-tuning
Supported Models	Licensed providers only	Any model (Hugging Face)
Latency	Variable	Predictable (dedicated hardware)

6. Real-World Scenario: The Migration Path

Question: A startup built their MVP on Amazon Bedrock using Claude 3. They are now experiencing 100,000 requests per minute and costs are skyrocketing. They also want to use a highly specific Llama-3-70B model with weights they have fine-tuned themselves. What is their best path?

Answer: They should migrate to Amazon SageMaker. By using SageMaker Real-Time Inference with Saving Plans or Reserved Instances, they can host their custom-weights model and gain control over the cost-per-inference at high scale.

Knowledge Check: Test Your Deployment Strategy

Error: Quiz options are missing or invalid.

Summary

You now understand the "Where." Use Bedrock for speed and simplicity. Use SageMaker for control and custom weights. With the models selected and deployment strategy understood, we are ready to move to Domain 1's next challenge: Data Management for GenAI.

Next Module: The Fuel for the Fire: Building Data Pipelines and ETL for AI

Where Does the Brain Live? Hosted vs. Managed vs. Custom Models

The Deployment Spectrum

1. Managed Foundation Models (Amazon Bedrock)

Characteristics:

Best For:

2. Hosted/Self-Managed Models (Amazon SageMaker)

Characteristics:

Best For:

3. Custom Models and Fine-tuning

4. The Shared Responsibility Model

5. Strategic Comparison Table

6. Real-World Scenario: The Migration Path

Knowledge Check: Test Your Deployment Strategy

Summary

Subscribe to our newsletter