
Where Does the Brain Live? Hosted vs. Managed vs. Custom Models
Master the deployment spectrum. Learn when to use the serverless simplicity of Amazon Bedrock versus the full infrastructure control of Amazon SageMaker.
The Deployment Spectrum
As an AWS Generative AI Developer, one of your most critical architectural decisions is where and how to host your Foundation Model. AWS offers multiple ways to consume intelligence, ranging from "hands-off" serverless APIs to "full-control" custom infrastructure.
In this lesson, we will compare Managed Services (Amazon Bedrock), Infrastructure-based Hosting (Amazon SageMaker), and Custom Model Development. Understanding these trade-offs is essential for passing the implementation section of the AIP-C01 exam.
1. Managed Foundation Models (Amazon Bedrock)
Amazon Bedrock is a serverless, managed service that provides access to FMs via an API.
Characteristics:
- No Infrastructure: You don't manage servers, GPUs, or scaling.
- Pay as you go: You pay per 1,000 tokens (on-demand) or for provisioned throughput.
- Speed to Market: You can have a working AI feature in minutes.
- Unified API: Switch between Claude, Llama, and Titan with minimal code changes.
Best For:
- Standard RAG applications.
- General-purpose chatbots.
- Rapid prototyping.
- Applications where "Least Operational Overhead" is the primary constraint.
2. Hosted/Self-Managed Models (Amazon SageMaker)
If Bedrock is a "Restaurant" (you order from the menu), Amazon SageMaker is a "Professional Kitchen" (you bring your own ingredients and staff).
Characteristics:
- Full Infrastructure Control: You choose the instance type (e.g.,
ml.p4d.24xlarge). - Open-Weights Flexibility: Run any model from Hugging Face or custom-trained weights.
- Persistence: The model is always "on" (real-time endpoints).
- Control: You manage the auto-scaling policies, VPC security groups, and containers.
Best For:
- Models not available on Bedrock.
- Highly specialized or proprietary weights.
- Scenarios requiring deep integration with a custom ML training pipeline.
- High-volume applications where reserved instances are cheaper than per-token pricing.
3. Custom Models and Fine-tuning
Sometimes, a general-purpose model doesn't know your company's internal jargon or "brand voice." There are three ways to customize:
- RAG (Retrieval-Augmented Generation): Providing context in the prompt. (Technically not a different "model," but the most common customization).
- Fine-Tuning: Taking a base model and showing it a small dataset of labeled examples to change its behavior.
- Continued Pre-training: Feeding the model massive amounts of unlabeled domain-specific data (e.g., medical journals) to expand its fundamental knowledge.
graph TD
A[Base Model] --> B{Need Data?}
B -->|Yes, current data| C[RAG]
B -->|Yes, internal style| D[Fine-Tuning]
B -->|Yes, domain depth| E[Continued Pre-training]
style C fill:#ccffcc,stroke:#006600
style D fill:#ffffcc,stroke:#666600
style E fill:#ffcccc,stroke:#660000
Visualization: The spectrum of model customization.
4. The Shared Responsibility Model
In the Professional exam, you must know what you are responsible for vs. what AWS is responsible for.
| Component | Amazon Bedrock (Managed) | Amazon SageMaker (Hosted) |
|---|---|---|
| GPU Optimization | AWS | USER |
| Model Availability | AWS | USER (Auto-scaling) |
| API Security | AWS (IAM) | AWS (IAM) |
| Prompt Security | USER | USER |
| Container Patching | AWS | USER |
5. Strategic Comparison Table
| Feature | Bedrock | SageMaker |
|---|---|---|
| Scaling | Automatic | Manual/Policy-based |
| Customization | Fine-tuning (Limited) | Full training/fine-tuning |
| Supported Models | Licensed providers only | Any model (Hugging Face) |
| Latency | Variable | Predictable (dedicated hardware) |
6. Real-World Scenario: The Migration Path
Question: A startup built their MVP on Amazon Bedrock using Claude 3. They are now experiencing 100,000 requests per minute and costs are skyrocketing. They also want to use a highly specific Llama-3-70B model with weights they have fine-tuned themselves. What is their best path?
Answer: They should migrate to Amazon SageMaker. By using SageMaker Real-Time Inference with Saving Plans or Reserved Instances, they can host their custom-weights model and gain control over the cost-per-inference at high scale.
Knowledge Check: Test Your Deployment Strategy
?Knowledge Check
A developer needs to deploy a Generative AI application that uses an open-source model from Hugging Face that is not currently part of the Amazon Bedrock model lineup. The application requires high availability across three Availability Zones. Which AWS approach should be used?
Summary
You now understand the "Where." Use Bedrock for speed and simplicity. Use SageMaker for control and custom weights. With the models selected and deployment strategy understood, we are ready to move to Domain 1's next challenge: Data Management for GenAI.
Next Module: The Fuel for the Fire: Building Data Pipelines and ETL for AI