
The Deployment Dilemma: Bedrock vs. SageMaker vs. Custom
Architecture is about trade-offs. Learn the professional decision framework for choosing the right deployment stack for your production AI workloads.
Where the Model Meets the Road
In Domain 1, we learned about the "Brains" (Models) and the "Fuel" (Data). In Domain 2, we focus on the "Body"—the infrastructure that allows your application to function reliably in a production environment.
The most common architectural question in the AWS Certified Generative AI Developer – Professional exam is some variation of: "Which service should I use to host this model?" To answer this, you must master the differences between Amazon Bedrock, Amazon SageMaker, and Custom Stacks (EC2/EKS).
1. The Deployment Flowchart
When deciding on a deployment target, use this logic sequence:
graph TD
A[Is the model available on Bedrock?] -->|Yes| B{Need custom weights?}
A -->|No| C[Amazon SageMaker]
B -->|No| D[Amazon Bedrock]
B -->|Yes| C
C --> E{Need extreme control?}
E -->|No| C
E -->|Yes| F[EC2 / EKS / Custom]
style D fill:#ccffcc,stroke:#006600
style C fill:#ffffcc,stroke:#666600
style F fill:#ffcccc,stroke:#660000
2. Amazon Bedrock: The Serverless Standard
Bedrock is the "Path of Least Resistance."
- Pros: Zero infrastructure management, automatic scaling, pay-per-token or provisioned throughput.
- Cons: Limited to the models AWS has licensed, no influence over the underlying hardware (e.g., you can't choose to use Graviton or Inferentia).
- Pro Developer Tool: Provisioned Throughput. If you have a critical app that needs guaranteed low latency and high RPM (Requests Per Minute), you "rent" a specific amount of capacity for 1 or 6 months.
3. Amazon SageMaker: The Infrastructure Powerhouse
SageMaker is for when you need to be "in the driver's seat."
- Custom Models: If you download a specific model from Hugging Face that isn't on Bedrock, you host it here.
- Specialized Hardware: You can use AWS Inferentia or AWS Trainium chips to reduce costs by 40-70% for high-volume inference.
- Deep Control: You manage the Docker container, the environment variables, and the auto-scaling policy (e.g., "Scale up when CPU > 70%").
- SageMaker JumpStart: A library of pre-trained models that you can deploy to a dedicated endpoint with one click.
4. Custom Stacks: EC2 and EKS
Why would you ever manage an EC2 instance manually for AI?
- Model Size: Some massive models require specific multi-GPU configurations that SageMaker's standard abstractions might not optimize as perfectly as a hand-tuned EC2 instance.
- Legacy Infrastructure: Your company already runs everything on Kubernetes (EKS) and you want to keep the AI model in the same "cluster" for network latency and security consistency.
- Proprietary Software: You are using a specialized inference engine (like vLLM or NVIDIA Triton) with custom C++ extensions.
5. Decision Matrix: Deployment Patterns
| Requirement | Preferred Deployment |
|---|---|
| Small Team / Rapid MVP | Amazon Bedrock (On-Demand) |
| Gauranteed SLA / High Traffic | Amazon Bedrock (Provisioned Throughput) |
| Custom Fine-tuned Weights | Amazon SageMaker |
| Cost Optimization at Scale | Amazon SageMaker (Inferentia/Graviton) |
| Edge Deployment | SageMaker Neo / Greengrass |
6. Pro-Tip: The "SQUID" Pattern
Regardless of the stack, you should follow the Simple Queue UI Design (SQUID) pattern for production apps.
- Frontend submits request to a Queue (SQS).
- Lambda picks up the request and calls Bedrock/SageMaker.
- Response is written to a DB (DynamoDB).
- Frontend polls or uses WebSockets (AppSync) to get the result.
This prevents "Timeout" errors when the AI takes too long to respond.
Knowledge Check: Test Your Deployment Strategy
?Knowledge Check
A developer needs to deploy a fine-tuned version of a Llama-3-70B model. The model will be used by an internal application with very predictable traffic (exactly 100 requests per minute, 24/7). Which AWS deployment option is likely most cost-effective and provides the most control over model parameters?
Summary
You now have the framework for the "Great Service Debate." Bedrock for speed; SageMaker for scale and customization. In the next lesson, we move from the "Where" to the "How," focusing on Integrating Foundation Models into Applications.
Next Lesson: The Bridge: Integrating Foundation Models into Applications