LLMs on AWS: Bedrock vs. SageMaker

Amazon Web Services (AWS) is the most popular cloud provider for enterprise AI. As an LLM Engineer, you need to navigate the two distinct paths AWS offers for building AI: AWS Bedrock and Amazon SageMaker.

One is for speed and ease; the other is for power and customization. In this lesson, we will help you choose the right path for your project.

1. AWS Bedrock: The Serverless Revolution

Bedrock is a "Models as a Service" (MaaS) platform. AWS manages the hardware, and you just call the models via an API.

Key Features:

Foundational Models: Access to Claude (Anthropic), Llama (Meta), Mistral, and Titan (Amazon).
Knowledge Bases: Built-in RAG orchestration. You point Bedrock to an S3 bucket, and it handles the chunking and embedding automatically.
Agents for Bedrock: A managed service that handles the reasoning loop and tool calling for you.

Pros: Zero server management. Instant scaling. Cons: Limited to the specific models AWS has partnered with. No low-level GPU control.

2. Amazon SageMaker: The Pro's Workbench

SageMaker is for when you need to "Own the Machine." It is a full Machine Learning platform.

When to use SageMaker?

Fine-Tuning: If you want to train a model using your own specialized LoRA (Module 6).
Custom Serving: If you want to use vLLM or TensorRT to squeeze every drop of performance out of a GPU.
Privacy: If you need to run a model that isn't available on Bedrock.

Pros: Total control over the hardware (A100s, H100s). Can host any model from Hugging Face. Cons: Requires DevOps knowledge to manage endpoints, scaling policies, and instance types.

3. The Choice Matrix

Requirement	Use AWS Bedrock	Use Amazon SageMaker
"I want to start in 5 minutes."	Yes	No
"I need to fine-tune Llama 3 on private data."	No (Limited)	Yes
"I just want to use Claude 3.5."	Yes	No (Not available natively)
"I want to use my own custom serving engine."	No	Yes

4. Architecting a Cloud-Native AI App

A professional AWS AI architecture usually looks like this:

Frontend: React hosted on S3/CloudFront.
Backend: FastAPI running on AWS Lambda (Serverless) or AWS ECS (Containers).
AI Logic: Calling AWS Bedrock for the main reasoning.
Memory/State: Stored in Amazon DynamoDB.

graph TD
    A[User] --> B[API Gateway]
    B --> C[Lambda Function: Python]
    C --> D{AWS Bedrock}
    C --> E[DynamoDB: Session Memory]
    D -- "Tool Call" --> F[S3: Knowledge Base]

Code Concept: Calling Bedrock with `boto3`

import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-east-1")

body = json.dumps({
    "prompt": "Explain Quantum Physics to a 5-year old",
    "max_tokens": 200,
    "temperature": 0.5
})

response = client.invoke_model(
    body=body,
    modelId="meta.llama3-8b-instruct-v1:0"
)

result = json.loads(response.get("body").read())
print(result['generation'])

Summary

AWS Bedrock is the "Easy Button" for AI. Use it for most RAG and Agent tasks.
Amazon SageMaker is the "Extreme Control" option. Use it for fine-tuning and high-performance serving.
Cloud Integration allows you to connect AI to S3, DynamoDB, and IAM security.

In the next lesson, we will look at Kubernetes for AI, learning how to manage your own GPU clusters for maximum scale.

Exercise: The Architect's Choice

You are building a "Legal Discovery Bot" that needs to process 1,000,000 documents once every year. For the rest of the year, it sits idle.

Which AWS service (Bedrock or SageMaker) would you choose to save money?
Why?

Answer Logic: AWS Bedrock. Because it is serverless, you only pay for the tokens you use during that one month of discovery. With SageMaker, you'd have to manage (and potentially pay for) the idle time of the servers unless you have a very complex auto-shutdown setup.