
Hosted Embeddings via Bedrock
Leverage AWS Bedrock for enterprise-grade, scalable, and secure multimodal embeddings.
Hosted Embeddings via Bedrock
For enterprise RAG systems running on AWS, Amazon Bedrock provides a secure way to access foundation models (like Amazon Titan and Cohere) without managing servers.
The AWS Bedrock Advantage
- Security & Compliance: Data never leaves the AWS perimeter. Great for HIPAA, SOC2, and GDPR requirements.
- Standardized API: One API to access multiple embedding models from different providers.
- Pay-as-you-go: Direct integration with AWS billing.
Using Amazon Titan Multimodal Embeddings
The Titan Multimodal model is specifically designed for RAG systems that mix text and visual data.
Implementation with Boto3
import boto3
import json
import base64
bedrock = boto3.client(service_name='bedrock-runtime')
def get_titan_multimodal_embedding(text=None, image_path=None):
body = {}
if text:
body["inputText"] = text
if image_path:
with open(image_path, "rb") as image_file:
body["inputImage"] = base64.b64encode(image_file.read()).decode('utf8')
response = bedrock.invoke_model(
body=json.dumps(body),
modelId="amazon.titan-embed-image-v1",
accept="application/json",
contentType="application/json"
)
response_body = json.loads(response.get('body').read())
return response_body.get('embedding')
Model Selection on Bedrock
| Model Id | Provider | Supported Modalities |
|---|---|---|
amazon.titan-embed-text-v1 | Amazon | Text |
amazon.titan-embed-image-v1 | Amazon | Text, Image |
cohere.embed-english-v3 | Cohere | Text |
Best Practices for Cloud Embeddings
- Error Handling: Implement exponential backoff for
ThrottlingException. - Latency Optimization: Use the
invoke_modelendpoint for single queries and batch processing for ingestion. - Regions: Ensure your Bedrock model is in the same AWS region as your Vector Database (e.g., Pinecone or Aurora) to minimize latency.
Exercises
- Set up an AWS account and enable the
Titan Multimodal Embeddingsmodel. - Use the code above to embed a local image.
- What is the "Price per 1M characters" for the Titan model compared to OpenAI?