Module 9 Lesson 1: Bedrock Guardrails
Setting the Safety Net. How to use AWS Bedrock Guardrails to filter sensitive content and block inappropriate prompts.
Bedrock Guardrails: The Safety Wrapper
Even with RAG, users might try to trick your AI into saying something offensive or leaking sensitive PII (Personally Identifiable Information). Bedrock Guardrails is a centralized security layer that sits in front of all your models.
1. Core Features of Guardrails
- Content Filters: Block hate speech, violence, or sexual content.
- Denied Topics: Strictly forbid the AI from talking about competitors or politics.
- PII Redaction: Automatically hide social security numbers, emails, or phone numbers.
- Word Filters: Create a "Banned List" of words.
2. How it Works
When you call the Bedrock API, you pass a guardrailId. Bedrock checks the Input (Prompt) AND the Output (Response) against your rules.
response = client.converse(
modelId="anthropic.claude-3-haiku-20240307-v1:0",
messages=messages,
guardrailConfig={
"guardrailIdentifier": "YOUR_GUARDRAIL_ID",
"guardrailVersion": "1"
}
)
3. Visualizing the Filter
graph TD
User[Prompt] --> G1[Guardrail Input Check]
G1 -->|Blocked| Err[Reject Prompt]
G1 -->|Valid| Model[LLM Logic]
Model --> G2[Guardrail Output Check]
G2 -->|Blocked| Redact[Redact PII/Topic]
G2 -->|Valid| Final[Safe User Response]
4. Why Use Guardrails over System Prompts?
- Consistency: One Guardrail can be applied to 10 different models/apps.
- Reporting: You get CloudWatch logs for every time a guardrail is triggered, helping you identify "Attackers" or problematic users.
- Speed: Filtering happens at the network layer, often faster and more reliably than a text instruction inside a prompt.
Summary
- Guardrails provide a cross-model security layer.
- They can filter content, block topics, and redact PII.
- They check both input and output.
- This is the standard for Enterprise Compliance and safety.