Bedrock Guardrails: The Safety Wrapper

Even with RAG, users might try to trick your AI into saying something offensive or leaking sensitive PII (Personally Identifiable Information). Bedrock Guardrails is a centralized security layer that sits in front of all your models.

1. Core Features of Guardrails

Content Filters: Block hate speech, violence, or sexual content.
Denied Topics: Strictly forbid the AI from talking about competitors or politics.
PII Redaction: Automatically hide social security numbers, emails, or phone numbers.
Word Filters: Create a "Banned List" of words.

2. How it Works

When you call the Bedrock API, you pass a guardrailId. Bedrock checks the Input (Prompt) AND the Output (Response) against your rules.

response = client.converse(
    modelId="anthropic.claude-3-haiku-20240307-v1:0",
    messages=messages,
    guardrailConfig={
        "guardrailIdentifier": "YOUR_GUARDRAIL_ID",
        "guardrailVersion": "1"
    }
)

3. Visualizing the Filter

graph TD
    User[Prompt] --> G1[Guardrail Input Check]
    G1 -->|Blocked| Err[Reject Prompt]
    G1 -->|Valid| Model[LLM Logic]
    Model --> G2[Guardrail Output Check]
    G2 -->|Blocked| Redact[Redact PII/Topic]
    G2 -->|Valid| Final[Safe User Response]

4. Why Use Guardrails over System Prompts?

Consistency: One Guardrail can be applied to 10 different models/apps.
Reporting: You get CloudWatch logs for every time a guardrail is triggered, helping you identify "Attackers" or problematic users.
Speed: Filtering happens at the network layer, often faster and more reliably than a text instruction inside a prompt.

Summary

Guardrails provide a cross-model security layer.
They can filter content, block topics, and redact PII.
They check both input and output.
This is the standard for Enterprise Compliance and safety.

Module 9 Lesson 1: Bedrock Guardrails