Guardrails and Constraints: The Brakes of Autonomous Systems

Guardrails and Constraints: The Brakes of Autonomous Systems

Ensure your Gemini agents operate safely and within enterprise boundaries. Learn to implement probabilistic and deterministic guardrails, manage prohibited actions, and design robust safety filters.

Guardrails and Constraints: The Brakes of Autonomous Systems

As we build increasingly powerful agents with the Gemini ADK, we must confront a difficult truth: Intelligence without control is dangerous. An autonomous agent tasked with "maximizing revenue" might accidentally violate an SEC regulation. An agent tasked with "helping users" might accidentally leak another user's private data (PII).

In the world of AI Engineering, Guardrails are the technical and logical barriers that keep an agent from causing harm. In this lesson, we will explore how to implement Probabilistic Guardrails (inside the prompt) and Deterministic Guardrails (inside the code), and how to build a multi-layered safety architecture for your agents.


1. Probabilistic vs. Deterministic Guardrails

To build a secure system, you need both "Soft" and "Hard" brakes.

A. Probabilistic Guardrails (Soft Brakes)

These are instructions written into the System Prompt.

  • Example: "Never mention a competitor by name."
  • How they work: They influence the model's likelihood of generating certain tokens.
  • Risk: They can be "jailbroken" if a user is clever enough to trick the model's reasoning.

B. Deterministic Guardrails (Hard Brakes)

These are written in code (Python/Middleware) and run outside the model.

  • Example: A regex check that blocks any response containing a Social Security Number.
  • How they work: They are binary and non-negotiable. If the condition is met, the response is blocked, regardless of what the LLM says.
  • Risk: They can be brittle and might miss "obfuscated" threats.

2. Categories of Constraints

When designing an agent's "Rules of Engagement," we group constraints into four primary buckets.

1. Safety and Ethics (The "Never" List)

  • Prohibited Content: No hate speech, harassment, or self-harm instructions.
  • Bias Mitigation: "Always provide multiple viewpoints for politically sensitive topics."

2. Privacy and Security (The "PII" Border)

  • Data Leakage: "Do not repeat back the user's API keys or passwords."
  • System Access: "You are an agent for Database X only. Never attempt to connect to Database Y."

3. Operational Integrity (The "Budget" Limit)

  • Resource Limits: "You have a maximum of 5 turns to solve this. If you fail, stop and escalate."
  • Tone & Brand: "Always speak in a professional, neutral tone. Never use emojis."

4. Legal and Regulatory (The "Compliance" Gate)

  • Financial Advice: "Never provide specific 'Buy' recommendations."
  • Medical Advice: "Always include a disclaimer that you are an AI, not a doctor."

3. The "Negative Prompting" Strategy

Negative prompting is the practice of telling the model what NOT to do. In Gemini 1.5, negative constraints are highly effective if they are specific and placed in the System Instructions.

Weak Constraint:

"Don't be mean."

Strong (Professional) Constraint:

"NEVER use derogatory language, sarcasm, or aggressive tones. If the user becomes hostile, maintain a neutral demeanor and politely suggest ending the session."


4. Architectural Pattern: The "Validator" Agent

For high-stakes applications, we use a Two-Agent Architecture.

  1. The Worker Agent: Performs the task and generates a response.
  2. The Validator Agent: Reviews the Worker's response against a set of safety criteria before it is sent to the user.
graph TD
    A[User Request] --> B[Worker Agent]
    B -->|Proposed Response| C{Validator Agent}
    C -->|Safe| D[Deliver to User]
    C -->|Unsafe| E[Reject & Re-generate]
    
    style C fill:#EA4335,color:#fff

5. Implementation: The "Safety Filter" Middleware

Let's look at how to implement a deterministic guardrail in a Python agent using a simple validation function.

import re
import google.generativeai as genai

# 1. Define a Deterministic Guardrail (Regex for PII)
def contains_sensitive_data(text: str):
    # Very simple check for a US-style phone number
    phone_pattern = r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
    if re.search(phone_pattern, text):
        return True
    return False

# 2. Wrap the Agent Call
model = genai.GenerativeModel('gemini-1.5-flash')

def execute_agent_safely(prompt: str):
    response = model.generate_content(prompt)
    output = response.text
    
    # Check against our Hard Brake
    if contains_sensitive_data(output):
        return "ERROR: The agent generated sensitive data and was blocked for safety."
    
    return output

# result = execute_agent_safely("Tell me the phone number of the customer...")

6. Prohibited Actions and "The Red Line"

In an agentic system with tools, you must define Red Lines—actions that the model is physically prevented from taking by the code, even if it "wants" to.

Example: The "Delete" tool

Instead of just a delete_file(filename) tool, we build a safe_delete_file(filename) tool that checks if the filename is in a PROTECTED_FILES list before executing. This ensures that even if the agent is "jailbroken" or "hallucinates," it cannot delete the system configuration.


7. Psychological Guardrails: The "Human in the Loop"

The ultimate guardrail is a human.

  • Low Risk: Autonomous execution.
  • Medium Risk: Logic check by another AI (Validator).
  • High Risk: Final approval by a human.

Golden Rule: If an agent is doing something that cannot be undone (sending money, deleting data, emailing a customer), a human must be in the loop.


8. Summary and Exercises

Guardrails are what make agents Production-Ready.

  • Use System Instructions for probabilistic behavioral control.
  • Use Middleware for deterministic data control.
  • Implement the Validator Pattern for high-stakes reasoning.
  • Define Physical Red Lines in your tool code.

Exercises

  1. Guardrail Design: You are building an agent for a school. Define three probabilistic guardrails and one deterministic guardrail to ensure student safety.
  2. Vulnerability Hunt: Write a prompt that tries to "trick" an agent into giving you its internal system instructions (this is called "Prompt Injection"). Now, rewrite the system instructions to prevent this "leak."
  3. Code implementation: Modify the "Safety Filter" Python example above to also check for common "curse words" using a list of strings.

In the next module, we leave the "Instructions" behind and dive into the "Hands" of our agents: Tools and Action Execution.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn