Responsible AI: Principles, Bias, and Safety

Responsible AI: Principles, Bias, and Safety

AI is powerful but dangerous. Learn Google's 7 AI Principles and how to identify and mitigate bias in your models.

The Responsibility Imperative

As a leader, you are not just responsible for the profit of your AI; you are responsible for its impact. If your AI loan-approval agent systematically denies loans to women, you will be sued, and you will lose customer trust.

Google Cloud places Responsible AI at the center of the certification. You must know the 7 Principles and the types of bias.


1. Google's 7 AI Principles

You don't need to memorize them word-for-word, but you need to know the spirit of them.

  1. Be socially beneficial. (Does this help society?)
  2. Avoid creating or reinforcing unfair bias. (The big one).
  3. Be built and tested for safety. (Don't kill people).
  4. Be accountable to people. (Humans must be able to appeal/understand decisions).
  5. Incorporate privacy design principles. (Don't leak data).
  6. Uphold high standards of scientific excellence.
  7. Be made available for uses that accord with these principles. (Don't sell facial recognition for mass surveillance).

2. Understanding Bias

Models aren't mathematically "neutral." They are trained on the internet. Comparisons of "Doctor" are linked more often to "Man," and "Nurse" to "Woman" in historical data. If uncorrected, the model replicates this.

Types of Harm

  • Representational Harms: Reinforcing stereotypes (e.g., An image generator only showing white people for the prompt "CEO").
  • Allocative Harms: Denying resources (credit, housing, jobs) to a specific group based on race, gender, or age.

Mitigation Strategies

  • Dataset Auditing: Check your training data. Is it balanced?
  • Safety Filters: Using Vertex AI Safety Filters to block toxic output.
  • Red Teaming: Hiring people to try and break your model (make it say racist things) before you launch.

3. Vertex AI Safety Filters

When you use Gemini, you aren't just getting the raw model. You are getting a model wrapped in safety layers.

You can configure thresholds for:

  • Hate Speech
  • Harassment
  • Sexually Explicit Content
  • Dangerous Content (How to make a bomb)

Code Example: Configuring Safety

In the SDK, you can set these to BLOCK_LOW_AND_ABOVE (strict) or BLOCK_ONLY_HIGH (loose).

from vertexai.generative_models import SafetySetting, HarmCategory, HarmBlockThreshold

safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE, # Strict
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    ),
]

model = GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Tell me a hateful joke", safety_settings=safety_settings)

# The model will refuse to answer.
print(response.text) 
# Output: null (blocked)
print(response.candidates[0].finish_reason) 
# Output: SAFETY

4. Summary

  • Responsible AI is not "politeness"; it is risk management.
  • Bias in data leads to bias in outcomes.
  • Safety Filters are your first line of defense in Vertex AI.
  • Human Accountability means you (the leader) are responsible for the AI's mistakes.

In the next lesson, we discuss Data Governance. How do we stop the model from memorizing our secrets?


Knowledge Check

?Knowledge Check

You are deploying a chatbot for a hiring portal. During testing, you notice the bot consistently ranks resumes from one demographic lower than others, even when qualifications are identical. Which AI Principle is being violated?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn