Module 13 Lesson 2: Data Privacy and PII
·Agentic AI

Module 13 Lesson 2: Data Privacy and PII

Protecting the user. How to handle Personally Identifiable Information (PII) and ensure your agent is GDPR/CCPA compliant.

Data Privacy and PII: The Trusted Agent

When you send a prompt to an LLM provider (OpenAI, Anthropic, Google), you are sending that data over the internet. If your user types their Credit Card or Medical History, that data is now on a server you don't control. As an agentic engineer, you must build a "Privacy Shield."

1. What is PII?

Personally Identifiable Information includes:

  • Names, Emails, Phone Numbers.
  • IPs, Mac Addresses.
  • Social Security numbers, Bank details.
  • Specific medical or legal conditions.

2. The "Pre-Processing" Redaction

Don't send raw PII to the model. Redact it locally first.

  • User: "My email is sudeep@example.com"
  • Privacy Shield: Replaces email with [EMAIL_1].
  • LLM receives: "My email is [EMAIL_1]"

The model can still reason about the task without needing the specific private string.


3. Visualizing the Privacy Shield

graph LR
    User[User Data] --> Shield[Redaction Node]
    Shield -->|Clean Data| Brain[LLM Brain]
    Brain -->|Result| Unshield[De-redaction Node]
    Unshield -->|Final Result| User

4. Privacy as a Business Asset

In many industries (Health, Finance), you cannot use cloud LLMs unless you have a specific legal agreement (Business Associate Agreement - BAA).

  • If you don't have a BAA, you must use Local Models (Module 13 Lesson 3).

5. Engineering Tip: Presidio

Microsoft Presidio is an open-source library designed specifically for this. It uses a mix of regex and ML models to find PII in text and anonymize it before you send it to your agent's brain.

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

# 1. Analyze for PII
results = analyzer.analyze(text="My name is Sudeep", entities=["PERSON"], language='en')

# 2. Anonymize
anonymized_result = anonymizer.anonymize(text=text, analyzer_results=results)
# "My name is <PERSON>"

Key Takeaways

  • Privacy is the most common reason enterprise projects are cancelled.
  • Always redact PII locally before sending it to the cloud.
  • Use an Anonymization Layer to keep your logs and prompts compliant.
  • Compliance (GDPR/HIPAA) requires technical guardrails, not just "hope."

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn