Module 13 Lesson 2: Data Privacy and PII
Protecting the user. How to handle Personally Identifiable Information (PII) and ensure your agent is GDPR/CCPA compliant.
Data Privacy and PII: The Trusted Agent
When you send a prompt to an LLM provider (OpenAI, Anthropic, Google), you are sending that data over the internet. If your user types their Credit Card or Medical History, that data is now on a server you don't control. As an agentic engineer, you must build a "Privacy Shield."
1. What is PII?
Personally Identifiable Information includes:
- Names, Emails, Phone Numbers.
- IPs, Mac Addresses.
- Social Security numbers, Bank details.
- Specific medical or legal conditions.
2. The "Pre-Processing" Redaction
Don't send raw PII to the model. Redact it locally first.
- User: "My email is sudeep@example.com"
- Privacy Shield: Replaces email with
[EMAIL_1]. - LLM receives: "My email is [EMAIL_1]"
The model can still reason about the task without needing the specific private string.
3. Visualizing the Privacy Shield
graph LR
User[User Data] --> Shield[Redaction Node]
Shield -->|Clean Data| Brain[LLM Brain]
Brain -->|Result| Unshield[De-redaction Node]
Unshield -->|Final Result| User
4. Privacy as a Business Asset
In many industries (Health, Finance), you cannot use cloud LLMs unless you have a specific legal agreement (Business Associate Agreement - BAA).
- If you don't have a BAA, you must use Local Models (Module 13 Lesson 3).
5. Engineering Tip: Presidio
Microsoft Presidio is an open-source library designed specifically for this. It uses a mix of regex and ML models to find PII in text and anonymize it before you send it to your agent's brain.
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
# 1. Analyze for PII
results = analyzer.analyze(text="My name is Sudeep", entities=["PERSON"], language='en')
# 2. Anonymize
anonymized_result = anonymizer.anonymize(text=text, analyzer_results=results)
# "My name is <PERSON>"
Key Takeaways
- Privacy is the most common reason enterprise projects are cancelled.
- Always redact PII locally before sending it to the cloud.
- Use an Anonymization Layer to keep your logs and prompts compliant.
- Compliance (GDPR/HIPAA) requires technical guardrails, not just "hope."