
Audit Logging: Tracking the AI's Reading Habits
Learn how to record every interaction with your vector data. Master the art of 'Retrieval Auditing' for compliance and security.
Audit Logging: Tracking the AI's Reading Habits
In traditional databases, we log who "Deleted" a record. In Vector databases, we must also log who "Read" a record. If an employee is querying for "Internal financial projections" every day, your security team needs to know—even if they never actually "Download" a file.
In this lesson, we learning how to implement Retrieval Auditing.
1. The Audit Log Schema
A good vector audit log doesn't just store the query; it stores what the database decided to return.
Key Fields:
timestamp: When did the search happen?user_id: Who made the request?query_text: What was the literal string searched?retrieved_ids: A list of the IDs that were returned as Top-K results.similarity_scores: How "close" was the match? (Very high scores on sensitive queries are red flags).
2. Real-Time Security Alerts
You can use your audit log to trigger alerts for suspicious behavior:
- Broad Scanning: One user searching for many different sensitive topics in a short time.
- Top-K Explosion: A user requesting
top_k=500to "Exfiltrate" as much data as possible in one go. - Access Violations: A query that attempted to use a namespace it didn't have access to.
3. Implementation: The Logging Middleware
Since we are using the Middleman Pattern (Module 16.1), we log the interaction in our Python API.
import logging
def secure_vector_search(user_id, query_str):
# 1. Perform Search
results = index.query(vector=model.encode(query_str), top_k=5)
# 2. Extract Document IDs for the log
doc_ids = [m['id'] for m in results['matches']]
# 3. Write to Audit Log (or ELK stack / Datadog)
logging.info({
"event": "retrieval",
"user": user_id,
"query": query_str,
"docs": doc_ids
})
return results
4. Compliance: Proof of Deletion
Under laws like GDPR, you must be able to prove that a user's data was deleted. Your audit logs are the evidence.
If a user requests "Forget Me," you run a delete operation on the vector DB. Your audit log should record: {"event": "deletion", "subject": "user_xyz_data"}. This log entry is your legal protection during an audit.
5. Summary and Key Takeaways
- Log the Output: Record which document IDs were retrieved, not just the query.
- Watch the Scores: High-similarity matches on sensitive topics require investigation.
- Immutable Logs: Send your audit logs to a separate system (like CloudWatch or Splunk) where they cannot be deleted by the database administrator.
- Visibility: Use dashboards to visualize "Most Retrieved Documents." (If a "Secret" doc is in the Top 10, your system has a leakage risk).
Exercise: The Security Auditor
- You see a log entry:
User_44: query='How do I quit?', docs=['resignation_template', 'hr_severance_policy_2024']. - Is this a security risk?
- The Question: If the
docslist also includedceo_private_payroll_spreadsheet, what would be your first 3 steps to fix the system?