Audit Logging in RAG Systems

If your RAG system makes a mistake, you need to know who asked the question, what documents were retrieved, and why the model gave the answer it did. This is the goal of Audit Logging.

What to Log?

A complete RAG log entry should include:

Timestamp & User ID.
The Original Query.
The "Pre-processed" Query (if you used Multi-query or Rewriting).
Retrieved Document IDs & their Similarity Scores.
Applied Metadata Filters.
The Full Prompt Sent to the LLM (including System Prompt).
The Generated Response.
Confidence Score / Verification Results.

Implementation with Structured Logging

import json
import logging

def log_rag_event(event):
    # Log to CloudWatch or a dedicated Audit DB
    logging.info(json.dumps(event))

# Example Event
rag_log = {
    "user": "user_123",
    "query": "What is the policy on X?",
    "retrieved": ["doc_A_p5", "doc_B_p2"],
    "scores": [0.92, 0.88],
    "model": "claude-3-5-sonnet",
    "result": "the policy is..."
}

Privacy in Logs

Warning: Never log raw PII. If the user's query contains an email address, redact it before it hits the audit logs.

Using Logs for "Ground Truth"

Over time, these logs become your "Golden Dataset." You can review the logs, identify perfect answers, and use them to test future versions of your RAG system.

Compliance Requirements

Retention: How long must you keep logs? (e.g., 7 years for Finance).
Immutability: Logs should be stored in a "Write-Once-Read-Many" (WORM) storage to prevent tampering.

Exercises

Design a SQL schema for storing RAG audit logs.
Why is logging the "System Prompt" important for debugging?
How can you use logs to identify "Dead documents" (documents that are never retrieved)?