Traversal Auditing: Who saw what?

In a traditional search engine, an "Audit Log" says: "User A opened Document B." But in a Graph RAG system, the AI doesn't just "Open" documents; it Walks through them. It might touch 50 different nodes to answer a single question. If a sensitive piece of information is leaked, you need to know exactly which Relationship Path the AI followed to find it.

In this lesson, we will look at Graph Audit Trails. We will learn how to capture the "Trace" of a Cypher query and store it as a permanent record. We will understand how to build a dashboard that shows internal auditors exactly which nodes were used to construct an AI's response, providing "Chain of Custody" for your company's knowledge.

1. Why Log the Traversal?

Standard logs only capture the "Input" and "Output."

Input: "What is the budget?"
Output: "It is $1M."

The Missing Data: Did the AI get that $1M from the (Official_Budget) node or from a (Slack_Message) between two employees? The Path is the evidence of truth. If the AI followed a [:GOSSIP] relationship instead of an [:OFFICIAL] relationship, your system has a Data Quality problem.

2. Capturing the "Lineage"

A lineage record for a Graph RAG request should include:

Request ID: Links to the user session.
Cypher Query: The exact code that was run.
Result Set IDs: A list of all Node IDs that were returned to the LLM.
Relationships Traversed: A list of the edge types explored.

3. The "Audit Subgraph" Pattern

Instead of a flat text log, you can store the audit trail In the Graph.

(Audit:Record)-[:ACCESSED]->(Node:101)
(Audit:Record)-[:ACCESSED]->(Node:102)

The Benefit: You can now run GDS algorithms (Module 11) on your audit data! You can ask: "Which node in our knowledge base is the most 'Leaked'?" or "Which user is exploring the most 'Sensitive' neighborhoods?"

graph TD
    U[User A] -->|Query| R[Audit: Record 99]
    R -->|Touched| N1[Node: Finance]
    R -->|Touched| N2[Node: Strategy]
    R -->|Touched| N3[Node: Sudeep]
    
    style R fill:#4285F4,color:#fff
    note[Auditors can now see the 'Network of Access']

4. Implementation: A Python Lineage Tracker

import uuid

def execute_and_log(user_id, cypher):
    request_id = str(uuid.uuid4())
    
    # 1. Run the query
    results = graph.run(cypher).to_subgraph()
    
    # 2. Extract IDs for auditing
    node_ids = [n.id for n in results.nodes]
    rel_types = [type(r) for r in results.relationships]
    
    # 3. Save to Audit Log
    log_access(user_id, request_id, node_ids, rel_types)
    
    return results

def log_access(user, rid, nodes, rels):
    # Logic to save to a database or a secure file
    print(f"AUDIT | User: {user} | Nodes: {nodes} | Relationships: {rels}")

5. Summary and Exercises

Auditing turns a "Black Box" into a "Transparent Library."

Path Lineage provides the evidence for AI's reasoning.
Node-Level Tracking identify which specific facts are high-value or high-risk.
Graph-based Auditing enables advanced security analytics.
Accountability is required for HIPAA, GDPR, and SOC2 compliance.

Exercises

Audit Drill: If an AI answer contains a mistake, how would you use the Lineage Log to find the "Bad Fact" node?
Storage Task: Should the Audit Log be stored in the same database as the Knowledge Graph? Why or why not? (Hint: Think about "Privilege Escalation" if a hacker gets access).
Visualization: Draw an "Audit Star" where one (Audit:Record) node points to 5 different (Fact) nodes.

In the next lesson, we will look at masking data: Redacting Sensitive Relationships.