
Traversal Auditing: Who saw what?
Leave a digital breadcrumb trail. Learn how to log and audit the specific paths your AI agents take through your graph to ensure accountability and detect data misuse.
Traversal Auditing: Who saw what?
In a traditional search engine, an "Audit Log" says: "User A opened Document B." But in a Graph RAG system, the AI doesn't just "Open" documents; it Walks through them. It might touch 50 different nodes to answer a single question. If a sensitive piece of information is leaked, you need to know exactly which Relationship Path the AI followed to find it.
In this lesson, we will look at Graph Audit Trails. We will learn how to capture the "Trace" of a Cypher query and store it as a permanent record. We will understand how to build a dashboard that shows internal auditors exactly which nodes were used to construct an AI's response, providing "Chain of Custody" for your company's knowledge.
1. Why Log the Traversal?
Standard logs only capture the "Input" and "Output."
- Input: "What is the budget?"
- Output: "It is $1M."
The Missing Data: Did the AI get that $1M from the (Official_Budget) node or from a (Slack_Message) between two employees? The Path is the evidence of truth. If the AI followed a [:GOSSIP] relationship instead of an [:OFFICIAL] relationship, your system has a Data Quality problem.
2. Capturing the "Lineage"
A lineage record for a Graph RAG request should include:
- Request ID: Links to the user session.
- Cypher Query: The exact code that was run.
- Result Set IDs: A list of all Node IDs that were returned to the LLM.
- Relationships Traversed: A list of the edge types explored.
3. The "Audit Subgraph" Pattern
Instead of a flat text log, you can store the audit trail In the Graph.
(Audit:Record)-[:ACCESSED]->(Node:101)(Audit:Record)-[:ACCESSED]->(Node:102)
The Benefit: You can now run GDS algorithms (Module 11) on your audit data! You can ask: "Which node in our knowledge base is the most 'Leaked'?" or "Which user is exploring the most 'Sensitive' neighborhoods?"
graph TD
U[User A] -->|Query| R[Audit: Record 99]
R -->|Touched| N1[Node: Finance]
R -->|Touched| N2[Node: Strategy]
R -->|Touched| N3[Node: Sudeep]
style R fill:#4285F4,color:#fff
note[Auditors can now see the 'Network of Access']
4. Implementation: A Python Lineage Tracker
import uuid
def execute_and_log(user_id, cypher):
request_id = str(uuid.uuid4())
# 1. Run the query
results = graph.run(cypher).to_subgraph()
# 2. Extract IDs for auditing
node_ids = [n.id for n in results.nodes]
rel_types = [type(r) for r in results.relationships]
# 3. Save to Audit Log
log_access(user_id, request_id, node_ids, rel_types)
return results
def log_access(user, rid, nodes, rels):
# Logic to save to a database or a secure file
print(f"AUDIT | User: {user} | Nodes: {nodes} | Relationships: {rels}")
5. Summary and Exercises
Auditing turns a "Black Box" into a "Transparent Library."
- Path Lineage provides the evidence for AI's reasoning.
- Node-Level Tracking identify which specific facts are high-value or high-risk.
- Graph-based Auditing enables advanced security analytics.
- Accountability is required for HIPAA, GDPR, and SOC2 compliance.
Exercises
- Audit Drill: If an AI answer contains a mistake, how would you use the Lineage Log to find the "Bad Fact" node?
- Storage Task: Should the Audit Log be stored in the same database as the Knowledge Graph? Why or why not? (Hint: Think about "Privilege Escalation" if a hacker gets access).
- Visualization: Draw an "Audit Star" where one
(Audit:Record)node points to 5 different(Fact)nodes.
In the next lesson, we will look at masking data: Redacting Sensitive Relationships.