Measuring Hallucination: The Multi-Hop Reality Check

Hallucination is the "Final Boss" of RAG. In standard vector RAG, a hallucination is usually a factual error ("He was born in 1980" instead of 1985). But in Graph RAG, we face a more subtle and dangerous lie: The Hallucinated Relationship.

This happens when an LLM correctly finds Node A and Node C, but invents the connection (Node B) between them. Because it looks logical, it is very hard for a human to catch. In this lesson, we will learn how to detect these "Logical Fabrications" using Automated Path Verification and Triplet Extraction Cross-Referencing.

1. Why Multi-Hop Reasoning Increases High-Stakes Lies

When an LLM performs a 3-hop reasoning chain, it is essentially "Connecting the Dots" in its mind.

Node A: Sudeep
Node C: Project Titan
Hallucination: "Sudeep is the CREATOR of Project Titan."
Graph Reality: Sudeep is just a MEMBER, and the Creator is Jane.

The LLM "Smoothes" the relationship to make it sound more impressive or direct. This is Relationship Infidelity.

2. Strategy: The Triplet-Audit Loop

To detect this, we use a post-processing step:

Extract Triplets from Answer: We ask a separate (small) LLM: "Extract all Subject-Predicate-Object facts from this AI answer."
Verify against Graph: For every extracted fact (S, P, O), we perform a direct query to the Graph Database: MATCH (S)-[r]->(O) RETURN type(r).
Conflict Score: If the AI says [Sudeep] -[:CREATOR]-> [Titan] and the DB says "Relationship not found," we Flag it as a Hallucination.

3. The "Evidence Score" Metric

For every answer your bot gives, you should display an Evidence Score.

100%: Every word in the answer is backed by a direct edge in the retrieved subgraph.
50%: Half the facts are from the graph; the other half are "Common Sense" from the LLM's weights.

RAG Pro Tip: In financial or medical RAG, you should block any answer with an Evidence Score below 90%.

graph TD
    A[AI Answer] --> E[Triplet Extractor]
    E -->| Triplet: S-P-O | V[Graph Verifier]
    V -->|Query| DB[(Knowledge Graph)]
    DB -->|Exists?| RES[Resolution]
    RES -->|No| H[Hallucination Flag]
    RES -->|Yes| G[Grounded Fact]
    
    style H fill:#f44336,color:#fff
    style G fill:#34A853,color:#fff

4. Implementation: Verifying a Chain in Python

def verify_answer(answer_text, retrieved_subgraph):
    # 1. Get the claims the AI made
    claims = llm.extract_claims(answer_text)
    
    # 2. Check each claim
    verified_count = 0
    for claim in claims:
        # Check if this exact relationship exists in our graph evidence
        if is_in_graph(claim, retrieved_subgraph):
            verified_count += 1
            
    return verified_count / len(claims) # The Reliability Score

5. Summary and Exercises

Hallucination in graphs is a failure of Connectivity Integrity.

Relationship Infidelity occurs when the AI "Simplifies" a path.
Triplet Extraction allows for programmatic "Fact Checking."
Evidence Scores provide transparency to the end user.
Multi-Step verification is necessary whenever the reasoning exceeds 1 hop.

Exercises

Lie Detection: An AI says "Dr. Smith treated the patient with Aspirin." You look at the graph and see [Dr. Smith] -[:WORKS_IN]-> [Hospital] <-[:TREATED_IN]- [Patient]. Did the AI hallucinate a relationship? Why?
Threshold Setting: If your bot is for "Movie Trivia," what is an acceptable Hallucination Rate? What if the bot is for "Prescription Drug Interactions"?
Visualization: Draw a 3-hop path that is "True." Now, write a "Hallucinated" version of that path that sounds believable but changes the core relationship.

In the next lesson, we will look at technical benchmarks: End-to-End Performance Benchmarking.