Module 7 Lesson 3: Detecting Hallucinations
·Artificial Intelligence

Module 7 Lesson 3: Detecting Hallucinations

How can you tell if an AI is lying? In this lesson, we learn about Logprobs, Self-Consistency checks, and the 'Stochastic Signature' of a hallucination.

Module 7 Lesson 3: Detecting Hallucinations

Hallucinations are tricky because they look like the truth. They share the same professional tone and grammatical perfection as correct answers. However, there are "tells"—patterns in the model's math and reasoning—that can help us spot a lie before it spreads.

In this lesson, we explore three ways to detect hallucinations: one technical, one behavioral, and one external.


1. Technical Detection: Logprobs

When a model predicts a token, it doesn't just pick one; it gives a probability to all of them. These are called Logprobs (Logarithmic Probabilities).

  • Correct Fact: If the model predicts "Tokyo" with 99.9% probability, it is very likely correct.
  • Hallucination: If the model predicts "Paris" but only with 40% probability (meaning 60% of the weight was split between other cities), it is essentially "guessing."

The Hack: If you see a response where the average "Logprob" of the key facts is low, you should treat the entire answer as a potential hallucination.


2. Behavioral Detection: Self-Consistency

This is a clever trick used by developers. Instead of asking the AI the question once, you ask it three times at a high temperature.

  • Fact: If you ask "Who won the Super Bowl in 1995?" three times, a knowledgeable model will say "Dallas Cowboys" every time.
  • Hallucination: If the model doesn't know, it might say "San Francisco 49ers" the first time, "Dallas Cowboys" the second time, and "Green Bay Packers" the third time.

Rule: If the AI can't agree with itself, it is hallucinating.

graph TD
    User["User Question"] --> Run1["Run 1 (Temp 0.8)"]
    User --> Run2["Run 2 (Temp 0.8)"]
    User --> Run3["Run 3 (Temp 0.8)"]
    Run1 --> Ans1["Answer: A"]
    Run2 --> Ans2["Answer: B"]
    Run3 --> Ans3["Answer: A"]
    Ans1 & Ans2 & Ans3 --> Logic["A matches A (2/3 majority)"]
    Logic --> Final["Reliable Answer: A"]

3. The "Chain of Thought" Audit

If you ask an AI to "Explain your reasoning step-by-step," hallucinations often fall apart.

  • The AI might claim a fact in Step 1.
  • By Step 3, that fact will contradict another part of its logic.
  • By reading the "Chain of Thought," a human can quickly see where the statistical "path" went off the rails.

4. External Verification (Grounding)

The most robust way to detect a hallucination is to compare it against a Source of Truth (like a database or Google Search). If the AI claims a company's revenue was $50M, but the latest SEC filing says $40M, the detection is clear. This is the foundation of RAG, which we will revisit in Lesson 4.


Lesson Exercise

Goal: Spot the "Stochastic Loophole."

  1. Ask an LLM a very obscure math question (e.g., "What is the 100th digit of Pi plus the 101st digit?").
  2. Now, press "Regenerate" twice.
  3. Do the numbers stay the same? Or does the AI give you a different "100th digit" every time?

Observation: If the answer changes, you've caught the model in the middle of a probability-based guess!


Summary

In this lesson, we established:

  • Logprobs represent the model's mathematical "confidence."
  • Self-consistency checks use diversity to find the majority truth.
  • Chain of Thought makes reasoning transparent, exposing logical cracks.

Next Lesson: We wrap up Module 7 by focusing on the solutions. How do we build systems that Reduce Hallucinations to near zero?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn