Module 7 Lesson 2: Causes of Hallucinations

We know that hallucinations are wrong answers. But they don't happen randomly. They occur because of specific tensions in the model's design. To understand "Why," we can look at the three most common categories of failure.

1. The Knowledge Gap (Training Cutoffs)

LLMs are like polaroids of the internet at a specific moment in time.

If you train a model in 2023, it has no memory of events in 2024.
When a user asks about 2024 events, the model's attention mechanism looks for relevant keys and finds... nothing.
Instead of saying "I don't know," the model often finds the next best thing (e.g., info from 2023) and tries to "extrapolate" it into 2024.

2. Topic Blur (Overgeneralization)

This happens when the model has seen so much data that two different concepts "blur" together in its vector space.

Example: The model has seen millions of articles about Steve Jobs. It has also seen millions of articles about Bill Gates. Because they share so many neighboring words (Tech, CEO, Microsoft/Apple, Founder), their vectors are close.

Occasionally, the model might swap their facts, claiming Steve Jobs founded Microsoft, simply because the "statistical path" between those two entities is so intertwined.

graph LR
    Entity1["Steve Jobs (Tech, CEO, Founder)"] --- Similar["Shared Context"]
    Similar --- Entity2["Bill Gates (Tech, CEO, Founder)"]
    Signal["Query: Who founded Microsoft?"] --> Confusion["Model Blurs Entities"]
    Confusion --> Hallucination["Output: Steve Jobs"]

3. The "Yes-Man" Problem (Eagerness to Please)

During Fine-Tuning (which we studied in Module 4), models are heavily rewarded for being helpful and providing answers.

If a human trainer votes for an answer that is polite but wrong over an answer that is bluntly ignorant, the model learns that Fluency > Accuracy.

This creates a "Yes-Man" bias where the model prioritizes making the user happy with a long, well-formatted response over the "painful" truth of being unable to answer.

4. Source Confusion

LLMs read the entire internet, including fiction, satire, and misinformation.

If there are 10,000 fan-fiction stories claiming that Batman lives in New York City, and only 5,000 factual Wikipedia pages saying he lives in Gotham, the model might choose New York City because it's the more "frequent" pattern in its specific segment of data.

Lesson Exercise

Goal: Identify "Topic Blur" in action.

Ask an LLM to explain the plot of a TV show you know very well.
Now, ask it to explain the plot of a show that is similar but less famous.
Check if the AI accidentally "borrows" characters or plot points from the famous show to fill in the gaps for the less famous one.

Observation: You'll see how the "gravity" of the famous show's data pulls the model toward it, causing it to hallucinate elements from the more popular source.

Summary

In this lesson, we established:

Hallucinations are caused by training data gaps.
"Topic Blur" occurs when similar concepts get mixed up in high-dimensional space.
The "Yes-Man" bias is a side effect of trying to be helpful during fine-tuning.

Next Lesson: We'll learn how to spot these errors. We'll explore the red flags and patterns used to Detect Hallucinations before they cause real-world problems.