
Why Hallucinations Still Happen
Understand the root causes of RAG errors and learn to distinguish between 'Creative' and 'Harmful' hallucinations.
Why Hallucinations Still Happen
RAG significantly reduces hallucinations, but it does not eliminate them. To build a reliable system, you must understand why the model still makes things up even when the answer is right in front of it.
The Top 3 Causes of RAG Hallucinations
1. Context Overflow
If you provide 20,000 words of context, the model might "glance over" the specific sentence that answers the query and fall back on its internal training data.
2. Conflicting Documents
If Document A says "2023" and Document B says "2024", the model might invent a third date ("2023.5") or combine them in an incoherent way.
3. Ambiguous Phrasing
If the query is "How do I fix the error?", and the context contains two different errors, the model might mix the "Fix" for Error A with the "Symptoms" of Error B.
Hallucination Types
| Type | Description |
|---|---|
| Intrinsic | The model ignores the context and makes a wild claim. |
| Extrinsic | The model adds details that aren't in the context but are generally true (still bad for "Strict" RAG). |
| Negative Hallucination | The model says "I don't know" even when the answer is in the context. |
The Role of Temperature
A high temperature (e.g., 0.9) makes the model more creative but increases hallucination risk. For RAG, always use a low temperature (0.0 to 0.2).
Why "Native" Knowledge is the Enemy
If you ask about "The Eiffel Tower," the model already knows a lot about it. It might use its training data instead of your specific PDF guide about the Eiffel Tower.
Exercises
- Give Claude a context about a "Spacecraft" but ask it a question about a "Submarine." Does it try to connect them?
- What is the impact of setting
max_tokenstoo low on hallucination? - Why does "Chain of Thought" prompting (asking the model to "think step by step") reduce hallucinations?