
Common Terminology: Hallucinations, Prompts, and Sampling
Master the essential vocabulary of Generative AI. Learn why AI models hallucinate, how to fix it with Prompt Engineering, and how to tune model output using Temperature, Top-K, and Top-P.
Speaking the Language of GenAI
If you are leading an AI initiative, your team will come to you with problems like "The model is hallucinating" or "We need to adjust the temperature." If you don't know what these mean, you can't make informed decisions.
In this final lesson of Module 1, we will cover the three pillars of GenAI interaction: Hallucinations (The Risk), Prompt Engineering (The Skill), and Sampling Parameters (The Control).
1. Hallucinations
A Hallucination occurs when a Large Language Model generates a confident response that is factually incorrect or nonsensical.
- Example: You ask, "Who won the World Cup in 2024?" and the model answers, "Brazil won the 2024 World Cup," even though the tournament hasn't happened yet.
- Why it happens: Remember, LLMs are prediction engines, not databases. They don't "know" facts; they predict the next likely word. Sometimes, the most statistically probable word is not the factually correct word.
- Mitigation Strategy:
- Grounding: Connect the model to a real source of truth (Google Search or your internal database).
- Low Temperature: Reduce randomness.
- Human-in-the-Loop: Verification for high-stakes decisions.
2. Prompt Engineering Strategies
Prompt Engineering is the art of phrasing your request to get the best possible output from the model. It is effectively "programming with English."
There are three main techniques you need to know for the exam and real-world leadership.
A. Zero-Shot Prompting
You ask the model to do something without giving it any examples. Foundation models are good at this because they have read so much data.
- Prompt: "Classify the sentiment of this text: 'The food was cold.'"
- Output: "Negative."
B. Few-Shot Prompting
You provide a few examples ("shots") of what you want. This radically improves consistency and accuracy.
- Prompt:
Great product -> Positive Bad service -> Negative Okay experience -> Neutral The food was cold -> - Output: "Negative"
C. Chain-of-Thought (CoT) Prompting
For complex reasoning, you ask the model to "show its work" or "think step-by-step." This forces the model to generate intermediate steps, which reduces errors in logic and math.
- Prompt: "A juggler has 10 balls. He drops 3. Then he buys 2 more. How many does he have? Let's think step by step."
- Output: "Start with 10. Drop 3, so 10 - 3 = 7. Buy 2, so 7 + 2 = 9. The answer is 9."
Without "step by step," models often just guess the final number wrong.
3. Controlling Creativity: Sampling Parameters
When you deploy a model in production (using Vertex AI), you can tweak "hyperparameters" that control how the model selects the next token.
A. Temperature (Randomness)
- Range: 0.0 to 1.0 (usually).
- Effect:
- Low (0.1): Flattens the probability curve. The model almost always picks the #1 most likely word. It becomes deterministic and repetitive. Use for: Code, Classification.
- High (0.9): The model might pick the 2nd or 3rd most likely word. This introduces variety. Use for: Brainstorming, Creative Writing.
B. Top-K (The Shortlist)
- Definition: Tells the model to only consider the top K most probable tokens.
- Example (Top-K = 3):
- Next word probabilities: Blue (80%), Red (10%), Green (5%), Purple (1%)...
- The model will only roll the dice between Blue, Red, and Green. Purple is cut off.
- Effect: Removes "long tail" weirdness. Keeps the output safe and on-topic.
C. Top-P (Nucleus Sampling)
- Definition: Tells the model to pick from the top tokens whose cumulative probability adds up to P.
- Example (Top-P = 0.9):
- The model keeps adding words (Blue 80% + Red 10% = 90%) until it hits 0.9. It stops there.
- Effect: Dynamically adjusts the size of the shortlist based on confidence.
Visualizing the Sampling Flow
graph LR
Input[Prompt] --> Model{LLM probabilities}
subgraph "Decoding Strategy (The Filter)"
Model --> K[Top-K Cutoff]
K --> P[Top-P Cutoff]
P --> Temp[Temperature Adjustment]
end
Temp --> Selection((Final Token Selection))
Selection --> Output
style Temp fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
4. Code Example: Configuring Parameters in Python
Here is how you actually set these in the Google Cloud Vertex AI SDK.
from vertexai.preview.language_models import TextGenerationModel
model = TextGenerationModel.from_pretrained("text-bison@001")
# Creative Settings (Brainstorming)
response_creative = model.predict(
"Give me 5 unique names for a pet rock.",
temperature=0.9, # High creativity
top_k=40, # Wide net of words
top_p=0.95
)
# Deterministic Settings (Data Extraction)
response_strict = model.predict(
"Extract the email address from this text.",
temperature=0.1, # Strict, low creativity
top_k=5, # Only the most likely tokens
top_p=0.8
)
5. Summary of Module 1
You have completed the Generative AI Fundamentals module!
- Lesson 1.1: You placed GenAI in the hierarchy (AI > ML > DL > GenAI).
- Lesson 1.2: You learned how LLMs use Tokens and Transformers to predict next words.
- Lesson 1.3: You mastered the vocabulary of Hallucinations, Prompting, and Sampling.
Key Takeaway for Leaders: GenAI is probabilistic, not deterministic. It doesn't give you the "right" answer; it gives you the "likely" answer. Your job is to structure the interaction (Prompt Engineering) and control the randomness (Parameters) to make that likelihood high enough for business value.
In Module 2, we will leave the theory behind and explore the Google Cloud Ecosystem. We will look at the tools you will actually use: Vertex AI, Model Garden, and Gen App Builder.
Knowledge Check
?Knowledge Check
You are building a chatbot to answer questions about your company's HR policy. Accuracy is critical; you do not want the bot to invent policies. How should you configure the Temperature?