Speaking the Language of GenAI

If you are leading an AI initiative, your team will come to you with problems like "The model is hallucinating" or "We need to adjust the temperature." If you don't know what these mean, you can't make informed decisions.

In this final lesson of Module 1, we will cover the three pillars of GenAI interaction: Hallucinations (The Risk), Prompt Engineering (The Skill), and Sampling Parameters (The Control).

1. Hallucinations

A Hallucination occurs when a Large Language Model generates a confident response that is factually incorrect or nonsensical.

Example: You ask, "Who won the World Cup in 2024?" and the model answers, "Brazil won the 2024 World Cup," even though the tournament hasn't happened yet.
Why it happens: Remember, LLMs are prediction engines, not databases. They don't "know" facts; they predict the next likely word. Sometimes, the most statistically probable word is not the factually correct word.
Mitigation Strategy:
1. Grounding: Connect the model to a real source of truth (Google Search or your internal database).
2. Low Temperature: Reduce randomness.
3. Human-in-the-Loop: Verification for high-stakes decisions.

2. Prompt Engineering Strategies

Prompt Engineering is the art of phrasing your request to get the best possible output from the model. It is effectively "programming with English."

There are three main techniques you need to know for the exam and real-world leadership.

A. Zero-Shot Prompting

You ask the model to do something without giving it any examples. Foundation models are good at this because they have read so much data.

Prompt: "Classify the sentiment of this text: 'The food was cold.'"
Output: "Negative."

B. Few-Shot Prompting

You provide a few examples ("shots") of what you want. This radically improves consistency and accuracy.

Prompt:

Great product -> Positive
Bad service -> Negative
Okay experience -> Neutral
The food was cold ->

Output: "Negative"

C. Chain-of-Thought (CoT) Prompting

For complex reasoning, you ask the model to "show its work" or "think step-by-step." This forces the model to generate intermediate steps, which reduces errors in logic and math.

Prompt: "A juggler has 10 balls. He drops 3. Then he buys 2 more. How many does he have? Let's think step by step."
Output: "Start with 10. Drop 3, so 10 - 3 = 7. Buy 2, so 7 + 2 = 9. The answer is 9."

Without "step by step," models often just guess the final number wrong.

3. Controlling Creativity: Sampling Parameters

When you deploy a model in production (using Vertex AI), you can tweak "hyperparameters" that control how the model selects the next token.

A. Temperature (Randomness)

Range: 0.0 to 1.0 (usually).
Effect:
- Low (0.1): Flattens the probability curve. The model almost always picks the #1 most likely word. It becomes deterministic and repetitive. Use for: Code, Classification.
- High (0.9): The model might pick the 2nd or 3rd most likely word. This introduces variety. Use for: Brainstorming, Creative Writing.

B. Top-K (The Shortlist)

Definition: Tells the model to only consider the top K most probable tokens.
Example (Top-K = 3):
- Next word probabilities: Blue (80%), Red (10%), Green (5%), Purple (1%)...
- The model will only roll the dice between Blue, Red, and Green. Purple is cut off.
Effect: Removes "long tail" weirdness. Keeps the output safe and on-topic.

C. Top-P (Nucleus Sampling)

Definition: Tells the model to pick from the top tokens whose cumulative probability adds up to P.
Example (Top-P = 0.9):
- The model keeps adding words (Blue 80% + Red 10% = 90%) until it hits 0.9. It stops there.
Effect: Dynamically adjusts the size of the shortlist based on confidence.

Visualizing the Sampling Flow

graph LR
    Input[Prompt] --> Model{LLM probabilities}
    
    subgraph "Decoding Strategy (The Filter)"
    Model --> K[Top-K Cutoff]
    K --> P[Top-P Cutoff]
    P --> Temp[Temperature Adjustment]
    end
    
    Temp --> Selection((Final Token Selection))
    Selection --> Output
    
    style Temp fill:#FFD700,stroke:#333,stroke-width:2px,color:#000

4. Code Example: Configuring Parameters in Python

Here is how you actually set these in the Google Cloud Vertex AI SDK.

from vertexai.preview.language_models import TextGenerationModel

model = TextGenerationModel.from_pretrained("text-bison@001")

# Creative Settings (Brainstorming)
response_creative = model.predict(
    "Give me 5 unique names for a pet rock.",
    temperature=0.9,  # High creativity
    top_k=40,         # Wide net of words
    top_p=0.95
)

# Deterministic Settings (Data Extraction)
response_strict = model.predict(
    "Extract the email address from this text.",
    temperature=0.1,  # Strict, low creativity
    top_k=5,          # Only the most likely tokens
    top_p=0.8
)

5. Summary of Module 1

You have completed the Generative AI Fundamentals module!

Lesson 1.1: You placed GenAI in the hierarchy (AI > ML > DL > GenAI).
Lesson 1.2: You learned how LLMs use Tokens and Transformers to predict next words.
Lesson 1.3: You mastered the vocabulary of Hallucinations, Prompting, and Sampling.

Key Takeaway for Leaders: GenAI is probabilistic, not deterministic. It doesn't give you the "right" answer; it gives you the "likely" answer. Your job is to structure the interaction (Prompt Engineering) and control the randomness (Parameters) to make that likelihood high enough for business value.

In Module 2, we will leave the theory behind and explore the Google Cloud Ecosystem. We will look at the tools you will actually use: Vertex AI, Model Garden, and Gen App Builder.

Knowledge Check

Error: Quiz options are missing or invalid.

Common Terminology: Hallucinations, Prompts, and Sampling