Handling Model Outputs

Getting the prompt right is half the battle. The other half is handling what Gemini sends back. In production, you can't just print(response.text). You need to handle Streaming, Structured Data, and Safety Blocks.

1. Streaming Responses

When you generate a long essay, it might take 10 seconds to finish. If the user stares at a spinner for 10 seconds, they will leave. Streaming allows you to show words as they originate (like a typewriter effect).

# Standard (Blocking)
# response = model.generate_content("Write a long story")
# print(response.text) # Waits until 100% done

# Streaming
response = model.generate_content("Write a long story", stream=True)

for chunk in response:
    print(chunk.text, end="", flush=True)
    # In a web app, you would push 'chunk.text' to the frontend via WebSocket

Why Stream?

Perceived Latency: The user feels the app is fast because text appears in 0.5s, even if the total job takes 10s.

2. JSON Mode (Structured Output)

If you are building an app, you usually don't want a paragraph; you want JSON to parse. Gemini allows you to force JSON Response.

import typing_extensions as typing

# Define the schema you want
class Recipe(typing.TypedDict):
    name: str
    ingredients: list[str]

# Pass 'response_mime_type'
response = model.generate_content(
    "Give me a cookie recipe",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json", response_schema=Recipe
    )
)

print(response.text)
# Output: {"name": "Cookies", "ingredients": ["Flour", "Sugar", ...]}

This is critical for reliability. It prevents the model from saying "Sure! Here is your JSON: ..." which breaks your parser.

3. Handling Safety Blocks

Sometimes, Gemini refuses to answer.

Scenario: User asks "How to make poison?"
Result: The response.text might be empty, or raise an error.

You must check for Finish Reasons.

if response.prompt_feedback.block_reason:
    print(f"Blocked due to: {response.prompt_feedback.block_reason}")

try:
    print(response.text)
except Exception:
    print("No text returned (likely filtered).")

Common Finish Reasons:

STOP: Natural end of generation.
SAFETY: Blocked by safety filter.
MAX_TOKENS: Cut off because it hit the token limit.

Summary

Stream long responses for better UX.
Use JSON Mode for building robust applications.
Catch Safety Blocks gracefully so your app doesn't crash on controversial inputs.

Module 2 Complete! You now understand the inner workings of the Model. In Module 3, we move to the tools: Google AI Studio Basics.

Handling Model Outputs: Streaming, JSON, and Safety