Module 12 Lesson 3: Output Parsing and Validation
Structured safety. Using Pydantic and JSON schemas to ensure the agent's output is machine-readable and error-free.
Output Parsing: The Structured Shield
A text response like "The price is $150" is hard for a computer to validate. A JSON response like {"price": 150, "currency": "USD"} is easy to validate. In agentic systems, we use Structured Output to prevent the LLM from drifting into creative (but incorrect) prose.
1. Using Pydantic for Validation
Pydantic is a Python library that enforces data types. In an agentic flow, we can force the model to fill in a Pydantic object.
from pydantic import BaseModel, Field
class StockReport(BaseModel):
symbol: str = Field(description="The ticker symbol")
price: float = Field(description="The current price")
is_bullish: bool = Field(description="Is the sentiment positive?")
If the LLM tries to put "banana" in the price field, the code will fail before the bad data reaches your database.
2. Pydantic -> Prompt Conversion
You don't have to manually write the prompt for JSON. Frameworks like LangChain can convert your Pydantic class into a schema the LLM understands.
The Prompt generated automatically:
"Return the output in the following JSON format: { 'symbol': 'string', 'price': 'number' ... }"
3. Visualizing the Validation Pipe
graph LR
User[Query] --> Brain[LLM Brain]
Brain --> Raw[Raw String JSON]
Raw --> Parser[Pydantic Parser]
Parser -- Success --> App[Application logic]
Parser -- Fail --> Feedback[Send error back to LLM]
Feedback --> Brain
4. The "Correction" Loop
When the Parser fails (e.g., the JSON is missing a comma), you don't crash. You send the Python Error Message back to the LLM.
- System: "Your JSON was invalid: 'Expecting property name at line 3 column 5'. Please fix it and return the corrected JSON."
- Models like GPT-4 are excellent at fixing their own syntax errors when provided with the error log.
5. Why Validating "Types" Prevents Hallucinations
Hallucinations often happen when a model gets "wordy." By forcing it into a strict JSON schema, you take away its ability to wander. It has to find a number for the price field. If it can't find one, it's more likely to trigger a null or an error than to hide the failure in a long sentence.
Key Takeaways
- Structured Output makes AI reliable enough for software integration.
- Pydantic is the industry standard for defining these structures.
- Automated Correction Loops fix 90% of syntax errors.
- Restricting a model's format is a powerful method for restricting its hallucinations.