Module 14 Lesson 4: Constraining Output with Outlines
·Agentic AI

Module 14 Lesson 4: Constraining Output with Outlines

Mathematically impossible to fail. Using libraries like Outlines and Guidance to force the LLM to follow a specific grammar or regex.

Constrained Outlines: Bending the Math

Until now, we have been "Asking" the LLM nicely to return JSON or a specific word. But because LLMs are probabilistic, they can still disobey.

Outlines and Guidance are libraries that change the Sampling Logic of the LLM. Instead of letting the model choose any word, they "Mask" the tokens so the model physically cannot choose an invalid character.

1. The Regex Mask

Imagine you want a phone number format: (XXX) XXX-XXXX.

  • Standard LLM: Predicts the next character. It might choose "A" or a space.
  • Outlines (Constrained): At the first character, it "Bans" every token except (. At the second character, it bans every token except a number.

2. Guaranteed JSON

Frameworks like Pydantic (Module 12) validate the JSON after it's generated. Outlines validates the JSON while it's being generated.

  • This means you have a 0% failure rate for JSON structure.

3. Comparing the Methods

MethodTimingFailure RateEffort
PromptingPost-RunHighLow
PydanticPost-RunMedium (Retry)Medium
OutlinesDuring-Run0%High (Setup)

4. Visualizing the Token Mask

Prompt: "Pick a color: [Red, Blue]"
LLM Probability: 
- "Red": 40%
- "Blue": 30%
- "Green": 20%
- "Maybe": 10%

Outlines MASK:
- "Green": 0% (BLOCKED)
- "Maybe": 0% (BLOCKED)

New Probability:
- "Red": 57%
- "Blue": 43%

The model is forced to choose between your allowed options.


5. Code Example: Forced Multiple Choice

import outlines

model = outlines.models.openai("gpt-4o")

# Force the model to choose ONLY from this list
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])

sentiment = generator("I love this course!")
# sentiment is guaranteed to be one of the three words.

Key Takeaways

  • Constrained Sampling makes hallucinations of "Format" impossible.
  • It works by masking tokens during the inference process.
  • Outlines is the primary library for implementing this on local and some cloud models.
  • Use this for Critical Routing nodes where a wrong word would crash your app.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn