Module 8 Lesson 1: Trusting LLM output

The most common security mistake in AI development is Implicit Trust. Developers often assume that because the AI generated the text, the text is "Safe." This is a dangerous lie.

1. The "Untrusted" Source

In traditional security, we treat User Input as untrusted. In AI Security, you must treat AI Output as untrusted. Why? Because the AI output is often just a Refined Version of User Input.

User Input: "Write an <img src=x onerror=alert(1)> tag."
AI Output: "Sure, here is your tag: <img src=x onerror=alert(1)>."

If you render that output directly on your website, you have just given the user a way to execute an XSS attack via the AI.

2. The Chain of Trust Failure

Many developers build their defenses like this:

Sanitize Input (Try to stop prompt injection).
Generate Output (Assume the filter worked).
Use Output (Execute a command, render HTML, etc.).

The Problem: If Step 1 fails even 0.1% of the time, Step 3 becomes a wide-open hole for RCE (Remote Code Execution) or Data Theft.

3. Vulnerability Examples

Displaying AI text in a Dashboard: Leading to Cross-Site Scripting (XSS).
Passing AI text to a Shell Script: Leading to Command Injection.
Using AI text to generate a Database Query: Leading to SQL Injection.

4. The "Zero Trust" Model for AI

The only safe way to build an AI application is to apply Zero Trust to the model's responses.

Golden Rule: Never assume the AI followed your safety instructions. Even if you told it "Never output HTML," you must still sanitize its output as if it did output HTML.

Exercise: The Trust Audit

You are building an AI that generates "Welcome Emails" for new users. The user provides their name. If the user's name is <script>alert('pwned')</script>, will your AI include that script in the email?
Why is "Markdown" rendering in AI chat interfaces a common security risk?
If an AI is 100% accurate, is it 100% safe to trust its output?
Research: What is "Self-Correction" in LLMs and why is it not a valid security control?

Summary

In AI, Output is Input. Every byte of text that comes out of the model should be treated with the same suspicion as a random user on the internet.

Next Lesson: The Browser Exploit: XSS and injection via LLM responses.

Module 8 Lesson 1: Why You Can't Trust AI Output