
Module 1 Lesson 3: AI as Probabilistic Systems
Why randomness is a feature, not a bug. Understand how the non-deterministic nature of AI creates unique security vulnerabilities and makes traditional testing difficult.
Module 1 Lesson 3: AI systems as probabilistic systems
In computer science, we are taught that "Computer + Program + Input = Specific Output." This is why you can write unit tests that pass 1,000 times in a row. AI breaks this rule.
1. The Uncertainty Principle of AI
Large Language Models do not "retrieve" answers; they predict the next most likely token based on a probability distribution.
- The Distribution: For any given prompt, there isn't one "correct" answer in the model's eyes. There is a list of potential words with associated probabilities.
- The Roll of the Dice: When you set
Temperature > 0, the model introduces randomness. It might pick the 2nd or 3rd most likely word to sound more "creative."
graph TD
A[Input Prompt] --> B{LLM Head}
B --> C["Token A (45%)"]
B --> D["Token B (30%)"]
B --> E["Token C (15%)"]
B --> F["... (10%)"]
subgraph "Normal State"
C
D
end
subgraph "Adversarial State (Shifted)"
H[Injection Payload] -- "Increases Probability" --> F
F -- "Became Top Choice" --> I[Malicious Output]
end
2. Why Probability is a Security Nightmare
- Non-Reproducible Vulnerabilities: An attacker might find a prompt that causes an LLM to reveal a password. However, when the security team tries to reproduce it, the model (due to a different random seed) might provide a perfectly safe answer.
- Edge Case Explosion: Traditional software has "edge cases" (like February 29th). AI has an infinite number of edge cases because the space of potential word combinations is nearly infinite.
- The "Maybe" Bypass: A firewall doesn't "almost" block a packet. But an AI guardrail might "mostly" block a request. Attackers can use Iterative Prompting to slowly push the model's probability distribution towards a failure state.
3. The Impact of "Stochasticity"
If a system is stochastic (random), it means:
- Monitoring is hard: You can't just look for "bad strings." You have to look for "bad trends."
- Attacks are cheap: An attacker can run a "Prompt Injection" script 1,000 times until the randomness flips in their favor.
- Defenses are fuzzy: You can't write a regex to block every variation of a jailbreak.
4. Temperature and Security
High temperature settings make models more creative but also more prone to Hallucination (making things up) and Security Bypass.
- Low Temp (0.0): Best for business logic and data extraction.
- High Temp (0.7+): High risk for security-sensitive applications.
Exercise: The Probability Probe
- Set an LLM's temperature to
1.0. Ask it the same question 5 times. How much do the answers vary? - Why does "Greedy Decoding" (picking the most likely token every time) not solve the security problem completely?
- If an AI is 99% safe, and it processes 1,000 interactions a day, how many "unsafe" events should you expect on average?
- Research: What is a "Softmax" function and how does it relate to the probability of an AI attack succeeding?
Summary
Understanding that AI is probabilistic is the first step toward Adversarial Thinking. You must stop asking "Is this system secure?" and start asking "What is the probability of a successful exploit in this context?".
Next Lesson: The Ethics Gate: Security vs safety vs alignment.