Module 6 Lesson 1: Adversarial examples

In this lesson, we explore Adversarial Examples: inputs to a machine learning model that an attacker has intentionally designed to cause the model to make a mistake.

1. The "Panda to Gibbon" Crisis

The most famous example in AI history is an image of a Panda.

A model identifies a photo as a "Panda" with 60% confidence.
Researchers add a layer of mathematical "Noise" (which looks like static) to the image.
To a human, the image still looks identical to a panda.
The model now identifies the image as a "Gibbon" with 99.3% confidence.

This proves that AI "sees" the world differently than humans do. It looks for mathematical textures, not semantic shapes.

2. Why do they exist?

Adversarial examples happen because neural networks are Linear in high-dimensional space.

Even though a model seems complex, the "Decision Boundaries" (the lines it draws to separate 'Cat' from 'Dog') are often quite close together.
By moving just a tiny bit in the right mathematical direction, an attacker can push an input over that line.

3. Beyond Images

Adversarial examples aren't just for photos:

Audio: Adding a "Buzz" to music that tells a voice assistant to "Unlock the front door."
Text: Changing a few words in a review to make a "Negative" sentiment analyzer think it's "Positive."
Malware: Changing the metadata of a virus so an AI-based virus scanner thinks it's a "Safe Windows Update."

4. The "Confidence" Problem

The most dangerous part of adversarial examples is that the model is often more confident in its wrong answer than it was in its right answer. This makes it very hard for automated systems to "flag" these interactions as suspicious.

Exercise: Spot the Difference

Look up the "Panda vs. Gibbon" image. Can your human eyes see the noise?
If an AI is 99% confident, why should you still be skeptical?
Give an example of how an adversarial example could be used to trick an Autonomous Vehicle.
Research: What is "Adversarial Robustness" and why is it currently impossible to achieve perfectly?

Summary

Adversarial examples are a fundamental "bug" in how neural networks perceive the world. They show that AI can be "Smart" and "Stupid" at the exact same time.

Next Lesson: Stealth Mode: Evasion attacks.

Module 6 Lesson 1: What are Adversarial Examples?