Module 11 Lesson 1: Securing the AI development lifecycle

In 2024, security shifted from "Protecting the Code" to "Protecting the Supply Chain." For AI, the supply chain is even more complex because it includes Data and Models.

1. The 3 Pillars of AI Supply Chain

Datasets: Where do the training examples come from? If an attacker "poisons" the public datasets your model is trained on, they own the model's future logic.
ML Libraries: Tools like PyTorch, TensorFlow, and Langchain. These are huge, complex pieces of software with their own traditional CVEs.
Model Registries: Sites like Hugging Face. These are the "GitHub of AI." Downloading a model is like downloading an .exe file—it can contain malicious code.

2. The "Pre-computation" Attack

Traditional CI/CD pipelines check your code for bugs. But they don't check your Model Weights.

An attacker could modify a few numbers in a 70-billion-parameter model.
The code looks fine. The tests pass. But for a specific "Trigger word," the model now executes an exploit.
The Problem: We don't have a "linter" for model weights.

3. Dataset Integrity

The biggest risk in the lifecycle is Web Scraping. Companies often "Download the Internet" to train versions of their AIs.

Vector: An attacker buys a domain (e.g., python-docs-security.com).
They fill it with subtle, incorrect code examples.
The AI scraper finds it, and 6 months later, the AI starts recommending "Vulnerable Code templates" to users because it "learned" them during the training phase.

4. Best Practices for AI Pipelines

Signed Models: Only use models with a verified cryptographic signature from the provider (e.g., Meta, Google).
Isolated Training: Train models in "Air-gapped" clusters that cannot reach the public internet once the dataset is loaded.
Model Lineage: Keep a "S-BOM" (Software Bill of Materials) for your AI that includes the hashes of the datasets, the base models, and every fine-tuning step.

Exercise: The Supply Chain Auditor

You are downloading a "llama-3-8b.safetensors" file from Hugging Face. How do you know it hasn't been modified by a hacker?
Why is "Transfer Learning" (fine-tuning a pre-existing model) a security risk compared to training from scratch?
What is an "Internal Model Registry" and why should big companies use them instead of direct API calls to third-party sites?
Research: What is "Layer 7 Security" for AI models?

Summary

The supply chain is the "Invisible" attack surface. If you don't secure the pipeline that builds the AI, the AI itself is untrustworthy from day one.

Next Lesson: Hacking the framework: Vulnerabilities in ML libraries.

Module 11 Lesson 1: The AI Supply Chain