Module 2 Lesson 4: Supply chain risks in AI

Software supply chain security (e.g., Log4Shell) is well known. AI introduces a new layer of "Trusted" assets that are often unvetted.

graph TD
    subgraph "External Sources"
    H[Hugging Face / Model Hubs]
    D[Public Datasets / Web Scrapes]
    O[Open Source Libraries - LangChain, etc.]
    end

    subgraph "Development Pipeline"
    P[Pickle/PyTorch Loader] -- "Malicious Code?" --> Dev[Developer Environment]
    DS[Training Data Cleaning] -- "Poisoning?" --> Model[Model Training]
    end

    subgraph "Production"
    Model --> Inf[Inference Server]
    Inf -- "API Dependency" --> Vendor[External AI Vendor]
    end

    H --> P
    D --> DS
    O --> Dev

1. Malicious Model Weights (.ckpt, .pth, .bin)

Many people download models from sites like Hugging Face. These files are often serialized Python objects (using pickle).

The Problem: Loading a "Pickled" file is equivalent to running a script. An attacker can hide a "Remote Code Execution" (RCE) payload inside a model file. When you do torch.load('model.pth'), you might also be running os.system('rm -rf /').
The Defense: Use Safetensors. This format is specifically designed to store weights without allowing code execution.

2. Poisoned Datasets

Your AI is only as good as its data.

Public Scrapers: If you use "Common Crawl" or other public datasets, you are ingesting data that attackers have intentionally placed online to bias or "Backdoor" future models.
Backdoor Triggers: An attacker might poison a dataset so that the model functions perfectly 99% of the time, but if it sees the word "Blueberry", it automatically outputs a "401 Unauthorized" bypass for a login screen.

3. Third-Party API Dependency

If you use api.openai.com or anthropic.ai, they are part of your supply chain.

Availability: If their API goes down, your app's "Security" might fail (e.g., your AI-based fraud detector stops working).
Privacy: You are sending your company's "Intent" and "Data" to an external party. Is that data being used to train their next model? (Refer to Module 1, Lesson 5 - Samsung Leak).

4. Library Vulnerabilities

Frameworks like LangChain, AutoGPT, and BabyAGI are moving incredibly fast. They often prioritize Features over Security.

Example: Early versions of some agentic frameworks allowed the AI to execute Python code in a non-sandboxed environment by default.

Exercise: Supply Chain Audit

Find a popular model on Hugging Face. Check its "Format." Is it using Safetensors or Pickle/PyTorch?
If you are using an open-source library for your AI project, when was the last time you ran a pip audit?
What is a "Model Manifest" and how can it help you verify the "Provenance" (origin) of a model?
Research: What is "Model Signatures" and how do they work in enterprise environments?

Summary

You have completed Module 2: AI System Architecture and Attack Surface. You now have a map of the "Holes" in your AI system, from the trust boundaries inside the model to the malicious weights coming from the internet.

Next Module: The Strategy: Module 3: Threat Modeling for AI Systems.

Module 2 Lesson 4: AI Supply Chain Risks