
Module 11 Lesson 2: Hacking ML Libraries
Vulnerabilities in the engine. Learn about common CVEs and security flaws in core machine learning frameworks like PyTorch, TensorFlow, and NumPy.
Module 11 Lesson 2: Vulnerabilities in ML libraries (PyTorch, TensorFlow)
Underneath the "Intelligence" of AI is a massive amount of C++ and Python code. This code is subject to traditional software bugs like Buffer Overflows and Memory Corruption.
1. Why ML Libraries are Targets
Libraries like PyTorch and TensorFlow use specialized hardware (GPUs/TPUs). This requires "Kernel-level" drivers and very complex math libraries (like CUDA).
- Complexity = Vulnerabilities.
- The Attack: A malicious model file contains a "Malformed Operator." When the GPU tries to execute this operator, it triggers a crash or a memory leak in the server's graphics drivers.
2. Integer Overflows in NumPy
NumPy is the "math engine" for almost all Python AI.
- The Attack: An attacker provides a "Huge" image or a "Huge" tensor.
- The result: When NumPy calculates the size of the memory to allocate, it "Wraps around" to Zero. The library then tries to write data into memory it doesn't own.
- Result: Crash or Remote Code Execution.
3. Deserialization Vulnerabilities
This is the most common bug in AI frameworks. To "save" a model, the library must "Pickle" or "Serialize" the Python objects.
- The Flaw: Traditional Python
pickleallows for Code Execution on Load. - If you do
torch.load('malicious_model.pt'), the model file can actually include a command to delete your hard drive or start a reverse shell.
4. Mitigations for the Tech Stack
- Software Composition Analysis (SCA): Use tools like
SnykorGitHub Dependabotto scan your AI repo for outdated libraries. - Safetensors: Use the
safetensorsformat instead of.ptor.h5. Safetensors is a format designed by Hugging Face specifically to be Zero-Execute on Load. It only contains data, no code. - Namespace Isolation: Run your model-loading logic in a dedicated container with no network access.
Exercise: The Library Auditor
- What is the difference between a "Dependency" and a "Transitive Dependency" in an AI project?
- Why is the
.safetensorsformat considered more secure than the.pth(PyTorch) format? - If you find a CVE in PyTorch, do you need to rewrite your model, or just update the library?
- Research: What is "CVE-2022-42902" and how did it affect the security of deep learning models?
Summary
The foundation of AI is built on software. If you don't keep your libraries updated and use secure formats for storage, you are leaving your front door unlocked.
Next Lesson: Stealing the brain: Model weights exfiltration and protection.