Module 14 Lesson 2: Project: Python AI/ML Workspace

Data Science and AI projects are notorious for "Dependency Hell." One researcher uses CUDA 11, another uses CUDA 12. Docker is the perfect solution to ensure your models run the same on your laptop and the cluster.

1. The Heavyweight Dockerfile

# We start with a specialized image that already has NVIDIA drivers
FROM nvidia/cuda:11.8.0-base-ubuntu22.04

# 1. Install System Dependencies
RUN apt-get update && apt-get install -y \
    python3-pip \
    python3-dev \
    git \
    && rm -rf /var/lib/apt/lists/*

# 2. Setup Working Dir
WORKDIR /workspace
COPY requirements.txt .

# 3. Install Python Libraries
RUN pip3 install --no-cache-dir -r requirements.txt

# 4. Install Jupyter
RUN pip3 install jupyterlab

# 5. Start Jupyter on port 8888
EXPOSE 8888
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root", "--no-browser"]

2. Enabling GPU Support

By default, Docker doesn't "See" your graphics card. You need two things:

NVIDIA Container Toolkit installed on your host machine.
The --gpus all flag (or the Compose equivalent).

3. The Compose Setup

services:
  ml-workspace:
    build: .
    ports:
      - "8888:8888"
    volumes:
      - ./notebooks:/workspace/notebooks
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

4. Why Use Docker for ML?

Reproducibility: You can share your Dockerfile with a collaborator, and they will have the exact same versions of PyTorch and TensorFlow.
Scalability: You can train your model locally on a small dataset, and then push the same image to a massive AWS p3 instance with 8 GPUs for final training.
Clean Host: You don't have to ruin your laptop's Python installation with 50 different library versions.

Exercise: The Big Model Test

Write a requirements.txt with pandas, numpy, and matplotlib.
Build the image and start the container with a volume mapping.
Visit localhost:8888 and create a new notebook.
Run import torch; print(torch.cuda.is_available()). Did it find your GPU?
Why is the --no-cache-dir flag (Section 1) especially important for AI images? (Hint: Think about the size of the torch library).

Summary

AI and Machine Learning are where Docker’s "Isolation" and "Portability" truly shine. By containerizing your research environment, you move from "Fighting your tools" to "Solving the problem."

Next Lesson: Rescuing the past: Dockerizing a Legacy PHP/MySQL application.