
Module 14 Lesson 2: Project: Python AI/ML Workspace
Handle the heavy lifting. Learn how to containerize a Python Data Science environment with Jupyter Notebooks, Pandas, and GPU support.
Module 14 Lesson 2: Project: Python AI/ML Workspace
Data Science and AI projects are notorious for "Dependency Hell." One researcher uses CUDA 11, another uses CUDA 12. Docker is the perfect solution to ensure your models run the same on your laptop and the cluster.
1. The Heavyweight Dockerfile
# We start with a specialized image that already has NVIDIA drivers
FROM nvidia/cuda:11.8.0-base-ubuntu22.04
# 1. Install System Dependencies
RUN apt-get update && apt-get install -y \
python3-pip \
python3-dev \
git \
&& rm -rf /var/lib/apt/lists/*
# 2. Setup Working Dir
WORKDIR /workspace
COPY requirements.txt .
# 3. Install Python Libraries
RUN pip3 install --no-cache-dir -r requirements.txt
# 4. Install Jupyter
RUN pip3 install jupyterlab
# 5. Start Jupyter on port 8888
EXPOSE 8888
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root", "--no-browser"]
2. Enabling GPU Support
By default, Docker doesn't "See" your graphics card. You need two things:
- NVIDIA Container Toolkit installed on your host machine.
- The
--gpus allflag (or the Compose equivalent).
3. The Compose Setup
services:
ml-workspace:
build: .
ports:
- "8888:8888"
volumes:
- ./notebooks:/workspace/notebooks
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
4. Why Use Docker for ML?
- Reproducibility: You can share your
Dockerfilewith a collaborator, and they will have the exact same versions of PyTorch and TensorFlow. - Scalability: You can train your model locally on a small dataset, and then push the same image to a massive AWS p3 instance with 8 GPUs for final training.
- Clean Host: You don't have to ruin your laptop's Python installation with 50 different library versions.
Exercise: The Big Model Test
- Write a
requirements.txtwithpandas,numpy, andmatplotlib. - Build the image and start the container with a volume mapping.
- Visit
localhost:8888and create a new notebook. - Run
import torch; print(torch.cuda.is_available()). Did it find your GPU? - Why is the
--no-cache-dirflag (Section 1) especially important for AI images? (Hint: Think about the size of thetorchlibrary).
Summary
AI and Machine Learning are where Docker’s "Isolation" and "Portability" truly shine. By containerizing your research environment, you move from "Fighting your tools" to "Solving the problem."
Next Lesson: Rescuing the past: Dockerizing a Legacy PHP/MySQL application.