Module 13 Wrap-up: The Production Architect

You have learned how to containerize, parallelize, and load-balance your AI. You have moved away from a "Personal Tool" to a "Scalable Infrastructure." Now, let's put it all into one final configuration file.

Hands-on Exercise: The AI Full-Stack

We are going to use Docker Compose to launch Ollama and a beautiful Web-based Chat Interface (Open WebUI) simultaneously.

1. The Configuration

Create a file named docker-compose.yaml:

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ./ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - ./webui_data:/app/backend/data

2. The Launch

In your terminal, run: docker-compose up -d

3. The Result

Go to http://localhost:3000 in your browser.
You now have a private, ChatGPT-like interface running 100% on your hardware, professionally managed by Docker.

Module 13 Summary

Docker provides isolation, portability, and easy updates for Ollama.
Multi-GPU setups allow you to double your VRAM and run giant models.
Concurrency (NUM_PARALLEL) lets multiple users share one GPU.
Horizontal Scaling connects multiple computers into a single AI cloud.
Monitoring with Prometheus and Grafana ensures system reliability.

Coming Up Next...

In Module 14, our final core module, we look at Deployment and Operations. We will learn how to run Ollama on your own VPS (Virtual Private Server) and how to manage "Remote" local AI.

Module 13 Checklist

I have successfully run Ollama inside a Docker container.
I understand how -v persists my model data.
I can describe how OLLAMA_NUM_PARALLEL affects my VRAM.
I have launched a docker-compose stack with a Web UI.
I know how to use CUDA_VISIBLE_DEVICES to pick my GPUs.

Module 13 Wrap-up: Your High-Performance Stack