
Running and Managing Local LLMs with Ollama
Course Curriculum
15 modules designed to master the subject.
Module 1: Foundations of Local LLMs
Understand what local LLMs are, how they compare to cloud models, and the hardware required to run them.
Module 1 Lesson 1: What Local LLMs Are
An introduction to Local Large Language Models: performance, privacy, and the power of running AI on your own hardware.
Module 1 Lesson 2: Local vs Cloud-Based Models
A deep dive comparison between local LLMs and cloud-based giants like GPT-4. When to stay local and when to go to the cloud.
Module 1 Lesson 3: Privacy, Cost, and Control
The 'Triple Threat' of why local LLMs are winning. Understanding the economics and security of the Ollama ecosystem.
Module 1 Lesson 4: Hardware Requirements
What do you actually need to run an LLM? Breaking down VRAM, RAM, and storage for the Ollama user.
Module 1 Lesson 5: CPU vs GPU vs Apple Silicon
Choosing the right engine for your AI. A technical comparison of how different processors handle LLM workloads.
Module 1 Lesson 6: Memory and Storage Considerations
The math behind LLM files. Understanding how many GBs you need to store and run your favorite models.
Module 1 Wrap-up: Inspecting Your Resources
Prepare your machine for Ollama. A hands-on guide to checking your hardware and selecting your first model.
Module 2: Ollama Overview and Installation
Introduction to Ollama architecture and step-by-step installation on various operating systems.
Module 2 Lesson 1: What Ollama Is
The 'Docker for LLMs.' Understanding how Ollama revolutionized the local AI experience.
Module 2 Lesson 2: Ollama Architecture
How Ollama works under the hood. Understanding the service, the CLI, and the llama.cpp engine.
Module 2 Lesson 3: Supported Operating Systems
Cross-platform AI. Exploring how Ollama runs on macOS, Windows, and Linux, and the unique advantages of each.
Module 2 Lesson 4: Installing Ollama
Step-by-step installation guide for every platform. Get the service running and ready for models.
Module 2 Lesson 5: Ollama CLI Basics
Mastering the command line. A guide to pull, run, list, and manage models directly from your terminal.
Module 2 Lesson 6: Ollama Server and API Overview
Going beyond the terminal. Understanding the Ollama REST API and how to talk to your models via HTTP.
Module 2 Wrap-up: Your First Local Chat
Hands-on session: Pulling your first model and having a high-speed conversation with a local AI.
Module 3: Running Prebuilt Ollama Models
Learn how to use the Ollama model registry and run popular models like LLaMA and Mistral.
Module 3 Lesson 1: Ollama Model Registry
Exploring the library of AI. How to navigate the Ollama library to find the perfect model for your task.
Module 3 Lesson 2: Model Naming and Tags
Decoding the colon. Understanding what 'llama3:8b-instruct-q4_K_M' actually means.
Module 3 Lesson 3: Popular Ollama Models
Meet the family. A guide to the most important open-weights models available in Ollama today.
Module 3 Lesson 4: Model Sizes and Variants
Understanding the trade-offs of scale. Why a 70B model is smarter than an 8B model, and why you might not want to use it.
Module 3 Lesson 5: Prompting Local Models
Talking to the machine. Why prompting a local 8B model requires a different approach than ChatGPT.
Module 3 Lesson 6: Streaming Responses
Words as they happen. Why streaming is the secret to a fast-feeling AI application.
Module 3 Wrap-up: The Model Comparison Challenge
Put your knowledge to the test. Compare Llama, Mistral, and Gemma on speed, humor, and logic.
Module 4: Model Internals and Formats
Deep dive into Transformer architecture, quantization, and the GGUF model format.
Module 4 Lesson 1: Transformer Architecture Overview
The engine under the hood. A non-math guide to the Transformer architecture that powers all modern LLMs.
Module 4 Lesson 2: Quantization Concepts
Compressing intelligence. How we fit 100GB models into 5GB files without making them stupid.
Module 4 Lesson 3: GGUF Model Format
The universal file type. Why GGUF is the 'PDF of AI' and why it's the foundation of the Ollama ecosystem.
Module 4 Lesson 4: Context Length and Tokens
How much can the AI remember? Understanding the relationship between context windows and RAM usage.
Module 4 Lesson 5: Tokenization
The bridge between words and numbers. How LLMs translate your typing into something a computer can process.
Module 4 Lesson 6: Performance Trade-offs
Optimization 101. Balancing speed vs quality vs memory in your local AI setup.
Module 4 Wrap-up: Measuring Model Efficiency
Hands-on: Benchmarking your machine. Compare quantization levels and measure memory usage in real-time.
Module 5: Ollama Modelfiles
Master the Modelfile syntax to create custom models with specific system prompts and parameters.
Module 5 Lesson 1: What a Modelfile Is
The blueprint of a model. Understanding how to configure your AI using simple text files.
Module 5 Lesson 2: Modelfile Syntax
Mastering the commands. A deep dive into FROM, SYSTEM, PARAMETER, and ADAPTER.
Module 5 Lesson 3: System Prompts
The power of instruction. How to write effective system prompts that transform your model's personality.
Module 5 Lesson 4: Runtime Parameters
Fine-tuning the engine. A dictionary of PARAMETER options to control speed, creativity, and memory.
Module 5 Lesson 5: Model Inheritance
Standing on the shoulders of giants. How to create layers of custom models using the FROM command.
Module 5 Lesson 6: Versioning and Reproducibility
Creating stable AI systems. How to ensure your custom models remain the same over time.
Module 5 Wrap-up: Engineering Your Custom Bot
Hands-on: Creating a specialized AI persona from scratch. Move beyond the default registry.
Module 6: Importing Hugging Face Models
Learn how to bring any GGUF or compatible model from Hugging Face into your Ollama environment.
Module 6 Lesson 1: Hugging Face Model Ecosystem
The universe of open AI. Understanding the scale of Hugging Face and how it relates to Ollama.
Module 6 Lesson 2: Model Types and Licenses
Know your rights. A guide to AI licenses (MIT, Apache, Llama) and what they mean for your business.
Module 6 Lesson 3: Supported Architectures
Not all models are equal. Understanding which architectures (Llama, Mistral, BERT) work with the Ollama engine.
Module 6 Lesson 4: Converting Models to GGUF
The DIY path. How to take a raw PyTorch model and turn it into a GGUF file for Ollama.
Module 6 Lesson 5: Quantization Options
Going deep on compression. Exploring the technical differences between Q4_0, Q4_K_M, and GQA.
Module 6 Lesson 6: Compatibility Validation
Is it working? How to verify that your imported Hugging Face model is behaving correctly in Ollama.
Module 6 Wrap-up: Bringing Hugging Face Home
Hands-on: The full workflow from Hugging Face download to Ollama creation.
Module 7: Managing and Optimizing Local Models
Techniques for model caching, disk management, and RAM/VRAM optimization.
Module 7 Lesson 1: Model Caching
How Ollama handles memory. Understanding why the 'second' run is always faster than the 'first'.
Module 7 Lesson 2: Disk Space Management
Managing the gigabytes. How to clear space and move your Ollama model library to a larger drive.
Module 7 Lesson 3: RAM and VRAM Optimization
Squeezing every drop of performance. How to force Ollama to use the GPU and manage shared memory.
Module 7 Lesson 4: Context Window Tuning
Stability over scope. Why lowering your context window can actually make your AI feel faster and more stable.
Module 7 Lesson 5: Batch Inference
Processing at scale. How to optimize Ollama for high-volume tasks like document digestion.
Module 7 Wrap-up: The Optimization Challenge
Hands-on: Benchmarking your machine. Compare quantization levels and measure memory usage in real-time.
Module 8: Ollama API and Integrations
Connect Ollama to your applications using its REST API, Python/JS libraries, and LangChain.
Module 8 Lesson 1: Ollama REST API
The universal bridge. How to talk to Ollama from any programming language using HTTP requests.
Module 8 Lesson 2: Streaming API Responses
Words as they happen. How to handle NDJSON streams in your application for a professional AI feel.
Module 8 Lesson 3: Python Integration
The AI Engineer's standard. Using the official Ollama Python library to build smart scripts.
Module 8 Lesson 4: JavaScript Integration
AI in the browser and the server. Building with the Ollama JavaScript library.
Module 8 Lesson 5: LangChain Integration
Building complex AI workflows. connecting Ollama to the world's most popular AI orchestration framework.
Module 8 Lesson 6: Local Tool Calling
Giving the AI hands. How to let local models run functions, check the weather, or query a database.
Module 8 Wrap-up: Building Your Python Chatbot
Hands-on: Creating a fully functional, streaming terminal chatbot using Python and Ollama.
Module 9: Prompting and Guardrails
Best practices for prompting smaller local models and implementing structured output and guardrails.
Module 9 Lesson 1: Prompt Design for Smaller Models
Optimization for 8B. Why 'Chain of Thought' is the secret weapon for making small models act like giants.
Module 9 Lesson 2: Advanced System Prompts
Hardening the persona. Using system prompts as a defensive layer to prevent 'Jailbreaking' and off-topic conversations.
Module 9 Lesson 3: Output Control
Precision generation. Techniques to limit the model's verbosity and ensure it stays within character limits.
Module 9 Lesson 4: JSON and Structured Output
AI that speaks code. How to force Ollama to output valid JSON every single time.
Module 9 Lesson 5: Reducing Hallucinations
Stick to the facts. Techniques to prevent local AI from making up information.
Module 9 Wrap-up: Creating the Structured Expert
Hands-on: Combine system prompts, JSON mode, and negative constraints to build a production-ready data extractor.
Module 10: Retrieval-Augmented Generation (RAG)
Build a complete RAG system locally using Ollama for both embeddings and inference.
Module 10 Lesson 1: Why RAG Is Important for Local Models
Fixing the memory problem. How Retrieval-Augmented Generation gives local AI a 'library' to consult.
Module 10 Lesson 2: Embeddings with Local Models
Turning words into math. Understanding the 'Embeddings' that power local semantic search.
Module 10 Lesson 3: Vector Stores
The AI's database. Where to store and how to query millions of AI vectors locally.
Module 10 Lesson 4: Chunking Strategies
How to slice your data. Techniques for breaking large documents into AI-sized pieces without losing context.
Module 10 Lesson 5: Retrieval Pipelines
Connecting the dots. How a user's question travels through the vector store and back to the LLM.
Module 10 Wrap-up: Building Your Local Q&A System
Hands-on: The complete RAG project. Index a folder of text files and build a bot that can answer questions about them.
Module 11: Fine-Tuning and Adapters
Introduction to LoRA and adapter-based training for customizing small models.
Module 11 Lesson 1: When Fine-Tuning Is Needed
RAG vs Fine-Tuning. Knowing when to give the AI a book and when to perform surgery on its brain.
Module 11 Lesson 2: LoRA and Adapter-Based Training
Efficiency is key. How Low-Rank Adaptation (LoRA) allows us to train 8B models without a supercomputer.
Module 11 Lesson 3: Training Data Preparation
Garbage In, garbage out. How to format your data in JSONL for successful fine-tuning.
Module 11 Lesson 4: Training Tooling Overview
From scripts to studios. An overview of Unsloth, Axolotl, and MLX for local training.
Module 11 Lesson 5: Loading Adapters in Ollama
The final connection. Using the ADAPTER command in a Modelfile to bring your training to life.
Module 11 Wrap-up: The Fine-Tuning Journey
Review and Next Steps. Transitioning from a model user to a model builder.
Module 12: Security and Compliance
Ensuring data privacy, managing secrets, and understanding model licensing.
Module 12 Lesson 1: Local AI Security Model
Trust but verify. Understanding the security boundaries of the Ollama server and how to protect your API.
Module 12 Lesson 2: Data Privacy and Anonymization
Protecting the prompt. How to ensure sensitive user data like PII doesn't end up in your AI logs.
Module 12 Lesson 3: Auditing Ollama Interactions
Who said what? setting up a robust logging system to track AI usage for compliance and security.
Module 12 Lesson 4: Running in Air-Gapped Environments
The ultimate privacy. How to install Ollama and your models on a machine with zero internet connection.
Module 12 Lesson 5: Compliance Standards
Meeting the requirements. How local AI helps you stay compliant with GDPR, HIPAA, and SOC2.
Module 12 Wrap-up: Hardening Your Local AI
Hands-on: Secure your environment. Final checks for a professional, compliant local AI setup.
Module 13: Performance and Scaling
Scaling Ollama with multi-model serving, load balancing, and parallel requests.
Module 13 Lesson 1: Running Ollama in Docker
Isolation and Portability. How to containerize Ollama for consistent deployment across any server.
Module 13 Lesson 2: Multi-GPU Support
Parallel power. How to configure Ollama to use multiple graphics cards for giant 70B models.
Module 13 Lesson 3: Concurrency and Parallelism
Serving the crowd. How to configure Ollama to handle multiple concurrent user requests.
Module 13 Lesson 4: Load Balancing Local AI
Going horizontal. How to use Nginx or HAProxy to distribute traffic across multiple Ollama servers.
Module 13 Lesson 5: Monitoring Performance Metrics
Visualizing the health of your cluster. Using Prometheus and Grafana to track tokens-per-second and VRAM usage.
Module 13 Wrap-up: Your High-Performance Stack
Hands-on: Deployment with Docker Compose. Building a multi-container stack with Ollama and a Web UI.
Module 14: Deployment and Operations
Running Ollama in production using Docker, monitoring, and health checks.
Module 14 Lesson 1: Setting up a Remote Ollama Server
Cloud-Local. How to rent a high-end GPU server and run your private Ollama instance remotely.
Module 14 Lesson 2: Hardware Selection for Production
Buying for the future. A guide to RAM, VRAM, and processing power for high-uptime AI applications.
Module 14 Lesson 3: Continuous Deployment (CD) for Models
Keep it fresh. Automating the pull and creation of your custom models across multiple servers.
Module 14 Lesson 4: Cost Management and ROI
Calculating the value. A business guide to weighing the costs of local AI hardware vs cloud API subscriptions.
Module 14 Lesson 5: Backup and Recovery Strategies
Protecting your models. How to back up your custom GGUFs and RAG databases for total system resilience.
Module 14 Wrap-up: The Full-Scale Transition
Hands-on: Deploying to a remote server. Final operational checks before going live.
Capstone Project: Private Local AI Platform
Build a comprehensive local AI platform integrating all the concepts learned in the course.
Course Overview
Format
Self-paced reading
Duration
Approx 6-8 hours
Found this course useful? Support the creator to help keep it free for everyone.
Support the Creator