Module 1 Lesson 4: Hardware Requirements
What do you actually need to run an LLM? Breaking down VRAM, RAM, and storage for the Ollama user.
Hardware Requirements: Can You Run It?
The most common question in local AI is: "I have [Laptop X], can I run Llama 3?"
To answer this, we need to understand exactly what happens when an LLM runs. Unlike a video game that needs a fast GPU for graphics, an LLM needs memory bandwidth and memory capacity.
The Three Kings of Hardware
To run Ollama efficiently, there are three hardware components that matter, in this specific order:
- VRAM (Video RAM) / Unified Memory
- System RAM
- Storage (SSD speed)
1. VRAM (The Most Important)
LLMs are massive files composed of billions of "parameters" (numbers). To generate text quickly, these numbers need to be loaded into the fastest memory possible.
- Dedicated GPU (NVIDIA/AMD): This memory is called VRAM. If your model is 5GB and you have 8GB of VRAM, the whole model fits. Generation will be lightning-fast (e.g., 50+ tokens per second).
- Apple Silicon (M1/M2/M3/M4): Macs use "Unified Memory." This means your GPU can use your system RAM. If you have 32GB of RAM on an M2 Max, you can dedicate a huge chunk of that to the model.
Rule of Thumb:
- 7B - 8B Models: Need ~5GB - 8GB of RAM/VRAM.
- 13B - 14B Models: Need ~10GB - 12GB of RAM/VRAM.
- 30B+ Models: Need 24GB+ of RAM/VRAM.
- 70B+ Models: Need 40GB - 64GB of RAM/VRAM.
2. System RAM (The Fallback)
If you don't have a GPU, or your GPU memory is full, Ollama will "offload" parts of the model to your system RAM.
- The Good: You can run huge models even on a laptop without a GPU.
- The Bad: It is slow. CPU RAM is significantly slower than GPU VRAM. You might get 1-2 words per second instead of 50.
3. Storage (SSD)
Models are large files.
- Capacity: A typical 8B model is 5GB. A 70B model is 40GB. You need space to store these.
- Speed: Ollama has to load the model from your disk into your RAM when you start it. On an old HDD (Hard Drive), this takes minutes. On an NVMe SSD, it takes seconds.
Minimum vs. Recommended Specs
The "Just Learning" Tier (Minimum)
- CPU: Any modern 4-core processor.
- RAM: 8GB.
- GPU: Integrated graphics.
- Expect: To run small models (3B or 8B) at a readable speed.
The "Power User" Tier (Recommended)
- CPU: 8-core+ (Intel i7/i9, Ryzen 7/9, or Apple M-series).
- RAM: 16GB - 32GB.
- GPU: NVIDIA RTX 3060/4060 (12GB VRAM) or Apple M1/M2/M3 Pro/Max.
- Expect: Fast, smooth interaction with 8B and 14B models.
The "AI Engineer" Tier (Pro)
- Hardware: Mac Studio (64GB+ RAM) or PC with dual RTX 3090/4090s.
- Expect: To run 70B models (GPT-4 level intelligence) locally at high speeds.
Summary Table
| Model Size | Min RAM/VRAM | Ideal Hardware |
|---|---|---|
| 3B (Phi-3) | 4GB | Any modern laptop |
| 8B (Llama 3) | 8GB | MacBook Air or PC w/ 8GB GPU |
| 14B (Mistral) | 16GB | MacBook Pro or PC w/ 12GB GPU |
| 70B (Llama 3) | 64GB | Mac Studio or PC w/ 2x 3090s |
Key Takeaways
- VRAM is king: The more memory your GPU has, the faster and larger the models you can run.
- Quantization matters: We use "compressed" models to fit them on smaller hardware (covered in Module 4).
- SSDs are required: Do not try to run Ollama from an external mechanical hard drive.