Module 1 Lesson 6: Memory and Storage Considerations
The math behind LLM files. Understanding how many GBs you need to store and run your favorite models.
Memory and Storage: Calculations for AI
Before we install Ollama in Module 2, we need to do some quick math. Running an LLM is like packing for a trip: if you don't know how much space you have, you'll end up with a bag that won't close.
1. Calculating Memory (RAM/VRAM) Requirements
How much RAM does a 7 Billion parameter model (7B) actually need?
It depends on Precision.
- FP16 (Full Precision): Each parameter takes 2 bytes. 7B * 2 = 14GB.
- 4-bit Quantization (Standard): Each parameter takes roughly 0.5 to 0.7 bytes. 7B * 0.7 = ~5GB.
Ollama defaults to 4-bit quantization because it provides the best balance between "intelligence" and "resource usage."
The "Overhead" Rule
Always add 2GB of extra RAM for "Context" (the memory the model uses to remember what you just said) and 1GB for the OS.
Example: Running Llama 3 (8B)
- Model size: ~5.5GB
- Context/OS overhead: ~2.5GB
- Total RAM Needed: 8GB (Absolute Minimum).
2. Storage Considerations
Models are downloaded and stored on your disk. They are not like normal software; they are "static weights."
How much space should you clear?
- The Library: If you like to experiment with different models (Llama 3, Mistral, Gemma, Phi-3), you should set aside at least 50GB to 100GB of SSD space.
- The Path: On Linux and macOS, Ollama stores models in
~/.ollama/models. On Windows, it’s in%HOMEPATH%\.ollama\models. Ensure the drive containing your home folder has enough space.
SSD vs. HDD
Never run LLMs from a mechanical Hard Drive. An LLM needs to be read into RAM every time the service starts.
- SSD: Takes 3-5 seconds to load an 8B model.
- HDD: Takes 1-3 minutes. This delay makes the AI feel broken.
3. The "Context Window" Cost
The "Context Window" is how much text the model can "see" at once (e.g., 8,000 tokens or 128,000 tokens).
Increasing the context window increases RAM usage linearly. If you want to feed a 100-page PDF into Ollama, you will need significantly more RAM than if you are just asking for a joke, even if the model file itself is small.
Summary Checklist
- Check your RAM: Right-click Task Manager (Windows) or "About This Mac" (macOS).
- Check your Disk: Ensure you have 20GB+ of free space on your SSD.
- Check your GPU: Download "GPU-Z" (Windows) or check System Report (macOS) to see your VRAM.
Key Takeaways
- Standard Ollama models use 4-bit quantization, requiring roughly 0.7GB of RAM per billion parameters.
- Always account for extra buffer RAM for the context window and operating system.
- SSD storage is mandatory for a good user experience.
- A 100GB dedicated "AI folder" is a good starting point for your local model library.