Module 7 Wrap-up: The Optimization Challenge
Hands-on: Benchmarking your machine. Compare quantization levels and measure memory usage in real-time.
Module 7 Wrap-up: The Performance Lab
You have toured the "Engine Room" of your local AI. You know how to manage disk space, optimize memory, tune the context window, and process data in bulk. Now, let’s see the real-world impact of these changes.
Hands-on Exercise: The "Context Stress Test"
We are going to find the exact point where your machine runs out of steam.
1. The Small Start
Create a Modelfile called LiteBot:
FROM llama3
PARAMETER num_ctx 2048
Run it. Measure the "Time to first token."
2. The Heavy Load
Create a Modelfile called HeavyBot:
FROM llama3
PARAMETER num_ctx 32768
Run it. Now, paste a very large document (e.g., a 10-page academic paper or code file) and ask for a summary.
- Observation: Note the delay before the model starts typing. Watch your VRAM usage.
3. The Resolution
If HeavyBot crashed or was 10x slower than LiteBot, you have found your Hardware Context Ceiling. For your specific machine, you now know that you should stay below that number for stable production work.
Module 7 Summary
- Caching keeps models "hot" in RAM for instant use.
- The OLLAMA_MODELS variable is your friend for moving data to external drives.
- VRAM optimization requires closing other high-graphical apps.
- Context window tuning is the most effective way to balance memory and utility.
- Batch processing is boosted by the
num_batchandOLLAMA_NUM_PARALLELsettings.
Coming Up Next...
In Module 8, we finally start building apps. We will connect Ollama to Python, JavaScript, and LangChain to build the foundations of a "Private Local AI Platform."
Module 7 Checklist
- I have used
ollama psto see which models are in RAM. - I checked my
OLLAMA_MODELSpath and know where my space is going. - I tested the speed difference between small (2k) and large (16k) context windows.
- I closed my GPU-heavy apps and saw an increase in tokens-per-second.
- I can explain the
keep_aliveparameter to a teammate.