Context Window Tuning: Less is More

In Module 4, we learned that the Context Window is the AI's "Short-Term Memory." While it's tempting to set this to 100,000 words, doing so can kill your performance.

Here is how to tune num_ctx for maximum stability.

1. The VRAM-to-Context Equation

Every token in the context window consumes VRAM.

Small Context (2,048): Very light. The model is very stable even on 8GB RAM.
Medium Context (8,192): Standard. Fits on most modern GPUs.
Large Context (32,768+): Heavy. Might push a 12GB GPU into the "Slow Zone" (System RAM).

2. Why Tune Down?

If you are building a simple chat bot or a translation tool, you do not need 8,000 tokens of memory. By setting PARAMETER num_ctx 2048 in your Modelfile:

Lower VRAM usage: You might be able to run a "smarter" model (like 14B instead of 8B) because you saved VRAM on the context window.
Faster TTFT: The model spent less time "pre-calculating" the memory buffer.

3. When to Tune UP?

You should only increase the context window for these tasks:

Code Review: When you need the AI to see 10 large files at once.
Legal/Academic Synthesis: Reading and comparing multiple PDFs.
Creative Writing: Writing a long chapter where the AI needs to remember what happened on page 1.

4. How to Change Context Safely

If you are moving to a large context (e.g., 64,000), follow these steps:

Check your total VRAM.
Calculate the math: A 64k context for Llama 3 can take ~4GB of extra VRAM.
Subtract and Test: If your model is 5GB and your context is 4GB, you need 9GB total. If you have 8GB VRAM, the model will be extremely slow.

Recommendation: Increase context in increments of 4096 and check ollama ps to see how much memory is being claimed.

Key Takeaways

Context usage is dynamic: It grows as the conversation gets longer.
Lowering num_ctx frees up VRAM for larger, smarter models.
Increasing num_ctx above 16k requires a high-end GPU or Mac Studio.
Always match your context window to the specific task—never use "Max" by default.

Module 7 Lesson 4: Context Window Tuning