Module 6 Lesson 3: Supported Architectures
Not all models are equal. Understanding which architectures (Llama, Mistral, BERT) work with the Ollama engine.
Supported Architectures: Is Your Model Compatible?
Ollama is built on llama.cpp. While the name suggests it only runs "Llama" models, it actually supports dozens of different mathematical structures (architectures). If you find a model on Hugging Face, you need to check if its "Architecture" is supported before you try to import it.
1. The "Big" Supported Architectures
These work 100% and are extremely fast:
- Llama / Llama 2 / Llama 3: The standard.
- Mistral / Mixtral: High-efficiency attention.
- Gemma: Google's transformer variant.
- Falcon: Primarily used for technical and scientific data.
- StarCoder: Designed for programming.
2. MoE (Mixture of Experts)
Ollama has excellent support for MoE models like mixtral:8x7b.
- In an MoE model, only a small part of the brain "wakes up" for each word.
- Challenge: You need enough RAM to store the whole thing, but it only uses a fraction of the compute power. Ollama handles this complex "sparse" math automatically.
3. What is NOT Supported?
Encoder-Only Models (BERT)
Models like BERT or RoBERTa are used for classification (is this email spam?) but they cannot "chat." Ollama is a "Generative" engine, so it generally does not run these classification-only models.
Image Generators (Stable Diffusion)
Ollama is for LLMs (Text). It cannot run Stable Diffusion or Midjourney-style image generators.
Experimental Non-Transformer Models
New architectures like Mamba (State Space Models) are being added to the registry slowly. Before you pull a Mamba model, check the Ollama GitHub Releases to see if that specific version of Ollama supports it.
4. How to Check Compatibility
When looking at a model on Hugging Face:
- Go to the "Files" tab.
- Open the
config.jsonfile. - Look for the field
"model_type". - If it says
"llama","mistral","gemma", or"qwen", you are 99% likely to be successful.
Key Takeaways
- Ollama supports the majority of Transformer-based Generative models.
- MoE (Mixture of Experts) is supported but requires high RAM.
- Encoder-only (BERT) and Image models (Stable Diffusion) are not supported.
- Check the
model_typein theconfig.jsonon Hugging Face before you waste time downloading 40GB of data.