Module 6 Lesson 3: Supported Architectures
·AI & LLMs

Module 6 Lesson 3: Supported Architectures

Not all models are equal. Understanding which architectures (Llama, Mistral, BERT) work with the Ollama engine.

Supported Architectures: Is Your Model Compatible?

Ollama is built on llama.cpp. While the name suggests it only runs "Llama" models, it actually supports dozens of different mathematical structures (architectures). If you find a model on Hugging Face, you need to check if its "Architecture" is supported before you try to import it.

1. The "Big" Supported Architectures

These work 100% and are extremely fast:

  • Llama / Llama 2 / Llama 3: The standard.
  • Mistral / Mixtral: High-efficiency attention.
  • Gemma: Google's transformer variant.
  • Falcon: Primarily used for technical and scientific data.
  • StarCoder: Designed for programming.

2. MoE (Mixture of Experts)

Ollama has excellent support for MoE models like mixtral:8x7b.

  • In an MoE model, only a small part of the brain "wakes up" for each word.
  • Challenge: You need enough RAM to store the whole thing, but it only uses a fraction of the compute power. Ollama handles this complex "sparse" math automatically.

3. What is NOT Supported?

Encoder-Only Models (BERT)

Models like BERT or RoBERTa are used for classification (is this email spam?) but they cannot "chat." Ollama is a "Generative" engine, so it generally does not run these classification-only models.

Image Generators (Stable Diffusion)

Ollama is for LLMs (Text). It cannot run Stable Diffusion or Midjourney-style image generators.

Experimental Non-Transformer Models

New architectures like Mamba (State Space Models) are being added to the registry slowly. Before you pull a Mamba model, check the Ollama GitHub Releases to see if that specific version of Ollama supports it.


4. How to Check Compatibility

When looking at a model on Hugging Face:

  1. Go to the "Files" tab.
  2. Open the config.json file.
  3. Look for the field "model_type".
  4. If it says "llama", "mistral", "gemma", or "qwen", you are 99% likely to be successful.

Key Takeaways

  • Ollama supports the majority of Transformer-based Generative models.
  • MoE (Mixture of Experts) is supported but requires high RAM.
  • Encoder-only (BERT) and Image models (Stable Diffusion) are not supported.
  • Check the model_type in the config.json on Hugging Face before you waste time downloading 40GB of data.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn