Module 2 Lesson 6: Ollama Server and API Overview
·AI & LLMs

Module 2 Lesson 6: Ollama Server and API Overview

Going beyond the terminal. Understanding the Ollama REST API and how to talk to your models via HTTP.

Ollama Server & API: The Developer's Gateway

The CLI is great for chatting, but most AI "Engineering" involves connecting Ollama to a website, a Python script, or a mobile app. This is done through the Ollama REST API.

The Local Server

By default, Ollama starts a web server on: http://localhost:11434

You can test if the server is alive by going to that address in your web browser. You should see a simple message: "Ollama is running".


Core API Endpoints

Ollama's API is simple and clean. Here are the three endpoints you'll use most often:

1. POST /api/generate (Completion)

Used for simple "one-shot" tasks like summarizing a paragraph or translating a sentence.

CURL Example:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'

2. POST /api/chat (Chat)

Used for conversations. It accepts a list of "messages" (User, Assistant, System) to maintain context.

CURL Example:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}'

3. POST /api/embeddings (RAG)

Used to turn text into numbers (vectors) for searching through documents. This is the foundation of Module 10.


Streaming vs. Non-Streaming

By default, Ollama streams responses. It sends one word at a time as they are generated. This makes user interfaces feel "alive."

If you are writing a simple script and want the whole answer at once, you can set "stream": false in your request body.


Remote Access: The OLLAMA_HOST Variable

By default, Ollama only listens to requests from your own computer (127.0.0.1). If you want to use your laptop to talk to a desktop PC running Ollama, you need to change a setting:

  1. Stop Ollama.
  2. Set the environment variable OLLAMA_HOST to 0.0.0.0.
  3. Restart Ollama.

Now, anyone on your Wi-Fi network with your IP address can use your local AI models!


Security Warning

Ollama does not have built-in authentication. If you expose localhost:11434 to the open internet, anybody in the world can use your GPU/CPU and read your model library. Never expose this port on a public server without a "Reverse Proxy" (like Nginx) or a VPN.


Summary Cheat Sheet

EndpointPurpose
/api/generateSingle prompt/response
/api/chatMulti-turn conversation
/api/tagsList models (same as ollama list)
/api/pullDownload models via API
/api/showGet technical details of a model

Training Exercise

Try to "ping" your Ollama server using your terminal. Run: curl http://localhost:11434/api/tags

If it returns a JSON list of your models, you have successfully mastered the API layer!


Key Takeaways

  • Ollama is a Restful Web Server running on port 11434.
  • Everything you can do in the CLI, you can do via HTTP requests.
  • Streaming is the default behavior and is best for user experience.
  • Be extremely careful about security when exposing your local server to a network.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn