Ollama Server & API: The Developer's Gateway

The CLI is great for chatting, but most AI "Engineering" involves connecting Ollama to a website, a Python script, or a mobile app. This is done through the Ollama REST API.

The Local Server

By default, Ollama starts a web server on: http://localhost:11434

You can test if the server is alive by going to that address in your web browser. You should see a simple message: "Ollama is running".

Core API Endpoints

Ollama's API is simple and clean. Here are the three endpoints you'll use most often:

1. `POST /api/generate` (Completion)

Used for simple "one-shot" tasks like summarizing a paragraph or translating a sentence.

CURL Example:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'

2. `POST /api/chat` (Chat)

Used for conversations. It accepts a list of "messages" (User, Assistant, System) to maintain context.

CURL Example:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}'

3. `POST /api/embeddings` (RAG)

Used to turn text into numbers (vectors) for searching through documents. This is the foundation of Module 10.

Streaming vs. Non-Streaming

By default, Ollama streams responses. It sends one word at a time as they are generated. This makes user interfaces feel "alive."

If you are writing a simple script and want the whole answer at once, you can set "stream": false in your request body.

Remote Access: The `OLLAMA_HOST` Variable

By default, Ollama only listens to requests from your own computer (127.0.0.1). If you want to use your laptop to talk to a desktop PC running Ollama, you need to change a setting:

Stop Ollama.
Set the environment variable OLLAMA_HOST to 0.0.0.0.
Restart Ollama.

Now, anyone on your Wi-Fi network with your IP address can use your local AI models!

Security Warning

Ollama does not have built-in authentication. If you expose localhost:11434 to the open internet, anybody in the world can use your GPU/CPU and read your model library. Never expose this port on a public server without a "Reverse Proxy" (like Nginx) or a VPN.

Summary Cheat Sheet

Endpoint	Purpose
`/api/generate`	Single prompt/response
`/api/chat`	Multi-turn conversation
`/api/tags`	List models (same as `ollama list`)
`/api/pull`	Download models via API
`/api/show`	Get technical details of a model

Training Exercise

Try to "ping" your Ollama server using your terminal. Run: curl http://localhost:11434/api/tags

If it returns a JSON list of your models, you have successfully mastered the API layer!

Key Takeaways

Ollama is a Restful Web Server running on port 11434.
Everything you can do in the CLI, you can do via HTTP requests.
Streaming is the default behavior and is best for user experience.
Be extremely careful about security when exposing your local server to a network.

Module 2 Lesson 6: Ollama Server and API Overview

Ollama Server & API: The Developer's Gateway

The Local Server

Core API Endpoints

1. `POST /api/generate` (Completion)

2. `POST /api/chat` (Chat)

3. `POST /api/embeddings` (RAG)

Streaming vs. Non-Streaming

Remote Access: The `OLLAMA_HOST` Variable

Security Warning

Summary Cheat Sheet

Training Exercise

Key Takeaways

Subscribe to our newsletter

Ollama Server & API: The Developer's Gateway

The Local Server

Core API Endpoints

1. POST /api/generate (Completion)

2. POST /api/chat (Chat)

3. POST /api/embeddings (RAG)

Streaming vs. Non-Streaming

Remote Access: The OLLAMA_HOST Variable

Security Warning

Summary Cheat Sheet

Training Exercise

Key Takeaways

Subscribe to our newsletter

1. `POST /api/generate` (Completion)

2. `POST /api/chat` (Chat)

3. `POST /api/embeddings` (RAG)

Remote Access: The `OLLAMA_HOST` Variable