Module 2 Lesson 6: Ollama Server and API Overview
Going beyond the terminal. Understanding the Ollama REST API and how to talk to your models via HTTP.
Ollama Server & API: The Developer's Gateway
The CLI is great for chatting, but most AI "Engineering" involves connecting Ollama to a website, a Python script, or a mobile app. This is done through the Ollama REST API.
The Local Server
By default, Ollama starts a web server on:
http://localhost:11434
You can test if the server is alive by going to that address in your web browser. You should see a simple message: "Ollama is running".
Core API Endpoints
Ollama's API is simple and clean. Here are the three endpoints you'll use most often:
1. POST /api/generate (Completion)
Used for simple "one-shot" tasks like summarizing a paragraph or translating a sentence.
CURL Example:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Why is the sky blue?"
}'
2. POST /api/chat (Chat)
Used for conversations. It accepts a list of "messages" (User, Assistant, System) to maintain context.
CURL Example:
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
3. POST /api/embeddings (RAG)
Used to turn text into numbers (vectors) for searching through documents. This is the foundation of Module 10.
Streaming vs. Non-Streaming
By default, Ollama streams responses. It sends one word at a time as they are generated. This makes user interfaces feel "alive."
If you are writing a simple script and want the whole answer at once, you can set "stream": false in your request body.
Remote Access: The OLLAMA_HOST Variable
By default, Ollama only listens to requests from your own computer (127.0.0.1). If you want to use your laptop to talk to a desktop PC running Ollama, you need to change a setting:
- Stop Ollama.
- Set the environment variable
OLLAMA_HOSTto0.0.0.0. - Restart Ollama.
Now, anyone on your Wi-Fi network with your IP address can use your local AI models!
Security Warning
Ollama does not have built-in authentication.
If you expose localhost:11434 to the open internet, anybody in the world can use your GPU/CPU and read your model library. Never expose this port on a public server without a "Reverse Proxy" (like Nginx) or a VPN.
Summary Cheat Sheet
| Endpoint | Purpose |
|---|---|
/api/generate | Single prompt/response |
/api/chat | Multi-turn conversation |
/api/tags | List models (same as ollama list) |
/api/pull | Download models via API |
/api/show | Get technical details of a model |
Training Exercise
Try to "ping" your Ollama server using your terminal. Run:
curl http://localhost:11434/api/tags
If it returns a JSON list of your models, you have successfully mastered the API layer!
Key Takeaways
- Ollama is a Restful Web Server running on port
11434. - Everything you can do in the CLI, you can do via HTTP requests.
- Streaming is the default behavior and is best for user experience.
- Be extremely careful about security when exposing your local server to a network.