Python Integration: The Official Library

While you can use the requests library to talk to the API, the Ollama team provides an official Python package that makes things much cleaner and handles all the streaming logic for you.

1. Installation

pip install ollama

2. Basic Generation (Non-Streaming)

import ollama

response = ollama.generate(model='llama3', prompt='Why is the sky blue?')
print(response['response'])

3. The Chat Loop (Streaming)

This is how you build a real-time chatbot in Python:

import ollama

stream = ollama.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'Tell me a story about a dragon.'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

end='': Tells Python not to add a newline after every word.
flush=True: Forces Python to print the word immediately instead of waiting for the buffer to fill up.

4. Handling Images (Multimodal)

If you have llava installed, you can send an image directly from your Python script:

with open('cat.jpg', 'rb') as f:
  response = ollama.generate(
    model='llava',
    prompt='Describe this image.',
    images=[f.read()]
  )
print(response['response'])

5. Async Support

If you are building a modern web app (using FastAPI or Quart), the library supports AsyncOllama:

from ollama import AsyncClient

async def main():
  client = AsyncClient()
  response = await client.chat(model='llama3', messages=[...])

Key Takeaways

The official Python library is the preferred way to interact with Ollama.
It simplifies Streaming and Multimodal inputs.
Use ollama.chat for conversations and ollama.generate for single tasks.
Async support is built-in for high-performance web applications.

Module 8 Lesson 3: Python Integration