Ollama Generative LLM ¶

New for 2025.

Ollama is a freely available package for running large language and vision models on your own computer. We will use it for experiments in interpreting camera data as a possible resource for building responsive and interactive systems.

The following notes may help you in installing and running the ollama client. Please note that we will be scripting it from Python, so the instructions default to installing a command line version without any graphical interface.

macOS Installation Notes ¶

The recommended method for installing Ollama on macOS is via Homebrew. If you already use Homebrew, then the following will install the command-line system:

brew install ollama

The chief advantage of Homebrew is that the components can be easily updated via a similar mechanism:

brew update
brew upgrade ollama

The alternative is to install the graphical application from the Ollama web site Downloads page. This should also install a command line client so most of the examples should work the same. However, the procedure for starting the local server and pulling model files is slightly different.

Starting a Local Server ¶

When using the command-line system, the server can be manually started in its own Terminal or Console window:

ollama serve

The advantage of running the server this way is that the diagnostic messages will be visble.

The graphical system runs the server by default out of sight. It is also possible to configure the command-line system to run the server in the background.

Model Installation ¶

The Ollama system is a platform for downloading and running different language and vision models. The list of available models frequently changes, but can be found on the Ollama Models page.

We will be testing the gemma3:4b model which is small enough to run on most laptops and includes both test and image processing.

Assuming the server is already running, then a pull command can be issued in a new Terminal or Console window:

ollama pull gemma3:4b

This will download the 3.3GB file and make it available to the local server.

Similarly, the graphical version includes menu options to choose and pull a model.

Interactive Testing ¶

The command line client can be used for interactive chat. Assuming you have already started a local server, this will begin an interactive session:

ollama run gemma3:4b

A single response can also be generated from the command line:

ollama run gemma3:4b "Please tell me a joke in eight words or less."

The ollama client can also include a JPEG image file with the prompt. Assuming you have an image file named scene.jpg:

ollama run gemma3:4b "Please describe this image in a single sentence." scene.jpg

Question: is there a way to include an image in a prompt on the graphical client?

Python Scripting ¶

We will primarily use the ollama server using Python scripts. This will allow more systematic testing of prompts and integration into project systems.

The first step is installing the Ollama Python Library (source on github), which is most easily installed using the pip utility which came with your Python installation:

pip install ollama

The specifics of this process may vary with your installation. Note that the recommended practice is to set up a virtual environment so the package and its dependencies can be kept locally, but that is outside the scope of these instructions.

The success of installation can be checked by running Python interactively and importing the package, which should look something like this:

$ python
Python 3.13.5 (v3.13.5:6cb20a219a8, Jun 11 2025, 12:23:45) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ollama
>>> help(ollama)
...

Python Documentation ¶

The basic documentation for the Python API can be found in the README and example code on github:

The Python library is a relatively thin wrapper around the Ollama REST API, which has more details on the specific parameters:

https://github.com/ollama/ollama/blob/main/docs/api.md

Python Script Examples: Text Prompt ¶

A good starting point is to try running the simple text prompt, only issued from a Python script rather than the command line client. The following samples can also be directly downloaded from ollama_examples.

# basic example of running ollama model from Python

import ollama

prompt = "Please tell a short joke in ten words or less."
response = ollama.generate(model='gemma3:4b', prompt=prompt)

print("Reply generated in %f seconds:" % (1e-9*response.total_duration))
print(response.response)

It is possible to vary the output considerably by including other optional parameters as shown in the following.

# using optional ollama parameters with a text prompt

import ollama

prompt = "Please tell a short joke in ten words or less."
system_message = "You are playing the role of a gregarious ten year old boy."
options = { "temperature" : 10.0 }

response = ollama.generate(model='gemma3:4b', prompt=prompt, system=system_message, options=options)

print("Reply generated in %f seconds:" % (1e-9*response.total_duration))
print(response.response)

Python Script Examples: Image Prompt ¶

The Python generate API can include images using an optional parameter. The following sample directly passes the contents of a jpeg file along with a text prompt.

import argparse
import ollama

parser = argparse.ArgumentParser( description = "Use ollama to produce a short image description.")
parser.add_argument('input', default="scene.jpg", type=str, nargs='?', help="Specify input jpeg file name (default: %(default)s).")
args = parser.parse_args()

with open(args.input, "rb") as input:
    jpg_data = input.read()

print("Read %d bytes of image data from %s." % (len(jpg_data), args.input))

prompt = "Please describe this image in twenty words or fewer."
response = ollama.generate(model='gemma3:4b', images=[jpg_data], prompt=prompt)

print("Reply generated in %f seconds:" % (1e-9*response.total_duration))
print(response.response)

Please note this uses the standard Python argparse library to interpret the command line. We will use this library extensively in utility scripts to allow specifying parameters and defaults from the command line.

This prompt may take considerable time to run, making it very clear that this Python API runs synchronously, i.e. it waits for the generated reply. A more complex example may need to use streaming mode and threading or asynchronous I/O to continue other program processes while waiting.