Ollama REST API

Number of APIs: 13

Welcome to the Ollama Collection! This collection was created to get you started with running Ollama APIs locally and quickly. It provides a comprehensive set of examples to help you utilize Ollama APIs based on the official Ollama API docs.

Note: This collection is part of a workspace curated by the Qodex team to help you explore and work with useful APIs. Learn how to contribute to this collaborative space and its collections [here]

Run Ollama Locally

Ollama allows you to run powerful LLM models locally on your machine, and exposes a REST API to interact with them on localhost. Before starting, you must download Ollama and the models you want to use. We'll go through this step by step below.

Step 1: Fork the collection

Fork the collection manually or by clicking the Run in Qodex button belowRun In Qodex

Step 2: Download and install Ollama

Download Ollama and install Ollama for Mac, Linux, and Windows

Step 3: Download the models

Download a model from the Ollama library to your local machine. To start, we'll download Llama 3.1 using the ollama pull command. On your favorite terminal run $ ollama pull Llama3.1:Latest (this will take a little bit of time, the smallest Llama3.1 model is >4G). Repeat this step for the following models:

  • ollama pull codellama:code

  • ollama pull mistral

  • ollama pull llama3.2

See all models available via the Ollama library.

(optional) If you want to run and interact with Llama3.1:Latestin the terminal, run the following command:$ ollama run llama3.1:latest and ask it a question.

Step 4: Start the Ollama server

To run the API and use in Qodex, run ollama serve in your terminal to start a new server.

If you get an error stating port 11434 is already in use, step 3 may have already started the server for you. By default, Ollama listens on port 11434.

  • Run ollama list in your terminal to see if it is running. If yes, a list of your downloaded models will be returned.

  • NOTE: If you update Ollama to listen on a different port, be sure to update the baseUrl collection level variable accordingly.

Step 5: Make your first API request

Since you are running the LLMs locally, depending on your hardware, API requests may take some time to complete.

Streaming

Some example requests state streaming or no streaming. For example, try running the Completion Generate (streaming) request - it will stream the response. There is a post-request script that collects the stream and outputs it as a console.log. (See image below)

The only difference between streaming and non-streaming is including the param "stream": false as part of the request body.


There are three folders in this Collection:

  • Completion - Generate text completions from a local model using the /generate endpoint (used for single-turn text generation). It generates output based on the input prompt without maintaining context or conversation history.

  • Model - You can think of the APIs in this folder as Admin APIs that are used for managing the local models on your machine.

  • Chat - Generate text completions from a local model using the /chat endpoint (used for multi-turn conversations). It maintains a context or conversation history, allowing for more interactive, dynamic exchanges.



Getting Support

We want you to get the best support you can when working with this workspace. If you're stuck and you need help regarding Ollama specific issues, we recommend that you explore the following channels.

For Qodex specific questions or feedback about this workspace:

  • [Qodex's Community Forum] - Provide feedback and ask questions about this workspace, ask general Qodex questions, understand how to use a feature, how to build a workflow, etc.

For Qodex specific issues:



Troubleshooting

  • Error: command not found: ollama

    • Solution: If you've downloaded the app, you likely need to move it out from the downloads folder to your Applications (or similar) folder
  • Error: $ Error: listen tcp 127.0.0.1:11434: bind: address already in use after running ollama serve

    • Solution: run $ export OLLAMA_HOST=127.0.0.1:3000 then run ollama serve again.
    • NOTE: If you update Ollama to listen on a different port, be sure to update the baseUrl collection level variable accordingly.
  • ollama serve --help is your best friend.

  1. Completion - Request with Image POST {{baseUrl}}/api/generate

  2. Model - use created model POST {{baseUrl}}/api/generate

  3. Model - create model POST {{baseUrl}}/api/create

  4. Model - list local models GET {{baseUrl}}/api/tags

  5. Model - show model POST {{baseUrl}}/api/show

  6. Model - copy a model POST {{baseUrl}}/api/copy

  7. Model - delete a model DELETE {{baseUrl}}/api/delete

  8. Model - pull a model: orca-mini POST {{baseUrl}}/api/pull

  9. Model - push a model POST {{baseUrl}}/api/push

  10. Model - generate embedding POST {{baseUrl}}/api/embed