API Reference¶
This document provides a detailed reference for the Ollama API endpoints supported by the Ollama Proxy. The proxy aims to be a drop-in replacement for the official Ollama server, so it maintains compatibility with the most common endpoints.
General Information¶
- Base URL:
http://<host>:<port> - Default Port:
11434 - Authentication: All requests require a valid OpenRouter API key configured on the proxy server
- Content-Type: All POST requests should use
Content-Type: application/json
Supported Endpoints¶
Health & Monitoring¶
GET /¶
Returns a simple health check message to confirm that the server is running.
-
Success Response (200 OK):
GET /api/version¶
Returns the version of the proxy.
-
Success Response (200 OK):
GET /health¶
Returns detailed health information about the proxy.
-
Success Response (200 OK):
GET /metrics¶
Returns metrics for monitoring and observability.
-
Success Response (200 OK):
Model Management¶
GET /api/tags¶
Lists all available models that are accessible through the proxy. The list is fetched from OpenRouter and can be filtered using the model filter configuration.
-
Success Response (200 OK):
POST /api/show¶
Provides detailed information about a specific model. Note that much of the information is stubbed since it is not available from the OpenRouter API.
-
Request Body:
-
Success Response (200 OK):
Inference¶
POST /api/chat¶
Handles chat completion requests. This is the primary endpoint for interacting with models.
-
Request Body:
-
Response (Non-streaming):
{ "model": "google/gemini-pro:latest", "created_at": "2023-12-12T14:00:00Z", "message": { "role": "assistant", "content": "The sky is blue because of Rayleigh scattering..." }, "done": true, "total_duration": 0, "load_duration": 0, "prompt_eval_count": null, "prompt_eval_duration": 0, "eval_count": 0, "eval_duration": 0 } -
Response (Streaming): A stream of JSON objects, each representing a token or a final summary.
{"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":"The"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" sky"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" is"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" blue"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" because"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" of"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" Rayleigh"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":" scattering"},"done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","message":{"role":"assistant","content":"."},"done":true}
POST /api/generate¶
Handles text generation requests (a simpler version of /api/chat).
-
Request Body:
-
Response (Non-streaming):
-
Response (Streaming): A stream of JSON objects.
{"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","response":" there","done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","response":" was","done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","response":" a","done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","response":" brave","done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","response":" knight","done":false} {"model":"google/gemini-pro:latest","created_at":"2023-12-12T14:00:00Z","response":"...","done":true}
Embeddings¶
POST /api/embed and POST /api/embeddings¶
Generates embeddings for a given input. Both endpoints are supported for compatibility.
-
Request Body for
/api/embed: -
Request Body for
/api/embeddings: -
Success Response (200 OK):
Process Management¶
GET /api/ps¶
Lists running models (stubbed implementation).
-
Success Response (200 OK):
Multi-Provider Endpoints¶
GET /api/providers¶
Lists all configured providers and their status.
-
Success Response (200 OK):
{ "providers": [ { "type": "openrouter", "enabled": true, "healthy": true, "priority": 1, "request_count": 1234, "error_count": 5, "error_rate": 0.004, "avg_response_time_ms": 850.5 }, { "type": "openai", "enabled": true, "healthy": true, "priority": 2, "request_count": 567, "error_count": 2, "error_rate": 0.003, "avg_response_time_ms": 650.2 } ] }
GET /api/providers/{provider_type}/stats¶
Get detailed statistics for a specific provider.
-
Success Response (200 OK):
{ "provider_type": "openai", "enabled": true, "healthy": true, "priority": 2, "request_count": 567, "successful_requests": 565, "failed_requests": 2, "error_rate": 0.003, "avg_response_time_ms": 650.2, "min_response_time_ms": 200.1, "max_response_time_ms": 2500.8, "circuit_breaker_state": "closed", "last_health_check": "2023-12-12T14:00:00Z", "models_available": 25 }
GET /api/tags/{provider_type}¶
Lists models available from a specific provider.
-
Success Response (200 OK):
Error Responses¶
The proxy returns standardized error responses for various conditions:
Model Not Found (404)¶
Model Forbidden (403)¶
OpenRouter API Error (502)¶
Internal Server Error (500)¶
Unsupported Endpoints¶
The following Ollama API endpoints are not supported by the proxy and will return an HTTP 501 Not Implemented error:
POST /api/createPOST /api/copyDELETE /api/deletePOST /api/pullPOST /api/pushPOST /api/blobs/{digest}HEAD /api/blobs/{digest}
These endpoints are related to local model management, which is not applicable when using the OpenRouter proxy.