Usage Examples¶
This guide provides practical examples of how to use the Ollama Proxy with various tools and applications. These examples will help you get started quickly and show you how to integrate the proxy into your existing workflows.
Table of Contents¶
- Basic Usage with Ollama CLI
- Using with Ollama Clients
- Programming Language Examples
- Advanced Configuration Examples
- Troubleshooting Common Issues
Basic Usage with Ollama CLI¶
Listing Available Models¶
Once the proxy is running, you can list all available models:
# List models through the proxy
ollama list
# Or using curl directly
curl http://localhost:11434/api/tags
Chat with a Model¶
You can chat with any model available through OpenRouter:
# Start a chat session
ollama run gemini-pro:latest
# Or send a single message
echo "Why is the sky blue?" | ollama run gemini-pro:latest
Generate Text¶
Generate text using a model:
# Simple text generation
ollama run gemini-pro:latest "Write a short poem about programming"
# Using prompt files
ollama run gemini-pro:latest -f ./prompt.txt
Using with Ollama Clients¶
Python Client Example¶
If you're using the Ollama Python client, you can configure it to use the proxy:
import ollama
# Configure client to use proxy
client = ollama.Client(host='http://localhost:11434')
# List models
models = client.list()
print(models)
# Chat with a model
response = client.chat(
model='gemini-pro:latest',
messages=[{
'role': 'user',
'content': 'Why is the sky blue?',
}]
)
print(response['message']['content'])
JavaScript/Node.js Client Example¶
For the JavaScript client:
import ollama from 'ollama';
// The client will automatically use http://localhost:11434
// unless you specify a different host
// List models
const models = await ollama.list();
console.log(models);
// Chat with a model
const response = await ollama.chat({
model: 'gemini-pro:latest',
messages: [{ role: 'user', content: 'Why is the sky blue?' }]
});
console.log(response.message.content);
Programming Language Examples¶
Python Direct API Usage¶
You can also interact with the proxy directly using HTTP requests:
import requests
import json
# Proxy endpoint
proxy_url = 'http://localhost:11434'
# List available models
response = requests.get(f'{proxy_url}/api/tags')
models = response.json()
print("Available models:", [m['name'] for m in models['models']])
# Chat completion
chat_data = {
"model": "gemini-pro:latest",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
]
}
response = requests.post(
f'{proxy_url}/api/chat',
headers={'Content-Type': 'application/json'},
data=json.dumps(chat_data)
)
if response.status_code == 200:
result = response.json()
print("Response:", result['message']['content'])
else:
print(f"Error: {response.status_code} - {response.text}")
Streaming Responses¶
To handle streaming responses in Python:
import requests
import json
proxy_url = 'http://localhost:11434'
# Streaming chat completion
chat_data = {
"model": "gemini-pro:latest",
"messages": [
{
"role": "user",
"content": "Write a story about a robot learning to paint"
}
],
"stream": True
}
with requests.post(
f'{proxy_url}/api/chat',
headers={'Content-Type': 'application/json'},
data=json.dumps(chat_data),
stream=True
) as response:
for line in response.iter_lines():
if line:
chunk = json.loads(line.decode('utf-8'))
if 'message' in chunk and 'content' in chunk['message']:
print(chunk['message']['content'], end='', flush=True)
if chunk.get('done', False):
break
cURL Examples¶
You can also use cURL to interact with the proxy:
# List models
curl http://localhost:11434/api/tags
# Chat completion
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-pro:latest",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
# Streaming chat completion
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-pro:latest",
"messages": [
{
"role": "user",
"content": "Count from 1 to 100."
}
],
"stream": true
}'
Advanced Configuration Examples¶
Model Filtering¶
Create a custom models-filter.txt to limit which models are available:
# Only allow these specific models
gemini-pro:latest
gpt-4:latest
claude-3-5-sonnet:latest
# Allow all models from a specific family using wildcards
llama*:*latest
# Exclude specific models (prefix with !)
!llama-2-13b-chat:latest
Then start the proxy with your filter:
Environment-based Configuration¶
Create a .env file for different environments:
# Development environment
OPENROUTER_API_KEY=your_dev_api_key_here
LOG_LEVEL=DEBUG
HOST=localhost
PORT=11434
MODELS_FILTER_PATH=./dev-models-filter.txt
For production:
# Production environment
OPENROUTER_API_KEY=your_prod_api_key_here
LOG_LEVEL=WARNING
HOST=0.0.0.0
PORT=11434
MODELS_FILTER_PATH=./prod-models-filter.txt
MAX_CONCURRENT_REQUESTS=200
Docker Compose with Custom Configuration¶
Create a docker-compose.prod.yml for production deployment:
version: '3.8'
services:
ollama-proxy:
build: .
ports:
- "11434:11434"
environment:
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
- LOG_LEVEL=INFO
- MAX_CONCURRENT_REQUESTS=200
volumes:
- ./prod-models-filter.txt:/app/models-filter.txt
restart: unless-stopped
Troubleshooting Common Issues¶
Connection Issues¶
If you can't connect to the proxy:
- Check if the proxy is running:
- Verify the proxy logs:
# If running with Docker
docker logs ollama-proxy-container
# If running directly
# Check your terminal where you started the proxy
Model Not Found Errors¶
If you get "model not found" errors:
- List available models to see what's actually available:
- Check your model filter configuration:
- Try without filtering to see all models:
API Key Issues¶
If you get authentication errors:
- Verify your API key:
- Test the key directly with OpenRouter:
Performance Issues¶
If responses are slow:
-
Check the proxy logs for any errors or warnings.
-
Monitor the proxy's health endpoint:
- Check the metrics endpoint for performance data: