Operations Guide
Day-to-day management: restart after reboot, update models, monitor health, troubleshoot.
docker-compose.yml is in the directory where you ran the setup wizard (usually ~/llmfinder-testing/ or wherever you launched it from).Essential commands
# Start everything (server + tunnel)
docker compose up -d
# Stop everything
docker compose down
# Restart
docker compose restart
# Live logs
docker compose logs -f
# Server logs only
docker compose logs -f llama-server
# Tunnel logs / get URL
docker logs llmfinder-tunnel 2>&1 | grep trycloudflare
# Container status
docker compose ps
# Resource usage (CPU, RAM, GPU)
docker stats
After reboot
Docker Compose uses restart: unless-stopped — your server and tunnel restart automatically when Docker starts. Make sure Docker itself starts on boot:
# Enable Docker on boot (run once)
sudo systemctl enable docker
# Verify after reboot
docker compose ps
Keeping the URL in sync
Option A — Run the script (easiest):
Every time you open the hoster script it automatically checks if the tunnel URL changed and syncs it to LLMFinder:
python3 llmfinder-hoster.py
# Menu opens → auto-sync runs silently
# If URL changed: "✅ Server URL auto-updated: https://new-url.trycloudflare.com"
You can also use menu option 4 "Update Server URL" to manually sync or set a custom URL.
Option B — Update manually:
# Get current tunnel URL (take the last URL printed)
docker logs llmfinder-tunnel 2>&1 | grep trycloudflare
# Update via the hoster portal
# https://api.llmfinder.net/hosters/portal → API Connection Details → Update endpoint
Option C — Permanent URL (no more changes):
Set up a named Cloudflare tunnel with a free Cloudflare account. The URL stays the same across all restarts:
# One-time setup
cloudflared tunnel login
cloudflared tunnel create llmfinder-node
cloudflared tunnel route dns llmfinder-node your-subdomain.yourdomain.com
# In docker-compose.yml, replace cloudflared command with:
# command: tunnel --no-autoupdate run --token YOUR_TUNNEL_TOKEN
https://your-subdomain.yourdomain.com — register that once with LLMFinder and never update it again.Update your server
# Pull latest images and restart
docker compose pull && docker compose up -d
# Check what version is running
docker compose images
Change your model
Edit docker-compose.yml and update the LLAMA_ARG_MODEL value:
# 1. Download the new model
curl -L "https://huggingface.co/.../new-model.gguf" -o ~/llmfinder-models/new-model.gguf
# 2. Edit docker-compose.yml
nano docker-compose.yml
# Change: LLAMA_ARG_MODEL=/models/new-model.gguf
# 3. Restart
docker compose up -d
Change context size
Edit LLAMA_ARG_CTX_SIZE in docker-compose.yml and restart. See Context & GPU Settings for how to choose the right value.
nano docker-compose.yml
# Change: LLAMA_ARG_CTX_SIZE=32768
docker compose up -d
Health check
# Local health check
curl http://localhost:8080/health
# Through tunnel
curl https://your-tunnel.trycloudflare.com/health
# List loaded models
curl http://localhost:8080/v1/models
# Quick inference test
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/json" \
-d '{"model":"your-model.gguf","messages":[{"role":"user","content":"Say OK"}],"max_tokens":5}'
Troubleshooting
Server won't start
# Check logs for errors
docker compose logs llama-server --tail=50
# Common issues:
# - Model file not found → check LLAMA_ARG_MODEL path and ~/llmfinder-models/ contents
# - CUDA error → GPU may be in use by another process
# - Port conflict → change port in docker-compose.yml
GPU not detected
# Verify NVIDIA driver + container toolkit
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# If docker run works but compose doesn't:
# Make sure your compose file has the deploy.resources section
Tunnel URL not showing
# Wait 15 seconds then check
docker logs llmfinder-tunnel 2>&1 | grep -i "trycloudflare\|https://"
# If still nothing, restart tunnel
docker compose restart cloudflared
sleep 10
docker logs llmfinder-tunnel 2>&1 | grep trycloudflare
Out of VRAM
# Reduce GPU layers (partial CPU offload)
nano docker-compose.yml
# Change: LLAMA_ARG_N_GPU_LAYERS=20 (tune down from 99)
# Or reduce context size
# Change: LLAMA_ARG_CTX_SIZE=8192
docker compose up -d
Re-run the setup wizard
You can re-run the wizard at any time to add models, update your endpoint, or fix issues:
python3 llmfinder-hoster.py
The wizard detects existing Docker containers, asks before stopping them, and reuses your saved config from ~/llmfinder-hoster.json.