OctoLLM Unraid Deployment Guide

Complete guide for deploying OctoLLM on Unraid 7.2.0 with Dell PowerEdge R730xd hardware.

Introduction
Prerequisites
Hardware Requirements
Installation
Configuration
GPU Setup
Managing Services
Accessing Services
Local LLM Usage
Troubleshooting
Backup & Restore
Performance Tuning
Monitoring
Security
Migration to Cloud

Introduction

OctoLLM is a distributed AI architecture inspired by octopus neurobiology. This guide covers local deployment on Unraid, optimized for development with GPU-accelerated LLM inference.

Why Unraid?

Native Docker Support: Excellent Docker management UI
Hardware Flexibility: Mix and match drives, use cache effectively
GPU Passthrough: Strong support for NVIDIA GPUs
Community: Large community with extensive documentation

Deployment Architecture

┌───────────────────────────────────────────────────────────┐
│                    Unraid Host (bond0)                    │
│  ┌─────────────────────────────────────────────────────┐  │
│  │         Docker Bridge: octollm-net (172.20.0.0/16)  │  │
│  │                                                     │  │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │  │
│  │  │  Reflex  │  │Orchestr. │  │  6 Arms          │   │  │
│  │  │  Layer   │  │          │  │  (Planner,       │   │  │
│  │  │  (Rust)  │  │ (Python) │  │   Executor,      │   │  │
│  │  │          │  │          │  │   Retriever,     │   │  │
│  │  │  :3001   │  │  :3000   │  │   Coder,         │   │  │
│  │  │          │  │          │  │   Judge,         │   │  │
│  │  │          │  │          │  │   Guardian)      │   │  │
│  │  │          │  │          │  │  :6001-6006      │   │  │
│  │  └────┬─────┘  └────┬─────┘  └────────┬─────────┘   │  │
│  │       │             │                 │             │  │
│  │       └─────────────┴─────────────────┘             │  │
│  │                     │                               │  │
│  │  ┌──────────────────┴──────────────────────┐        │  │
│  │  │                                         │        │  │
│  │  ▼                                         ▼        │  │
│  │  ┌──────────┐  ┌──────┐  ┌──────┐  ┌──────────┐     │  │
│  │  │PostgreSQL│  │Redis │  │Qdrant│  │  Ollama  │     │  │
│  │  │  15      │  │  7   │  │ 1.7.4│  │ (Models) │     │  │
│  │  │  :3010   │  │:3011 │  │:3012 │  │  :3014   │     │  │
│  │  └──────────┘  └──────┘  └──────┘  └──────┬───┘     │  │
│  │                                           │         │  │
│  │  ┌──────────────────────────────────────┐ │         │  │
│  │  │       Monitoring Stack               │ │         │  │
│  │  │  ┌──────────┐  ┌────────┐ ┌──────┐   │ │         │  │
│  │  │  │Prometheus│  │Grafana │ │ Loki │   │ │         │  │
│  │  │  │  :9090   │  │ :3030  │ │:3100 │   │ │         │  │
│  │  │  └──────────┘  └────────┘ └──────┘   │ │         │  │
│  │  └──────────────────────────────────────┘ │         │  │
│  └───────────────────────────────────────────┼─────────┘  │
│                                              │            │
│                                         ┌────▼──────┐     │
│                                         │ Tesla P40 │     │
│                                         │  24GB     │     │
│                                         │  VRAM     │     │
│                                         └───────────┘     │
└───────────────────────────────────────────────────────────┘

Prerequisites

Software Requirements

Software	Minimum Version	Recommended	Purpose
Unraid	7.0.0	7.2.0+	Host OS
Docker	20.10	27.5.1+	Container runtime
Docker Compose	1.29	2.40.3+ (V2)	Orchestration
NVIDIA Driver	510+	580.105.08+	GPU support

Unraid Plugins Required

Install from Community Applications:

NVIDIA Driver (for GPU support)
- Search: "nvidia driver"
- Install: "nvidia-driver" by ich777
- Reboot after installation
Compose Manager (optional, for UI management)
- Search: "compose manager"
- Install: "compose.manager" by dcflachs
NerdTools (optional, for additional utilities)
- Useful for jq, git, and other tools

User Account Setup

Create Unraid user account with access to:

Docker management
Console/SSH access
Appdata shares

Hardware Requirements

Minimum Configuration

Component	Minimum	Recommended	Notes
CPU	4 cores	8+ cores	More cores = better parallelism
RAM	16GB	64GB+	More RAM = larger models
Storage	50GB free	200GB+ free	Models are large (5-50GB each)
GPU	None	NVIDIA Tesla P40	Optional but highly recommended
Network	100Mbps	1Gbps+	For model downloads

Recommended: Dell PowerEdge R730xd

This guide is optimized for:

CPU:     Dual Intel Xeon E5-2683 v4 @ 2.10GHz
         - 32 physical cores (64 threads with HT)
         - 2 NUMA nodes
         - 40MB L3 cache

RAM:     503.8 GiB DDR4 ECC
         - 16× 32GB DIMMs
         - 2400 MHz
         - Error-correcting for reliability

GPU:     NVIDIA Tesla P40
         - 24GB GDDR5 VRAM
         - 3840 CUDA cores
         - 250W TDP
         - CUDA 13.0 support

Storage: 144TB array (10 disks)
         - 1.8TB SSD cache (btrfs)
         - 128GB Docker vDisk

Network: 4× Intel I350 Gigabit NICs
         - Bonded to 4Gbps aggregate (bond0)
         - LACP mode 4

GPU Compatibility

Supported GPUs (tested):

NVIDIA Tesla P40 (24GB) ✅
NVIDIA Tesla P100 (16GB) ✅
NVIDIA Tesla V100 (32GB) ✅
NVIDIA RTX 3090 (24GB) ✅
NVIDIA RTX 4090 (24GB) ✅

Minimum VRAM for models:

Small models (7-13B): 8GB VRAM
Medium models (30-70B): 24GB VRAM
Large models (70B+): 48GB+ VRAM or multi-GPU

Installation

Step 1: Install NVIDIA Driver Plugin

Open Unraid WebUI: http://tower.local (or your server IP)
Navigate to Apps tab
Search for "nvidia driver"
Click Install on "nvidia-driver" by ich777
Wait for installation to complete
Reboot server
After reboot, verify:

# SSH to Unraid
ssh root@tower.local

# Test NVIDIA driver
nvidia-smi

Expected Output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08     Driver Version: 580.105.08   CUDA Version: 13.0 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P40           Off  | 00000000:03:00.0 Off |                    0 |
| N/A   30C    P0    49W / 250W |      0MiB / 24576MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Step 2: Clone Repository

# SSH to Unraid
ssh root@tower.local

# Navigate to appdata
cd /mnt/user/appdata

# Clone OctoLLM repository
git clone https://github.com/your-org/octollm.git
cd octollm

Step 3: Run Setup Script

The automated setup script will:

Create directory structure
Generate secure passwords
Configure environment files
Download Ollama models
Initialize databases
Start all services

cd /mnt/user/appdata/octollm/infrastructure/unraid

# Make script executable (if needed)
chmod +x setup-unraid.sh

# Run setup
bash setup-unraid.sh

Setup Process:

[INFO] Checking prerequisites...
[SUCCESS] Docker is installed: Docker version 27.5.1
[SUCCESS] Docker Compose V2 is installed: 2.40.3
[SUCCESS] NVIDIA driver is installed: 580.105.08
[SUCCESS] Detected GPU: Tesla P40 with 24576 MiB VRAM

[INFO] Creating directory structure in /mnt/user/appdata/octollm/...
[SUCCESS] Created directory: /mnt/user/appdata/octollm/postgres/data
[SUCCESS] Created directory: /mnt/user/appdata/octollm/redis/data
...

[INFO] Setting up environment configuration...
[SUCCESS] Environment file created: .env.unraid
[INFO] Secure passwords generated. Save these credentials:
PostgreSQL Password: xK9fL2mN8vP4qR7sT1wU6yZ3aB5cD0eF
Redis Password: gH4jK1lM7nP9qR2sT8vW5xY0zA3bC6dE
Qdrant API Key: fG1hI4jK7lM0nP3qR6sT9uV2wX5yZ8aB
Grafana Admin Password: cD0eF3gH6iJ9kL2mN5oP8qR1sT4uV7wX

[INFO] Creating PostgreSQL initialization script...
[SUCCESS] PostgreSQL initialization script created

[INFO] Setting up GPU and downloading Ollama models...
[WARNING] This may take 15-30 minutes depending on your internet speed.
[INFO] Pulling model: llama3.1:8b
[SUCCESS] Model llama3.1:8b downloaded successfully
...

[INFO] Starting OctoLLM services...
[SUCCESS] OctoLLM services started successfully

============================================================================
[SUCCESS] OctoLLM Unraid Setup Complete!
============================================================================

Access URLs:
  Orchestrator API:    http://192.168.4.6:3000
  Orchestrator Docs:   http://192.168.4.6:3000/docs
  Reflex Layer API:    http://192.168.4.6:3001
  Grafana Dashboard:   http://192.168.4.6:3030
  Prometheus:          http://192.168.4.6:9090
  Ollama API:          http://192.168.4.6:3014

Credentials:
  Grafana:
    Username: admin
    Password: cD0eF3gH6iJ9kL2mN5oP8qR1sT4uV7wX

Step 4: Verify Installation

Run test suite:

# Test prerequisites
bash tests/test-prerequisites.sh

# Test GPU access
bash tests/test-gpu.sh

# Test Ollama inference
bash tests/test-ollama.sh

# Test service health (wait 2-3 minutes after startup)
bash tests/test-services.sh

All tests should pass:

============================================================================
OctoLLM Service Health Test
============================================================================

[PASS] orchestrator is healthy
[PASS] reflex-layer is healthy
[PASS] planner-arm is healthy
...

============================================================================
Summary: 11 passed, 0 failed
============================================================================
[SUCCESS] All services are healthy!

Configuration

Environment Variables

Edit /mnt/user/appdata/octollm/infrastructure/unraid/.env.unraid:

# Network Configuration
HOST_IP=192.168.4.6                    # Change to your Unraid server IP

# Database Credentials (auto-generated by setup)
POSTGRES_DB=octollm
POSTGRES_USER=octollm
POSTGRES_PASSWORD=xK9fL2mN8vP4qR7sT1wU6yZ3aB5cD0eF
REDIS_PASSWORD=gH4jK1lM7nP9qR2sT8vW5xY0zA3bC6dE
QDRANT_API_KEY=fG1hI4jK7lM0nP3qR6sT9uV2wX5yZ8aB

# Local LLM Configuration
PREFER_LOCAL_LLM=true                  # Use GPU-accelerated local inference
OLLAMA_PRIMARY_MODEL=llama3.1:8b       # Fast general-purpose model
OLLAMA_FALLBACK_MODEL=mixtral:8x7b     # Advanced reasoning model
OLLAMA_NUM_PARALLEL=4                  # Concurrent requests (GPU memory limited)

# Cloud LLM APIs (optional fallback)
OPENAI_API_KEY=                        # Leave empty to skip
ANTHROPIC_API_KEY=                     # Leave empty to skip

# Performance Tuning
MAX_PARALLEL_ARMS=5                    # Max concurrent arm executions
TASK_TIMEOUT=300                       # Task timeout in seconds
CACHE_TTL=3600                         # Cache time-to-live in seconds

# Monitoring
LOG_LEVEL=INFO                         # DEBUG, INFO, WARNING, ERROR
GRAFANA_ADMIN_PASSWORD=cD0eF3gH6iJ9kL2mN5oP8qR1sT4uV7wX

Port Customization

If ports conflict with existing services, edit docker-compose.unraid.yml:

services:
  orchestrator:
    ports:
      - "8000:8000"  # Change 3000 → 8000 if needed

  grafana:
    ports:
      - "3050:3000"  # Change 3030 → 3050 if needed

After changes, restart services:

docker-compose down
docker-compose up -d

GPU Setup

Installing NVIDIA Driver

Method 1: Unraid Plugin (Recommended)

Apps → Search "nvidia driver"
Install "nvidia-driver" by ich777
Reboot
Verify: nvidia-smi

Method 2: Manual Installation

# Download driver
cd /tmp
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/580.105.08/NVIDIA-Linux-x86_64-580.105.08.run

# Install
chmod +x NVIDIA-Linux-x86_64-580.105.08.run
./NVIDIA-Linux-x86_64-580.105.08.run --no-questions --ui=none

# Reboot
reboot

Configuring Docker NVIDIA Runtime

Edit /etc/docker/daemon.json:

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Restart Docker:

/etc/rc.d/rc.docker restart

Testing GPU Access

# Test from host
nvidia-smi

# Test from Docker
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

GPU Monitoring

Real-time monitoring:

# Simple watch
nvidia-smi -l 1

# Detailed with scripts/monitor-resources.sh
cd /mnt/user/appdata/octollm/infrastructure/unraid
bash scripts/monitor-resources.sh

Grafana dashboard:

Navigate to http://192.168.4.6:3030
Login with admin / [password from .env.unraid]
Dashboard: "OctoLLM Unraid Dashboard"
GPU section shows:
- Utilization %
- Temperature
- Memory usage
- Power consumption

Managing Services

Docker Compose Commands

Navigate to compose directory first:

cd /mnt/user/appdata/octollm/infrastructure/unraid

Start all services:

docker-compose up -d

Stop all services:

docker-compose stop

Restart all services:

docker-compose restart

Stop and remove containers:

docker-compose down

View status:

docker-compose ps

View logs:

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f orchestrator

# Last 100 lines
docker-compose logs --tail=100 orchestrator

Individual Service Management

Restart single service:

docker-compose restart orchestrator

Rebuild single service:

docker-compose build orchestrator
docker-compose up -d orchestrator

Scale arms (if needed):

docker-compose up -d --scale planner-arm=2

Unraid Docker UI

Services also appear in Unraid Docker tab:

Click container name to view logs
Click "Console" for shell access
Click "Edit" to modify settings
Use "Autostart" to start on boot

Accessing Services

Web Interfaces

Service	URL	Credentials
Grafana	http://192.168.4.6:3030	admin / [.env.unraid]
Prometheus	http://192.168.4.6:9090	None
Orchestrator Docs	http://192.168.4.6:3000/docs	None
cAdvisor	http://192.168.4.6:8080	None

API Endpoints

Orchestrator (Main API):

# Health check
curl http://192.168.4.6:3000/health

# API documentation
open http://192.168.4.6:3000/docs

# Submit task
curl -X POST http://192.168.4.6:3000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Explain quantum computing in simple terms",
    "constraints": {"max_tokens": 500}
  }'

# Get task status
curl http://192.168.4.6:3000/api/v1/tasks/abc123

Ollama (Local LLM):

# List models
curl http://192.168.4.6:3014/api/tags

# Generate completion
curl http://192.168.4.6:3014/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

# Chat completion
curl http://192.168.4.6:3014/api/chat -d '{
  "model": "llama3.1:8b",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}'

Prometheus (Metrics):

# Query API
curl 'http://192.168.4.6:9090/api/v1/query?query=up'

# GPU metrics
curl 'http://192.168.4.6:9090/api/v1/query?query=DCGM_FI_DEV_GPU_UTIL'

Local LLM Usage

Ollama Model Management

List installed models:

docker exec octollm-ollama ollama list

Pull new model:

# Small model (< 10GB)
docker exec octollm-ollama ollama pull llama3:8b

# Medium model (< 30GB)
docker exec octollm-ollama ollama pull mixtral:8x7b

# Large model (requires 48GB+ VRAM or multi-GPU)
docker exec octollm-ollama ollama pull llama3:70b

# Specialized models
docker exec octollm-ollama ollama pull codellama:13b    # Code generation
docker exec octollm-ollama ollama pull nomic-embed-text # Embeddings
docker exec octollm-ollama ollama pull llama3-vision    # Image understanding

Remove model:

docker exec octollm-ollama ollama rm llama3:70b

Model disk usage:

du -sh /mnt/user/appdata/octollm/ollama/models

Recommended Models by Use Case

Use Case	Model	VRAM	Speed	Quality
General Chat	llama3.1:8b	8GB	Fast	Good
Advanced Reasoning	mixtral:8x7b	24GB	Medium	Excellent
Code Generation	codellama:13b	13GB	Medium	Excellent
Code Completion	codellama:7b	7GB	Fast	Good
Embeddings	nomic-embed-text	1GB	Very Fast	Excellent
Long Context	llama3-longcontext:70b	48GB	Slow	Excellent

Performance Tuning

Concurrent requests:

# .env.unraid
OLLAMA_NUM_PARALLEL=4  # Reduce if OOM errors, increase if underutilized

Model keep-alive:

# .env.unraid
OLLAMA_KEEP_ALIVE=5m   # How long to keep model in VRAM

Max loaded models:

# .env.unraid
OLLAMA_MAX_LOADED_MODELS=3  # Max models in VRAM simultaneously

Switching Between Local and Cloud

Use local LLM (default, cost-free):

# .env.unraid
PREFER_LOCAL_LLM=true

Use cloud APIs (when local unavailable):

# .env.unraid
PREFER_LOCAL_LLM=false
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...

Automatic fallback (best of both worlds):

# .env.unraid
PREFER_LOCAL_LLM=true
OPENAI_API_KEY=sk-proj-...  # Used only if local fails

Troubleshooting

Common Issues

1. Services Won't Start

Symptom: docker-compose up -d fails or services crash immediately.

Check logs:

docker-compose logs orchestrator

Common causes:

Port conflicts
Insufficient resources
Missing environment variables

Solutions:

# Check port availability
ss -tuln | grep -E ':(3000|3001|6001|9090)'

# Check Docker resources
docker info | grep -E "CPUs|Total Memory"

# Verify .env.unraid exists
ls -la .env.unraid

# Recreate from scratch
docker-compose down -v
bash setup-unraid.sh

2. GPU Not Detected

Symptom: nvidia-smi: command not found or Ollama not using GPU.

Diagnose:

# Test NVIDIA driver
nvidia-smi

# Test Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

# Check Ollama logs
docker logs octollm-ollama | grep -i gpu

Solutions:

# Reinstall NVIDIA driver plugin
# Apps → nvidia-driver → Force Update
# Reboot server

# Check Docker NVIDIA runtime
cat /etc/docker/daemon.json
# Should have "nvidia" runtime configured

# Restart Ollama with GPU
docker-compose restart ollama

3. Out of Memory Errors

Symptom: Containers killed with OOM, logs show memory errors.

Check memory usage:

free -h
docker stats --no-stream

Solutions:

# Reduce concurrent requests
# Edit .env.unraid:
OLLAMA_NUM_PARALLEL=2
MAX_PARALLEL_ARMS=3

# Increase container memory limits
# Edit docker-compose.unraid.yml:
services:
  ollama:
    deploy:
      resources:
        limits:
          memory: 24G  # Increase from 16G

# Use smaller models
docker exec octollm-ollama ollama pull llama3:8b
# Instead of mixtral:8x7b

4. Slow Inference

Symptom: LLM responses take > 30 seconds.

Check GPU usage:

nvidia-smi -l 1

If GPU usage is low:

Model not loaded properly
CPU inference fallback
Queue backlog

Solutions:

# Force model load
docker exec octollm-ollama ollama run llama3.1:8b "Hello"

# Check Ollama logs for errors
docker logs octollm-ollama --tail=100

# Verify GPU passthrough
docker inspect octollm-ollama | grep -A5 DeviceRequests

# Restart Ollama
docker-compose restart ollama

If GPU usage is high (100%):

Normal behavior during inference
Consider faster model or more GPUs
Reduce parallel requests

5. Database Connection Errors

Symptom: Services can't connect to PostgreSQL/Redis.

Check database health:

docker-compose ps postgres redis
docker logs octollm-postgres --tail=50
docker logs octollm-redis --tail=50

Solutions:

# Wait for health checks
docker-compose ps  # Check health status

# Manual health check
docker exec octollm-postgres pg_isready -U octollm
docker exec octollm-redis redis-cli ping

# Restart databases
docker-compose restart postgres redis

# Check network connectivity
docker exec octollm-orchestrator ping postgres
docker exec octollm-orchestrator ping redis

6. Port Conflicts

Symptom: "bind: address already in use"

Find conflicting process:

ss -tuln | grep :3000
lsof -i :3000

Solutions:

# Stop conflicting service
docker stop conflicting-container
# Or change OctoLLM ports in docker-compose.unraid.yml

# Use alternative ports
# Edit docker-compose.unraid.yml:
services:
  orchestrator:
    ports:
      - "8000:8000"  # Changed from 3000

Logging and Debugging

Enable debug logging:

# Edit .env.unraid
LOG_LEVEL=DEBUG
RUST_LOG=debug
RUST_BACKTRACE=1

# Restart services
docker-compose restart

View aggregated logs:

# All services, follow mode
docker-compose logs -f

# Specific time range
docker-compose logs --since="2024-01-15T10:00:00"

# Filter by keyword
docker-compose logs | grep ERROR

Access container shell:

# Orchestrator (Python)
docker exec -it octollm-orchestrator bash

# Ollama (check models)
docker exec -it octollm-ollama bash
ls -lh /root/.ollama/models

Check resource usage:

# Real-time stats
docker stats

# Per-container stats
docker stats octollm-ollama

# Custom monitoring script
bash scripts/monitor-resources.sh

Getting Help

Check logs first: docker-compose logs [service]
Search GitHub issues: https://github.com/your-org/octollm/issues
Ask in discussions: https://github.com/your-org/octollm/discussions
Unraid forum: https://forums.unraid.net

When reporting issues, include:

Unraid version: cat /etc/unraid-version
Hardware specs: CPU, RAM, GPU
Docker version: docker --version
Logs: docker-compose logs [service] --tail=100
Config: .env.unraid (redact passwords!)

Backup & Restore

Automated Backup

Run backup script:

cd /mnt/user/appdata/octollm/infrastructure/unraid
bash scripts/backup-data.sh

Output:

Starting OctoLLM backup...
Timestamp: 20250112_143022
Stopping services...
Backing up PostgreSQL...
Backing up data directories...
Backup complete!
  PostgreSQL: 150M
  Data files: 2.5G
  Location: /mnt/user/backups/octollm
Restarting services...
Done!

Backup location:

/mnt/user/backups/octollm/
├── octollm_backup_20250112_143022_postgres.sql
└── octollm_backup_20250112_143022_data.tar.gz

Manual Backup

PostgreSQL only:

docker exec octollm-postgres pg_dumpall -U octollm > backup_$(date +%Y%m%d).sql

Data directories:

tar -czf octollm_data_$(date +%Y%m%d).tar.gz \
  -C /mnt/user/appdata \
  --exclude='octollm/ollama/models' \
  octollm/

Ollama models (optional, large):

tar -czf octollm_models_$(date +%Y%m%d).tar.gz \
  -C /mnt/user/appdata/octollm/ollama \
  models/

Restore from Backup

Step 1: Stop services:

cd /mnt/user/appdata/octollm/infrastructure/unraid
docker-compose down

Step 2: Restore data directories:

cd /mnt/user/appdata
tar -xzf /mnt/user/backups/octollm/octollm_backup_20250112_143022_data.tar.gz

Step 3: Restore PostgreSQL:

docker-compose up -d postgres
sleep 10
docker exec -i octollm-postgres psql -U octollm < /mnt/user/backups/octollm/octollm_backup_20250112_143022_postgres.sql

Step 4: Restart all services:

docker-compose up -d

Cloud Backup

Sync to cloud storage:

# AWS S3
aws s3 sync /mnt/user/backups/octollm s3://my-bucket/octollm-backups/

# Google Cloud Storage
gsutil -m rsync -r /mnt/user/backups/octollm gs://my-bucket/octollm-backups/

# Rclone (any provider)
rclone sync /mnt/user/backups/octollm remote:octollm-backups/

Performance Tuning

CPU Pinning (NUMA Optimization)

Dell PowerEdge R730xd has 2 NUMA nodes. Pin containers to specific nodes for better performance.

Check NUMA topology:

lscpu | grep NUMA
numactl --hardware

Edit docker-compose.unraid.yml:

services:
  ollama:
    cpuset: "0-15,32-47"  # NUMA node 0
    mem: "0"              # NUMA node 0 memory

  orchestrator:
    cpuset: "16-31,48-63" # NUMA node 1
    mem: "1"              # NUMA node 1 memory

PostgreSQL Tuning

Create custom config:

cat > /mnt/user/appdata/octollm/postgres/postgresql.conf << EOF
# OctoLLM PostgreSQL Performance Tuning

# Memory
shared_buffers = 2GB                  # 25% of dedicated RAM
effective_cache_size = 8GB            # 50% of system RAM
work_mem = 64MB                       # Per query operation
maintenance_work_mem = 512MB          # VACUUM, CREATE INDEX

# Connections
max_connections = 200

# Query Planner
random_page_cost = 1.1               # SSD optimization
effective_io_concurrency = 200       # SSD parallel I/O

# WAL
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y%m%d.log'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_statement = 'none'               # 'all' for debugging
log_duration = off
log_min_duration_statement = 1000    # Log slow queries (> 1s)
EOF

Mount in docker-compose.unraid.yml:

services:
  postgres:
    volumes:
      - /mnt/user/appdata/octollm/postgres/postgresql.conf:/var/lib/postgresql/data/postgresql.conf:ro
    command: postgres -c config_file=/var/lib/postgresql/data/postgresql.conf

Redis Tuning

Edit .env.unraid:

# Redis Configuration
REDIS_MAXMEMORY=4gb
REDIS_MAXMEMORY_POLICY=allkeys-lru

# Persistence (reduce writes for performance)
REDIS_SAVE_SECONDS=900 1            # Save after 15 min if 1+ key changed
REDIS_SAVE_SECONDS_2=300 10         # Save after 5 min if 10+ keys changed

Ollama GPU Performance

Maximize throughput:

# .env.unraid
OLLAMA_NUM_PARALLEL=4              # Max concurrent requests (GPU memory limited)
OLLAMA_KEEP_ALIVE=10m              # Keep models loaded longer
OLLAMA_MAX_LOADED_MODELS=2         # Reduce model swapping

Power limit (Tesla P40 defaults to 250W):

# Increase to maximum (if cooling allows)
nvidia-smi -pl 250

# Monitor temperature
nvidia-smi -l 1
# Should stay below 85°C

Network Optimization

MTU tuning (for 4Gbps bond):

# Check current MTU
ip link show bond0

# Increase MTU (if switch supports)
ifconfig bond0 mtu 9000

# Test with jumbo frames
ping -M do -s 8972 192.168.4.6

Docker network tuning:

# Edit docker-compose.unraid.yml
networks:
  octollm-net:
    driver: bridge
    driver_opts:
      com.docker.network.driver.mtu: 9000  # Jumbo frames

Monitoring

Grafana Dashboards

Access Grafana:

URL: http://192.168.4.6:3030
Username: admin
Password: [from .env.unraid]

Pre-configured dashboards:

OctoLLM Unraid Dashboard (default)
- System overview (CPU, RAM, disk, network)
- GPU metrics (utilization, temperature, memory, power)
- Service health status
- Database performance
- Ollama LLM metrics
- Container resources
Import additional dashboards:
- Click "+ → Import"
- Enter dashboard ID or upload JSON
- Recommended IDs:
  - 1860: Node Exporter Full
  - 179: Docker Host & Container Overview
  - 12321: NVIDIA DCGM Exporter

Prometheus Alerts

View alerts:

URL: http://192.168.4.6:9090/alerts

Alert rules (from prometheus/alerts.unraid.yml):

High CPU usage (> 80%)
High memory usage (> 85%)
Low disk space (< 10%)
High GPU temperature (> 80°C)
Service down
Database connection exhaustion
High error rate

Configure alerting (Slack, email, PagerDuty):

Edit /mnt/user/appdata/octollm/prometheus/config/prometheus.yml:

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'alertmanager:9093'

Deploy Alertmanager:

# Add to docker-compose.unraid.yml
services:
  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro

Real-Time Monitoring

Custom monitoring script:

bash scripts/monitor-resources.sh

Output:

╔════════════════════════════════════════════════════════════════════════════╗
║  OctoLLM Resource Monitor - tower
║  Uptime: up 5 days, 12 hours
╚════════════════════════════════════════════════════════════════════════════╝

CPU (64 cores): 45.2%
[██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░]

RAM (504GB): 125GB / 504GB (24.8%)
[████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
NVIDIA Tesla P40 GPU
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Utilization:  87%
VRAM:         18432MB / 24576MB (75.0%)
Temperature:  72°C
Power:        187W / 250W

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Storage (/mnt/user)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Usage: 93TB / 144TB (64%)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Network (bond0 - 4Gbps)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Download: 42 MB/s  |  Upload: 18 MB/s

Logging

View logs in Grafana (Loki integration):

Navigate to Explore
Select "Loki" datasource
Query: {container_name=~"octollm-.*"}

Command-line log access:

# Real-time logs
docker-compose logs -f orchestrator

# Search logs
docker-compose logs orchestrator | grep ERROR

# Export logs
docker-compose logs --no-color > octollm-logs-$(date +%Y%m%d).txt

Security

Network Isolation

Firewall rules (iptables):

# Allow from local network only
iptables -A INPUT -p tcp -s 192.168.0.0/16 --dport 3000:9999 -j ACCEPT

# Block from internet
iptables -A INPUT -p tcp --dport 3000:9999 -j DROP

# Save rules (Unraid persists in /boot/config/network.cfg)
iptables-save > /boot/config/firewall-rules

Docker network isolation:

# docker-compose.unraid.yml
networks:
  octollm-net:
    driver: bridge
    internal: false  # Set to true to disable internet access
    ipam:
      config:
        - subnet: 172.20.0.0/16

VPN Access (Recommended)

Option 1: Tailscale (easiest):

# Install Tailscale on Unraid
curl -fsSL https://tailscale.com/install.sh | sh

# Authenticate
tailscale up

# Access from anywhere
# http://tower.tail-scale.ts.net:3000

Option 2: WireGuard (manual):

Install WireGuard plugin from Community Applications
Configure peer
Access via VPN tunnel

Secrets Management

Never commit these files:

.env.unraid
.env.unraid.backup
backups/*.sql

Verify gitignore:

cd /mnt/user/appdata/octollm
git status --ignored
# Should NOT list .env.unraid

Rotate passwords regularly:

# Regenerate all passwords
cd infrastructure/unraid
bash setup-unraid.sh
# Answer "y" when prompted to overwrite .env.unraid

TLS/SSL (Production)

Behind reverse proxy (NGINX Proxy Manager):

Install NGINX Proxy Manager from Community Applications
Create proxy host:
- Domain: octollm.yourdomain.com
- Forward to: 192.168.4.6:3000
- Enable SSL (Let's Encrypt)
Access via: https://octollm.yourdomain.com

Direct TLS (advanced):

# Generate self-signed cert
openssl req -x509 -newkey rsa:4096 -nodes \
  -keyout /mnt/user/appdata/octollm/certs/key.pem \
  -out /mnt/user/appdata/octollm/certs/cert.pem \
  -days 365

# Edit .env.unraid
ENABLE_TLS=true
TLS_CERT_PATH=/mnt/user/appdata/octollm/certs/cert.pem
TLS_KEY_PATH=/mnt/user/appdata/octollm/certs/key.pem

Audit Logging

PostgreSQL audit table (already created by setup):

SELECT * FROM audit.api_logs
ORDER BY timestamp DESC
LIMIT 100;

Query audit logs:

docker exec -it octollm-postgres psql -U octollm -c "
SELECT
  timestamp,
  endpoint,
  method,
  status_code,
  user_id,
  ip_address
FROM audit.api_logs
WHERE timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC;
"

Migration to Cloud

When ready to deploy to production (GKE/EKS):

Step 1: Export Data

# Backup all data
cd /mnt/user/appdata/octollm/infrastructure/unraid
bash scripts/backup-data.sh

# Upload to cloud storage
aws s3 cp /mnt/user/backups/octollm/ s3://my-bucket/octollm-migration/ --recursive

Step 2: Update Configuration

Switch to cloud LLMs:

# .env.cloud
PREFER_LOCAL_LLM=false
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...

Use managed databases:

# .env.cloud
DATABASE_URL=postgresql://user:pass@cloud-sql-instance:5432/octollm
REDIS_URL=redis://redis-memorystore:6379
QDRANT_URL=https://my-cluster.qdrant.io

Step 3: Deploy to Kubernetes

cd /mnt/user/appdata/octollm/infrastructure/kubernetes

# Apply namespace
kubectl apply -f namespaces/octollm-prod-namespace.yaml

# Deploy with Helm (recommended)
helm install octollm ./charts/octollm \
  --namespace octollm-prod \
  --values ./charts/octollm/values-prod.yaml

# Or apply manifests directly
kubectl apply -k overlays/prod

Step 4: Data Migration

PostgreSQL:

# Restore to Cloud SQL
cat backup_postgres.sql | psql "$DATABASE_URL"

Qdrant vectors:

# Use Qdrant snapshot API
curl -X POST http://192.168.4.6:3012/collections/octollm/snapshots
curl -X GET http://192.168.4.6:3012/collections/octollm/snapshots/snapshot_name/download > snapshot.tar

# Upload to Qdrant Cloud
curl -X POST https://my-cluster.qdrant.io/collections/octollm/snapshots/upload \
  -F "snapshot=@snapshot.tar"

Cost Comparison

Component	Unraid (Monthly)	GKE (Monthly)	Difference
Compute	$0 (owned)	$200-500	+$200-500
LLM APIs	$0 (local)	$150-700	+$150-700
Databases	$0	$100-300	+$100-300
Storage	$0	$20-50	+$20-50
Networking	$0	$50-100	+$50-100
Total	~$50 electricity	$520-1,650	+$470-1,600/mo

Break-even analysis:

Development on Unraid: ~$50/month
Production on GKE: ~$1,000/month
Savings during development: $950/month × 6 months = $5,700

See full Cloud Migration Guide for detailed steps.

Conclusion

You now have a fully functional OctoLLM deployment on Unraid with:

✅ GPU-accelerated local LLM inference (Tesla P40) ✅ Complete monitoring stack (Prometheus, Grafana, Loki) ✅ Automated backups and health checks ✅ Production-ready architecture ✅ Cost savings: $150-700/month in LLM API fees

Next Steps

Explore API: http://192.168.4.6:3000/docs
Monitor with Grafana: http://192.168.4.6:3030
Submit test tasks: See API examples above
Optimize performance: Tune based on your workload
Join community: https://github.com/your-org/octollm/discussions

Support

Documentation: https://github.com/your-org/octollm/docs
Issues: https://github.com/your-org/octollm/issues
Discord: https://discord.gg/octollm
Email: support@octollm.io

Last Updated: 2025-11-12 Version: 1.0.0 Tested On: Unraid 7.2.0, Dell PowerEdge R730xd, Tesla P40

Keyboard shortcuts

OctoLLM Documentation