Kubernetes Deployment Guide

Estimated Time: 2-3 hours Difficulty: Advanced Prerequisites: Kubernetes cluster access, kubectl configured, basic Kubernetes knowledge

Overview

This guide walks you through deploying OctoLLM to a production Kubernetes cluster with:

High availability and auto-scaling
Persistent storage for databases
Service mesh integration (optional)
Monitoring and observability
Security best practices

Prerequisites
Cluster Requirements
Namespace Setup
Storage Configuration
Database Deployment
Core Services Deployment
Ingress Configuration
Scaling Configuration
Security Hardening
Monitoring Setup
Verification
Troubleshooting

Prerequisites

Required Tools

# Verify kubectl installation
kubectl version --client

# Verify Helm installation (v3+)
helm version

# Verify cluster access
kubectl cluster-info
kubectl get nodes

Recommended Versions

Component	Minimum Version	Recommended
Kubernetes	1.25+	1.28+
kubectl	1.25+	1.28+
Helm	3.10+	3.13+
Container Runtime	containerd 1.6+	containerd 1.7+

Required Kubernetes Features

StorageClasses - For persistent volumes
RBAC - For service accounts and permissions
NetworkPolicies - For network isolation
HorizontalPodAutoscaler - For auto-scaling
Ingress Controller - For external access (nginx, traefik, etc.)

Cluster Requirements

Node Resources

Minimum Cluster (Development/Testing):

3 nodes (1 master, 2 workers)
4 vCPU per node
16 GB RAM per node
100 GB SSD storage per node

Production Cluster:

5+ nodes (1 master, 4+ workers)
8 vCPU per node
32 GB RAM per node
200 GB SSD storage per node
Separate node pool for databases (higher IOPS)

Network Requirements

# Required network connectivity
- Intra-cluster: All pods must communicate (CNI configured)
- External API access: OpenAI, Anthropic, etc. (egress allowed)
- Ingress: HTTPS (443) for external requests
- Monitoring: Prometheus scraping (internal)

Namespace Setup

Create OctoLLM Namespace

# Create namespace
kubectl create namespace octollm

# Set as default for this session
kubectl config set-context --current --namespace=octollm

# Verify
kubectl get namespace octollm

Namespace Configuration

# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: octollm
  labels:
    name: octollm
    env: production
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: octollm-quota
  namespace: octollm
spec:
  hard:
    requests.cpu: "32"
    requests.memory: 64Gi
    requests.storage: 500Gi
    persistentvolumeclaims: "10"
    pods: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: octollm-limits
  namespace: octollm
spec:
  limits:
  - max:
      cpu: "4"
      memory: 8Gi
    min:
      cpu: 100m
      memory: 128Mi
    type: Container

Apply the configuration:

kubectl apply -f k8s/namespace.yaml

Storage Configuration

StorageClass Configuration

# k8s/storage/storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: octollm-fast-ssd
provisioner: kubernetes.io/aws-ebs  # Change based on cloud provider
parameters:
  type: gp3
  iopsPerGB: "50"
  encrypted: "true"
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

For different cloud providers:

AWS (EBS):

provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3  # or io2 for higher IOPS
  iopsPerGB: "50"

GCP (Persistent Disk):

provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: regional-pd

Azure (Disk):

provisioner: kubernetes.io/azure-disk
parameters:
  storageaccounttype: Premium_LRS
  kind: Managed

Apply storage configuration:

kubectl apply -f k8s/storage/storageclass.yaml

Database Deployment

PostgreSQL Deployment

# k8s/databases/postgres.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: octollm
data:
  POSTGRES_DB: octollm
  POSTGRES_USER: octollm
---
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: octollm
type: Opaque
stringData:
  POSTGRES_PASSWORD: "CHANGE_ME_SECURE_PASSWORD"  # Use sealed secrets in production
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: octollm
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: octollm-fast-ssd
  resources:
    requests:
      storage: 50Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: octollm
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15-alpine
        ports:
        - containerPort: 5432
          name: postgres
        envFrom:
        - configMapRef:
            name: postgres-config
        - secretRef:
            name: postgres-secret
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
          subPath: postgres
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - octollm
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - octollm
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: octollm
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432
  clusterIP: None  # Headless service for StatefulSet

Redis Deployment

# k8s/databases/redis.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
  namespace: octollm
data:
  redis.conf: |
    maxmemory 2gb
    maxmemory-policy allkeys-lru
    appendonly yes
    appendfsync everysec
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-pvc
  namespace: octollm
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: octollm-fast-ssd
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: octollm
spec:
  serviceName: redis
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
          name: redis
        command:
        - redis-server
        - /etc/redis/redis.conf
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis
        - name: redis-storage
          mountPath: /data
        resources:
          requests:
            cpu: 500m
            memory: 2Gi
          limits:
            cpu: 1000m
            memory: 4Gi
        livenessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: redis-config
        configMap:
          name: redis-config
      - name: redis-storage
        persistentVolumeClaim:
          claimName: redis-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: octollm
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
  clusterIP: None

Qdrant Deployment

# k8s/databases/qdrant.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: qdrant-pvc
  namespace: octollm
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: octollm-fast-ssd
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  namespace: octollm
spec:
  serviceName: qdrant
  replicas: 1
  selector:
    matchLabels:
      app: qdrant
  template:
    metadata:
      labels:
        app: qdrant
    spec:
      containers:
      - name: qdrant
        image: qdrant/qdrant:v1.7.0
        ports:
        - containerPort: 6333
          name: http
        - containerPort: 6334
          name: grpc
        volumeMounts:
        - name: qdrant-storage
          mountPath: /qdrant/storage
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        livenessProbe:
          httpGet:
            path: /
            port: 6333
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 6333
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: qdrant-storage
        persistentVolumeClaim:
          claimName: qdrant-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: qdrant
  namespace: octollm
spec:
  selector:
    app: qdrant
  ports:
  - port: 6333
    targetPort: 6333
    name: http
  - port: 6334
    targetPort: 6334
    name: grpc
  clusterIP: None

Deploy all databases:

kubectl apply -f k8s/databases/postgres.yaml
kubectl apply -f k8s/databases/redis.yaml
kubectl apply -f k8s/databases/qdrant.yaml

# Wait for databases to be ready
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s
kubectl wait --for=condition=ready pod -l app=redis --timeout=300s
kubectl wait --for=condition=ready pod -l app=qdrant --timeout=300s

Core Services Deployment

ConfigMap for Shared Configuration

# k8s/core/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: octollm-config
  namespace: octollm
data:
  LOG_LEVEL: "info"
  ENVIRONMENT: "production"

  # Database URLs (internal DNS)
  POSTGRES_HOST: "postgres.octollm.svc.cluster.local"
  POSTGRES_PORT: "5432"
  POSTGRES_DB: "octollm"

  REDIS_HOST: "redis.octollm.svc.cluster.local"
  REDIS_PORT: "6379"

  QDRANT_HOST: "qdrant.octollm.svc.cluster.local"
  QDRANT_PORT: "6333"

Secret for API Keys

# k8s/core/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: octollm-secrets
  namespace: octollm
type: Opaque
stringData:
  # LLM API Keys (replace with actual keys)
  OPENAI_API_KEY: "sk-XXXXXXXXXXXXXXXXXXXXX"
  ANTHROPIC_API_KEY: "sk-ant-XXXXXXXXXXXXXXXXXXXXX"

  # Database credentials
  POSTGRES_PASSWORD: "SECURE_PASSWORD_HERE"

  # JWT Secret for API authentication
  JWT_SECRET: "SECURE_RANDOM_STRING_32_CHARS_MIN"

IMPORTANT: In production, use Sealed Secrets or External Secrets Operator to manage secrets securely:

# Example with Sealed Secrets
kubeseal --format=yaml < k8s/core/secrets.yaml > k8s/core/sealed-secrets.yaml
kubectl apply -f k8s/core/sealed-secrets.yaml

Reflex Layer Deployment

# k8s/core/reflex-layer.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: reflex-layer
  namespace: octollm
spec:
  replicas: 3
  selector:
    matchLabels:
      app: reflex-layer
  template:
    metadata:
      labels:
        app: reflex-layer
    spec:
      containers:
      - name: reflex-layer
        image: octollm/reflex-layer:latest
        ports:
        - containerPort: 8001
          name: http
        envFrom:
        - configMapRef:
            name: octollm-config
        - secretRef:
            name: octollm-secrets
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 8001
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8001
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: reflex-layer
  namespace: octollm
spec:
  selector:
    app: reflex-layer
  ports:
  - port: 8001
    targetPort: 8001
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: reflex-layer-hpa
  namespace: octollm
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: reflex-layer
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Orchestrator Deployment

# k8s/core/orchestrator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orchestrator
  namespace: octollm
spec:
  replicas: 2
  selector:
    matchLabels:
      app: orchestrator
  template:
    metadata:
      labels:
        app: orchestrator
    spec:
      containers:
      - name: orchestrator
        image: octollm/orchestrator:latest
        ports:
        - containerPort: 8000
          name: http
        envFrom:
        - configMapRef:
            name: octollm-config
        - secretRef:
            name: octollm-secrets
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 15
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: orchestrator
  namespace: octollm
spec:
  selector:
    app: orchestrator
  ports:
  - port: 8000
    targetPort: 8000
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: orchestrator-hpa
  namespace: octollm
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orchestrator
  minReplicas: 2
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Arm Deployments (Example: Planner Arm)

# k8s/arms/planner-arm.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: planner-arm
  namespace: octollm
spec:
  replicas: 2
  selector:
    matchLabels:
      app: planner-arm
  template:
    metadata:
      labels:
        app: planner-arm
    spec:
      containers:
      - name: planner-arm
        image: octollm/planner-arm:latest
        ports:
        - containerPort: 8100
          name: http
        envFrom:
        - configMapRef:
            name: octollm-config
        - secretRef:
            name: octollm-secrets
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 1000m
            memory: 2Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 8100
          initialDelaySeconds: 15
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8100
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: planner-arm
  namespace: octollm
spec:
  selector:
    app: planner-arm
  ports:
  - port: 8100
    targetPort: 8100
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: planner-arm-hpa
  namespace: octollm
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: planner-arm
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Deploy core services:

kubectl apply -f k8s/core/configmap.yaml
kubectl apply -f k8s/core/secrets.yaml
kubectl apply -f k8s/core/reflex-layer.yaml
kubectl apply -f k8s/core/orchestrator.yaml
kubectl apply -f k8s/arms/planner-arm.yaml

# Deploy remaining arms similarly...
# kubectl apply -f k8s/arms/executor-arm.yaml
# kubectl apply -f k8s/arms/coder-arm.yaml
# kubectl apply -f k8s/arms/judge-arm.yaml
# kubectl apply -f k8s/arms/guardian-arm.yaml
# kubectl apply -f k8s/arms/retriever-arm.yaml

Ingress Configuration

NGINX Ingress Controller

# k8s/ingress/nginx-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: octollm-ingress
  namespace: octollm
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
spec:
  tls:
  - hosts:
    - api.octollm.example.com
    secretName: octollm-tls
  rules:
  - host: api.octollm.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: orchestrator
            port:
              number: 8000

Install cert-manager for TLS

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Create ClusterIssuer for Let's Encrypt
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
EOF

# Apply ingress
kubectl apply -f k8s/ingress/nginx-ingress.yaml

Scaling Configuration

Cluster Autoscaler (AWS Example)

# k8s/scaling/cluster-autoscaler.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-autoscaler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["events", "endpoints"]
    verbs: ["create", "patch"]
  - apiGroups: [""]
    resources: ["pods/eviction"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["pods/status"]
    verbs: ["update"]
  - apiGroups: [""]
    resources: ["endpoints"]
    resourceNames: ["cluster-autoscaler"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["watch", "list", "get", "update"]
  - apiGroups: [""]
    resources: ["pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["extensions"]
    resources: ["replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["policy"]
    resources: ["poddisruptionbudgets"]
    verbs: ["watch", "list"]
  - apiGroups: ["apps"]
    resources: ["statefulsets", "replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
          - ./cluster-autoscaler
          - --v=4
          - --stderrthreshold=info
          - --cloud-provider=aws
          - --skip-nodes-with-local-storage=false
          - --expander=least-waste
          - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/octollm-cluster

Pod Disruption Budgets

# k8s/scaling/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: orchestrator-pdb
  namespace: octollm
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: orchestrator
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: reflex-layer-pdb
  namespace: octollm
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: reflex-layer

Security Hardening

Network Policies

# k8s/security/network-policies.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: orchestrator-network-policy
  namespace: octollm
spec:
  podSelector:
    matchLabels:
      app: orchestrator
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: reflex-layer
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - podSelector:
        matchLabels:
          app: redis
    ports:
    - protocol: TCP
      port: 6379
  - to:
    - podSelector:
        matchLabels:
          app: qdrant
    ports:
    - protocol: TCP
      port: 6333
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 53  # DNS
    - protocol: UDP
      port: 53
  - to:
    - podSelector: {}
    ports:
    - protocol: TCP
      port: 8100  # Arms
    - protocol: TCP
      port: 8101
    - protocol: TCP
      port: 8102
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-network-policy
  namespace: octollm
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: orchestrator
    - podSelector:
        matchLabels:
          app: planner-arm
    ports:
    - protocol: TCP
      port: 5432

Pod Security Standards

# k8s/security/pod-security.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: octollm
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Context Example

# Add to deployment templates
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault
containers:
- name: orchestrator
  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
    capabilities:
      drop:
      - ALL

Apply security configurations:

kubectl apply -f k8s/security/network-policies.yaml
kubectl apply -f k8s/security/pod-security.yaml

Monitoring Setup

Prometheus ServiceMonitor

# k8s/monitoring/service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: octollm-metrics
  namespace: octollm
spec:
  selector:
    matchLabels:
      monitoring: "true"
  endpoints:
  - port: http
    path: /metrics
    interval: 30s

Add monitoring labels to services

# Update services with label
metadata:
  labels:
    monitoring: "true"

Grafana Dashboard ConfigMap

# k8s/monitoring/grafana-dashboard.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: octollm-dashboard
  namespace: monitoring
data:
  octollm-overview.json: |
    {
      "dashboard": {
        "title": "OctoLLM Overview",
        "panels": [
          {
            "title": "Request Rate",
            "targets": [
              {
                "expr": "rate(http_requests_total{namespace=\"octollm\"}[5m])"
              }
            ]
          }
        ]
      }
    }

Verification

Deployment Verification Script

#!/bin/bash
# k8s/scripts/verify-deployment.sh

set -e

NAMESPACE="octollm"

echo "=== OctoLLM Kubernetes Deployment Verification ==="

# Check namespace
echo -n "Checking namespace... "
kubectl get namespace $NAMESPACE &> /dev/null && echo "✓" || (echo "✗" && exit 1)

# Check databases
echo -n "Checking PostgreSQL... "
kubectl wait --for=condition=ready pod -l app=postgres -n $NAMESPACE --timeout=60s &> /dev/null && echo "✓" || echo "✗"

echo -n "Checking Redis... "
kubectl wait --for=condition=ready pod -l app=redis -n $NAMESPACE --timeout=60s &> /dev/null && echo "✓" || echo "✗"

echo -n "Checking Qdrant... "
kubectl wait --for=condition=ready pod -l app=qdrant -n $NAMESPACE --timeout=60s &> /dev/null && echo "✓" || echo "✗"

# Check core services
echo -n "Checking Reflex Layer... "
kubectl wait --for=condition=ready pod -l app=reflex-layer -n $NAMESPACE --timeout=60s &> /dev/null && echo "✓" || echo "✗"

echo -n "Checking Orchestrator... "
kubectl wait --for=condition=ready pod -l app=orchestrator -n $NAMESPACE --timeout=60s &> /dev/null && echo "✓" || echo "✗"

# Check arms
for arm in planner executor coder judge guardian retriever; do
  echo -n "Checking ${arm} arm... "
  kubectl wait --for=condition=ready pod -l app=${arm}-arm -n $NAMESPACE --timeout=60s &> /dev/null && echo "✓" || echo "✗"
done

# Test API endpoint
echo -n "Testing API health endpoint... "
ORCHESTRATOR_POD=$(kubectl get pod -l app=orchestrator -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n $NAMESPACE $ORCHESTRATOR_POD -- curl -sf http://localhost:8000/health &> /dev/null && echo "✓" || echo "✗"

echo ""
echo "=== Deployment Status ==="
kubectl get pods -n $NAMESPACE

Run verification:

chmod +x k8s/scripts/verify-deployment.sh
./k8s/scripts/verify-deployment.sh

Test API from Outside Cluster

# Get ingress IP/hostname
INGRESS_HOST=$(kubectl get ingress octollm-ingress -n octollm -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Test health endpoint
curl https://$INGRESS_HOST/health

# Submit test task
curl -X POST https://$INGRESS_HOST/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -d '{
    "goal": "Test deployment",
    "constraints": ["Quick verification"],
    "priority": "low"
  }'

Troubleshooting

Common Issues

1. Pods Not Starting

# Check pod status
kubectl get pods -n octollm

# Describe pod for events
kubectl describe pod <pod-name> -n octollm

# Check logs
kubectl logs <pod-name> -n octollm --previous

Common causes:

Image pull errors (check image name/tag)
Resource limits too low
Missing secrets or configmaps
Node capacity issues

2. Database Connection Failures

# Test database connectivity from orchestrator pod
kubectl exec -it <orchestrator-pod> -n octollm -- sh

# Inside pod, test PostgreSQL
nc -zv postgres.octollm.svc.cluster.local 5432

# Test Redis
nc -zv redis.octollm.svc.cluster.local 6379

Solutions:

Verify service DNS resolution
Check network policies
Ensure databases are ready before deploying apps

3. Ingress Not Working

# Check ingress status
kubectl get ingress -n octollm
kubectl describe ingress octollm-ingress -n octollm

# Check nginx ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

Solutions:

Verify ingress controller is installed
Check DNS configuration
Verify TLS certificate issuance

4. Auto-scaling Not Triggering

# Check HPA status
kubectl get hpa -n octollm
kubectl describe hpa orchestrator-hpa -n octollm

# Check metrics server
kubectl top pods -n octollm

Solutions:

Install metrics-server if missing
Verify resource requests are set
Check HPA metric thresholds

Debugging Commands

# Get all resources in namespace
kubectl get all -n octollm

# Check events
kubectl get events -n octollm --sort-by='.lastTimestamp'

# Port forward for local access
kubectl port-forward svc/orchestrator 8000:8000 -n octollm

# Execute shell in pod
kubectl exec -it <pod-name> -n octollm -- /bin/sh

# View logs with follow
kubectl logs -f <pod-name> -n octollm

# View logs from all replicas
kubectl logs -l app=orchestrator -n octollm --tail=50

Production Checklist

Before going to production, ensure:

Security

Secrets managed with Sealed Secrets or External Secrets
Network policies applied and tested
Pod security standards enforced
RBAC properly configured
TLS certificates configured
Image scanning enabled
Security context configured for all pods

Reliability

Resource requests and limits set
Liveness and readiness probes configured
HPA configured and tested
PDB configured for critical services
Backup strategy for databases
Disaster recovery plan documented

Monitoring

Prometheus metrics exposed
Grafana dashboards created
Alerting rules configured
Log aggregation configured
Distributed tracing enabled

Performance

Load testing completed
Database indexes optimized
Connection pooling configured
Caching strategy verified
Resource limits tuned

Next Steps

After successful deployment:

Set up monitoring - Follow Monitoring and Alerting Guide
Configure backups - Set up automated database backups
Load testing - Use Performance Tuning Guide
Disaster recovery - Test recovery procedures
Documentation - Document your specific configuration

Keyboard shortcuts

OctoLLM Documentation