Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

OctoLLM Documentation

Welcome to the OctoLLM comprehensive technical documentation. This guide covers the complete architecture, implementation, API reference, and operational workflows for the distributed AI system.

What is OctoLLM?

OctoLLM is a novel distributed AI architecture inspired by octopus neurobiology, designed specifically for offensive security operations and advanced developer tooling. By modeling cognitive processing after the octopus's distributed nervous system—where each arm possesses autonomous decision-making capabilities coordinated by a central brain—OctoLLM achieves superior modularity, security isolation, and operational efficiency compared to monolithic LLM systems.

Core Innovation

Rather than relying on a single large language model to handle all tasks, OctoLLM employs specialized "arm" modules that operate semi-autonomously under the guidance of a central "brain" orchestrator. This architecture enables:

  • Enhanced Security: Capability isolation and compartmentalization prevent lateral movement of compromised components
  • Cost Efficiency: Lightweight reflexes and specialized models handle routine tasks without engaging expensive central processing
  • Operational Resilience: Individual component failures don't cascade through the system
  • Rapid Adaptation: New capabilities can be added as independent modules without system-wide reengineering

System Architecture

Core Components

ComponentPurposeTechnology
Central Brain (Orchestrator)Strategic planning using frontier LLMsPython + FastAPI, GPT-4/Claude Opus
Autonomous ArmsSpecialized modules with domain expertisePython/Rust, smaller models
Reflex LayerFast preprocessing bypassing LLM callsRust, regex/classifiers
Distributed MemoryGlobal semantic + local episodic storesPostgreSQL, Redis, Qdrant

Layer Architecture

Layer 1: Ingress (API Gateway + Reflex)

  • Technology: NGINX/Traefik + Rust
  • Latency Target: <10ms cache hits, <50ms reflex decisions

Layer 2: Orchestration (The Brain)

  • Technology: Python + FastAPI, LangChain
  • Main Loop: Cache → Plan → Execute → Integrate → Validate

Layer 3: Execution (The Arms)

  • Planner: Task decomposition
  • Tool Executor: Sandboxed external actions
  • Retriever: Knowledge base search
  • Coder: Code generation/debugging
  • Judge: Output validation
  • Safety Guardian: PII detection, content filtering

Layer 4: Persistence

  • PostgreSQL (global memory), Redis (caching), Qdrant (vectors)

Layer 5: Observability

  • Prometheus (metrics), Loki (logs), Jaeger (tracing)

Current Status

Phase: Phase 0 (Architecture) → Phase 1 (Proof of Concept) Sprint: Sprint 1.2 COMPLETE (Orchestrator Core v1.2.0) Progress: ~22% overall, Phase 1 ~40%

Completed Components

Phase 0: Complete architecture, documentation, specifications (100%) ✅ Sprint 1.1: Reflex Layer production-ready (v1.1.0)

  • Cache hit latency: <5ms (2x better than target)
  • Pattern match latency: <8ms (6x better than target)
  • Memory usage: ~12MB (4x better than target)

Sprint 1.2: Orchestrator Core production-ready (v1.2.0)

  • 1,776 lines Python code
  • 2,776 lines tests (87 tests, 87% pass rate, 85%+ coverage)
  • 6 REST endpoints operational
  • API latency P95: <100ms (5x better than target)
  • Database query P95: <5ms (2x better than target)

In Progress

🚧 Sprint 1.3: Planner Arm (PLANNED)

  • Task decomposition into subtasks
  • Acceptance criteria generation
  • Resource estimation

Documentation Structure

This documentation is organized into the following major sections:

1. Project Overview

  • Vision, goals, and success metrics
  • Biological inspiration from octopus neurobiology
  • Core concepts and design principles
  • Complete roadmap (7 phases)

2. Architecture

  • System architecture and layer design
  • Data structures (TaskContract, ArmCapability, Memory Models)
  • Data flow and swarm decision-making
  • Architecture Decision Records (ADRs)

3. Components

  • Reflex Layer (preprocessing and caching)
  • Orchestrator (central coordination)
  • All 6 Arms (specialized modules)
  • Persistence layer

4. API Documentation

  • REST API overview and contracts
  • OpenAPI 3.0 specifications for all services
  • Data models and schemas
  • Authentication and error handling

5. Development

  • Getting started guide
  • Development environment setup
  • Testing strategies and debugging
  • Custom arm development
  • Contributing guidelines

6. Operations

  • Deployment guides (Docker Compose, Kubernetes, Unraid)
  • Monitoring and alerting setup
  • Troubleshooting playbooks
  • Performance tuning and scaling

7. Security

  • Security model and threat model
  • Capability isolation and PII protection
  • Secrets management
  • Security testing and compliance

8. Sprint Progress

  • Phase 0 sprints (0.1-0.7) - Complete
  • Phase 1 sprints (1.1-1.3) - In progress
  • Sprint completion reports with metrics

9. Project Tracking

  • Master TODO with all 7 phases
  • Roadmap and phase details
  • Current status and checklists

10. Reference

  • Configuration reference
  • Glossary and diagrams
  • Documentation summary

For New Users

For Developers

For Operators

For Security Engineers

Key Metrics

MetricTargetCurrent Status
Task Success Rate>95% vs baselineNot yet measured (Phase 1.3+)
P99 Latency<30s critical tasksReflex: <8ms ✅, Orchestrator: <100ms ✅
Cost per Task<50% monolithic LLMNot yet measured
Reflex Cache Hit Rate>60% over timeNot yet measured
PII Leakage Rate<0.1% outputsNot yet measured
Test Coverage>85%Reflex: 90%+ ✅, Orchestrator: 85%+ ✅

Repository

GitHub: github.com/doublegate/OctoLLM Documentation: doublegate.github.io/OctoLLM


Use the sidebar to explore the documentation. All pages include:

  • Links to source code in the repository
  • Related documentation pages
  • API references where applicable
  • Version information

Need help? Check the Troubleshooting Playbooks or review the FAQ section.

Want to contribute? See the Contributing Guide.