OctoLLM Documentation
Welcome to the OctoLLM comprehensive technical documentation. This guide covers the complete architecture, implementation, API reference, and operational workflows for the distributed AI system.
What is OctoLLM?
OctoLLM is a novel distributed AI architecture inspired by octopus neurobiology, designed specifically for offensive security operations and advanced developer tooling. By modeling cognitive processing after the octopus's distributed nervous system—where each arm possesses autonomous decision-making capabilities coordinated by a central brain—OctoLLM achieves superior modularity, security isolation, and operational efficiency compared to monolithic LLM systems.
Core Innovation
Rather than relying on a single large language model to handle all tasks, OctoLLM employs specialized "arm" modules that operate semi-autonomously under the guidance of a central "brain" orchestrator. This architecture enables:
- Enhanced Security: Capability isolation and compartmentalization prevent lateral movement of compromised components
- Cost Efficiency: Lightweight reflexes and specialized models handle routine tasks without engaging expensive central processing
- Operational Resilience: Individual component failures don't cascade through the system
- Rapid Adaptation: New capabilities can be added as independent modules without system-wide reengineering
System Architecture
Core Components
| Component | Purpose | Technology |
|---|---|---|
| Central Brain (Orchestrator) | Strategic planning using frontier LLMs | Python + FastAPI, GPT-4/Claude Opus |
| Autonomous Arms | Specialized modules with domain expertise | Python/Rust, smaller models |
| Reflex Layer | Fast preprocessing bypassing LLM calls | Rust, regex/classifiers |
| Distributed Memory | Global semantic + local episodic stores | PostgreSQL, Redis, Qdrant |
Layer Architecture
Layer 1: Ingress (API Gateway + Reflex)
- Technology: NGINX/Traefik + Rust
- Latency Target: <10ms cache hits, <50ms reflex decisions
Layer 2: Orchestration (The Brain)
- Technology: Python + FastAPI, LangChain
- Main Loop: Cache → Plan → Execute → Integrate → Validate
Layer 3: Execution (The Arms)
- Planner: Task decomposition
- Tool Executor: Sandboxed external actions
- Retriever: Knowledge base search
- Coder: Code generation/debugging
- Judge: Output validation
- Safety Guardian: PII detection, content filtering
Layer 4: Persistence
- PostgreSQL (global memory), Redis (caching), Qdrant (vectors)
Layer 5: Observability
- Prometheus (metrics), Loki (logs), Jaeger (tracing)
Current Status
Phase: Phase 0 (Architecture) → Phase 1 (Proof of Concept) Sprint: Sprint 1.2 COMPLETE (Orchestrator Core v1.2.0) Progress: ~22% overall, Phase 1 ~40%
Completed Components
✅ Phase 0: Complete architecture, documentation, specifications (100%) ✅ Sprint 1.1: Reflex Layer production-ready (v1.1.0)
- Cache hit latency: <5ms (2x better than target)
- Pattern match latency: <8ms (6x better than target)
- Memory usage: ~12MB (4x better than target)
✅ Sprint 1.2: Orchestrator Core production-ready (v1.2.0)
- 1,776 lines Python code
- 2,776 lines tests (87 tests, 87% pass rate, 85%+ coverage)
- 6 REST endpoints operational
- API latency P95: <100ms (5x better than target)
- Database query P95: <5ms (2x better than target)
In Progress
🚧 Sprint 1.3: Planner Arm (PLANNED)
- Task decomposition into subtasks
- Acceptance criteria generation
- Resource estimation
Documentation Structure
This documentation is organized into the following major sections:
1. Project Overview
- Vision, goals, and success metrics
- Biological inspiration from octopus neurobiology
- Core concepts and design principles
- Complete roadmap (7 phases)
2. Architecture
- System architecture and layer design
- Data structures (TaskContract, ArmCapability, Memory Models)
- Data flow and swarm decision-making
- Architecture Decision Records (ADRs)
3. Components
- Reflex Layer (preprocessing and caching)
- Orchestrator (central coordination)
- All 6 Arms (specialized modules)
- Persistence layer
4. API Documentation
- REST API overview and contracts
- OpenAPI 3.0 specifications for all services
- Data models and schemas
- Authentication and error handling
5. Development
- Getting started guide
- Development environment setup
- Testing strategies and debugging
- Custom arm development
- Contributing guidelines
6. Operations
- Deployment guides (Docker Compose, Kubernetes, Unraid)
- Monitoring and alerting setup
- Troubleshooting playbooks
- Performance tuning and scaling
7. Security
- Security model and threat model
- Capability isolation and PII protection
- Secrets management
- Security testing and compliance
8. Sprint Progress
- Phase 0 sprints (0.1-0.7) - Complete
- Phase 1 sprints (1.1-1.3) - In progress
- Sprint completion reports with metrics
9. Project Tracking
- Master TODO with all 7 phases
- Roadmap and phase details
- Current status and checklists
10. Reference
- Configuration reference
- Glossary and diagrams
- Documentation summary
Quick Links
For New Users
- Getting Started - Setup and installation
- Core Concept - Understanding the architecture
- Quickstart Guide - Run your first task
For Developers
- Development Environment - Python/Rust setup
- Testing Guide - Unit/integration tests
- Custom Arms - Build new specialized modules
- Contributing - How to contribute
For Operators
- Docker Compose Setup - Local deployment
- Kubernetes Deployment - Production deployment
- Monitoring Runbook - Operations guide
- Troubleshooting Playbooks - Common issues
For Security Engineers
- Security Overview - Security architecture
- Threat Model - Attack vectors and mitigations
- Security Testing - Security test suite
Key Metrics
| Metric | Target | Current Status |
|---|---|---|
| Task Success Rate | >95% vs baseline | Not yet measured (Phase 1.3+) |
| P99 Latency | <30s critical tasks | Reflex: <8ms ✅, Orchestrator: <100ms ✅ |
| Cost per Task | <50% monolithic LLM | Not yet measured |
| Reflex Cache Hit Rate | >60% over time | Not yet measured |
| PII Leakage Rate | <0.1% outputs | Not yet measured |
| Test Coverage | >85% | Reflex: 90%+ ✅, Orchestrator: 85%+ ✅ |
Repository
GitHub: github.com/doublegate/OctoLLM Documentation: doublegate.github.io/OctoLLM
Navigation
Use the sidebar to explore the documentation. All pages include:
- Links to source code in the repository
- Related documentation pages
- API references where applicable
- Version information
Need help? Check the Troubleshooting Playbooks or review the FAQ section.
Want to contribute? See the Contributing Guide.