Core Concept

Extracted from: ref-docs/OctoLLM-Concept_Idea.md

Architectures to Borrow from the Octopus

1. Local-Autonomy "Arms," Central-Integration "Brain"

Spin up task-specific peripheral controllers (code tools, web searchers, planners, UI drivers, data labelers) with narrow policies and short-term memory.
A central integrator (LLM) sets intent, allocates subtasks, imposes constraints, and fuses results—only intervening when goals or safety are at stake.
Mechanism: hierarchical control + explicit contracts (inputs/outputs/invariants). Think: Mixture-of-Experts + Orchestrator rather than a single giant monolith.

2. Reflex Layer Before Cognition

Pre-LLM reflex filters handle fast, predictable decisions (schema validation, PII/safety checks, rate limiting, cache hits) using small models/finite-state machines.
The LLM only engages for "novelty." This reduces latency, cost, and attack surface.

3. Decentralized Memory

Each arm has a local episodic store (vector DB or KV cache) bounded by its domain ontology; the brain has a global semantic map.
Routing: classifier/gating picks which memories to consult.
Prevents cross-domain contamination and keeps retrieval precise.

4. Embodied Tool-Use

Treat tools as sensors/actuators. The arm owns its tools (APIs, shells, browsers), maintains affordances/capabilities metadata, and reports action traces upward.
The brain reasons over traces, not raw environments—like a commander reading squad reports.

5. Elastic Specialization via MoE + Skill Distillation

Train small specialists per domain (planning, SQL, regex, code fixes, UI automation); distill their strengths back into a generalist for robustness while keeping specialists online for hard cases.
Gate by uncertainty/entropy or cost budget.

6. Swarm Deliberation with Quorum

For critical decisions, run N lightweight "arm" proposals (diverse prompts/seeds/models), aggregate with verifiable voters (majority, Borda, or learned ranker).
The brain resolves conflicts using explicit rules (risk thresholds, SLAs).

7. Active Inference for Exploration

Arms maintain simple world models and choose actions that reduce expected uncertainty (information gain) subject to task goals.
Great for web research agents and code-repair loops.

Concrete System Design (Drop-In Blueprint)

Orchestrator (Brain)

One robust LLM with a Task Contract Schema:

goal, constraints, budget (tokens/time/$), security policy, deliverables, acceptance tests.

Arms (Specialists)

Planner: Decomposes tasks → subgoals + acceptance criteria.
Retriever: Structured + vector search with domain ontologies.
Tool-Executor: Browser/API/shell; enforces allowlists; captures provenance.
Coder: Patch proposals + self-tests.
Judge: Spec compliance, hallucination detection, unit/property checks.
Safety/PII Guardian: Static rules + tiny classifier; runs before and after LLM calls.

Memories

Local: Per-arm episodic stores (short retention, domain schema).
Global: Project knowledge graph (entities, tasks, decisions, citations).

Control

Reflex gate → Arm(s) → Orchestrator escalate-on-novelty.
Uncertainty triggers: escalate, fork more arms, or ask for user input (with minimally sufficient questions).

Provenance

Every artifact tagged with tool, prompt hash, data source, time, and tests passed.

Quick-Start Experiments You Can Run This Week

Reflex gate + cache: Put a rules/regex/PII filter + embedding cache in front of your LLM; measure latency/cost drop on your common tickets.
Two-arm prototype: Planner → Tool-Executor (browser or repo) with a Judge. Orchestrator only resolves conflicts.
Specialist MoE: Add a code-fix small model (e.g., 1–3B) gated by a classifier; fall back to the big model on low confidence.
Decentralized memory: Split your RAG into per-domain stores; add a router; watch precision improve and leakage drop.
Quorum for critical ops: Require 3 proposals for risky actions; aggregate; compare error rates.

OctoLLM Documentation

Core Concept

Architectures to Borrow from the Octopus

1. Local-Autonomy "Arms," Central-Integration "Brain"

2. Reflex Layer Before Cognition

3. Decentralized Memory

4. Embodied Tool-Use

5. Elastic Specialization via MoE + Skill Distillation

6. Swarm Deliberation with Quorum

7. Active Inference for Exploration

Concrete System Design (Drop-In Blueprint)

Orchestrator (Brain)

Arms (Specialists)

Memories

Control

Provenance

Quick-Start Experiments You Can Run This Week

See Also

Keyboard shortcuts

OctoLLM Documentation