ArmCapability Schema Reference
Overview
The ArmCapability schema defines how specialized arms register their capabilities with the Orchestrator. This registry enables dynamic task routing, cost-aware scheduling, and capability-based delegation across the OctoLLM system.
Used By: Orchestrator (for arm registry), all Arms (for self-registration)
Primary Endpoint: GET /capabilities
Format: JSON
Structure
ArmCapability
Complete arm registration structure returned by the capabilities endpoint.
interface ArmCapability {
arm_id: string; // Required: Unique arm identifier
name: string; // Required: Human-readable name
description: string; // Required: Purpose and specialization
capabilities: string[]; // Required: Capability tags
cost_tier: number; // Required: 1-5 (1=cheap, 5=expensive)
endpoint: string; // Required: Service URL
status?: ArmStatus; // Optional: Current health status
input_schema?: JSONSchema; // Optional: Request schema
output_schema?: JSONSchema; // Optional: Response schema
metadata?: ArmMetadata; // Optional: Additional info
}
type ArmStatus = 'healthy' | 'degraded' | 'unavailable';
interface ArmMetadata {
version?: string; // Arm version (e.g., "0.3.0")
technology?: string; // Tech stack (e.g., "Python/FastAPI")
model?: string; // LLM model if applicable
average_latency_ms?: number; // Typical response time
max_concurrent_tasks?: number; // Concurrency limit
uptime_percentage?: number; // 30-day uptime (0-100)
}
Field Definitions
arm_id (required)
Type: string Constraints: Lowercase, alphanumeric with hyphens Description: Unique identifier used for arm routing and discovery
Valid Arm IDs (current system):
type ArmId =
| 'planner'
| 'executor'
| 'retriever'
| 'coder'
| 'judge'
| 'safety-guardian';
Validation:
function validateArmId(armId: string): boolean {
const pattern = /^[a-z0-9]+(-[a-z0-9]+)*$/;
if (!pattern.test(armId)) {
throw new Error("arm_id must be lowercase alphanumeric with hyphens");
}
return true;
}
name (required)
Type: string Constraints: 3-50 characters Description: Human-readable display name for the arm
Examples:
"Planner Arm"
"Tool Executor Arm"
"Code Generation Arm"
"Safety Guardian Arm"
description (required)
Type: string Constraints: 10-200 characters Description: Concise explanation of the arm's purpose and specialization
Best Practices:
- Start with the primary function
- Mention key specializations
- Keep under 200 characters
Examples:
"Task decomposition and planning specialist"
"Sandboxed command execution specialist with capability-based security"
"Hybrid vector and keyword search over knowledge bases"
"Code generation, debugging, and refactoring using GPT-4"
capabilities (required)
Type: array of strings Constraints: At least 1 capability tag Description: Tags describing what the arm can do, used for task routing
Capability Tag Taxonomy
Planning Capabilities:
task_planning- Task decomposition into subtasksgoal_decomposition- Breaking down high-level goalsdependency_resolution- Managing task dependenciesacceptance_criteria- Defining success conditions
Execution Capabilities:
shell_execution- Running shell commandshttp_requests- Making HTTP/HTTPS requestspython_execution- Running Python scriptsnetwork_scanning- Port scanning and network recon
Knowledge Capabilities:
vector_search- Semantic similarity searchkeyword_search- Traditional keyword-based searchrag_retrieval- Retrieval-Augmented Generationcitation_generation- Creating source citations
Code Capabilities:
code_generation- Creating new codecode_debugging- Finding and fixing bugscode_refactoring- Improving code structurecode_analysis- Understanding existing codetest_generation- Creating unit testscode_explanation- Documenting code
Validation Capabilities:
schema_validation- Validating data structuresfact_checking- Verifying factual claimscriteria_validation- Checking acceptance criteriahallucination_detection- Identifying LLM hallucinationsquality_assessment- Evaluating output quality
Safety Capabilities:
pii_detection- Finding personally identifiable informationsecret_detection- Identifying API keys, passwords, tokenscontent_filtering- Blocking inappropriate contentinput_sanitization- Cleaning user inputoutput_redaction- Removing sensitive data
Example Capability Sets:
// Planner Arm
{
"capabilities": [
"task_planning",
"goal_decomposition",
"dependency_resolution",
"acceptance_criteria"
]
}
// Executor Arm
{
"capabilities": [
"shell_execution",
"http_requests",
"python_execution",
"network_scanning"
]
}
// Coder Arm
{
"capabilities": [
"code_generation",
"code_debugging",
"code_refactoring",
"code_analysis",
"test_generation",
"code_explanation"
]
}
cost_tier (required)
Type: integer Constraints: 1-5 Description: Relative cost indicator for resource-aware scheduling
Cost Tier Definitions
| Tier | Name | Characteristics | LLM Usage | Typical Cost/Task |
|---|---|---|---|---|
| 1 | Cheap | No LLM calls, pure computation | None | $0.00 |
| 2 | Low | Small model, simple tasks | GPT-3.5-turbo | $0.01-0.05 |
| 3 | Medium | Medium model or sandboxing overhead | GPT-3.5-turbo (complex) | $0.05-0.10 |
| 4 | High | Large model, complex tasks | GPT-4 | $0.10-0.50 |
| 5 | Expensive | Frontier model, multi-step reasoning | GPT-4/Claude Opus | $0.50-2.00 |
Cost Tier Examples
Tier 1 - Cheap:
{
"arm_id": "reflex-layer",
"cost_tier": 1,
"rationale": "Cache lookups and regex pattern matching only"
}
{
"arm_id": "safety-guardian",
"cost_tier": 1,
"rationale": "Regex-based PII/secret detection without LLM"
}
Tier 2 - Low:
{
"arm_id": "planner",
"cost_tier": 2,
"rationale": "GPT-3.5-turbo for task decomposition (500-2000 tokens)"
}
{
"arm_id": "judge",
"cost_tier": 2,
"rationale": "GPT-3.5-turbo for validation (1000-3000 tokens)"
}
Tier 3 - Medium:
{
"arm_id": "executor",
"cost_tier": 3,
"rationale": "Docker sandboxing overhead, no LLM but resource-intensive"
}
{
"arm_id": "retriever",
"cost_tier": 3,
"rationale": "Vector database queries and embedding generation"
}
Tier 4 - High:
{
"arm_id": "coder",
"cost_tier": 4,
"rationale": "GPT-4 for complex code generation (5000-10000 tokens)"
}
Tier 5 - Expensive:
{
"arm_id": "orchestrator",
"cost_tier": 5,
"rationale": "GPT-4/Claude Opus with multi-step reasoning and synthesis"
}
endpoint (required)
Type: string (URI format) Description: HTTP(S) URL where the arm service is accessible
Environment-Specific Endpoints:
// Local Development (Docker Compose)
const endpoints = {
planner: "http://planner:8002",
executor: "http://executor:8003",
retriever: "http://retriever:8004",
coder: "http://coder:8005",
judge: "http://judge:8006",
safetyGuardian: "http://safety-guardian:8007"
};
// Kubernetes (Internal)
const k8sEndpoints = {
planner: "http://planner.octollm.svc.cluster.local:8002",
executor: "http://executor.octollm.svc.cluster.local:8003"
};
// Production (External)
const prodEndpoints = {
planner: "https://planner.api.octollm.example.com",
executor: "https://executor.api.octollm.example.com"
};
Validation:
function validateEndpoint(endpoint: string): boolean {
try {
const url = new URL(endpoint);
if (!['http:', 'https:'].includes(url.protocol)) {
throw new Error("Endpoint must use HTTP or HTTPS protocol");
}
return true;
} catch (error) {
throw new Error(`Invalid endpoint URL: ${endpoint}`);
}
}
status (optional)
Type: enum
Values: 'healthy' | 'degraded' | 'unavailable'
Description: Current operational status of the arm
Status Definitions
healthy - Arm is fully operational
- All endpoints responding normally
- Latency within acceptable range
- Error rate <1%
degraded - Arm is partially operational
- Endpoints responding but slowly
- Latency 2-3x normal
- Error rate 1-5%
- Some features may be disabled
unavailable - Arm is not operational
- Endpoints not responding
- Network connectivity lost
- Service crashed or restarting
Status Checks:
async def check_arm_status(arm_endpoint: str) -> ArmStatus:
"""Check arm health and return status."""
try:
response = await http_client.get(f"{arm_endpoint}/health", timeout=5)
if response.status_code == 200:
health_data = response.json()
latency_ms = response.elapsed.total_seconds() * 1000
# Check latency thresholds
if latency_ms > 3000:
return "degraded"
return "healthy"
else:
return "degraded"
except Exception as e:
logger.error(f"Arm {arm_endpoint} health check failed: {e}")
return "unavailable"
input_schema (optional)
Type: JSON Schema object Description: Formal schema defining the arm's expected request format
Example - Planner Arm Input:
{
"input_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["goal"],
"properties": {
"goal": {
"type": "string",
"minLength": 10,
"maxLength": 2000
},
"constraints": {
"type": "array",
"items": {"type": "string"}
},
"context": {
"type": "object",
"additionalProperties": true
}
}
}
}
Example - Executor Arm Input:
{
"input_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["action_type", "command", "capability_token"],
"properties": {
"action_type": {
"type": "string",
"enum": ["shell", "http", "python"]
},
"command": {
"type": "string"
},
"args": {
"type": "array",
"items": {"type": "string"}
},
"timeout_seconds": {
"type": "integer",
"minimum": 1,
"maximum": 300,
"default": 30
},
"capability_token": {
"type": "string",
"pattern": "^tok_[a-zA-Z0-9]{16}$"
}
}
}
}
output_schema (optional)
Type: JSON Schema object Description: Formal schema defining the arm's response format
Example - Judge Arm Output:
{
"output_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["valid", "confidence", "issues"],
"properties": {
"valid": {
"type": "boolean"
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0
},
"issues": {
"type": "array",
"items": {
"type": "object",
"required": ["severity", "type", "message"],
"properties": {
"severity": {
"type": "string",
"enum": ["error", "warning", "info"]
},
"type": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
}
metadata (optional)
Type: object Description: Additional metadata about the arm's capabilities and performance
Common Metadata Fields:
version: Arm version (semantic versioning)technology: Tech stack (e.g., "Python 3.11/FastAPI", "Rust 1.75/Axum")model: LLM model if applicable (e.g., "gpt-4", "gpt-3.5-turbo")average_latency_ms: Typical response timemax_concurrent_tasks: Maximum parallel task capacityuptime_percentage: 30-day uptime (0-100)
Example:
{
"metadata": {
"version": "0.3.0",
"technology": "Python 3.11 / FastAPI 0.104",
"model": "gpt-4",
"average_latency_ms": 8500,
"max_concurrent_tasks": 10,
"uptime_percentage": 99.7
}
}
Complete Examples
Example 1: Planner Arm
{
"arm_id": "planner",
"name": "Planner Arm",
"description": "Task decomposition and planning specialist",
"capabilities": [
"task_planning",
"goal_decomposition",
"dependency_resolution",
"acceptance_criteria"
],
"cost_tier": 2,
"endpoint": "http://planner:8002",
"status": "healthy",
"input_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["goal"],
"properties": {
"goal": {"type": "string", "minLength": 10, "maxLength": 2000},
"constraints": {"type": "array", "items": {"type": "string"}},
"context": {"type": "object"}
}
},
"output_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["plan_id", "steps"],
"properties": {
"plan_id": {"type": "string"},
"steps": {"type": "array", "items": {"type": "object"}}
}
},
"metadata": {
"version": "0.3.0",
"technology": "Python 3.11 / FastAPI",
"model": "gpt-3.5-turbo",
"average_latency_ms": 2500,
"max_concurrent_tasks": 20,
"uptime_percentage": 99.8
}
}
Example 2: Tool Executor Arm
{
"arm_id": "executor",
"name": "Tool Executor Arm",
"description": "Sandboxed command execution specialist",
"capabilities": [
"shell_execution",
"http_requests",
"python_execution",
"network_scanning"
],
"cost_tier": 3,
"endpoint": "http://executor:8003",
"status": "healthy",
"input_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["action_type", "command", "capability_token"],
"properties": {
"action_type": {"type": "string", "enum": ["shell", "http", "python"]},
"command": {"type": "string"},
"args": {"type": "array", "items": {"type": "string"}},
"timeout_seconds": {"type": "integer", "minimum": 1, "maximum": 300},
"capability_token": {"type": "string"}
}
},
"output_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["success", "provenance"],
"properties": {
"success": {"type": "boolean"},
"stdout": {"type": "string"},
"stderr": {"type": "string"},
"exit_code": {"type": "integer"},
"duration_ms": {"type": "number"},
"provenance": {"type": "object"}
}
},
"metadata": {
"version": "0.3.0",
"technology": "Rust 1.75 / Axum",
"average_latency_ms": 850,
"max_concurrent_tasks": 15,
"uptime_percentage": 99.5
}
}
Example 3: Retriever Arm
{
"arm_id": "retriever",
"name": "Retriever Arm",
"description": "Hybrid vector and keyword search over knowledge bases",
"capabilities": [
"vector_search",
"keyword_search",
"rag_retrieval",
"citation_generation"
],
"cost_tier": 3,
"endpoint": "http://retriever:8004",
"status": "healthy",
"metadata": {
"version": "0.3.0",
"technology": "Python 3.11 / FastAPI + Qdrant",
"average_latency_ms": 1200,
"max_concurrent_tasks": 25,
"uptime_percentage": 99.9
}
}
Example 4: Coder Arm
{
"arm_id": "coder",
"name": "Code Generation Arm",
"description": "Code generation, debugging, and refactoring using GPT-4",
"capabilities": [
"code_generation",
"code_debugging",
"code_refactoring",
"code_analysis",
"test_generation",
"code_explanation"
],
"cost_tier": 4,
"endpoint": "http://coder:8005",
"status": "healthy",
"metadata": {
"version": "0.3.0",
"technology": "Python 3.11 / FastAPI",
"model": "gpt-4",
"average_latency_ms": 8500,
"max_concurrent_tasks": 10,
"uptime_percentage": 99.6
}
}
Example 5: Judge Arm
{
"arm_id": "judge",
"name": "Judge Arm",
"description": "Multi-layer validation of outputs against criteria and facts",
"capabilities": [
"schema_validation",
"fact_checking",
"criteria_validation",
"hallucination_detection",
"quality_assessment"
],
"cost_tier": 2,
"endpoint": "http://judge:8006",
"status": "healthy",
"metadata": {
"version": "0.3.0",
"technology": "Python 3.11 / FastAPI",
"model": "gpt-3.5-turbo",
"average_latency_ms": 3200,
"max_concurrent_tasks": 20,
"uptime_percentage": 99.7
}
}
Example 6: Safety Guardian Arm
{
"arm_id": "safety-guardian",
"name": "Safety Guardian Arm",
"description": "PII detection, secret detection, and content filtering",
"capabilities": [
"pii_detection",
"secret_detection",
"content_filtering",
"input_sanitization",
"output_redaction"
],
"cost_tier": 1,
"endpoint": "http://safety-guardian:8007",
"status": "healthy",
"metadata": {
"version": "0.3.0",
"technology": "Python 3.11 / FastAPI (regex-based, no LLM)",
"average_latency_ms": 75,
"max_concurrent_tasks": 50,
"uptime_percentage": 99.9
}
}
Usage Patterns
Pattern 1: Querying Available Capabilities
Retrieve all registered arms to understand system capabilities.
curl http://orchestrator:8000/capabilities \
-H "Authorization: Bearer $SERVICE_TOKEN"
Response:
{
"arms": [
{
"arm_id": "planner",
"name": "Planner Arm",
"description": "Task decomposition and planning specialist",
"capabilities": ["task_planning", "goal_decomposition"],
"cost_tier": 2,
"endpoint": "http://planner:8002",
"status": "healthy"
},
{
"arm_id": "executor",
"name": "Tool Executor Arm",
"description": "Sandboxed command execution specialist",
"capabilities": ["shell_execution", "http_requests", "python_execution"],
"cost_tier": 3,
"endpoint": "http://executor:8003",
"status": "healthy"
}
]
}
Pattern 2: Capability-Based Task Routing
Select the appropriate arm based on required capabilities.
interface TaskRoutingRequest {
requiredCapabilities: string[];
preferLowCost?: boolean;
}
async function routeTask(request: TaskRoutingRequest): Promise<ArmCapability> {
// Fetch all arms
const response = await fetch('http://orchestrator:8000/capabilities', {
headers: { 'Authorization': `Bearer ${serviceToken}` }
});
const { arms } = await response.json();
// Filter arms with all required capabilities
const compatibleArms = arms.filter(arm =>
request.requiredCapabilities.every(cap =>
arm.capabilities.includes(cap)
)
);
if (compatibleArms.length === 0) {
throw new Error(`No arm found with capabilities: ${request.requiredCapabilities}`);
}
// Sort by cost tier if preferLowCost is true
if (request.preferLowCost) {
compatibleArms.sort((a, b) => a.cost_tier - b.cost_tier);
}
// Return first healthy arm
const healthyArm = compatibleArms.find(arm => arm.status === 'healthy');
if (!healthyArm) {
throw new Error('No healthy arms available');
}
return healthyArm;
}
// Example usage
const arm = await routeTask({
requiredCapabilities: ['code_generation', 'test_generation'],
preferLowCost: false
});
console.log(`Routing to: ${arm.name} (cost tier ${arm.cost_tier})`);
// Output: "Routing to: Code Generation Arm (cost tier 4)"
Pattern 3: Cost-Aware Scheduling
Choose the cheapest arm that meets requirements.
from typing import List, Optional
async def schedule_task_cost_aware(
required_capabilities: List[str],
max_cost_tier: int = 5
) -> Optional[ArmCapability]:
"""Schedule task to cheapest compatible arm."""
response = await http_client.get(
"http://orchestrator:8000/capabilities",
headers={"Authorization": f"Bearer {service_token}"}
)
arms = response.json()["arms"]
# Filter by capabilities and cost tier
compatible = [
arm for arm in arms
if all(cap in arm["capabilities"] for cap in required_capabilities)
and arm["cost_tier"] <= max_cost_tier
and arm["status"] == "healthy"
]
if not compatible:
return None
# Sort by cost tier (ascending)
compatible.sort(key=lambda a: a["cost_tier"])
cheapest_arm = compatible[0]
print(f"Scheduled to {cheapest_arm['name']} (tier {cheapest_arm['cost_tier']})")
return cheapest_arm
# Example usage
arm = await schedule_task_cost_aware(
required_capabilities=["pii_detection", "secret_detection"],
max_cost_tier=3
)
# Output: "Scheduled to Safety Guardian Arm (tier 1)"
Pattern 4: Health Monitoring
Continuously monitor arm health and adjust routing.
class ArmHealthMonitor {
private arms: Map<string, ArmCapability> = new Map();
private healthCheckInterval = 30000; // 30 seconds
async start() {
setInterval(() => this.refreshCapabilities(), this.healthCheckInterval);
await this.refreshCapabilities();
}
async refreshCapabilities() {
const response = await fetch('http://orchestrator:8000/capabilities', {
headers: { 'Authorization': `Bearer ${this.serviceToken}` }
});
const { arms } = await response.json();
for (const arm of arms) {
this.arms.set(arm.arm_id, arm);
// Log status changes
const previous = this.arms.get(arm.arm_id);
if (previous && previous.status !== arm.status) {
console.warn(`Arm ${arm.name} status changed: ${previous.status} → ${arm.status}`);
}
}
}
getHealthyArms(capability: string): ArmCapability[] {
return Array.from(this.arms.values()).filter(
arm => arm.capabilities.includes(capability) && arm.status === 'healthy'
);
}
getCheapestHealthyArm(capability: string): ArmCapability | null {
const healthyArms = this.getHealthyArms(capability);
if (healthyArms.length === 0) return null;
return healthyArms.reduce((cheapest, arm) =>
arm.cost_tier < cheapest.cost_tier ? arm : cheapest
);
}
}
// Example usage
const monitor = new ArmHealthMonitor();
await monitor.start();
const arm = monitor.getCheapestHealthyArm('code_generation');
if (arm) {
console.log(`Using ${arm.name} (${arm.status})`);
} else {
console.error('No healthy arms available for code generation');
}
Best Practices
1. Always Check Arm Status Before Routing
Why: Prevents routing to unhealthy arms
How: Filter by status: 'healthy' before delegation
const healthyArms = arms.filter(arm => arm.status === 'healthy');
2. Use Cost Tiers for Budget Control
Why: Prevents runaway costs on simple tasks
How: Set max_cost_tier constraints
# Use cheap arms (tier 1-2) for simple validation
arm = schedule_task(capabilities=["pii_detection"], max_cost_tier=2)
# Allow expensive arms (tier 4-5) for complex reasoning
arm = schedule_task(capabilities=["code_generation"], max_cost_tier=5)
3. Capability Tags Should Be Granular
Why: Enables precise routing and prevents over-delegation How: Use specific capability tags
Bad (too broad):
{"capabilities": ["coding"]}
Good (granular):
{
"capabilities": [
"code_generation",
"code_debugging",
"code_refactoring",
"test_generation"
]
}
4. Monitor Arm Health Continuously
Why: Enables graceful degradation and failover
How: Poll /capabilities endpoint every 30-60 seconds
async def monitor_arms():
while True:
response = await get_capabilities()
for arm in response["arms"]:
if arm["status"] != "healthy":
logger.warning(f"Arm {arm['name']} is {arm['status']}")
await asyncio.sleep(30)
Related Documentation
- Orchestrator API Reference
- TaskContract Schema
- Arm Registration Guide (coming soon)
- Cost Optimization Guide (coming soon)
JSON Schema
Complete JSON Schema for validation:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "ArmCapability",
"type": "object",
"required": ["arm_id", "name", "description", "capabilities", "cost_tier", "endpoint"],
"properties": {
"arm_id": {
"type": "string",
"pattern": "^[a-z0-9]+(-[a-z0-9]+)*$",
"description": "Unique arm identifier (lowercase alphanumeric with hyphens)"
},
"name": {
"type": "string",
"minLength": 3,
"maxLength": 50,
"description": "Human-readable arm name"
},
"description": {
"type": "string",
"minLength": 10,
"maxLength": 200,
"description": "Arm purpose and specialization"
},
"capabilities": {
"type": "array",
"items": {"type": "string"},
"minItems": 1,
"description": "List of capability tags"
},
"cost_tier": {
"type": "integer",
"minimum": 1,
"maximum": 5,
"description": "Cost tier (1=cheap, 5=expensive)"
},
"endpoint": {
"type": "string",
"format": "uri",
"description": "Arm service endpoint URL"
},
"status": {
"type": "string",
"enum": ["healthy", "degraded", "unavailable"],
"description": "Current operational status"
},
"input_schema": {
"type": "object",
"description": "JSON Schema for arm input validation"
},
"output_schema": {
"type": "object",
"description": "JSON Schema for arm output validation"
},
"metadata": {
"type": "object",
"properties": {
"version": {"type": "string"},
"technology": {"type": "string"},
"model": {"type": "string"},
"average_latency_ms": {"type": "number"},
"max_concurrent_tasks": {"type": "integer"},
"uptime_percentage": {"type": "number", "minimum": 0, "maximum": 100}
}
}
}
}