ADR-001: Technology Stack Selection

Status: Accepted Date: 2025-11-10 Decision Makers: Architecture Team, Engineering Leads Consulted: Development Team, DevOps Team

Context

OctoLLM requires a technology stack that supports:

High-performance request processing (>10,000 req/s for Reflex Layer)
Async I/O for LLM API calls and database operations
Vector similarity search for episodic memory
Reliable data storage with ACID guarantees
Fast caching for frequently accessed data
Multiple specialized components (orchestrator, arms, reflex layer)
Cloud-native deployment (Kubernetes)
Developer productivity and maintainability

The system has diverse performance requirements:

Reflex Layer: <10ms P95 latency, >10,000 req/s throughput
Orchestrator: Complex routing logic, multiple concurrent operations
Arms: LLM integration, specialized processing
Memory: Vector search, relational queries, caching

Decision

We will use the following technology stack:

Core Languages

Python 3.11+ (Primary)

Used for: Orchestrator, all Arms, API services
Framework: FastAPI for HTTP APIs
Async: asyncio for concurrent operations
Reasons:
- Excellent LLM ecosystem (OpenAI, Anthropic SDKs)
- Strong async support with asyncio/FastAPI
- Rich data processing libraries
- High developer productivity
- Large talent pool
- Extensive testing frameworks

Rust 1.75+ (Performance-Critical)

Used for: Reflex Layer, Tool Executor
Framework: Axum for HTTP
Reasons:
- Zero-cost abstractions for performance
- Memory safety without garbage collection
- Excellent async runtime (tokio)
- Pattern matching for PII detection
- No runtime overhead
- Strong type system prevents bugs

Databases

PostgreSQL 15+ (Primary Data Store)

Used for: Global knowledge graph, task history, provenance
Reasons:
- ACID guarantees for critical data
- JSONB for flexible schemas
- Full-text search with GIN indexes
- Excellent performance for relational queries
- Mature replication and backup tools
- Strong community support

Qdrant 1.7+ (Vector Database)

Used for: Episodic memory (code examples, patterns)
Reasons:
- Optimized for similarity search
- Built in Rust (high performance)
- Filtering support for hybrid search
- Supports multiple distance metrics
- Good Python SDK
- Active development

Redis 7+ (Cache & Pub/Sub)

Used for: L2 cache, rate limiting, session state, events
Reasons:
- In-memory performance (<1ms latency)
- Rich data structures (strings, hashes, sets, sorted sets)
- Pub/sub for event messaging
- TTL support for automatic expiration
- Persistence options (AOF, RDB)
- Cluster mode for scale

Web Framework

FastAPI (Python)

Reasons:
- Built on Starlette (async ASGI)
- Automatic OpenAPI documentation
- Pydantic integration for validation
- Excellent async support
- Dependency injection
- WebSocket support
- Strong type hints

Axum (Rust)

Reasons:
- Built on tokio (async runtime)
- Type-safe routing
- Minimal overhead
- Good ecosystem integration
- Composable middleware

Async Runtime

Python: asyncio + uvicorn

ASGI server with excellent performance
Integrates with FastAPI
Multiple worker processes for CPU utilization

Rust: tokio

Industry-standard async runtime
Work-stealing scheduler
Efficient I/O operations

Deployment

Docker + Docker Compose

Development: Easy local setup
Production: Standardized containers
CI/CD: Consistent builds

Kubernetes

Production orchestration
Auto-scaling with HPA
Rolling updates
Service discovery
Health checks

Supporting Tools

Monitoring:

Prometheus: Metrics collection
Grafana: Visualization
Alertmanager: Alert routing
Loki: Log aggregation (optional)
Jaeger: Distributed tracing (optional)

Development:

Poetry: Python dependency management
Cargo: Rust build tool
Black/isort/ruff: Python formatting/linting
rustfmt/clippy: Rust formatting/linting
pre-commit: Git hooks
pytest: Python testing
cargo test: Rust testing

Consequences

Positive

Performance:
- Rust delivers <10ms latency for Reflex Layer
- Async Python handles thousands of concurrent operations
- Redis provides sub-millisecond caching
- Qdrant optimized for vector search
Developer Experience:
- Python enables rapid development
- FastAPI auto-generates API docs
- Strong typing catches bugs early
- Extensive libraries available
Scalability:
- Kubernetes enables horizontal scaling
- Stateless services easy to replicate
- Database clustering supported
- Redis can scale with cluster mode
Maintainability:
- Type hints improve code clarity
- Rust prevents memory bugs
- PostgreSQL ensures data integrity
- Docker standardizes deployments
Ecosystem:
- Rich LLM integration libraries
- Mature database drivers
- Active communities
- Abundant learning resources

Negative

Complexity:
- Two languages to maintain (Python + Rust)
- Different build tools and workflows
- Team needs skills in both languages
- More complex CI/CD pipeline
Learning Curve:
- Rust has steep learning curve
- Async programming can be challenging
- Kubernetes requires operations expertise
- Multiple databases to manage
Resource Usage:
- Three databases increase infrastructure cost
- Kubernetes overhead for small deployments
- Development environment is heavyweight
- Local testing requires significant resources
Operational Overhead:
- More components to monitor
- More failure modes
- Complex troubleshooting
- Data consistency across databases

Mitigation Strategies

Language Complexity:
- Keep Rust components minimal (Reflex, Executor only)
- Provide Python fallbacks where feasible
- Comprehensive documentation
- Code review focus on readability
Learning Curve:
- Training programs for team
- Pair programming for knowledge sharing
- Start contributors with Python
- Document common patterns
Resource Usage:
- Provide lightweight dev mode (Docker Compose)
- Use resource limits in Kubernetes
- Optimize container images
- Implement efficient caching
Operational Complexity:
- Comprehensive monitoring and alerting
- Automated deployment pipelines
- Disaster recovery procedures
- Regular operational training

Alternatives Considered

1. Go for Performance-Critical Components

Pros:

Good performance (better than Python)
Simpler than Rust
Excellent concurrency model
Single binary deployment

Cons:

Not as fast as Rust (<10ms requirement tight)
Garbage collection introduces latency variance
Weaker type system than Rust
Less memory safe

Why Rejected: Rust provides better latency guarantees and memory safety for our <10ms P95 requirement.

2. Node.js/TypeScript for All Services

Pros:

Single language across stack
Good async support
Large ecosystem
Fast development

Cons:

Not ideal for CPU-intensive tasks
Weaker LLM library support
Memory usage higher than Python
Type system not as strong as Python + mypy

Why Rejected: Python has superior LLM ecosystem and better data processing libraries.

3. Java/Spring Boot

Pros:

Mature enterprise ecosystem
Strong typing
Excellent tooling
Large talent pool

Cons:

Slower development than Python
Higher memory usage
More verbose code
Weaker LLM integration

Why Rejected: Python provides better developer experience and LLM integration.

4. All Python (including performance-critical)

Pros:

Single language
Simpler deployment
Easier team management
Unified tooling

Cons:

Cannot meet <10ms P95 latency consistently
GIL limits true parallelism
Higher memory usage
No compile-time safety

Why Rejected: Cannot achieve required performance for Reflex Layer without Rust.

5. MongoDB instead of PostgreSQL

Pros:

Flexible schema
Horizontal scaling built-in
Good for unstructured data

Cons:

Weaker ACID guarantees
No SQL JOIN support
Transaction model more limited
Less mature tooling

Why Rejected: Need ACID guarantees for critical data and complex relational queries.

6. Elasticsearch instead of Qdrant

Pros:

Mature ecosystem
Full-text search excellent
Powerful aggregations

Cons:

Not optimized for vector search
Higher resource usage
More complex to operate
Slower vector operations

Why Rejected: Qdrant is purpose-built for vector similarity search with better performance.

References

Last Review: 2025-11-10 Next Review: 2026-05-10 (6 months) Related ADRs: ADR-002, ADR-003, ADR-005

Keyboard shortcuts

OctoLLM Documentation