SimpleAgents

Architecture Overview

This document describes the architecture and design decisions behind SimpleAgents.

Table of Contents

Design Philosophy

SimpleAgents is built on these core principles:

1. Type Safety First

2. Zero-Cost Abstractions

3. Security by Default

4. Extensibility

5. Developer Experience

System Architecture

┌─────────────────────────────────────────────────────────┐
│                      Application                         │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│                  simple-agents-types                     │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐ │
│  │   Request   │  │   Response   │  │    Message     │ │
│  └─────────────┘  └──────────────┘  └────────────────┘ │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐ │
│  │  Provider   │  │    Cache     │  │   Validation   │ │
│  │   (trait)   │  │   (trait)    │  │    (ApiKey)    │ │
│  └─────────────┘  └──────────────┘  └────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
            ┌─────────────┴─────────────┐
            ▼                           ▼
┌───────────────────────┐   ┌──────────────────────┐
│ simple-agents-        │   │ simple-agents-cache  │
│ providers             │   │                      │
│ ┌─────────────────┐   │   │ ┌────────────────┐  │
│ │  OpenAI         │   │   │ │  InMemory      │  │
│ │  Provider       │   │   │ │  (LRU + TTL)   │  │
│ └─────────────────┘   │   │ └────────────────┘  │
│ ┌─────────────────┐   │   │ ┌────────────────┐  │
│ │  Anthropic      │   │   │ │  NoOp          │  │
│ │  Provider       │   │   │ │  (testing)     │  │
│ └─────────────────┘   │   │ └────────────────┘  │
│ ┌─────────────────┐   │   └──────────────────────┘
│ │  Retry Logic    │   │
│ └─────────────────┘   │
└───────────────────────┘
            │
            ▼
    ┌──────────────┐
    │  HTTP/2      │
    │  Connection  │
    │  Pool        │
    └──────────────┘
            │
            ▼
    ┌──────────────┐
    │  LLM API     │
    │  (OpenAI,    │
    │   Anthropic) │
    └──────────────┘

Core Abstractions

Provider Trait

The Provider trait defines a three-phase architecture for LLM interactions:

#[async_trait]
pub trait Provider: Send + Sync {
    // Phase 1: Transform unified request to provider format
    fn transform_request(&self, req: &CompletionRequest)
        -> Result<ProviderRequest>;

    // Phase 2: Execute HTTP request
    async fn execute(&self, req: ProviderRequest)
        -> Result<ProviderResponse>;

    // Phase 3: Transform provider response to unified format
    fn transform_response(&self, resp: ProviderResponse)
        -> Result<CompletionResponse>;
}

Benefits:

Request/Response Types

Unified Types (application-facing):

Provider Types (provider-facing):

This separation allows:

Cache Trait

Simple async trait for caching:

#[async_trait]
pub trait Cache: Send + Sync {
    async fn get(&self, key: &str) -> Result<Option<Vec<u8>>>;
    async fn set(&self, key: &str, value: Vec<u8>, ttl: Duration) -> Result<()>;
    async fn delete(&self, key: &str) -> Result<()>;
    async fn clear(&self) -> Result<()>;
}

Key Features:

Data Flow

Typical Request Flow

1. Application creates CompletionRequest
   ↓
2. transform_request() → ProviderRequest
   ↓
3. execute() → HTTP call → ProviderResponse
   ↓
4. transform_response() → CompletionResponse
   ↓
5. Application uses response

With Caching

1. Application creates CompletionRequest
   ↓
2. Generate cache key from request
   ↓
3. Check cache.get(key)
   ├─ Hit → Return cached response
   └─ Miss ↓
4. transform_request() → ProviderRequest
   ↓
5. execute() → HTTP call → ProviderResponse
   ↓
6. transform_response() → CompletionResponse
   ↓
7. cache.set(key, response, ttl)
   ↓
8. Return response to application

With Retry Logic

1. Application creates CompletionRequest
   ↓
2. transform_request() → ProviderRequest
   ↓
3. execute_with_retry()
   ├─ Attempt 1 → Fail (retryable error)
   ├─ Backoff (exponential + jitter)
   ├─ Attempt 2 → Fail (retryable error)
   ├─ Backoff (exponential + jitter)
   └─ Attempt 3 → Success ↓
4. transform_response() → CompletionResponse
   ↓
5. Application uses response

Provider System

OpenAI Provider

Request Transformation:

CompletionRequest  OpenAICompletionRequest  JSON

Key Features:

Error Handling:

Adding New Providers

New providers implement:

  1. Request/response models
  2. Error types and mapping
  3. Provider trait implementation
  4. Tests for all three phases

Example Structure:

providers/
└── myprovider/
    ├── mod.rs        # Provider implementation
    ├── models.rs     # Request/response types
    └── error.rs      # Error mapping

Caching Layer

InMemoryCache

Architecture:

┌─────────────────────────────────────┐
│         InMemoryCache               │
│                                     │
│  ┌───────────────────────────────┐ │
│  │  Arc<RwLock<HashMap>>         │ │
│  │                               │ │
│  │  CacheEntry {                 │ │
│  │    data: Vec<u8>,             │ │
│  │    expires_at: Instant,       │ │
│  │    last_accessed: Instant     │ │
│  │  }                            │ │
│  └───────────────────────────────┘ │
│                                     │
│  Eviction Strategies:               │
│  • TTL-based (expires_at)          │
│  • LRU (last_accessed)             │
│  • Size-based (max_size)           │
│  • Count-based (max_entries)       │
└─────────────────────────────────────┘

Eviction Algorithm:

  1. On every get: Remove expired entries
  2. On every set: Check limits
  3. If over limits: Sort by last_accessed, remove oldest

Thread Safety:

Cache Key Generation

Uses blake3 for fast, deterministic hashing:

pub fn from_parts(provider: &str, model: &str, content: &str) -> String {
    let mut hasher = blake3::Hasher::new();
    hasher.update(provider.as_bytes());
    hasher.update(model.as_bytes());
    hasher.update(content.as_bytes());
    let hash = hasher.finalize();
    format!("{}:{}:{}", provider, model, hash.to_hex())
}

Format: provider:model:hash

Benefits:

Error Handling

Error Hierarchy

SimpleAgentsError (enum)
├── Validation(ValidationError)
│   ├── Empty { field }
│   ├── TooShort { field, min }
│   ├── TooLong { field, max }
│   ├── OutOfRange { field, min, max }
│   └── InvalidFormat { field, reason }
│
├── Provider(ProviderError)
│   ├── Authentication(String)
│   ├── RateLimit { retry_after, message }
│   ├── InvalidResponse(String)
│   ├── ModelNotFound(String)
│   ├── ContextLengthExceeded { max_tokens }
│   ├── Timeout(Duration)
│   └── UnsupportedFeature(String)
│
├── Network(String)
├── Serialization(String)
├── Cache(String)
└── Config(String)

Error Context

Errors include:

Retryable Errors

The system distinguishes between:

Retryable:

Non-retryable:

Security Model

Defense in Depth

Layer 1: Input Validation

Layer 2: Secrets Handling

Layer 3: Cryptographic Security

Layer 4: Network Security

Constant-Time Operations

Critical for preventing timing attacks:

use subtle::ConstantTimeEq;

// API key comparison
impl PartialEq for ApiKey {
    fn eq(&self, other: &Self) -> bool {
        // Takes same time regardless of where keys differ
        self.0.as_bytes().ct_eq(other.0.as_bytes()).into()
    }
}

Why it matters: An attacker could try guessing API keys character by character. With normal comparison, the function returns faster when the first character is wrong vs when many characters are correct. This timing difference leaks information.

Performance Optimizations

1. Zero-Copy Message Passing

// Before: Clones all messages
pub struct Request {
    pub messages: Vec<Message>,  // Owned
}

// After: Borrows messages
pub struct Request<'a> {
    pub messages: &'a [Message],  // Borrowed
}

Impact: Eliminates potentially megabytes of allocations per request.

2. Static String Allocation

// Headers use Cow for zero-allocation static strings
pub headers: Vec<(Cow<'static, str>, Cow<'static, str>)>

// Common headers are static
headers.push((
    Cow::Borrowed("Content-Type"),
    Cow::Borrowed("application/json")
));

Impact: Eliminates heap allocations for common headers.

3. Connection Pooling

Client::builder()
    .pool_max_idle_per_host(10)
    .pool_idle_timeout(Duration::from_secs(90))
    .http2_prior_knowledge()

Impact:

4. Smart Caching

LRU Eviction:

TTL-based Expiry:

Blake3 Hashing:

5. Lazy Validation

Validation happens only when:

Benefits:

Design Decisions

Why Traits Over Enums for Providers?

Considered:

enum Provider {
    OpenAI(OpenAIProvider),
    Anthropic(AnthropicProvider),
}

Chose Trait Instead:

trait Provider { ... }

Reasons:

  1. Open for extension (users can add providers)
  2. No overhead for dispatch (monomorphization)
  3. Better encapsulation
  4. Easier to test

Why Three-Phase Provider Architecture?

Separates:

  1. Transform Request - Pure, testable
  2. Execute - Side effects, retries
  3. Transform Response - Pure, testable

Benefits:

Why Separate Crates?

simple-agents-types:

simple-agents-providers:

simple-agents-cache:

Benefits:

Why Async?

LLM APIs are:

Async allows:

Future Architecture

Planned Improvements

  1. Rate Limiting
    • Token bucket algorithm
    • Per-provider limits
    • Automatic backoff
  2. Observability
    • Metrics collection
    • Distributed tracing
    • Performance monitoring
  3. Advanced Caching
    • Redis backend
    • Semantic caching
    • Cache warming
  4. Streaming
    • Complete SSE parsing
    • Backpressure handling
    • Chunk aggregation
  5. Advanced Routing
    • Load balancing
    • Fallback providers
    • Cost optimization

References