This document describes the architecture and design decisions behind SimpleAgents.
SimpleAgents is built on these core principles:
ApiKey)┌─────────────────────────────────────────────────────────┐
│ Application │
└─────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ simple-agents-types │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Request │ │ Response │ │ Message │ │
│ └─────────────┘ └──────────────┘ └────────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Provider │ │ Cache │ │ Validation │ │
│ │ (trait) │ │ (trait) │ │ (ApiKey) │ │
│ └─────────────┘ └──────────────┘ └────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌───────────────────────┐ ┌──────────────────────┐
│ simple-agents- │ │ simple-agents-cache │
│ providers │ │ │
│ ┌─────────────────┐ │ │ ┌────────────────┐ │
│ │ OpenAI │ │ │ │ InMemory │ │
│ │ Provider │ │ │ │ (LRU + TTL) │ │
│ └─────────────────┘ │ │ └────────────────┘ │
│ ┌─────────────────┐ │ │ ┌────────────────┐ │
│ │ Anthropic │ │ │ │ NoOp │ │
│ │ Provider │ │ │ │ (testing) │ │
│ └─────────────────┘ │ │ └────────────────┘ │
│ ┌─────────────────┐ │ └──────────────────────┘
│ │ Retry Logic │ │
│ └─────────────────┘ │
└───────────────────────┘
│
▼
┌──────────────┐
│ HTTP/2 │
│ Connection │
│ Pool │
└──────────────┘
│
▼
┌──────────────┐
│ LLM API │
│ (OpenAI, │
│ Anthropic) │
└──────────────┘
The Provider trait defines a three-phase architecture for LLM interactions:
#[async_trait]
pub trait Provider: Send + Sync {
// Phase 1: Transform unified request to provider format
fn transform_request(&self, req: &CompletionRequest)
-> Result<ProviderRequest>;
// Phase 2: Execute HTTP request
async fn execute(&self, req: ProviderRequest)
-> Result<ProviderResponse>;
// Phase 3: Transform provider response to unified format
fn transform_response(&self, resp: ProviderResponse)
-> Result<CompletionResponse>;
}
Benefits:
Unified Types (application-facing):
CompletionRequest - Standard request formatCompletionResponse - Standard response formatMessage - Conversation messagesProvider Types (provider-facing):
ProviderRequest - HTTP request detailsProviderResponse - HTTP response detailsThis separation allows:
Simple async trait for caching:
#[async_trait]
pub trait Cache: Send + Sync {
async fn get(&self, key: &str) -> Result<Option<Vec<u8>>>;
async fn set(&self, key: &str, value: Vec<u8>, ttl: Duration) -> Result<()>;
async fn delete(&self, key: &str) -> Result<()>;
async fn clear(&self) -> Result<()>;
}
Key Features:
1. Application creates CompletionRequest
↓
2. transform_request() → ProviderRequest
↓
3. execute() → HTTP call → ProviderResponse
↓
4. transform_response() → CompletionResponse
↓
5. Application uses response
1. Application creates CompletionRequest
↓
2. Generate cache key from request
↓
3. Check cache.get(key)
├─ Hit → Return cached response
└─ Miss ↓
4. transform_request() → ProviderRequest
↓
5. execute() → HTTP call → ProviderResponse
↓
6. transform_response() → CompletionResponse
↓
7. cache.set(key, response, ttl)
↓
8. Return response to application
1. Application creates CompletionRequest
↓
2. transform_request() → ProviderRequest
↓
3. execute_with_retry()
├─ Attempt 1 → Fail (retryable error)
├─ Backoff (exponential + jitter)
├─ Attempt 2 → Fail (retryable error)
├─ Backoff (exponential + jitter)
└─ Attempt 3 → Success ↓
4. transform_response() → CompletionResponse
↓
5. Application uses response
Request Transformation:
CompletionRequest → OpenAICompletionRequest → JSON
Key Features:
CowError Handling:
New providers implement:
Example Structure:
providers/
└── myprovider/
├── mod.rs # Provider implementation
├── models.rs # Request/response types
└── error.rs # Error mapping
Architecture:
┌─────────────────────────────────────┐
│ InMemoryCache │
│ │
│ ┌───────────────────────────────┐ │
│ │ Arc<RwLock<HashMap>> │ │
│ │ │ │
│ │ CacheEntry { │ │
│ │ data: Vec<u8>, │ │
│ │ expires_at: Instant, │ │
│ │ last_accessed: Instant │ │
│ │ } │ │
│ └───────────────────────────────┘ │
│ │
│ Eviction Strategies: │
│ • TTL-based (expires_at) │
│ • LRU (last_accessed) │
│ • Size-based (max_size) │
│ • Count-based (max_entries) │
└─────────────────────────────────────┘
Eviction Algorithm:
get: Remove expired entriesset: Check limitslast_accessed, remove oldestThread Safety:
Arc<RwLock<>> for shared stateUses blake3 for fast, deterministic hashing:
pub fn from_parts(provider: &str, model: &str, content: &str) -> String {
let mut hasher = blake3::Hasher::new();
hasher.update(provider.as_bytes());
hasher.update(model.as_bytes());
hasher.update(content.as_bytes());
let hash = hasher.finalize();
format!("{}:{}:{}", provider, model, hash.to_hex())
}
Format: provider:model:hash
Benefits:
SimpleAgentsError (enum)
├── Validation(ValidationError)
│ ├── Empty { field }
│ ├── TooShort { field, min }
│ ├── TooLong { field, max }
│ ├── OutOfRange { field, min, max }
│ └── InvalidFormat { field, reason }
│
├── Provider(ProviderError)
│ ├── Authentication(String)
│ ├── RateLimit { retry_after, message }
│ ├── InvalidResponse(String)
│ ├── ModelNotFound(String)
│ ├── ContextLengthExceeded { max_tokens }
│ ├── Timeout(Duration)
│ └── UnsupportedFeature(String)
│
├── Network(String)
├── Serialization(String)
├── Cache(String)
└── Config(String)
Errors include:
The system distinguishes between:
Retryable:
Non-retryable:
Layer 1: Input Validation
Layer 2: Secrets Handling
ApiKey type prevents accidental exposure.expose() requiredLayer 3: Cryptographic Security
Layer 4: Network Security
Critical for preventing timing attacks:
use subtle::ConstantTimeEq;
// API key comparison
impl PartialEq for ApiKey {
fn eq(&self, other: &Self) -> bool {
// Takes same time regardless of where keys differ
self.0.as_bytes().ct_eq(other.0.as_bytes()).into()
}
}
Why it matters: An attacker could try guessing API keys character by character. With normal comparison, the function returns faster when the first character is wrong vs when many characters are correct. This timing difference leaks information.
// Before: Clones all messages
pub struct Request {
pub messages: Vec<Message>, // Owned
}
// After: Borrows messages
pub struct Request<'a> {
pub messages: &'a [Message], // Borrowed
}
Impact: Eliminates potentially megabytes of allocations per request.
// Headers use Cow for zero-allocation static strings
pub headers: Vec<(Cow<'static, str>, Cow<'static, str>)>
// Common headers are static
headers.push((
Cow::Borrowed("Content-Type"),
Cow::Borrowed("application/json")
));
Impact: Eliminates heap allocations for common headers.
Client::builder()
.pool_max_idle_per_host(10)
.pool_idle_timeout(Duration::from_secs(90))
.http2_prior_knowledge()
Impact:
LRU Eviction:
TTL-based Expiry:
Blake3 Hashing:
Validation happens only when:
.build())Benefits:
Considered:
enum Provider {
OpenAI(OpenAIProvider),
Anthropic(AnthropicProvider),
}
Chose Trait Instead:
trait Provider { ... }
Reasons:
Separates:
Benefits:
simple-agents-types:
simple-agents-providers:
simple-agents-cache:
Benefits:
LLM APIs are:
Async allows: