Troubleshooting Guide
Common issues, error patterns, and solutions for SimpleAgents core and binding workflows.
Table of Contents
- Environment & Credential Issues
- Installation & Build Issues
- YAML Workflow Errors
- Language Binding Issues
- Custom Worker Errors
- Streaming Issues
- Observability & Tracing Issues
- Performance Issues
- Provider-Specific Issues
- Debugging Techniques
- Getting Help
Environment & Credential Issues
Issue: "API key not found" or authentication errors
Symptoms:
RuntimeError: API key not found for provider 'openai'
Error: Missing required environment variable WORKFLOW_API_KEYSolutions:
Verify .env file is loaded:
pythonfrom dotenv import load_dotenv load_dotenv() # Must be called before creating ClientCheck environment variable names:
bash# For workflow runner WORKFLOW_PROVIDER=openai WORKFLOW_API_BASE=https://api.openai.com/v1 WORKFLOW_API_KEY=sk-your-key # For direct provider usage OPENAI_API_KEY=sk-your-key ANTHROPIC_API_KEY=sk-ant-...Verify the key is set:
bashecho $WORKFLOW_API_KEY # Should print your keyFor Docker/containerized environments:
bash# Pass env vars explicitly docker run -e WORKFLOW_API_KEY=$WORKFLOW_API_KEY myapp
Issue: "Provider not found" errors
Symptoms:
Error: Unknown provider 'azure'
RuntimeError: Provider 'custom' is not supportedSolutions:
Use openai as the provider with a custom base URL:
# Correct - Azure OpenAI
client = Client(
provider="openai",
api_base="https://your-resource.openai.azure.com/...",
api_key="your-azure-key"
)Issue: Live tests are skipped
Symptoms:
SKIPPED: Live tests require API credentialsSolutions:
Set required environment variables:
export PROVIDER=openai
export CUSTOM_API_MODEL=gpt-4.1-mini
export CUSTOM_API_KEY=sk-your-key
export CUSTOM_API_BASE=https://api.openai.com/v1 # optionalFor Node tests specifically:
cd crates/simple-agents-napi
npm run test:live # Will skip without env vars by designInstallation & Build Issues
Issue: Python bindings fail to install
Symptoms:
ERROR: Could not build wheels for simple-agents-py
ImportError: cannot import name 'Client' from 'simple_agents_py'Solutions:
Ensure Python 3.8+:
bashpython --version # Should be 3.8 or higherUse uv (recommended):
bashuv pip install simple-agents-pyFor development builds:
bashcd crates/simple-agents-py uv build uv pip install -e .If using pip:
bashpip install --upgrade pip pip install simple-agents-py
Issue: Stale Python bindings cache
Symptoms:
TypeError: argument 'prompt': 'list' object cannot be converted to 'PyString'
AttributeError: 'Client' object has no attribute 'run_workflow'Solution: Clear uv cache and rebuild:
cd examples
rm -rf .venv .uv-cache
uv run --no-cache python python_client.pyIssue: Node bindings fail to build
Symptoms:
Error: Cannot find module 'simple-agents-node'
napi:Error: Failed to load native addonSolutions:
Install dependencies:
bashcd crates/simple-agents-napi npm ciRebuild the native module:
bashnpm run build # or npm run build:debugVerify Node version:
bashnode --version # Should be 16+
Issue: Rust build failures
Symptoms:
error: linking with `cc` failed
error: could not compile `simple-agents-core`Solutions:
Ensure Rust 1.75+:
bashrustc --version rustup updateInstall system dependencies:
bash# Ubuntu/Debian sudo apt-get install build-essential libssl-dev pkg-config # macOS xcode-select --installClean and rebuild:
bashcargo clean cargo build --all
Issue: WASM bindings build fails
Symptoms:
error: wasm-bindgen not found
error: failed to run custom build command for `simple-agents-wasm`Solutions:
Install wasm-bindgen-cli:
# Makefile target does this automatically:
make ensure-wasm-bindgen
# Or manually:
cargo install wasm-bindgen-cli --version 0.2.117YAML Workflow Errors
Issue: "Invalid YAML syntax"
Symptoms:
Error: failed to parse workflow yaml: invalid type
RuntimeError: YamlError: mapping values are not allowed hereSolutions:
Validate YAML syntax:
bash# Use yamllint or online validator yamllint workflow.yamlCommon YAML mistakes:
yaml# WRONG - Missing space after colon model:gpt-4.1-mini # CORRECT model: gpt-4.1-mini # WRONG - Tab characters instead of spaces nodes: - id: classify # CORRECT - Use spaces only nodes: - id: classifyUse quotes for strings with special characters:
yamlprompt: | This is a multi-line string with "quotes" and 'apostrophes'
Issue: "Node not found" or "Invalid node reference"
Symptoms:
Error: Node 'classify' not found in workflow
RuntimeError: Invalid node reference: extract_companySolutions:
Check node ID spelling:
yamlnodes: - id: classify # Defined as 'classify' - id: route node_type: switch: branches: - condition: '$.nodes.clasify.output.category == "billing"' # WRONG - typo target: handle_billingEnsure entry_node exists:
yamlid: my-workflow version: 1.0.0 entry_node: classify # Must match a node id nodes: - id: classify # This must existVerify edge references:
yamledges: - from: classify # Must be a valid node id to: route # Must be a valid node id
Issue: "Invalid JSONPath expression"
Symptoms:
Error: Failed to evaluate condition: Invalid JSONPath
RuntimeError: JSONPath error: path not foundSolutions:
Test JSONPath syntax:
yaml# WRONG - Missing $ prefix condition: 'nodes.classify.output.category == "billing"' # CORRECT condition: '$.nodes.classify.output.category == "billing"'Common JSONPath patterns:
yaml# Access node output "{{ nodes.node_id.output.field }}" # Access input messages "{{ input.messages[0].content }}" # Access globals "{{ globals.config_value }}" # In switch conditions condition: '$.nodes.previous.output.status == "success"'Use quotes for string comparisons:
yamlcondition: '$.nodes.classify.output.category == "billing"' # Quotes required condition: '$.nodes.score.output.value > 0.5' # Numbers ok without
Issue: Schema validation failures
Symptoms:
Error: Schema validation failed: missing required field 'response'
Error: additionalProperties 'extra_field' not allowedSolutions:
Use strict schemas with additionalProperties: false:
yamlconfig: output_schema: type: object properties: response: type: string confidence: type: number required: [response] additionalProperties: false # Rejects unexpected fieldsEnable healing for malformed JSON:
yamlnode_type: llm_call: model: gpt-4.1-mini heal: true # Auto-fixes common JSON errorsImprove the prompt:
yamlprompt: | Return ONLY valid JSON. No markdown, no explanation. Required fields: response (string), confidence (number between 0-1) Example: {"response": "Hello", "confidence": 0.95}
Language Binding Issues
Issue: Python binding contract failures
Symptoms:
FAILED tests/test_binding_contract.py::test_contract - AssertionErrorSolutions:
Run contract tests to identify mismatch:
bash./scripts/run-binding-contracts.shCheck fixture file:
bashcat parity-fixtures/binding_contract.jsonCommon causes:
- Binding API changed without fixture update
- Symbol missing in generated declarations
- Streaming event shape changed
Update fixtures if needed:
bash# After intentional API changes ./scripts/update-binding-contracts.sh
Issue: Node binding contract failures
Symptoms:
not ok 1 - Contract test: complete should return ResponseWithMetadata
TypeError: Cannot read property 'content' of undefinedSolutions:
Rebuild the addon:
bashcd crates/simple-agents-napi npm run buildRun TypeScript checks:
bashmake node-typecheckCheck declaration file:
bashcat crates/simple-agents-napi/index.d.ts
Issue: Pydantic validation errors (Python)
Symptoms:
pydantic.error_wrappers.ValidationError: 1 validation error for WorkflowExecutionRequest
field required (type=value_error.missing)Solutions:
Install pydantic extra:
bashpip install simple-agents-py[pydantic]Check field names:
pythonfrom simple_agents_py.workflow_request import WorkflowExecutionRequest, WorkflowMessage, WorkflowRole # CORRECT req = WorkflowExecutionRequest( workflow_path="workflow.yaml", # Note: underscore, not camelCase messages=[WorkflowMessage(role=WorkflowRole.USER, content="Hello")] )Use dict-based requests as fallback:
python# Skip Pydantic models entirely req = { "workflow_path": "workflow.yaml", "messages": [{"role": "user", "content": "Hello"}] }
Issue: TypeScript type errors
Symptoms:
error TS2345: Argument of type 'string' is not assignable to parameter of type 'MessageInput[]'Solutions:
Check import paths:
typescript// CORRECT import { Client } from "simple-agents-node"; import type { MessageInput } from "simple-agents-node"; // For workflow events import { parseWorkflowEvent } from "simple-agents-node/workflow_event";Use proper message format:
typescriptconst messages: MessageInput[] = [ { role: "user", content: "Hello" } // Content should be string or array ];Run type checker:
bashmake node-typecheck
Custom Worker Errors
Issue: "Handler not found" (Python)
Symptoms:
RuntimeError: Custom worker handler 'lookup_company' not found
Error: No module named 'handlers'Solutions:
Ensure handlers.py exists:
bashls -la handlers.py # Must be in same directory as workflow.yamlCheck function name matches exactly:
python# handlers.py def lookup_company(*, context, payload): # Must match YAML exactly ...yaml# workflow.yaml node_type: custom_worker: handler: lookup_company # Must match function name handler_file: handlers.py # Optional, defaults to handlers.pyVerify function signature:
python# CORRECT - Keyword-only args def my_handler(*, context, payload): ... # WRONG - Positional args def my_handler(context, payload): ...Check for import errors in handlers.py:
bashpython -c "import handlers"
Issue: "Handler not found" (TypeScript)
Symptoms:
Error: custom_worker requires customWorkerDispatch callback
Error: unknown custom worker handler: lookup_companySolutions:
Always pass the dispatch callback:
typescriptfunction customWorkerDispatch(req: { handler: string; payload: unknown; context: unknown; }): string { if (req.handler === "lookup_company") { return JSON.stringify({ result: "found" }); } throw new Error(`unknown handler: ${req.handler}`); } // Pass as LAST argument const result = await client.runWorkflow( workflowPath, input, undefined, // workflowOptions undefined, // executionFlags customWorkerDispatch // REQUIRED for custom_worker nodes );Use executeWorkflowYaml (recommended):
typescriptimport { Client } from "simple-agents-node"; const client = new Client("openai"); // This signature is cleaner const result = client.executeWorkflowYaml({ workflowPath: "workflow.yaml", messages: [...], customWorkerDispatch: myDispatch, // Built into request object });
Issue: Custom worker returns wrong format
Symptoms:
Error: Custom worker output is not valid JSON
RuntimeError: Failed to parse custom worker resultSolutions:
Return JSON-serializable data:
python# CORRECT def my_handler(*, context, payload): return { "status": "success", "data": {"key": "value"} } # WRONG - Returns non-serializable object def my_handler(*, context, payload): return SomeCustomClass() # Don't return custom objectsTypeScript - Return JSON string:
typescriptfunction dispatch(req) { if (req.handler === "my_handler") { return JSON.stringify({ result: "data" }); // Must return string } }Handle errors gracefully:
pythondef my_handler(*, context, payload): try: result = risky_operation() return {"success": True, "result": result} except Exception as e: return {"success": False, "error": str(e)}
Issue: TypeError in custom worker (Python)
Symptoms:
TypeError: my_handler() got an unexpected keyword argument 'context'Solution:
Update to new signature:
# OLD (deprecated)
def my_handler(input, nodes, globals):
...
# NEW (current)
def my_handler(*, context, payload):
input_data = context["input"]
nodes = context["nodes"]
globals_ = context["globals"]
...Streaming Issues
Issue: Streaming not working
Symptoms:
- No events received
on_eventcallback never fires- Workflow completes without streaming
Solutions:
Enable streaming at both levels:
yaml# YAML level nodes: - id: generate node_type: llm_call: model: gpt-4.1-mini stream: true # Enable for this nodepython# Runtime level execution=WorkflowExecutionFlags( node_llm_streaming=True, # Master switch )Check streaming logic:
- Stream = YAML
stream: trueAND runtimenode_llm_streaming: true - Both must be enabled
- Stream = YAML
Use proper event handling:
pythondef on_event(event): event_type = event.get("event_type") if event_type == "node_stream_delta": print(event.get("delta", ""), end="") elif event_type == "workflow_error": print(f"Error: {event}") result = client.stream_workflow(request, on_event=on_event)
Issue: "stream and heal cannot both be true"
Symptoms:
ValidationError: stream=True and heal=True cannot both be enabled
RuntimeError: Node config conflict: streaming and healing are mutually exclusiveSolution:
Choose one mode per node:
# For real-time UI feedback
nodes:
- id: chat
node_type:
llm_call:
model: gpt-4.1-mini
stream: true
heal: false
# For reliable data extraction
nodes:
- id: extract
node_type:
llm_call:
model: gpt-4.1-mini
stream: false
heal: trueWorkaround for both: Use separate nodes - one for streaming (user-facing) and one for healing (data extraction).
Issue: Streaming structured output is garbled
Symptoms:
- Partial JSON looks corrupted
- Events show incomplete/malformed data
- Parsing fails mid-stream
Solutions:
Use stream_json_as_text:
yamlnode_type: llm_call: model: gpt-4.1-mini stream: true stream_json_as_text: true # Stream as text, not parsed JSONParse incrementally:
pythonfrom simple_agents_py import StreamingParser parser = StreamingParser() def on_event(event): if event.get("event_type") == "node_stream_delta": parser.feed(event.get("delta", "")) partial = parser.try_parse() if partial: print(f"Partial: {partial.value}")
Observability & Tracing Issues
Issue: Traces not appearing in Langfuse/Jaeger
Symptoms:
- No traces visible in UI
- Workflow runs but no observability data
Solutions:
Verify environment variables:
bashecho $SIMPLE_AGENTS_TRACING_ENABLED # Should be "true" echo $OTEL_EXPORTER_OTLP_ENDPOINT # Should be setCheck endpoint connectivity:
bashcurl $OTEL_EXPORTER_OTLP_ENDPOINTEnable telemetry in request:
pythonworkflow_options=WorkflowRunOptions( telemetry=WorkflowTelemetryConfig(enabled=True) )For Langfuse specifically:
pythonimport base64 # Verify token is correct token = base64.b64encode(f"{public}:{secret}".encode()).decode("ascii") print(f"Token length: {len(token)}") # Should be > 0
Issue: "OTEL exporter failed"
Symptoms:
ERROR opentelemetry_otlp: Export failed: connection refused
Warning: Failed to export spansSolutions:
Verify collector is running:
bash# For Jaeger docker run -d --name jaeger \ -e COLLECTOR_OTLP_ENABLED=true \ -p 16686:16686 \ -p 4317:4317 \ jaegertracing/all-in-one:latest # Check if running curl http://localhost:16686Match protocol to endpoint:
bash# gRPC endpoint OTEL_EXPORTER_OTLP_PROTOCOL=grpc OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # HTTP endpoint OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318Check headers format:
bash# Correct format OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token,header2=value"
Issue: High cardinality traces
Symptoms:
- Langfuse/Jaeger UI slow
- Too many unique traces
- Storage growing rapidly
Solutions:
Enable sampling:
pythonworkflow_options={ "telemetry": { "enabled": True, "sample_rate": 0.1 # Only 10% of traces } }Disable nerdstats if not needed:
pythontelemetry=WorkflowTelemetryConfig( enabled=True, nerdstats=False # Reduces span count )
Performance Issues
Issue: Workflows are slow
Symptoms:
- High latency (>2s for simple workflows)
- Slow step execution
- Poor throughput
Solutions:
Check LLM model:
yaml# Faster models model: gpt-4.1-mini # Faster than gpt-4 model: gpt-3.5-turbo # Fastest OpenAI modelEnable streaming for user-facing workflows:
yamlstream: true # Reduces time-to-first-tokenProfile with nerdstats:
pythontelemetry=WorkflowTelemetryConfig(nerdstats=True) # Check step_timings in result for slow nodesParallelize independent nodes:
yaml# These nodes run in parallel if no dependencies nodes: - id: extract_a ... - id: extract_b ... edges: [] # No dependencies = parallel execution
Issue: High memory usage
Symptoms:
- OOM errors
- Memory growing over time
- Rust panics on allocation
Solutions:
Limit output size:
yamlnode_type: llm_call: model: gpt-4.1-mini max_tokens: 500 # Limit response sizeProcess large workflows in chunks:
python# Don't load all data at once for chunk in batches: result = client.run_workflow( WorkflowExecutionRequest( workflow_path="process.yaml", messages=[{"role": "user", "content": chunk}] ) )Check for memory leaks in custom workers:
pythondef my_handler(*, context, payload): # Clear large objects explicitly result = expensive_operation() del payload["large_data"] # Free memory return result
Provider-Specific Issues
Issue: Azure OpenAI authentication errors
Symptoms:
Error: 401 - Authentication failed
Error: Deployment not foundSolutions:
Use correct endpoint format:
pythonapi_base = "https://your-resource.openai.azure.com/openai/deployments/your-deployment-name" # NOT just the base resource URLSet API version in headers if needed:
python# SimpleAgents handles this, but if overriding: headers = {"api-version": "2024-02-01"}Verify deployment name matches:
bash# In Azure Portal, check deployment name # Must match exactly (case-sensitive)
Issue: Local models (Ollama/vLLM) not working
Symptoms:
Error: Connection refused
Error: 404 Not Found
Model not foundSolutions:
Verify server is running:
bash# Ollama curl http://localhost:11434/api/tags # vLLM curl http://localhost:8000/v1/modelsUse correct base URL:
python# Ollama OpenAI-compatible endpoint api_base = "http://localhost:11434/v1" # vLLM api_base = "http://localhost:8000/v1"Model name format:
yaml# Ollama - use model tag name model: llama2 # vLLM - use full path or simplified name model: meta-llama/Llama-2-70b-chat-hfCheck CORS (Ollama):
bashOLLAMA_ORIGINS="*" ollama serve
Issue: Rate limiting from provider
Symptoms:
Error: 429 Too Many Requests
Error: Rate limit exceededSolutions:
SimpleAgents has built-in retries:
yaml# Automatic retry with exponential backoff node_type: llm_call: model: gpt-4.1-miniAdd delays between batches:
pythonimport time for item in items: result = client.run_workflow(...) time.sleep(0.5) # Rate limit yourselfUse slower tier:
yamlmodel: gpt-3.5-turbo # Higher rate limits than GPT-4
Debugging Techniques
Enable Debug Logging
Rust:
RUST_LOG=debug cargo run
RUST_LOG=simple_agents=trace cargo testPython:
import logging
logging.basicConfig(level=logging.DEBUG)Tracing subscriber:
use tracing_subscriber;
tracing_subscriber::fmt::init();Inspect Workflow Events
def debug_on_event(event):
"""Print all event details for debugging"""
print(f"\n=== Event: {event.get('event_type')} ===")
for key, value in event.items():
print(f" {key}: {value}")
result = client.stream_workflow(request, on_event=debug_on_event)Validate YAML Without Running
import yaml
# Load and validate structure
with open("workflow.yaml") as f:
data = yaml.safe_load(f)
# Check required fields
assert "id" in data, "Missing workflow id"
assert "entry_node" in data, "Missing entry_node"
assert "nodes" in data, "Missing nodes"
# Check node references
node_ids = {n["id"] for n in data["nodes"]}
assert data["entry_node"] in node_ids, "entry_node not in nodes"
for edge in data.get("edges", []):
assert edge["from"] in node_ids, f"Edge from '{edge['from']}' not found"
assert edge["to"] in node_ids, f"Edge to '{edge['to']}' not found"Test Custom Workers in Isolation
# Test handler independently
def test_handler():
context = {
"input": {"messages": [{"role": "user", "content": "test"}]},
"nodes": {},
"globals": {}
}
payload = {"company_name": "Test Corp"}
result = lookup_company(context=context, payload=payload)
print(f"Result: {result}")
return result
test_handler()Getting Help
Before Reporting an Issue
- Search existing issues: https://github.com/CraftsMan-Labs/SimpleAgents/issues
- Check this troubleshooting guide for your specific error
- Run validation commands:bash
make check-publish # Full validation suite make test # Run all tests ./scripts/run-binding-contracts.sh # Check binding parity
Information to Include
When reporting issues, include:
Environment:
bashpython --version # or node --version rustc --version pip show simple-agents-py # or npm list simple-agents-nodeMinimal reproduction:
- Smallest YAML/workflow that triggers the issue
- Minimal code to reproduce
Error output:
- Full error message with stack trace
- Log output with
RUST_LOG=debug
Configuration:
- Redacted .env (remove API keys)
- Workflow YAML (sanitized)
Where to Report
- Bugs: https://github.com/CraftsMan-Labs/SimpleAgents/issues
- Documentation: Same as above with label
documentation - Feature requests: Issues with label
enhancement
Community Resources
- Examples:
examples/directory in repo - Playground: https://yamslam.craftsmanlabs.net/playground
- Full Docs: https://docs.simpleagents.craftsmanlabs.net/
Quick Reference: Common Error Codes
| Error | Meaning | Quick Fix |
|---|---|---|
API key not found | Missing credentials | Check .env file, verify load_dotenv() |
Invalid YAML | Syntax error | Use yamllint, check indentation |
Node not found | Invalid reference | Check node ID spelling |
Handler not found | Custom worker missing | Verify handlers.py exists, function name matches |
Schema validation failed | Output doesn't match schema | Enable healing, adjust prompt, check schema |
Stream + heal conflict | Both enabled | Choose one per node |
429 Rate limit | Too many requests | Add delays, use retries, check tier |
401 Unauthorized | Bad API key | Verify key, check base URL |
Connection refused | Server not running | Check local model server |
Last updated: 2026-04-20
For the latest troubleshooting information, check the GitHub repository.