YAML Workflow System Guide
This guide shows how to design, run, and troubleshoot YAML workflows in SimpleAgents. By the end, you will understand the workflow file model, supported node behavior, schema contracts, runtime telemetry, and practical production checks.
Prerequisites
- Familiarity with Quick Start and Usage Guide
- A runnable workspace with
cargoand optionaluvfor Python examples - Basic JSON schema knowledge for
llm_calloutput contracts
Quick Path
- Create a minimal YAML workflow with one
llm_callnode. - Add explicit
config.output_schemafor structured output stability. - Run workflow via Rust API or examples runner.
- Render the workflow graph to Mermaid for fast wiring validation.
- Inspect trace/timing fields and iterate.
Minimal workflow skeleton:
id: my-workflow
version: 1.0.0
entry_node: first_node
nodes:
- id: first_node
node_type:
llm_call:
model: gpt-4.1
config:
output_schema:
type: object
properties:
status: { type: string }
required: [status]
additionalProperties: false
prompt: |
Return {"status":"ok"}
edges:
- from: first_node
to: first_nodeRequired top-level fields are id, entry_node, and non-empty nodes.
Mental Model
| Layer | What it does |
|---|---|
| YAML authoring | Defines graph, prompts, routing, workers, and state updates |
| Runtime model | Converts YAML to canonical IR when compatible, otherwise runs YAML-specific path |
| Execution + telemetry | Runs node-by-node and emits trace, timings, and event diagnostics |
Keep product logic in YAML; treat runtime output as verification and observability material.
Supported Node Types
llm_call: structured LLM generation with optional tools and streaming flagsswitch: condition-driven routing with deterministic defaultcustom_worker: deterministic external logic handler
llm_call essentials
node_type:
llm_call:
model: gpt-4.1
stream: false
heal: true
messages_path: input.messages
append_prompt_as_user: true
config:
output_schema: { ...json schema... }
prompt: |
...Behavior notes:
modelis required.config.output_schemashould be explicit for everyllm_call.config.schemais accepted as an alias but preferoutput_schema.- If schema is omitted, runtime falls back to permissive object behavior.
Tool calling (per-node strict format):
tools_format:openaiorsimplifiedtools,tool_choice,max_tool_roundtrips,tool_calls_global_key- Mixed tool declaration formats in one node fail validation.
- Tool output schema mismatch hard-fails node execution.
switch essentials
node_type:
switch:
branches:
- condition: '$.nodes.classifier.output.category == "x"'
target: branch_x
default: fallback_nodeAlways define deterministic default behavior.
custom_worker essentials
node_type:
custom_worker:
handler: GetRagData
config:
payload:
topic: terminationWorker context includes trace correlation fields under context.trace so external code can propagate telemetry.
Prompt Context and Run Memory
Templates can resolve from:
input.*nodes.<node_id>.output.*globals.*
Memory updates are available via:
config.set_globalsconfig.update_globalswithset|append|increment|merge
Use globals for run-level state, not for long-term secret storage.
Chat-History Workflows
Pass chat arrays in input.messages:
{
"email_text": "optional scalar input",
"messages": [
{"role":"system","content":"..."},
{"role":"user","content":"..."}
]
}Supported role values: system, user, assistant, tool (requires tool_call_id).
Running Workflows
Rust API:
use serde_json::json;
use simple_agents_workflow::run_workflow_yaml_file_with_client;
let output = run_workflow_yaml_file_with_client(
std::path::Path::new("examples/workflow_email/email-unified-chat-intake-classification.yaml"),
&json!({ "email_text": "Need replacement", "messages": [] }),
&client,
).await?;Python examples:
uv run --directory examples python workflow_email/run_with_chat_history.py
uv run --directory examples python workflow_email/run_with_unified_system.pyGraph visualization:
cargo run -p simple-agents-cli -- workflow mermaid examples/workflow_email/python-intern-fun-interview-system.yamlTelemetry and Diagnostics
Workflow outputs include:
tracenode orderstep_timingsper nodetotal_elapsed_mstrace_idmetadata.telemetry.trace_idmetadata.telemetry.sampled
Runtime options can include telemetry sampling, payload mode, tool trace mode, retention, and tenant context. Use conversation_id to group multi-turn traces reliably. telemetry.sample_rate must be between 0.0 and 1.0 and is applied deterministically per trace id.
Exporter configuration is environment-driven and shared across tracing backends:
SIMPLE_AGENTS_TRACING_ENABLEDOTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_PROTOCOL(grpcorhttp/protobuf)OTEL_EXPORTER_OTLP_HEADERSOTEL_SERVICE_NAME
Design Patterns That Work Well
- Classifier node ->
switchrouter -> action node - LLM action plus deterministic guardrail worker
- One-question-at-a-time interview/chat progression
- Explicit output schema for every
llm_call - Explicit closed terminal states for completed sessions
Troubleshooting
Stale Python bindings in examples
uv sync --directory examples --reinstall-package simple-agents-pyGraph validation issues
Render Mermaid output first to confirm parse and wiring:
cargo run -p simple-agents-cli -- workflow mermaid examples/workflow_email/email-unified-chat-intake-classification.yamlNon-deterministic routing behavior
Verify every switch has a deterministic default and branch paths point to existing node ids.
Schema drift in LLM output
Define config.output_schema on every llm_call node and keep it strict (additionalProperties: false where appropriate).
Production Checklist
- Every
llm_callhas explicitconfig.output_schema. - Every
switchdefines deterministic default routing. - Sensitive logic is represented in deterministic worker nodes where needed.
- Trace/timing output is captured and retained for audit/debug use.
- Session-close states are explicitly modeled.
Next Steps
- Use Workflow Debugging UX for replay and retry inspection.
- Tune runtime characteristics in Workflow Performance.
- Apply guardrails from Workflow Security.
- For YAML/code conversion, follow Workflow DSL Migration Cookbook.