Tracing Architecture (OTLP Backends)
This document defines tracing data flow for SimpleAgents workflow runs when used from an external API layer, with OTLP destinations such as Jaeger and Langfuse.
Goals
- Correlate external API traces and in-repo workflow traces as one transaction.
- Emit mandatory per-handler spans for custom worker execution.
- Return trace correlation IDs in workflow outputs.
- Keep payload capture full by default while remaining toggle-ready for redaction.
Correlation model
Required attributes across spans/events:
trace_idspan_idworkspace_iduser_idconversation_idrequest_idworkflow_idrun_idnode_idhandler_lang
Runtime options contract
Workflow execution accepts a structured options object (YamlWorkflowRunOptions):
telemetryenabled(trueby default)sample_rate(1.0by default)payload_mode(full_payloaddefault,redacted_payloadoptional)retention_days(30by default)multi_tenant(trueby default)
tracecontext:trace_id,span_id,parent_span_id,traceparent,tracestate,baggagetenant:workspace_id,user_id,conversation_id,request_id,run_id
Runtime exporter env contract
Tracing exporter configuration is clean-break and OTLP-native:
SIMPLE_AGENTS_TRACING_ENABLEDOTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_PROTOCOL(grpcorhttp/protobuf)OTEL_EXPORTER_OTLP_HEADERS(k=v,k2=v2)OTEL_SERVICE_NAME
Notes:
- For
http/protobuf, endpoint can be a base OTLP URL; traces are emitted to the traces signal path (/v1/traces). - OTLP headers are applied in both
grpcandhttp/protobufmodes.
Jaeger / Collector example
export SIMPLE_AGENTS_TRACING_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_SERVICE_NAME=simple-agents-workflowLangfuse API example
export SIMPLE_AGENTS_TRACING_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64(public_key:secret_key)>,x-langfuse-ingestion-version=4"
export OTEL_SERVICE_NAME=simple-agents-workflowOutput contract
Workflow results include correlation IDs in both locations:
- top-level
trace_id metadata.telemetry.trace_idmetadata.telemetry.sampledmetadata.trace.tenant.*(includingconversation_idwhen provided)
Sampling behavior:
sample_rateis validated at runtime and must be between0.0and1.0(inclusive).- Sampling decision is deterministic per
trace_id(sametrace_id+ samesample_rate=> same sampled outcome). trace_idis still returned even when a run is not sampled, so logs/events can stay correlated.
Programmatic Correlation In External Code
What is possible
- Reuse the same
trace_idacross multiple turns/runs. - Pass
conversation_id(UUID) for chat/session grouping. - Correlate external handler telemetry with workflow telemetry using shared IDs.
- Custom worker
contextnow includes an injectedtraceblock automatically.
Important: traces are append-only. You do not mutate old emitted spans; each run adds new spans/events.
Python example (streaming)
result = client.run_workflow_yaml_stream(
"examples/workflow_email/email-chat-draft-or-clarify.yaml",
workflow_input,
on_event=on_event,
workflow_options={
"telemetry": {"enabled": True, "nerdstats": True},
"trace": {
"context": {
"trace_id": "4f6f4a0e7b6e4f7a85fd5ec3e3de31a9",
},
"tenant": {
"conversation_id": "6e6d3125-b9f1-4af2-af1f-7cca024a2c42",
"request_id": "turn-12",
"run_id": "chat-turn-12",
},
},
},
)YAML custom worker payload forwarding example
If your external handler emits its own logs/metrics/spans, forward IDs in payload/context:
- id: rag_lookup
node_type:
custom_worker:
handler: GetRagData
config:
payload:
topic: termination_repeated_offense
trace_id: "{{ input.trace_id }}"
conversation_id: "{{ input.conversation_id }}"Automatic trace context in custom worker context
YamlWorkflowCustomWorkerExecutor.execute(..., context) receives this shape:
{
"input": { "...": "..." },
"nodes": { "...": "..." },
"globals": { "...": "..." },
"trace": {
"context": {
"trace_id": "...",
"span_id": "...",
"parent_span_id": "...",
"traceparent": "00-...-...-01",
"tracestate": "...",
"baggage": { "...": "..." }
},
"tenant": {
"workspace_id": "...",
"user_id": "...",
"conversation_id": "...",
"request_id": "...",
"run_id": "..."
}
}
}Use context.trace.context.traceparent (or trace_id) when starting external spans, and carry context.trace.tenant.conversation_id as a span attribute for chat-level grouping.
Handler-side guidance
- Include
trace_idandconversation_idin every handler log/event. - If you have an OTel tracer in external code, set these as span attributes.
- Keep one trace per turn/request as the default operational model; use
conversation_idto stitch multi-turn chats.
Data flow
flowchart TD
A[External API Layer] -->|structured trace context + optional raw fields| B[Binding/FFI Entry]
B --> C[Workflow Runner]
C --> D[workflow.run span]
D --> E[workflow.node.execute spans]
E --> F[handler.invoke spans (mandatory)]
F --> G[Workflow Output]
G --> H[top-level trace_id]
G --> I[metadata.telemetry.trace_id]Backend fanout
Recommended deployment path:
- SDK/runtime spans -> OpenTelemetry Collector
- Collector -> Jaeger (trace UI + query)
- Jaeger storage -> OpenSearch (retention policy e.g. 30 days)
- Collector can fan out to multiple backends (e.g. Jaeger + Langfuse) when needed.
This keeps vendor interoperability through OpenTelemetry and avoids backend-specific runtime code paths.