Coordinating complex tasks with Directed Acyclic Graphs (DAGs).
Workflows allow you to orchestrate multiple jobs, approvals, and sub-workflows in a structured DAG. They handle dependencies, data flow, and conditional execution.
DAG Concepts
A workflow is defined as a Directed Acyclic Graph (DAG) where:
- Steps: The nodes of the graph.
- Dependencies: The edges of the graph, defined by the
depends_onfield. - Fan-in/Fan-out: Multiple steps can depend on one (fan-out), or one step can depend on many (fan-in).
Validation guardrail: a workflow can include up to 1000 steps.
For implementation-level behavior (scheduler decisions, explainability APIs, critical-path estimates, policy enforcement, and replay controls), see DAG Runtime.
Step Types
| Type | Description |
|---|---|
job | Executes a standard Job. |
approval | A human-in-the-loop gate that waits for manual approval or rejection. |
sub_workflow | Triggers another workflow as a nested step. |
wait_for_event | Pauses until an external event arrives via API. See Event Triggers. |
sleep | Durable sleep — pauses for a specified duration without holding goroutines. |
Workflow Versioning
Workflows use a snapshotting mechanism. Every time a workflow's steps are updated, the version increments. The entire DAG definition is copied to a versioned table, ensuring that running workflows are not disrupted by changes to the definition.
Step Conditions
Steps can define a condition (JSON) that must evaluate to true before the step starts.
- step_status: Check the status of a dependency (e.g.,
completed,failed). - step_status_in: Check a dependency against a set of statuses.
- not: Invert a nested condition.
- Composite Logic: Use
all_oforany_offor complex boolean logic.
If a condition is false, the step is marked as skipped.
Data Flow and Templates
Workflows support dynamic data flow between steps using template variables.
Template Variable Rendering
Use {{var}} syntax with dot notation to access data:
{{payload.key}}: Access the initial workflow trigger payload.{{parent_outputs.step_ref.key}}: Access the output of a specific parent step.
Output Transforms
Steps can define an output_transform using JSONPath (via gjson). This allows a step to export only a specific subset of its result to downstream steps.
Payload Merging
The payload for a step is constructed by merging three layers:
- Trigger Payload: The data provided when the workflow was started.
- Step Payload: Static data defined in the workflow step configuration.
- Parent Outputs: The results (optionally transformed) of all direct dependencies.
Failure Policies
The on_failure field determines the workflow's behavior when a step fails:
fail_workflow(default): The entire workflow run transitions tofailed.skip_dependents: The failed step's dependents are marked asskipped, but other branches continue.continue: The failure is ignored, and dependents proceed as if it succeeded.
Concurrency Control
max_concurrent_runs: Limits how many instances of the workflow can run simultaneously.max_parallel_steps: Limits the number of steps that can execute in parallel within a single workflow run.concurrency_key(step-level): Serializes steps that share the same key, even if they are otherwise ready to run.resource_class(step-level): Assigns a step to a capacity bucket (small,medium, orlarge). The scheduler respects per-class capacity limits when determining which steps to start.
Sub-workflow Nesting
Workflows can trigger other workflows using the sub_workflow step type.
- Depth Limit: The engine enforces a default maximum nesting depth of 10 to prevent infinite recursion.
- Propagation: When a sub-workflow completes, its outputs are aggregated and returned to the parent step.
Workflow Policies
Project-level governance controls that apply to all workflows in a project:
| Policy | Description |
|---|---|
max_fan_out | Maximum number of direct dependents a single step can have. Prevents explosion of parallelism. |
max_depth | Maximum DAG depth (longest path from root to leaf). Limits nesting complexity. |
forbidden_step_types | List of step types that cannot be used in workflows for this project. |
require_approval_for_deploy | When true, requires an approval step before any deploy-type step. |
Policies are managed via GET /v1/workflow-policies/{projectID} and PUT /v1/workflow-policies/{projectID}. They are enforced at workflow create, update, and trigger time.
Step Overrides
At trigger time, you can provide step_overrides to selectively enable or disable specific steps for that specific run.
For previewing the resulting execution order (after overrides), use POST /v1/workflows/{workflowID}/plan. This endpoint accepts optional step_overrides, applies them to the current workflow version, validates the resulting DAG, and returns the topological execution order, root steps, and step count.
For validating a DAG structure without persisting changes, use POST /v1/workflows/{workflowID}/dry-run. This accepts an optional steps array and returns whether the DAG is valid.
Runtime Introspection and Recovery
Beyond definition and execution, workflow runs expose runtime control APIs:
GET /v1/workflow-runs/{workflowRunID}/graph: execution graph, runnable set, and critical-path estimate.GET /v1/workflow-runs/{workflowRunID}/explain: paginated scheduler/condition decision log.POST /v1/workflow-runs/{workflowRunID}/steps/{stepRef}/retry: retry a single terminal step.POST /v1/workflow-runs/{workflowRunID}/steps/{stepRef}/replay-subtree: replay a branch from a selected step.
Version analysis APIs:
GET /v1/workflows/{workflowID}/versions/{fromVersionID}/diff/{toVersionID}GET /v1/workflows/{workflowID}/versions/{versionID}/impactPOST /v1/workflows/{workflowID}/simulate
These endpoints are intended for operations and incident response, not only workflow authoring. For detailed endpoint payloads and operational runbooks, see DAG Runtime and DAG Operations Playbook.
Workflow FSMs
Workflow Run FSM
┌─────────┐
│ pending │
└────┬────┘
│ start
v
┌────────────────┬─────────┬────────────────────┐
│ │ running │ │
│ └────┬────┘ │
│ pause │ │ ^ resume │
│ v │ │ │
│ ┌──────┴────┴┐ │
│ │ paused │ │
│ └──────┬─────┘ │
│ │ │
│ ┌──────────┬─────┴─────┬──────────┐ │
│ v v v v │
│ completed failed timed_out canceled │
└────────────────────────────────────────────────┘Step Run FSM
┌─────────┐
│ pending │
└────┬────┘
│
┌──────────────┼──────────────┐
v v v
┌─────────┐ ┌─────────┐ ┌─────────┐
│ waiting │ │ skipped │ │ canceled│
└────┬────┘ └─────────┘ └─────────┘
│ start
v
┌─────────┐
│ running │
└────┬────┘
│
┌─────┴─────┐
v v
completed failedEvent-Driven Steps
wait_for_event
Pauses the workflow until an external event arrives via API. The step creates an Event Trigger with a globally unique event_key and optional timeout. No goroutines are held — the wait is a database row.
{
"step_ref": "aml-check",
"type": "wait_for_event",
"event_key": "aml:{{payload.user_id}}",
"timeout_secs": 86400,
"depends_on": ["extract"]
}When the event arrives (via POST /v1/events/{eventKey}/send), the step completes with the event payload as output, and downstream steps can access it via {{parent_outputs.aml-check.*}}.
sleep
Durable sleep step that pauses for a specified duration:
{
"step_ref": "cooldown",
"type": "sleep",
"sleep_duration": "1h",
"depends_on": ["notify"]
}Internally creates a sleep-type event trigger. The reaper completes it when the duration expires.
Event Chaining
Steps can auto-emit events on completion using event_emit_key:
{
"step_ref": "process",
"job_id": "job_process",
"event_emit_key": "done:{{payload.batch_id}}"
}When this step completes, it auto-resolves any waiting trigger with key done:{batch_id}, enabling cross-workflow coordination.
Step-Level Retries
Each step can have its own retry configuration (retry_max_attempts, retry_backoff), allowing transient failures in one part of the DAG to be resolved without restarting the entire workflow.