Workflows

Coordinating complex tasks with Directed Acyclic Graphs (DAGs).

Workflows allow you to orchestrate multiple jobs, approvals, and sub-workflows in a structured DAG. They handle dependencies, data flow, and conditional execution.

DAG Concepts

A workflow is defined as a Directed Acyclic Graph (DAG) where:

Steps: The nodes of the graph.
Dependencies: The edges of the graph, defined by the depends_on field.
Fan-in/Fan-out: Multiple steps can depend on one (fan-out), or one step can depend on many (fan-in).

Validation guardrail: a workflow can include up to 1000 steps.

For implementation-level behavior (scheduler decisions, explainability APIs, critical-path estimates, policy enforcement, and replay controls), see DAG Runtime.

Step Types

Type	Description
`job`	Executes a standard Job.
`approval`	A human-in-the-loop gate that waits for manual approval or rejection.
`sub_workflow`	Triggers another workflow as a nested step.
`wait_for_event`	Pauses until an external event arrives via API. See Event Triggers.
`sleep`	Durable sleep — pauses for a specified duration without holding goroutines.

Workflows use a snapshotting mechanism. Every time a workflow's steps are updated, the version increments. The entire DAG definition is copied to a versioned table, ensuring that running workflows are not disrupted by changes to the definition.

Step Conditions

Steps can define a condition (JSON) that must evaluate to true before the step starts.

step_status: Check the status of a dependency (e.g., completed, failed).
step_status_in: Check a dependency against a set of statuses.
not: Invert a nested condition.
Composite Logic: Use all_of or any_of for complex boolean logic.

If a condition is false, the step is marked as skipped.

Data Flow and Templates

Workflows support dynamic data flow between steps using template variables.

Template Variable Rendering

Use {{var}} syntax with dot notation to access data:

{{payload.key}}: Access the initial workflow trigger payload.
{{parent_outputs.step_ref.key}}: Access the output of a specific parent step.

Output Transforms

Steps can define an output_transform using JSONPath (via gjson). This allows a step to export only a specific subset of its result to downstream steps.

Payload Merging

The payload for a step is constructed by merging three layers:

Trigger Payload: The data provided when the workflow was started.
Step Payload: Static data defined in the workflow step configuration.
Parent Outputs: The results (optionally transformed) of all direct dependencies.

Failure Policies

The on_failure field determines the workflow's behavior when a step fails:

fail_workflow (default): The entire workflow run transitions to failed.
skip_dependents: The failed step's dependents are marked as skipped, but other branches continue.
continue: The failure is ignored, and dependents proceed as if it succeeded.

Concurrency Control

max_concurrent_runs: Limits how many instances of the workflow can run simultaneously.
max_parallel_steps: Limits the number of steps that can execute in parallel within a single workflow run.
concurrency_key (step-level): Serializes steps that share the same key, even if they are otherwise ready to run.
resource_class (step-level): Assigns a step to a capacity bucket (small, medium, or large). The scheduler respects per-class capacity limits when determining which steps to start.

Sub-workflow Nesting

Workflows can trigger other workflows using the sub_workflow step type.

Depth Limit: The engine enforces a default maximum nesting depth of 10 to prevent infinite recursion.
Propagation: When a sub-workflow completes, its outputs are aggregated and returned to the parent step.

Workflow Policies

Project-level governance controls that apply to all workflows in a project:

Policy	Description
`max_fan_out`	Maximum number of direct dependents a single step can have. Prevents explosion of parallelism.
`max_depth`	Maximum DAG depth (longest path from root to leaf). Limits nesting complexity.
`forbidden_step_types`	List of step types that cannot be used in workflows for this project.
`require_approval_for_deploy`	When true, requires an approval step before any deploy-type step.

Policies are managed via GET /v1/workflow-policies/{projectID} and PUT /v1/workflow-policies/{projectID}. They are enforced at workflow create, update, and trigger time.

Step Overrides

At trigger time, you can provide step_overrides to selectively enable or disable specific steps for that specific run.

For previewing the resulting execution order (after overrides), use POST /v1/workflows/{workflowID}/plan. This endpoint accepts optional step_overrides, applies them to the current workflow version, validates the resulting DAG, and returns the topological execution order, root steps, and step count.

For validating a DAG structure without persisting changes, use POST /v1/workflows/{workflowID}/dry-run. This accepts an optional steps array and returns whether the DAG is valid.

Runtime Introspection and Recovery

Beyond definition and execution, workflow runs expose runtime control APIs:

GET /v1/workflow-runs/{workflowRunID}/graph: execution graph, runnable set, and critical-path estimate.
GET /v1/workflow-runs/{workflowRunID}/explain: paginated scheduler/condition decision log.
POST /v1/workflow-runs/{workflowRunID}/steps/{stepRef}/retry: retry a single terminal step.
POST /v1/workflow-runs/{workflowRunID}/steps/{stepRef}/replay-subtree: replay a branch from a selected step.

Version analysis APIs:

GET /v1/workflows/{workflowID}/versions/{fromVersionID}/diff/{toVersionID}
GET /v1/workflows/{workflowID}/versions/{versionID}/impact
POST /v1/workflows/{workflowID}/simulate

These endpoints are intended for operations and incident response, not only workflow authoring. For detailed endpoint payloads and operational runbooks, see DAG Runtime and DAG Operations Playbook.

Workflow FSMs

Workflow Run FSM

                    ┌─────────┐
                    │ pending │
                    └────┬────┘
                         │ start
                         v
   ┌────────────────┬─────────┬────────────────────┐
   │                │ running │                     │
   │                └────┬────┘                     │
   │          pause │    │    ^ resume              │
   │                v    │    │                     │
   │              ┌──────┴────┴┐                    │
   │              │   paused   │                    │
   │              └──────┬─────┘                    │
   │                     │                          │
   │    ┌──────────┬─────┴─────┬──────────┐         │
   │    v          v           v          v         │
   │ completed   failed    timed_out   canceled     │
   └────────────────────────────────────────────────┘

Step Run FSM

                    ┌─────────┐
                    │ pending │
                    └────┬────┘
                         │
          ┌──────────────┼──────────────┐
          v              v              v
     ┌─────────┐    ┌─────────┐    ┌─────────┐
     │ waiting │    │ skipped │    │ canceled│
     └────┬────┘    └─────────┘    └─────────┘
          │ start
          v
     ┌─────────┐
     │ running │
     └────┬────┘
          │
    ┌─────┴─────┐
    v           v
 completed    failed

Event-Driven Steps

wait_for_event

Pauses the workflow until an external event arrives via API. The step creates an Event Trigger with a globally unique event_key and optional timeout. No goroutines are held — the wait is a database row.

{
  "step_ref": "aml-check",
  "type": "wait_for_event",
  "event_key": "aml:{{payload.user_id}}",
  "timeout_secs": 86400,
  "depends_on": ["extract"]
}

When the event arrives (via POST /v1/events/{eventKey}/send), the step completes with the event payload as output, and downstream steps can access it via {{parent_outputs.aml-check.*}}.

sleep

Durable sleep step that pauses for a specified duration:

{
  "step_ref": "cooldown",
  "type": "sleep",
  "sleep_duration": "1h",
  "depends_on": ["notify"]
}

Internally creates a sleep-type event trigger. The reaper completes it when the duration expires.

Event Chaining

Steps can auto-emit events on completion using event_emit_key:

{
  "step_ref": "process",
  "job_id": "job_process",
  "event_emit_key": "done:{{payload.batch_id}}"
}

When this step completes, it auto-resolves any waiting trigger with key done:{batch_id}, enabling cross-workflow coordination.

Step-Level Retries

Each step can have its own retry configuration (retry_max_attempts, retry_backoff), allowing transient failures in one part of the DAG to be resolved without restarting the entire workflow.

Was this page helpful?