Strait is designed as a distributed job orchestration service that uses PostgreSQL as its primary state store and message queue.Documentation Index
Fetch the complete documentation index at: https://docs.strait.dev/llms.txt
Use this file to discover all available pages before exploring further.
System Overview
- api: Handles HTTP requests, job management, and triggering.
- worker: Runs executor, scheduler, and background maintenance tasks.
- all: Runs both API and worker components in a single process.
errgroup and signal handling to ensure in-flight jobs are completed and resources are released cleanly.
Component Architecture
The core logic resides ininternal/ directory, organized into specialized packages.
api
Implements Chi HTTP router, middleware chain, and authentication (Internal Secret + Run Token). Handles SSRF validation, health scoring, debug bundles, and DLQ management.
worker
Handles pond/v2 executor pool with bounded queue backpressure, HTTP dispatch, smart retries, and state transitions. Exposes OTel metrics for pool observability.
workflow
Manages DAG execution, Kahn’s algorithm validation, atomic fan-in logic, step conditions, and sub-workflow nesting.
queue
Implements Postgres-backed queue using
SKIP LOCKED for atomic job claiming and batch dequeue operations.store
Provides raw SQL access via
pgx/v5 with instrumented OTel spans, optimistic locking, and interface segregation.scheduler
Background components for Cron, delayed polling, stale run reaping, retention, and cost budget enforcement.
compute
Container runtime abstraction for managed execution. Fly Machines and Docker backends, warm machine pool, machine lifecycle (Create, Start, Wait, Stop, Destroy), cost estimation.
clickhouse
Optional analytics backend. Async batch export of run events, analytics summaries, and compute usage to ClickHouse for time-series reporting.
webhook
Async webhook delivery with retry policies, circuit breaker, HMAC-SHA256 signing, and dead letter tracking.
cdc
Change data capture via Sequin. Polls database changes for real-time event streaming to external consumers.
Technology Stack
| Component | Technology | Role |
|---|---|---|
| Runtime | Go 1.26 | Single binary, native concurrency, instant cold start |
| Database | PostgreSQL 18 | State store + message queue via SELECT FOR UPDATE SKIP LOCKED |
| Cache/PubSub | Redis 8 | Pub/sub for SSE streaming, CDC event distribution |
| SQL Driver | pgx/v5 | Raw SQL with connection pooling and prepared statements |
| HTTP Router | Chi/v5 | Composable middleware, low-allocation request handling |
| Concurrency | sourcegraph/conc | Panic-safe goroutine pools with context-aware cancellation |
| Worker Pool | alitto/pond/v2 | Bounded queue backpressure with Prometheus metrics |
| CDC | Sequin | Postgres WAL streaming through Redis channels |
| Observability | OpenTelemetry | Vendor-neutral tracing (Jaeger/Tempo) and Prometheus metrics |
| Auth Tokens | golang-jwt/v5 | HS256 JWT for SDK run tokens (60s expiry) |
| Scheduling | robfig/cron/v3 | 5-field cron expressions with timezone support |
| Migrations | golang-migrate/v4 | Embedded SQL migrations via go:embed |
| Utilities | samber/lo | Type-safe generic collection operations |
| Testing | google/go-cmp, testcontainers-go | Structural diffs, real Postgres/Redis in integration tests |
Queue Mechanics
Strait uses PostgreSQL as a message queue to minimize operational complexity and ensure transactional consistency.SKIP LOCKED Dequeue
Multiple workers can poll the same job_runs table concurrently.SELECT FOR UPDATE SKIP LOCKED locks only the rows that will be processed by this worker and skips rows already locked by other workers. This provides:
- Lock-free concurrency: Workers don’t block each other. Each worker claims rows it can process immediately.
- Exactly-once semantics: A row locked by one worker cannot be claimed by another, preventing duplicate job execution.
- ACID guarantees: The dequeue and status update happen in a single transaction. If the worker crashes after claiming but before processing, the transaction rolls back and the run returns to queued state.
Batch Dequeue (DequeueN)
Using a Common Table Expression (CTE), workers can claim up to N rows in a single database round-trip. The CTE selects rows withSKIP LOCKED and returns their IDs, then the outer UPDATE transitions them to dequeued. This amortizes network latency across multiple jobs.
Priority and Ordering
The queue respects two ordering dimensions:- Priority: Jobs with higher
priorityvalues are dequeued first (ORDER BY priority DESC). - FIFO within priority: Jobs with the same priority follow first-in-first-out ordering (
ORDER BY created_at ASC).
Delayed and Retry Gating
The dequeue query includes filters forscheduled_at and next_retry_at columns. Jobs are only dequeued when:
scheduled_at IS NULL OR scheduled_at <= NOW(): Delayed jobs wait until their scheduled time.next_retry_at IS NULL OR next_retry_at <= NOW(): Retrying jobs wait until their backoff period expires.
Visibility Timeout
The transition fromqueued to dequeued acts as the visibility timeout mechanism. If a worker crashes after claiming but before processing, the stale reaper identifies runs stuck in dequeued state with no recent heartbeat and transitions them back to queued for reprocessing.
Finite State Machine (FSM)
The lifecycle of a job run is managed by an FSM with 13 possible states.Valid Transitions
delayed → queued, canceled, expired
Delayed jobs move to queued when scheduled time arrives. Can be canceled or expired before execution.
queued → dequeued, canceled, expired
Dequeued by worker via SKIP LOCKED, or manually canceled, or expired if TTL elapses.
dequeued → executing, queued, canceled, system_failed
Starts execution, returns to queue if worker fails, or system failure during dispatch.
executing → completed, failed, timed_out, crashed, canceled, waiting, queued, system_failed, dead_letter
Terminal states indicate job outcome. Waiting state for SDK checkpoints. Queued for retry if attempt < max_attempts. Dead letter if all retries exhausted.
waiting → executing, completed, failed, canceled, timed_out
Waiting state for SDK-controlled execution (checkpoints, continuation). Returns to executing or terminal.
Optimistic Locking
State transitions use optimistic locking withWHERE status = $from clause in the UPDATE statement. Multiple workers can attempt the same transition simultaneously, but only one will match the row and succeed. The others receive a 0-row update result and must retry the entire transition. This prevents race conditions without distributed locks or version numbers.
Authentication
The system implements two distinct authentication schemes:- Internal Secret
- Run Token
Used for management API (
/v1/*). Requires Authorization: Bearer <INTERNAL_SECRET> header. Simple string comparison provides fast auth for job CRUD, triggering, and run management.Execution Lifecycle
Trigger
API receives POST to
/v1/jobs/{id}/trigger. Cost budget check is performed. Create job_run with status=queued or delayed. Generate JWT run token.Dequeue
Worker calls
DequeueN, claiming runs via SKIP LOCKED and updating status to dequeued. Optimistic lock ensures only one worker claims each run.Execute
Execution submitted to pond/v2 worker pool using
context.WithoutCancel. This ensures in-flight jobs complete even if Strait process shuts down. Background goroutine sends heartbeats to job_runs.heartbeat_at.Resolve Environment
If job has
environment_id, executor resolves environment variables. Check for ENDPOINT_URL override and validate with SSRF protection (blocks private/loopback IPs).Dispatch
Executor sends HTTP POST to resolved endpoint URL. Headers include
X-Run-ID, X-Job-ID, X-Attempt, and X-SDK-Token. Payload includes job payload and metadata.Execution Tracing
Capture timing breakdown: queue_wait_ms, dequeue_ms, connect_ms, ttfb_ms, transfer_ms, total_ms. Stored as JSONB in
execution_trace column.Result Handling
2xx response → status=completed, store result. Non-2xx → schedule retry using job’s retry_strategy or mark failed. Timeout → retry or timed_out. Adaptive timeout adjustment applied.
DLQ
If attempt >= max_attempts, transition to
dead_letter instead of failed. Dead-lettered runs excluded from normal listings.Webhook
Upon reaching terminal state, send signed POST to job’s
webhook_url. Job-run webhooks are retried up to 3 times with exponential backoff (1s, 5s delays). Event-trigger webhooks use persistent delivery via the webhook_deliveries table with up to 5 retries and exponential backoff, surviving process restarts. See Webhooks for the full delivery lifecycle.Workflow Engine
The workflow engine manages execution of Directed Acyclic Graphs (DAGs) where each node represents a job execution, human approval, or nested sub-workflow.DAG Validation
Before execution, the workflow definition is validated using Kahn’s algorithm for cycle detection. The algorithm:- Identifies all steps with no dependencies (roots).
- Removes roots from the graph and identifies newly exposed roots.
- Repeats until all steps are processed.
Atomic Fan-In
When multiple parent steps complete concurrently, their child step must start only when all dependencies are satisfied. The engine uses adeps_completed counter on each workflow_step_run. As each parent completes, an UPDATE ... RETURNING query atomically increments the counter and identifies runs where deps_completed == deps_required. This ensures exactly one parent triggers the child, even with concurrent completions.
Step-level concurrency groups (concurrency_key) are enforced during scheduling. Steps that share the same key are serialized even when dependency fan-in says they are ready.
Payload Merging
Step payloads are constructed by merging three layers:- Trigger payload: Initial payload provided when workflow is triggered.
- Step payload: Static payload defined on the step in workflow definition.
- Parent outputs: Outputs from all direct dependency steps, keyed by
step_ref.
{{parent_outputs.stepRef.field}} allow accessing nested fields. The engine preserves native JSON types—arrays, objects, strings—so no manual serialization is needed.
Template Rendering
The template engine renders{{variable}} placeholders using dot notation. Example: {{parent_outputs.extractData.id}} accesses the id field of the extractData object from a parent step’s output. This provides flexible data flow while maintaining type safety.
Output Transform
Steps can define anoutput_transform using JSONPath expressions. Before persisting a step’s output, the engine extracts the matching subset of the job result. This reduces storage overhead and simplifies downstream steps that only need specific fields. Example: $.data.items[*].id extracts all IDs from an array.
Step Types
- job
- approval
- wait_for_event
- sleep
- sub_workflow
Executes a regular job. The workflow creates a new
job_run and waits for it to reach a terminal state before marking the step as completed.Workflow FSM
The workflow engine manages two FSMs—one for workflow runs, one for step runs. Workflow Run States: pending → running → paused → (completed | failed | timed_out | canceled) Step Run States: pending → (waiting | skipped | canceled) → running → (completed | failed) Callback handlerOnJobRunTerminal drives workflow progression when any job run completes. This single entry point handles all terminal states (completed, failed, timed_out, crashed, canceled, system_failed, dead_letter) and determines next workflow actions—start ready children, apply failure policies, propagate sub-workflow outputs. Event-driven callbacks OnEventReceived, OnStepCompleted, and OnStepFailed handle event trigger resolution, sleep completion, and timeout/cancellation failures respectively. Event chaining is handled by tryEmitEvent, which auto-resolves waiting triggers when steps with event_emit_key complete.
Policy Enforcement and Governance
Workflow governance is managed with per-project policies stored inworkflow_policies and exposed via GET|PUT /v1/workflow-policies/{projectID}.
Policy checks run at workflow lifecycle boundaries:
- Create (
POST /v1/workflows) - Update (
PATCH /v1/workflows/{workflowID}) - Trigger (
POST /v1/workflows/{workflowID}/trigger)
max_fan_out: maximum number of direct dependents per stepmax_depth: maximum DAG depth (computed from dependency chains)forbidden_step_types: denylisted step types for the projectrequire_approval_for_deploy: requires at least one approval step for deploy/release-like DAGs
Scheduling Decisions and Explainability
The progression scheduler records key routing decisions inworkflow_step_decisions for post-mortem and operator visibility. Decisions include:
scheduler: blocked by workflowmax_parallel_stepsconcurrency: blocked byconcurrency_keyserializationresource: blocked byresource_classquotacondition: skipped because condition evaluated to false
GET /v1/workflow-runs/{workflowRunID}/explain with optional step_ref and decision_type filters.
Runtime Graph and Critical Path Estimation
GET /v1/workflow-runs/{workflowRunID}/graph exposes the live DAG execution graph for a run, including:
- node execution state (
pending|waiting|running|completed|failed|...) - dependency edges and roots
- runnable set (deps satisfied but not running)
- critical-path estimate fields:
critical_pathcritical_path_estimate_mscritical_path_remaining_ms
Runtime Recovery Controls
Operators can recover or replay portions of a DAG without retriggering the entire workflow:POST /v1/workflow-runs/{workflowRunID}/steps/{stepRef}/retryresets a terminal step topendingand resumes progression.POST /v1/workflow-runs/{workflowRunID}/steps/{stepRef}/replay-subtreeresets the selected step and all descendants topending, then resumes progression.
Data Model
The system uses PostgreSQL with the following primary tables (66 migrations):jobs
job_runs
workflows & workflow_runs
workflow_steps & workflow_step_runs
workflow governance and explainability
event_triggers
Key Indexes
Partial and composite indexes ensure query performance at scale:idx_runs_queue:(status, next_retry_at, priority DESC, created_at ASC) WHERE status = 'queued'— Composite dequeue lookup covering priority ordering and retry gatingidx_runs_priority:(priority DESC, created_at ASC) WHERE status = 'queued'— Priority-ordered FIFO fallbackidx_runs_idempotency:(job_id, idempotency_key) WHERE idempotency_key IS NOT NULLUNIQUE — Deduplicationidx_runs_heartbeat:(heartbeat_at) WHERE status = 'executing'— Stale run detectionidx_runs_retry:(next_retry_at) WHERE status = 'queued' AND next_retry_at IS NOT NULL— Retry gatingidx_runs_dead_letter:(status) WHERE status = 'dead_letter'— DLQ queriesidx_runs_continuation:(continuation_of)— Lineage tree traversalidx_runs_debug:(debug_mode) WHERE debug_mode = TRUE— Debug mode queriesidx_event_triggers_status_expires:(status, expires_at) WHERE status = 'waiting'— Reaper timeout lookupidx_event_triggers_project_status:(project_id, status)— Project-scoped list queriesidx_event_triggers_prefix:(event_key text_pattern_ops)— Prefix matching for batch resolutionidx_workflow_runs_status_expires:(status, expires_at) WHERE expires_at IS NOT NULL— Reaper timeout lookup for workflow runsidx_workflow_runs_status_finished:(status, finished_at) WHERE finished_at IS NOT NULL— Retention deletion queriesidx_step_runs_workflow_run_status:(workflow_run_id, status)— Step run queries filtered by run and statusidx_step_decisions_step_run_id:(step_run_id)— CASCADE deletes on workflow step decisionsidx_job_runs_active_by_job:(job_id) WHERE status IN ('queued', 'dequeued', 'executing')— Dequeue active run counting
Design Decisions
UUIDv7 Primary Keys
UUIDv7 combines a timestamp (first 48 bits) with random bits. This provides:- Natural sort order: Rows are approximately ordered by creation time without a separate index column
- No sequence contention: Auto-increment integers create hotspots during concurrent inserts
- Distributed-friendly: Can be generated without coordination across multiple nodes
Raw SQL Over ORM
Hand-written SQL viapgx/v5 provides complete control over query execution plans. For operations like SELECT FOR UPDATE SKIP LOCKED and CTE-based batch dequeue, the exact query structure matters. ORMs may generate suboptimal queries or fail to support these PostgreSQL-specific features efficiently.
The trade-off is verbosity—SQL queries require manual maintenance—but the benefits include predictable performance, transparent optimization, and the ability to use database-specific features without fighting the abstraction layer.
Embedded Migrations
Usinggo:embed, migration files are compiled into the binary. This eliminates the need to bundle migration files separately or mount them in containers. The binary is truly self-contained—no external file dependencies at startup.
Migrations run automatically on startup, checking schema_migrations table for the current version and applying pending migrations in a transaction. This ensures database schema always matches application code.
Single Binary, Three Modes
Strait compiles to a single executable that can run in three modes:- api: HTTP server only, for scaling API endpoints independently
- worker: Executor and scheduler only, for scaling job processing independently
- all: Combined mode for development or small deployments
context.WithoutCancel for In-flight Jobs
When a job run is dispatched, it uses a child context created withcontext.WithoutCancel(parent). This means if the parent context is canceled (Strait shutdown), the goroutine continues execution until the HTTP request completes. Combined with errgroup, this ensures graceful shutdown—in-flight jobs finish before the process exits.
Engine Capabilities
Engine capabilities described in this document are built in and available by default.Micro-USD Cost Tracking
AI costs are tracked in micro-USD (1/1,000,000 USD) stored asBIGINT. This avoids floating-point precision issues that accumulate with repeated additions. Integer arithmetic provides exact financial calculations down to $0.000001 resolution.
Dead Letter as FSM State
The DLQ is modeled as a first-class FSM state (dead_letter) rather than a separate table. This design choice:
- Allows standard run queries to include or exclude dead-lettered runs via status filter
- Enables DLQ replay using the same state transition mechanism as retries
- Maintains FSM invariants—all state transitions go through the same validation logic
Transactional Event Sends
When an event resolves a workflow step trigger, the trigger status update and step completion are wrapped in a single database transaction viarunInTx. If either operation fails, both roll back atomically. This prevents the inconsistency where a trigger is marked received but the step remains waiting. The same pattern wraps workflow create/update/clone, SDK wait-for-event, and workflow run retry operations.
Fan-in progression (starting downstream steps) runs outside the transaction because it involves queue operations and multiple tables. The reconciliation reaper acts as a safety net for edge cases where progression fails.
Distributed Reaper with Advisory Locks
In multi-instance deployments, the reaper usespg_try_advisory_lock to ensure single-leader execution. Each reaper cycle:
- Attempts to acquire session-level advisory lock
0x5374726169745265(“StraitRe”) - If acquired: runs all reaper passes, then releases the lock
- If not acquired: skips the cycle entirely
Interface Segregation
Store packages define small, focused interfaces (JobStore, RunStore, WorkflowStore, etc.) that only include methods needed by callers. This improves:
- Testability: Mocks only implement required methods, not entire store
- Decoupling: Changes to unrelated store methods don’t require recompiling callers
- Encapsulation: Internal store implementation details hidden from consumers
WithTx helper accepts any DBTX interface (connection pool or transaction), enabling callers to write transaction-aware code without hard-coded transaction handling.
SSRF Validation in Both Layers
Endpoint URL validation happens in both the API server (job creation/update) and the worker (environment override resolution). This defense-in-depth prevents bypassing SSRF checks via environment variable injection. Blocked address ranges include:- Private IPv4 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
- Loopback addresses (127.0.0.0/8, ::1/128)
- Link-local addresses (169.254.0.0/16)
- CGNAT addresses (100.64.0.0/10)
- IPv6 Unique Local Addresses (fc00::/7)
Performance Optimizations
Workflow Context Caching
During step callback processing, the workflow engine loads the workflow run, step definitions, and step-by-ref index. These are immutable for the lifetime of a single callback. ThewfCtx struct caches this data after the first load, eliminating 10-15 redundant database queries per step completion in multi-step workflows.
Transaction Safety via runInTx
Multi-step write operations (workflow creation, event trigger registration, retry runs) use arunInTx abstraction that wraps store calls in a database transaction. In production, this delegates to store.WithTx; in tests, it passes through to the mock store directly. This provides atomicity without requiring mock implementations of pgx.Tx.
Key operations wrapped in transactions:
- Workflow create/update/clone (version snapshot + steps)
- SDK wait-for-event (run status + event trigger creation)
- Workflow run retry (new run + step runs)
- Event send (trigger resolution + step completion)
Bounded Reaper Queries
All reaper and poller queries include aLIMIT clause (default 1000) to prevent unbounded result sets during incident recovery or backlog processing. The retention worker uses CTE-based batch deletes with FOR UPDATE SKIP LOCKED (batch size 5000) instead of unbounded DELETE statements.
Reaper N+1 Elimination
The workflow timeout reaper previously issued 2+2N database calls per timed-out workflow run (fetching step runs individually, then canceling each). This was replaced with 4 bulk calls per run using existingCancelNonTerminalStepRuns and CancelJobRunsByWorkflowRun methods.
Bulk Operation Batching
Bulk cancel (POST /v1/runs/bulk-cancel) was rewritten from 4N queries to 3 queries: batch fetch via GetRunsByIDs, batch cancel via BulkCancelRuns (single UPDATE with status filter), and batch child cancel via CancelChildRunsByParentIDs.
Bulk trigger (POST /v1/jobs/{jobID}/trigger/bulk) pre-computes project quota counts once before the loop instead of per-item, eliminating 2 redundant queries per item.
Graceful Shutdown
The shutdown sequence protects in-flight work:Signal Capture
signal.NotifyContext captures SIGINT and SIGTERM. Context cancellation propagates through errgroup.Worker Pool Shutdown
Worker polling loop exits on context cancellation.
pool.Shutdown() blocks until all in-flight goroutines complete.HTTP Server Shutdown
Server shuts down with 10-second grace period to drain in-flight HTTP requests.
context.WithoutCancel for job dispatch ensures that jobs already executing when shutdown begins are allowed to complete and record their results. This prevents partial execution and data loss during deployments.
Observability
Distributed Tracing
OpenTelemetry spans cover critical operations:- HTTP requests:
otelchimiddleware traces request duration, path, and status code - Database queries: Each store method is instrumented with span attributes (query type, table)
- Queue operations: Dequeue, enqueue, and retry operations traced
- Workflow execution: Step completion, fan-in triggers, sub-workflow propagation traced
Prometheus Metrics
Metrics exposed atGET /metrics:
strait_run_transitions_total: Counter for each FSM state transition (labeled by from_status, to_status)strait_dispatch_duration_seconds: Histogram of HTTP dispatch latency (labeled by job_id, outcome)strait_dequeue_duration_seconds: Histogram of queue polling latencystrait_worker_pool_active: Gauge of currently active worker goroutinesstrait_worker_pool_queued: Gauge of tasks waiting in pool buffer
Structured Logging
Structured JSON logging vialog/slog captures events with consistent fields:
run_id: Job run identifier for correlationjob_id: Job identifier for filteringstatus: FSM state transitionerror: Error details with stack traceduration_ms: Operation timing
Concurrency Model
Multiple worker processes can run concurrently, each polling the same PostgreSQL table. Coordination is handled entirely bySELECT FOR UPDATE SKIP LOCKED:
- No distributed coordination needed: Workers don’t know about each other. Database locks provide coordination.
- Scalability: Add worker processes to increase dequeue throughput. No leader election or worker registry.
- Safety: Optimistic locking prevents race conditions during state transitions. Workers retry on conflict.
alitto/pond/v2 which provides:
- Bounded queue: Buffer channel limits task submissions, preventing unbounded memory growth
- Backpressure: When queue is full, task submission blocks, signaling workers are at capacity
- Graceful shutdown: Pool waits for all in-flight tasks before returning
- Metrics: Exposes pool metrics for capacity planning
Scheduler Background Tasks
The scheduler package runs several background goroutines:- Cron Scheduler
- Delayed Poller
- Stale Reaper
- Event Trigger Reaper
- Retention Worker
Uses
robfig/cron/v3 to maintain job schedules. Every second, checks for jobs whose next execution time has arrived and enqueues new runs with triggered_by = 'cron'. Supports cron expressions, timezones, and execution windows.Health Endpoints
Strait exposes health and readiness endpoints for infrastructure orchestration:GET /health(liveness): Returns 200 if the process is alive. Used by Kubernetes liveness probes.GET /health/ready(readiness): Returns 200 when all dependencies are ready (database, Redis, worker pool). Returns 503 with a JSON body detailing which component is not ready. Used by Kubernetes readiness probes to prevent routing traffic to uninitialized instances.
apps/strait/internal/health/registry.go) where each subsystem registers its health check function. This enables extensible readiness verification without modifying the health endpoint handler.
Resilience Patterns
Strait implements several resilience patterns to protect against cascading failures:- Circuit Breaker: Per-endpoint circuit breaker tracks consecutive failures. When a threshold is reached, the circuit opens and subsequent dispatches are short-circuited to prevent overwhelming a failing endpoint. State is persisted in the
endpoint_circuit_statePostgreSQL table. - Concurrency Limiting: Per-job
max_concurrencycap is enforced at dequeue time via a COUNT subquery in the SKIP LOCKED query. - Rate Limiting: Per-job
rate_limit_maxandrate_limit_window_secscontrol dispatch throughput. - Retry Strategies: Configurable per-job with exponential, linear, fixed, or custom per-attempt delays. All strategies apply +/-20% jitter to prevent thundering herd.
Managed Container Execution
In addition to HTTP dispatch, Strait supports managed execution where jobs run inside ephemeral Fly Machines. This is designed for long-running tasks, custom runtimes, GPU workloads, and isolated execution environments.Dispatch Priority
When the executor picks up a managed run, it resolves a machine through a three-tier priority system:Wait(machineID, timeout).
Machine Lifecycle
Managed machines are created withauto_destroy=false so Fly does not garbage-collect them on stop. This enables reuse across the warm pool and pause/resume paths. The Start() method performs a three-step restart: GET current config, PUT updated environment variables, POST start.
Warm Machine Pool
TheMachinePool is an in-memory cache keyed by "{imageURI}:{region}". After a clean exit (exit code 0 and SDK reported completion), the machine is returned to the pool instead of being destroyed. Subsequent dispatches for the same image+region can acquire a stopped machine and restart it in 1-2 seconds instead of the 5-15 second cold create time.
Pool operations:
- Acquire: Pops the oldest machine for a given key (FIFO).
- Release: Pushes a machine. Evicts oldest if at capacity (configurable
maxPer, default 3). - Prune: Background goroutine removes machines idle for more than 10 minutes every 5 minutes.
- Drain: On shutdown, all pooled machines are destroyed.
Pause and Resume
The pause/resume flow preserves themachine_id on the run row so that the stopped Fly Machine can be restarted on resume instead of cold-creating a new one:
- Pause: API transitions
executing → paused, callsStop()on the machine. Themachine_idis preserved. - Resume: API transitions
paused → queuedwithout clearingmachine_id. - Re-dispatch: Executor sees
run.MachineID != "", callsStart(machineID, freshEnv). - Fallback: If the machine was deleted (auto-destroy, timeout, manual), falls back to cold create.
Error Handling
ErrMachineGone(404): Machine was deleted. Caller falls back to Create.- Retryable errors (429, 500, 503, connection refused): Run is snoozed with backoff.
- Fatal errors (422): Run transitions to
system_failed. - Snooze path: Machine is stopped before snoozing to prevent orphaned running containers.
- Cancel race: If Stop fails during cancel detection, Destroy is called as fallback.
Compute Cost Tracking
Every managed run records wall-clock compute usage in therun_compute_usage table (run ID, project ID, machine preset, duration in seconds, cost in micro-USD). Daily budget enforcement checks the project quota before creating a machine. The budget monitor fires a warning webhook at 80% utilization.
For the full managed execution reference, see Managed Execution.
ClickHouse Analytics
Strait supports an optional ClickHouse backend for high-performance analytics over run metrics, cost data, and event timelines.Architecture
PostgreSQL remains the source of truth for all state. ClickHouse receives denormalized event data via an async batch pipeline. This separation keeps the hot path (queue, FSM transitions) on Postgres while offloading analytical queries to a column-oriented store optimized for aggregation.Tables
| Table | Purpose | Engine | TTL | Partition |
|---|---|---|---|---|
run_events | Run lifecycle events with timing | MergeTree | 365 days | Monthly |
run_analytics | Denormalized run summaries | MergeTree | 365 days | Monthly |
compute_usage | Managed execution cost records | MergeTree | 90 days | Monthly |
Data Pipeline
Theclickhouse.Engine maintains an in-memory buffer that flushes to ClickHouse on two triggers: batch size threshold (default 100) or flush interval (default 10 seconds). Backpressure is handled by capping the buffer at 10x batch size and dropping oldest entries. Graceful shutdown flushes remaining entries.
Configuration
| Variable | Default | Description |
|---|---|---|
CLICKHOUSE_ENABLED | false | Enable ClickHouse export |
CLICKHOUSE_URL | — | ClickHouse connection string |
CLICKHOUSE_DATABASE | strait | Target database |
CLICKHOUSE_BATCH_SIZE | 100 | Flush threshold |
CLICKHOUSE_FLUSH_INTERVAL | 10s | Max time between flushes |
Related Documentation
- Introduction — Product overview and key features
- Quick Start — Getting started guide
- Jobs — Job definitions and configuration
- Runs — Run lifecycle and FSM
- Workflows — DAG orchestration details
- DAG Runtime — Detailed DAG scheduler behavior, explainability, policy enforcement, and recovery controls
- DAG Operations Playbook — On-call triage, recovery, and hardening runbook for DAG incidents
- Managed Execution — Fly Machines container runtime, warm pool, pause/resume
- ClickHouse Analytics — Optional analytics backend for run metrics and cost tracking
- Event Triggers — Durable external event waits
- CDC — Real-time change data capture
- Resilience — Circuit breaker, rate limiting, and graceful degradation
- Security — SSRF protection, HMAC signing, and access control
Performance Characteristics
Queue Throughput
Strait’s queue is backed by PostgreSQLSELECT FOR UPDATE SKIP LOCKED, which provides lock-free concurrent dequeue. Under typical workloads, a single instance can sustain 1000+ dequeue operations per second with 32 concurrent workers.
Analytics Query Cost
The analytics endpoint scans thejob_runs table for a configurable time window (1-720 hours). Query cost scales linearly with the number of runs in the window. For datasets exceeding 1M runs in the analysis period, consider reducing period_hours or implementing materialized view pre-aggregation.