Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.strait.dev/llms.txt

Use this file to discover all available pages before exploring further.

What is Strait?

Strait is an open-source platform that runs your background jobs, orchestrates multi-step workflows, and manages AI agent pipelines. You define what needs to happen. Strait handles the retries, scheduling, dependencies, and monitoring. One Go binary. One PostgreSQL database. No Redis required. No RabbitMQ. No SQS. Just deploy and start running jobs.

Why teams switch to Strait

Most teams start with a simple queue and a retry loop. Then they need scheduling. Then workflow dependencies. Then approval gates. Then cost tracking for AI agents. Before long, they’re maintaining five different systems that don’t talk to each other. Strait replaces that patchwork with one system:
  • Jobs fail gracefully. Every run follows a 13-state lifecycle. Retries use exponential backoff, fixed delays, or custom sequences. Exhausted runs go to a dead letter queue for review.
  • Workflows run as DAGs. Define step dependencies, approval gates, sub-workflows, event waits, and sleep delays. Strait validates the graph and runs it.
  • Everything is observable. See exactly where every run is, why it failed, how long it took, and what it cost. Real-time streaming, not polling.
  • Five SDKs, same architecture. TypeScript, Python, Go, Ruby, and Rust. Pick your language, define your jobs, run them anywhere.
  • Self-host or use managed. Deploy on your infrastructure with just PostgreSQL, or use the managed platform at app.strait.dev.

Key Capabilities

13-State FSM

Robust lifecycle management—queued, executing, completed, failed, timed_out, dead_letter—ensures every job run is tracked correctly.

Workflow DAGs

Directed Acyclic Graphs with fan-in/fan-out, step conditions, template variables, output transforms, human approval gates, and durable event waits.

Smart Retry

Exponential, linear, fixed, or custom per-attempt delays with ±20% jitter. Prevents thundering herd and handles transient failures gracefully.

Cost Budgets

Track AI model usage with micro-USD precision. Enforce per-run and daily project limits to control costs.

Event Triggers

Pause execution and wait for external events—approvals, webhooks, third-party callbacks—for days or weeks without holding goroutines. Durable, database-backed waits with timeout support.

Real-Time CDC

Postgres WAL change capture via Sequin. No polling required—your applications react instantly when jobs, workflows, or runs change.

SDK Endpoints

Specialized endpoints for job executors—logging, heartbeats, progress updates, checkpoints, continuation, and child job spawning.

Webhooks

HMAC-SHA256 signed webhooks with automatic retries and dead letter queue on delivery failure.

Health Scoring

Aggregate metrics over configurable time windows. Success rate, timeout rate, crash rate, and latency stability—at-a-glance job reliability.

Architecture Overview

                    ┌──────────────────────────────────┐
                    │           API Server              │
                    │  (Chi router + middleware)         │
                    │                                    │
                    │  /v1/jobs/* ── Job CRUD + Health   │
                    │  /v1/workflows/* ── DAG CRUD       │
                    │  /v1/workflow-runs/* ── Run mgmt   │
                    │  /v1/jobs/{id}/trigger ── Enqueue  │
                    │  /v1/runs/* ── Run mgmt + DLQ     │
                    │  /v1/events/* ── Event triggers    │
                    │  /sdk/v1/* ── SDK (JWT auth)      │
                    │  /metrics ── Prometheus            │
                    └──────────┬───────────────────────┘
                               │ Enqueue (budget check)
                               v
                    ┌──────────────────────────────────┐
                    │         PostgreSQL                 │
                    │                                    │
                    │  jobs ── job definitions           │
                    │  job_runs ── run state + queue     │
                    │  workflows ── DAG definitions      │
                    │  workflow_runs ── workflow state   │
                    │  event_triggers ── durable waits   │
                    │  run_events ── log entries         │
                    │  run_usage ── AI cost tracking     │
                    │  environments ── endpoint config   │
                    │  project_quotas ── budget limits   │
                    │                                    │
                    │  Queue: SELECT FOR UPDATE          │
                    │         SKIP LOCKED                │
                    └──────────┬───────────────────────┘
                               │ Dequeue
                               v
                    ┌──────────────────────────────────┐
                    │         Worker Executor            │
                    │                                    │
                    │  Poll ─> DequeueN(available)       │
                    │  Workflow Engine:                  │
                    │  - DAG Validation (Kahn's)         │
                    │  - Atomic Fan-in (UPDATE...RET)    │
                    │  - Condition Evaluation            │
                    │  - Template Rendering              │
                    │  - Sub-workflow Nesting            │
                    │                                    │
                    │  Job Execution:                    │
                    │  - Resolve ─> Env override + SSRF  │
                    │  - Execute ─> HTTP POST to endpt   │
                    │  - Retry ─> Smart strategy select  │
                    │  - Trace ─> Execution timing       │
                    │  - DLQ ─> Dead letter on exhaust   │
                    └──────────┬───────────────────────┘
                               │ Webhook / PubSub
                               v
                    ┌──────────────────────────────────┐
                    │  Scheduler         │  Redis       │
                    │  - Cron ticker     │  - PubSub    │
                    │  - Delayed poller  │  - SSE       │
                    │  - Stale reaper    │  streaming   │
                    │  - Retention       │              │
                    └──────────────────────────────────┘
Strait runs in three modes:
api: Handles HTTP requests, job management, and triggering. Scale horizontally for API throughput.
worker: Runs executor, scheduler, and background maintenance. Scale horizontally for job processing throughput.
all: Combined mode for development or small deployments. Single binary, single process.

Why Strait?

1

Zero External Dependencies

No RabbitMQ. No SQS. No Kafka. PostgreSQL handles queuing with SELECT FOR UPDATE SKIP LOCKED—lock-free concurrent workers without operational overhead. Single binary includes everything—no runtime dependencies to install.
2

Production-Grade Concurrency

Go goroutines provide parallel job execution without external coordination. Worker pool with bounded backpressure prevents memory exhaustion during traffic spikes. Structured concurrency patterns (sourcegraph/conc) ensure panic recovery and graceful shutdown.
3

Built for AI Workloads

SDK endpoints designed for AI agents—logging, heartbeats, progress checkpoints, continuation for long-running workflows, and child job spawning. Cost budgets track token usage with micro-USD precision. Debug bundles aggregate execution data for troubleshooting.
4

Workflow Orchestration

Complex DAGs with step conditions, output transforms, template variables, and human approval gates. Atomic fan-in handles concurrent parent completions safely. Sub-workflows enable arbitrary nesting depth for multi-stage pipelines.
5

Observability First

OpenTelemetry tracing links job runs across API server, worker, and external endpoints. Prometheus metrics expose queue depth, throughput, and latency. Structured JSON logging enables log aggregation. Real-time SSE streaming via Redis.
6

Developer Experience

Unified CLI with code-first deployment workflows, operational command groups, and shell completion for Bash, Zsh, Fish, and PowerShell.

Use Cases

Strait fits these patterns: Background Jobs: Scheduled data imports, report generation, cache warming, cleanup tasks, and recurring maintenance operations. Webhook Consumers: Process events from external services with retries, dead letter queue, and delivery guarantees. AI Agent Workflows: Multi-step AI pipelines with human approval gates, conditional execution, and sub-workflow nesting. Cost tracking per run and per project. Batch Processing: Bulk job triggering with configurable batch sizes, priority ordering, and idempotency deduplication. Data Pipelines: ETL workflows with fan-out parallel steps, transform stages, and aggregation. Cron Jobs: Standard 5-field cron expressions with timezone support and execution windows.

Getting Started

Quick Start

Get Strait running in minutes. Clone repository, start infrastructure with Docker Compose, and trigger your first job.

Architecture

Deep dive into internals. Learn about queue mechanics, FSM states, workflow engine, and technology choices.

SDK Reference

Official SDKs for TypeScript, Python, Go, Ruby, and Rust with full feature parity. Authoring DSL, composition helpers, and typed errors.

CLI Reference

Complete CLI documentation. 48+ commands organized by category with examples and shell completion.

API Reference

REST API endpoints for job management, triggering, workflow orchestration, and SDK interactions.

Concepts

Core domain concepts you’ll encounter:
Jobs define the template for recurring tasks—endpoint URL, timeout, retry strategy, cron schedule, and cost budgets. Runs are execution instances of jobs.

Guides

Step-by-step guides for common tasks:

Authentication

Internal secret auth for API endpoints and JWT run token auth for SDK. API key management with system keychain storage.

Deployment

Docker deployment, Fly.io configuration, horizontal scaling strategies, and production readiness checklist.

Security

SSRF protection, rate limiting, encryption at rest, and secure webhook delivery.

Cost Budgets

Per-run and daily project limits. AI model usage tracking with micro-USD precision. Budget enforcement before execution.

Development

Contributing to Strait or running it locally:

Contributing

Setup development environment, code style, commit conventions, and PR guidelines.

Testing

Unit tests, integration tests with testcontainers, E2E tests, fuzz testing, and benchmarks.

Database Schema

Complete table definitions, indexes, and relationships for PostgreSQL schema.

What’s Next?

Ready to dive deeper? Or jump straight into the Quick Start Guide and run your first job in 5 minutes.

Explore the Docs

Quick Start

Run your first job in under 10 minutes.

Architecture

Understand how Strait works under the hood.

SDKs

Official client libraries for 5 languages.

API Reference

Complete REST API documentation.