Strait Docs
Development

Detailed reference of the PostgreSQL database schema and migration system.

Strait uses PostgreSQL 18 as its primary data store. All primary keys are UUIDv7 stored as TEXT. Schema is managed by versioned SQL migrations (currently through the 000089_* series) that run automatically on startup.

Core Tables

TableDescription
jobsJob definitions (name, slug, endpoint URL, retry config, cron, TTL, retry strategy, environment link)
job_runsExecution instances with 13-state FSM, payload, result, error, execution trace, debug mode, continuation lineage
run_eventsStructured log entries per run (type, level, message, data)
job_versionsAuto-snapshot of job config on every update
api_keysPer-project API keys (SHA-256 hashed, revocable, rotatable with grace windows)
audit_eventsImmutable audit trail for security-sensitive and control-plane actions
project_roles / project_member_rolesRBAC role definitions and project user-role bindings
resource_policies / tag_policiesFine-grained policy grants by resource and by tag match
known_actorsSynced actor identities used for attribution and audit context
webhook_deliveriesWebhook delivery tracking and dead letter queue (job runs + event triggers)
event_triggersDurable event waits for workflow steps and job runs
project_quotasPer-project quotas (max jobs, runs, concurrency, cost limits)
job_run_idempotencyGlobal deduplication tracking for job runs across partitions
webhook_subscriptionsProject-level webhook subscriptions for multiple event types

Workflow Tables

TableDescription
workflowsWorkflow DAG definitions (name, slug, project, version)
workflow_versionsImmutable workflow snapshots by version number
workflow_version_stepsVersion-pinned step snapshots used by running workflow runs
workflow_stepsCurrent mutable step definitions (dependencies, conditions, retries, concurrency/resource hints)
workflow_runsWorkflow execution instances (status, payload, parent lineage, version pin)
workflow_step_runsPer-step runtime state (status, deps counters, output/error, timing)
workflow_step_decisionsExplainability log for scheduler/condition decisions during progression
workflow_policiesProject-level DAG governance controls (max_fan_out, max_depth, forbidden types, approval requirement)
workflow_run_labelsKey-value labels on workflow runs
workflow_step_approvalsStep approval tracking (approvers, status, timestamps)

Core Engine Tables

TableDescription
run_usageAI model usage tracking per run (provider, model, tokens, cost in micro-USD)
run_checkpointsSDK-driven run state checkpoints
run_tool_callsTool call recording with input/output and duration
run_outputsStructured outputs with optional schema validation
environmentsNamed execution environments with JSONB variable storage
endpoint_circuit_stateCircuit breaker state per endpoint URL (state, failures, opened_at, last_failure_at)
endpoint_health_scoresEWMA-based health scoring per endpoint (score, success_rate, timeout_rate, latency)
job_slosService level objectives per job (metric, target, window)
job_slo_evaluationsPoint-in-time SLO evaluation results (current_value, budget_remaining)
pricing_catalogStatic pricing table for AI model cost calculation
job_secretsEncrypted secrets scoped to job and environment
job_groupsLogical grouping of jobs (full CRUD with name, description, project scope)
job_dependenciesInter-job dependency definitions
audit_eventsImmutable audit trail for security-sensitive operations (migration 000057)
batch_operationsTracks bulk trigger and bulk cancel batch operation status and results
log_drainsExternal log drain endpoint configuration for streaming run events
event_sourcesInbound event source definitions with type and configuration
event_subscriptionsBindings between event sources and jobs with optional filter expressions

Key Columns

Table.ColumnTypeDescription
jobs.retry_strategyTEXTRetry strategy: exponential, linear, fixed, custom
jobs.retry_delays_secsINT[]Custom per-attempt delays in seconds (for custom strategy)
jobs.environment_idTEXTFK to environments table for endpoint override
job_runs.execution_traceJSONBTiming breakdown (queue_wait, dequeue, connect, ttfb, transfer, total)
job_runs.debug_modeBOOLEANWhether debug diagnostics are enabled for this run
job_runs.continuation_ofTEXTFK to parent run for continuation lineage
job_runs.lineage_depthINTDepth in continuation chain
workflow_steps.concurrency_keyTEXTStep-level serialization key across runnable steps
workflow_steps.resource_classTEXTScheduler capacity bucket (small, medium, large)
workflow_runs.parent_step_run_idTEXTFK linking sub-workflow runs back to parent step run
workflow_step_decisions.decision_typeTEXTDecision class (scheduler, concurrency, resource, condition)
workflow_step_decisions.detailsJSONBMachine-readable context used by explain endpoint
workflow_policies.max_fan_outINTPer-step direct dependent cap enforced on create/update/trigger
workflow_policies.max_depthINTMaximum DAG depth enforced by policy checks
project_quotas.max_cost_per_run_microusdBIGINTPer-run cost limit in micro-USD
project_quotas.max_daily_cost_microusdBIGINTDaily project cost limit in micro-USD
webhook_deliveries.workflow_run_idTEXTFK to workflow_runs for workflow-triggered webhooks
webhook_deliveries.event_typeTEXTType of event being delivered (e.g., run.completed)
audit_events.resource_typeTEXTType of resource affected (e.g., job, workflow)
audit_events.changesJSONBDetailed diff of changes made during the event
webhook_subscriptions.event_typesTEXT[]Array of event types this subscription listens to

Relationships

  • Jobs & Runs: A job has many job_runs. Each run is linked to a specific job_version.
  • Workflows: A workflow has many workflow_versions, and each version has many workflow_version_steps used to materialize workflow_step_runs at trigger time.
  • Workflow Governance: workflow_policies is one row per project and is enforced at workflow create/update/trigger boundaries.
  • Workflow Explainability: workflow_step_decisions stores progression decisions per run and powers /v1/workflow-runs/{id}/explain.
  • Environments: jobs and job_secrets are linked to environments for configuration isolation.

Event Trigger Tables

event_triggers

id                    TEXT PRIMARY KEY              -- UUIDv7
event_key             TEXT NOT NULL UNIQUE           -- Globally unique caller-controlled key
project_id            TEXT NOT NULL
source_type           TEXT NOT NULL                  -- 'workflow_step' or 'job_run'
trigger_type          TEXT NOT NULL DEFAULT 'event'  -- 'event' or 'sleep'
workflow_run_id       TEXT                           -- FK to workflow_runs
workflow_step_run_id  TEXT                           -- FK to workflow_step_runs
job_run_id            TEXT                           -- FK to job_runs
status                TEXT NOT NULL DEFAULT 'waiting' -- waiting, received, timed_out, canceled
timeout_secs          INT NOT NULL DEFAULT 3600
request_payload       JSONB
response_payload      JSONB
requested_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
received_at           TIMESTAMPTZ
expires_at            TIMESTAMPTZ
error                 TEXT
notify_url            TEXT
notify_status         TEXT
event_emit_key        TEXT
sent_by               TEXT
created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW()
updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW()

Key Indexes

  • idx_event_triggers_status_expires: (status, expires_at) WHERE status = 'waiting' -- Reaper timeout lookup
  • idx_event_triggers_project_status: (project_id, status) -- List queries
  • idx_event_triggers_event_key_prefix: (event_key text_pattern_ops) -- Prefix matching for batch resolution
  • idx_event_triggers_step_run: (workflow_step_run_id) WHERE workflow_step_run_id IS NOT NULL -- Step-to-trigger lookup

Performance Indexes

  • idx_workflow_runs_status_expires: (status, expires_at) WHERE expires_at IS NOT NULL -- Reaper timeout lookup
  • idx_workflow_runs_status_finished: (status, finished_at) WHERE finished_at IS NOT NULL -- Retention deletion
  • idx_step_runs_workflow_run_status: (workflow_run_id, status) -- Step run queries by run and status
  • idx_step_decisions_step_run_id: (step_run_id) -- CASCADE deletes on decisions
  • idx_job_runs_active_by_job: (job_id) WHERE status IN ('queued', 'dequeued', 'executing') -- Active run counting

webhook_deliveries Extensions

The webhook_deliveries table (migration 000009, extended in 000061 and 000062) is shared between job run webhooks and event trigger webhooks:

id                TEXT PRIMARY KEY
run_id            TEXT                -- FK to job_runs (nullable)
job_id            TEXT                -- FK to jobs (nullable)
event_trigger_id  TEXT                -- FK to event_triggers (nullable)
workflow_run_id   TEXT                -- FK to workflow_runs (nullable)
subscription_id   TEXT                -- FK to webhook_subscriptions (nullable)
webhook_url       TEXT NOT NULL
status            TEXT NOT NULL DEFAULT 'pending'  -- pending, delivering, delivered, failed, dead
attempts          INT NOT NULL DEFAULT 0
max_attempts      INT NOT NULL DEFAULT 5
last_status_code  INT
last_error        TEXT
next_retry_at     TIMESTAMPTZ
delivered_at      TIMESTAMPTZ
payload           JSONB
event_type        TEXT
sequence          INT NOT NULL DEFAULT 0
created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()

Key indexes:

  • idx_webhook_deliveries_pending: (next_retry_at ASC) WHERE status = 'pending' -- Delivery worker queue
  • idx_webhook_deliveries_run_seq: (run_id, sequence) WHERE run_id IS NOT NULL -- Ordered delivery per run
  • idx_webhook_deliveries_event_trigger: (event_trigger_id) WHERE event_trigger_id IS NOT NULL
  • idx_webhook_deliveries_status_created: (status, created_at)

endpoint_circuit_state

The circuit breaker state table (migration 000022) tracks per-endpoint failure state:

endpoint_url    TEXT PRIMARY KEY
state           TEXT NOT NULL DEFAULT 'closed'  -- closed, open, half_open
failures        INT NOT NULL DEFAULT 0
opened_at       TIMESTAMPTZ
last_failure_at TIMESTAMPTZ
updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()

endpoint_health_scores

The endpoint health scoring table (migration 000120) tracks continuous health metrics per endpoint using EWMA:

endpoint_url        TEXT PRIMARY KEY
health_score        DOUBLE PRECISION NOT NULL DEFAULT 100.0
success_rate        DOUBLE PRECISION NOT NULL DEFAULT 1.0
timeout_rate        DOUBLE PRECISION NOT NULL DEFAULT 0.0
latency_score       DOUBLE PRECISION NOT NULL DEFAULT 1.0
total_requests      BIGINT NOT NULL DEFAULT 0
last_latency_ms     DOUBLE PRECISION NOT NULL DEFAULT 0.0
updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()

job_slos and job_slo_evaluations

SLO definitions and evaluation history (migration 000122):

-- job_slos
id              TEXT PRIMARY KEY
job_id          TEXT NOT NULL REFERENCES jobs(id) ON DELETE CASCADE
project_id      TEXT NOT NULL
metric          TEXT NOT NULL  -- success_rate, p95_latency_secs, p99_latency_secs
target          DOUBLE PRECISION NOT NULL
window_hours    INT NOT NULL   -- 24, 168, 720
UNIQUE(job_id, metric, window_hours)

-- job_slo_evaluations
id              TEXT PRIMARY KEY
slo_id          TEXT NOT NULL REFERENCES job_slos(id) ON DELETE CASCADE
current_value   DOUBLE PRECISION NOT NULL
budget_remaining DOUBLE PRECISION NOT NULL
evaluated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()

job_runs Key Indexes

  • idx_runs_queue_covering: (created_at ASC) INCLUDE (job_id, priority, scheduled_at, next_retry_at) WHERE status = 'queued' -- Covering index for dequeue
  • idx_runs_status_project_covering: (project_id, status, created_at DESC) INCLUDE (job_id, priority, scheduled_at, next_retry_at) -- Covering index for list queries
  • idx_runs_heartbeat: (heartbeat_at) WHERE status = 'executing' -- Stale run detection
  • idx_runs_expires: (expires_at) WHERE expires_at IS NOT NULL AND status IN ('delayed', 'queued') -- Expiration reaper
  • idx_job_runs_parent_run_id: (parent_run_id) WHERE parent_run_id IS NOT NULL -- Parent-child lookup
  • idx_runs_metadata: GIN index on metadata (migration 000016) -- JSONB filtering
  • idx_jobs_tags: GIN index on tags (migration 000054) -- Tag-based filtering

FK Constraints on job_runs

Important for partitioning considerations:

  • job_runs.parent_run_id -- Self-referential FK dropped in migration 000066 to support range partitioning. Integrity is enforced at the application layer.
  • run_events.run_id -- No ON DELETE clause
  • run_checkpoints.run_id -- ON DELETE CASCADE
  • run_usage.run_id -- ON DELETE CASCADE
  • run_tool_calls.run_id -- ON DELETE CASCADE
  • run_outputs.run_id -- ON DELETE CASCADE

Miscellaneous Indexes

  • idx_idempotency_expires: (expires_at) WHERE expires_at IS NOT NULL on job_run_idempotency
  • idx_audit_events_resource_lookup: (resource_type, resource_id) on audit_events
  • idx_webhook_subscriptions_project_active: (project_id, created_at DESC) WHERE active = TRUE on webhook_subscriptions

Migrations

We use golang-migrate for schema evolution.

Numbering Convention

Migrations follow a sequential 6-digit numbering scheme: 000001_*.up.sql through the latest migration in the repository.

Recent Migrations

MigrationDescription
000062_webhook_deliveries_extendExtends webhook deliveries with workflow run context, subscriptions, and unified delivery worker columns.
000063_autovacuum_tuningOptimizes autovacuum settings for high-churn tables (job_runs, webhook_deliveries, run_events).
000064_uuidv7_defaultsAdds native uuidv7() defaults for primary keys using the pg_uuidv7 extension.
000065_job_run_idempotencyIntroduces a dedicated table for global job run idempotency tracking across partitions.
000066_job_runs_partitioningConverts job_runs to a range-partitioned table by created_at with automated partition management.
000067_add_performance_indexesAdds workflow_runs and workflow_step_runs performance indexes.
000068_covering_indexesAdds covering indexes (INCLUDE clause) to job_runs and webhook_deliveries for index-only scans.
000069_audit_events_extendExtends audit events with resource context, changes, and client metadata (IP, User-Agent).
000070_webhook_subscriptionsAdds webhook subscriptions table for multi-event project-level webhook management.
000071_performance_indexes_phase2Phase 2 performance indexes for job_runs and workflows.
000072_add_webhook_retry_policyAdds retry policy configuration to webhook deliveries.
000073_recreate_idempotency_indexRecreates idempotency index with corrected partial filter.
000074_workflow_parent_step_run_idAdds parent_step_run_id to workflow_runs for sub-workflow lineage.
000075_workflow_step_concurrency_keyAdds concurrency_key to workflow_steps and workflow_version_steps.
000076_workflow_step_decisionsAdds explainability decision log table for DAG runtime.
000077_workflow_step_resource_classAdds resource_class scheduling bucket to workflow step tables.
000078_workflow_policiesAdds per-project DAG governance policy table.
000079_queue_notify_triggerAdds pg_notify trigger to wake queue workers on new job insertion.
000080_dequeue_active_runs_indexAdds partial index on job_runs for active run counting during dequeue.
000081_step_decisions_indexAdds indexes on workflow step decisions table.
000082_batch_trackingCreates batch_operations table; adds batch_id to job_runs.
000083_job_default_run_metadataAdds default_run_metadata JSONB to jobs and job_versions.
000084_payload_gin_indexAdds GIN index on job_runs.payload for JSONB containment queries.
000086_job_run_concurrency_keyAdds max_concurrency_per_key to jobs; concurrency_key to job_runs with partial index.
000087_named_rate_limit_keysAdds rate_limit_keys JSONB to jobs and job_versions.
000088_log_drainsCreates log_drains table with auth config and level filtering.
000089_event_sources_v2Creates event_sources and event_subscriptions tables.

Adding New Migrations

  1. Create a new pair of SQL files in the apps/strait/migrations/ directory using the next sequence number:
    • 000090_your_feature.up.sql
    • 000090_your_feature.down.sql
  2. Ensure the migration is idempotent.
  3. The application will automatically detect and apply the new migration on the next startup.

Never modify an existing migration file that has already been merged. Always create a new migration to apply changes.

Was this page helpful?

On this page