Detailed rationale behind each technology used in Strait.
This page provides the design rationale for each major technology choice in Strait. For a concise overview, see the Architecture page.
Go 1.26
Go is ideal for a distributed job orchestration service because it combines enterprise-grade performance with operational simplicity. The single binary distribution eliminates runtime dependency management -- no Python venvs, no node_modules, no container images required. This dramatically reduces deployment complexity and attack surface.
Go's concurrency primitives -- goroutines, channels, and the sync package -- provide a robust foundation for parallel job execution without external dependencies. The worker pool can spin up hundreds of concurrent goroutines with minimal overhead, each handling independent job dispatches. Channels enable elegant coordination between the executor, scheduler, and background maintenance tasks without complex distributed coordination systems.
For a job orchestration service, cold start performance matters. Go compiles to native code that starts instantly, crucial for rapid worker scaling during traffic spikes. Unlike interpreted languages, there's no JIT warmup period or garbage collection tuning needed at scale.
The standard library provides production-ready HTTP server, JSON handling, and cryptographic primitives without external dependencies. This reduces vulnerability surface and ensures long-term stability.
PostgreSQL 18 with SELECT FOR UPDATE SKIP LOCKED
Using PostgreSQL as both the job queue and state store eliminates distributed transaction complexity. When a job is triggered, the enqueue operation and state update happen in a single database transaction. No message broker coordination is required -- no RabbitMQ, SQS, or Kafka to manage, configure, or scale horizontally. This reduces operational overhead significantly while ensuring exactly-once semantics.
SELECT FOR UPDATE SKIP LOCKED is the key mechanism that enables PostgreSQL to function as a message queue. Multiple workers can poll the same table concurrently, but each query locks only rows it will process and skips rows already locked by other workers. This provides lock-free dequeue with ACID guarantees at the database level. No distributed locks, no leader election, no coordination service.
Batch dequeue operations use Common Table Expressions (CTEs) to claim up to N rows in a single database round-trip. This amortizes connection overhead and improves throughput at scale.
PostgreSQL provides excellent support for JSONB fields, enabling flexible job metadata, workflow DAG definitions, and execution traces without schema migrations for every new field. The partial indexes on status = 'queued' and priority DESC, created_at ASC ensure efficient dequeue queries even with millions of queued runs.
Redis 8
Redis provides the pub/sub layer that decouples real-time streaming from the transactional database layer. When a job run completes, state changes are published to Redis channels. Clients can subscribe to these channels via Server-Sent Events (SSE) for real-time dashboards without polling the API. This reduces database load and provides sub-second latency for UI updates.
For Change Data Capture (CDC), Sequin streams Postgres WAL events through Redis channels. This enables real-time notifications when job definitions change, workflows update, or runs complete -- without application-level triggers that run within transactions and could slow down operations under load.
Redis cluster mode provides horizontal scalability for the pub/sub layer. As the number of concurrent SSE connections grows, the Redis cluster can distribute channel subscriptions across multiple nodes.
pgx/v5 (Raw SQL over ORM)
The pgx/v5 library provides low-level access to PostgreSQL with fine-grained control over connection pooling, prepared statements, and transaction boundaries. For performance-critical operations like SELECT FOR UPDATE SKIP LOCKED, using raw SQL ensures the exact query structure needed for lock-free dequeue. ORMs introduce abstraction layers that may not support these patterns efficiently or may generate suboptimal queries for batch operations.
Connection pooling is configurable and critical for production deployments. The pool manages connection lifecycle, limits concurrent connections to prevent database exhaustion, and provides metrics for monitoring. Transactions use the WithTx helper to wrap multiple queries in ACID boundaries for operations like enqueue with budget checks.
Prepared statements are automatically used for repeated queries, reducing parsing overhead. This is particularly important for worker dequeue loops that execute the same queries millions of times.
Chi/v5 HTTP Router
Chi is a lightweight, composable HTTP router that provides excellent performance with minimal allocations in the hot request path. For a job orchestration service API handling thousands of requests per second, memory allocation efficiency translates directly to better throughput and lower GC pressure.
The middleware chain composes independently -- RequestID generation, real IP extraction, OTel tracing, request logging, panic recovery, rate limiting, and authentication can each be developed, tested, and debugged separately. This modularity enables adding cross-cutting concerns without modifying route handlers.
sourcegraph/conc (Structured Concurrency)
The sourcegraph/conc library provides panic-safe goroutine pools and context-aware execution patterns. In a job orchestration service, background goroutines handle scheduling, heartbeat monitoring, webhook dispatch, and retention cleanup. If any of these goroutines panic, conc automatically recovers and logs the stack trace without crashing the entire process.
Context-aware pools automatically cancel all work when the parent context is canceled. This is crucial for graceful shutdown -- when Strait receives SIGTERM, all in-flight operations complete before the process exits. Pools also enforce size limits, preventing unbounded goroutine spawns that could exhaust process memory under error conditions.
alitto/pond/v2 (Worker Pool)
The alitto/pond/v2 library implements a production-grade worker pool with bounded queue backpressure. Tasks are submitted to a buffered channel, and a configurable number of worker goroutines consume from this channel. The buffer size provides backpressure -- when the queue is full, task submission blocks rather than growing unbounded memory usage.
The pool exposes Prometheus metrics for active workers, queued tasks, and task completion time. These metrics enable horizontal scaling decisions -- monitoring queue depth to add workers proactively before latency increases. Graceful shutdown waits for all in-flight tasks to complete, ensuring no job work is lost during deployments.
Sequin (Change Data Capture)
Sequin captures Postgres Write-Ahead Log (WAL) events in real-time and streams them through Redis. This enables applications to react to data changes without polling. For a job orchestration service, this means real-time dashboards, audit trails, and trigger-based workflows without adding application-level triggers that run within transactions.
Logical replication slots provide the CDC stream without affecting transaction performance. Multiple consumers can subscribe to the same slot for different purposes -- SSE streaming, cache invalidation, and external event forwarding.
OpenTelemetry (Distributed Tracing & Metrics)
OpenTelemetry provides vendor-neutral observability that works with Prometheus, Jaeger, Tempo, or any OTLP-compatible backend. The otelchi middleware automatically traces every HTTP request with span ID, parent/child relationships, and performance timings.
Spans cover database queries, queue operations, HTTP dispatches, and workflow step execution. When debugging production issues, distributed tracing links a slow job run across the API server, worker, and external endpoint to pinpoint latency sources. Metrics expose Prometheus counters and histograms for run transitions, dequeue duration, and HTTP dispatch latency, enabling alerting and capacity planning.
golang-jwt/v5
The golang-jwt/v5 library generates HS256 JWT tokens with the JWT_SIGNING_KEY for SDK authentication. Tokens include the run ID as the subject, an expiration timestamp, and issued-at claim. This allows job executors to interact with the /sdk/v1/* endpoints for a specific run without managing session state or long-lived credentials.
JWT tokens enable stateless authentication -- the server validates the signature and claims on each request without database lookups. The 60-second default expiration ensures tokens cannot be reused after the run completes.
robfig/cron/v3
The robfig/cron/v3 library implements standard 5-field cron expressions for job scheduling. This enables time-based job triggering -- hourly, daily, weekly, or custom schedules -- without building custom scheduling logic.
The cron scheduler maintains job IDs and schedules in memory, triggering a tick when the next execution time arrives. Timezone support ensures jobs run at the correct time regardless of server location.
golang-migrate/v4 with go:embed
The golang-migrate/v4 library manages SQL schema migrations with up/down files. Using go:embed embeds migration files directly in the Go binary, eliminating the need to bundle migration files separately or mount them in containers.
Migrations run automatically on startup, checking the schema_migrations table for the current version and applying any pending migrations. This ensures database schema is always synchronized with application code, preventing deployment failures due to mismatched schema versions.
samber/lo (Type-Safe Utilities)
The samber/lo library provides type-safe generic collection utilities -- Map, Filter, Reduce, GroupBy, and more -- for Go code. Without generics, common operations require verbose boilerplate or unsafe type assertions. These utilities provide concise, readable code for data transformations without sacrificing type safety.
Map operations support functional composition, enabling pipeline-style data processing in job payloads and workflow step outputs. Type parameters inferred from usage reduce boilerplate compared to the Go standard library.
google/go-cmp
The google/go-cmp library provides structural comparison with human-readable diffs for test assertions. When testing job run transitions, expected vs. actual JSONB payloads can be compared with deep equality, not just pointer equality. The diff output shows exactly which fields differ, speeding up test debugging.
testcontainers-go
The testcontainers-go library spins up real Postgres and Redis containers for integration tests. Tests execute against actual database behavior -- including SELECT FOR UPDATE SKIP LOCKED semantics and transaction isolation -- rather than mocks that may not match production.
Containers are provisioned with deterministic ports and automatically cleaned up after tests complete. This enables CI pipelines to run integration tests without external infrastructure dependencies.