Operations
Runbook for authentication and authorization incidents.
Common Incident Types
- sudden 401 spike (token/key validation issues)
- sudden 403 spike (RBAC/policy misconfiguration)
- audit event write failures
Triage Checklist
- Check recent auth/RBAC deployments and config changes.
- Inspect 401/403 rates and top failing routes.
- Confirm OIDC issuer/audience and key material.
- Validate permission-cache behavior and invalidation events.
- Verify DB health for audit event writes.
Mitigation
- Temporarily roll back OIDC/RBAC changes.
- Reapply known-good role/policy snapshots.
- If needed, use internal-secret path for emergency admin actions.
Was this page helpful?