limits_concurrency bypassed after non-graceful shutdown

## Summary

When a worker is force-killed during shutdown (shutdown timeout exceeded), jobs protected by limits_concurrency can run concurrently after restart.

## Steps to reproduce

```
class SlowJob < ActiveJob::Base
  limits_concurrency key: "slow_job", duration: 5.minutes

  def perform
    sleep 1.hour
  end
end
```

1. Enqueue 3 SlowJob instances
2. Start SolidQueue supervisor in fork mode (worker thread_pool_size: 3)
3. Wait for the first job to start (semaphore acquired, others blocked)
4. Send SIGTERM to the supervisor
5. SolidQueue.shutdown_timeout (default 5s) expires — supervisor force-kills the worker
6. Start a new supervisor
7. Two or more jobs start Performing concurrently, violating the concurrency limit of 1

## Expected behavior

Only one SlowJob runs at a time after restart, same as before the shutdown.

## Actual behavior

Multiple jobs with the same concurrency key run simultaneously after restart.

## Root cause

Supervisor#start calls start_processes (line 39), which starts the dispatcher and workers concurrently. The dispatcher's ConcurrencyMaintenance is initialized with Concurrent::TimerTask.new(run_now: true), so it does run expire_semaphores and unblock_blocked_executions at boot — but in a background thread. Meanwhile, the worker starts polling immediately and can claim multiple jobs before the maintenance thread completes.

## The sequence:

1. Old worker is force-killed mid-job, leaving a stale semaphore in solid_queue_semaphores
2. Release claimed jobs runs, putting the interrupted job back in the ready queue
3. New supervisor starts — dispatcher and workers boot concurrently
4. Dispatcher's maintenance starts in a background thread (Concurrent::TimerTask)
5. Worker starts polling (every 0.1s), claims multiple ready jobs before maintenance has expired the stale semaphore and unblocked blocked executions
6. Concurrency limit is violated

## Observed in production logs

14:38:39 Supervisor wasn't terminated gracefully - shutdown timeout exceeded (5018.5ms)
14:38:39 Release claimed jobs (90.1ms)  size: 1
...
14:51:47 ==> Your service is live
14:51:50 [Job ff2291c7] Performing RefreshDataJob (az4n-8mr2)
14:51:50 [Job b1ddfa0c] Performing RefreshDataJob (6sqe-dvqs)

Both jobs use limits_concurrency key: self (limit 1) but started in the same second after a deploy that triggered a non-graceful shutdown.

## Possible fix

Run ConcurrencyMaintenance#expire_semaphores and #unblock_blocked_executions synchronously during dispatcher boot, before workers start polling. This would ensure stale semaphores from dead processes are cleaned up before any jobs are claimed.

## Environment
- solid_queue 1.4.0
- Rails 8.1
- Ruby 3.4.7
- PostgreSQL 16
- Fork mode supervisor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limits_concurrency bypassed after non-graceful shutdown #735

Summary

Steps to reproduce

Expected behavior

Actual behavior

Root cause

The sequence:

Observed in production logs

Possible fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

limits_concurrency bypassed after non-graceful shutdown #735

Description

Summary

Steps to reproduce

Expected behavior

Actual behavior

Root cause

The sequence:

Observed in production logs

Possible fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions