All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] workqueue: Shrink the lock time
@ 2026-06-03 13:40 Breno Leitao
  2026-06-03 13:40 ` [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Breno Leitao @ 2026-06-03 13:40 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team

The goal of this patchset is to decrease the time spent under the
workqueue pool->lock.

Currently the worker process is woken up inside pool->lock.  The wakeup
ends in wake_up_process(), which takes the target task's rq->lock, so
rq->lock nests under pool->lock on the two hottest paths of a contended
unbound workqueue (__queue_work() enqueue and process_one_work() chain
kick).  On some architectures the wakeup is even more expensive: on
arm64 waking a CPU that is idle (in wfi) issues an IPI.

Doing all of that while holding pool->lock lengthens the locked region
and hurts throughput on contended unbound pools.

This series shortens the locked region by selecting and claiming the
worker to wake under pool->lock, but issuing the actual wakeup after the
lock is dropped, using the wake_q machinery (wake_q_add() under the
lock, wake_up_q() after).

Because the win is a shorter pool->lock hold time, it shows up most
clearly as lower enqueue latency under contention.  Measured with the
in-tree test_workqueue microbenchmark (lib/test_workqueue.c, 8 producers
x 200000 items on a WQ_UNBOUND workqueue, x86 8-vCPU VM, medians of five
boots), on the contended affinity scopes (cache_shard, cache, numa,
system) p95 enqueue latency drops ~40% and throughput improves ~20%.
The uncontended per-CPU scopes (cpu, smt) are unaffected.

While reworking this, Hillf Danton pointed out -- and a closer look
confirmed -- a latent race: kick_pool() wakes a worker but leaves it
WORKER_IDLE on pool->idle_list until it schedules in, so a concurrent
idle_cull_fn() (which only checks WORKER_IDLE, not the task state) can
reap it before it consumes the work, stranding the just-enqueued item.

The window is narrow today but deferring the wakeup widens it, so the
first patch closes it by moving the kicked worker onto a new
pool->kicked_list under pool->lock, out of reach of the cull which walks
idle_list only.

Patch 1 fixes the cull race and is a standalone correctness fix.
Patch 2 is a pure refactor introducing kick_pool_pick().
Patch 3 defers the wakeup on the enqueue path (__queue_work()).
Patch 4 defers the wakeup on the per-work chain-kick path
(process_one_work()).

Changes in v2:
- Close the idle_cull_fn() vs kicked-worker race by parking the kicked
  worker on a new pool->kicked_list under pool->lock (new patch 1).
  Reported by Hillf Danton.
- Use the wake_q machinery (wake_q_add() / wake_up_q() via
  raw_spin_unlock_wake()) instead of plumbing a task_struct out of the
  helper by hand.  Suggested by Sebastian Andrzej Siewior.
- Link to v1: https://lore.kernel.org/r/20260526-fastwake-v1-0-e69ad86923e6@debian.org

Signed-off-by: Breno Leitao <leitao@debian.org>
---
To: Tejun Heo <tj@kernel.org>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org

---
Breno Leitao (4):
      workqueue: park kicked worker on pool->kicked_list
      workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
      workqueue: defer the worker wakeup outside pool->lock in __queue_work()
      workqueue: defer the worker wakeup outside pool->lock in process_one_work()

 kernel/workqueue.c | 65 +++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 10 deletions(-)
---
base-commit: c1ecb239fa3456529a32255359fc78b69eb9d847
change-id: 20260526-fastwake-02982fd66312

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-04 15:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 13:40 [PATCH v2 0/4] workqueue: Shrink the lock time Breno Leitao
2026-06-03 13:40 ` [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list Breno Leitao
2026-06-04  8:50   ` Tejun Heo
2026-06-03 13:40 ` [PATCH v2 2/4] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-03 13:40 ` [PATCH v2 3/4] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-03 13:40 ` [PATCH v2 4/4] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-04  8:50   ` Tejun Heo
2026-06-04 15:29     ` Breno Leitao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.