The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v4 0/3] workqueue: Shrink the lock time
@ 2026-06-24 11:47 Breno Leitao
  2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, bigeasy
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

The goal of this patchset is to decrease the time spent under the
workqueue pool->lock.

Currently the worker process is woken up inside pool->lock. The wakeup
ends in wake_up_process(), which takes the target task's rq->lock, so
rq->lock nests under pool->lock on the two hottest paths of a contended
unbound workqueue (__queue_work() enqueue and process_one_work() chain
kick). On some architectures the wakeup is even more expensive: on
arm64 waking a CPU that is idle (in wfi) issues an IPI.

Doing all of that while holding pool->lock lengthens the locked region
and hurts throughput on contended unbound pools.

This series shortens the locked region by selecting and claiming the
worker to wake under pool->lock, but issuing the actual wakeup after the
lock is dropped, using the wake_q machinery (wake_q_add() under the
lock, wake_up_q() after).

Because the win is a shorter pool->lock hold time, it shows up most
clearly as lower enqueue latency under contention.

Performance numbers (based on in-kernel workqueue microbenchmark)

VMs and arm64 (Grace) is where this series is meant to pay off -- waking
an idle CPU sitting in wfi costs an IPI (on arm; similar type of
operation on VMs), so doing it under pool->lock lengthens the critical
section.

The arm64 bare-metal numbers match what the x86-or-arm64 VM showed:

    affinity_scope    baseline    patched    tput     p95
                     (items/s)  (items/s)    gain    drop
    --------------   ---------  ---------  ------  ------
    cpu              2,569,880  3,029,740  +17.9%  -13.6%
    smt              2,586,485  3,044,788  +17.7%  -14.0%
    cache_shard        572,055    797,621  +39.4%  -37.1%
    cache              538,132    724,997  +34.7%  -30.1%
    numa               528,673    658,215  +24.5%  -20.5%
    system             524,287    614,486  +17.2%  -21.1%

(p95 drop = change in p95 enqueue latency; negative is better.)
(tput gain = number of requests enqueued per sec; bigger is better.)

Patch 1 is a pure refactor introducing kick_pool_pick().
Patch 2 defers the wakeup on the enqueue path (__queue_work()).
Patch 3 defers the wakeup on the per-work chain-kick path
(process_one_work()).

Signed-off-by: Breno Leitao <leitao@debian.org>

Changes in v4:
- replace raw_spin_unlock_wake() with a standard 
  raw_spin_unlock() + wake_up_q() (Sebastian Andrzej Siewior)
- Link to v3: https://lore.kernel.org/r/20260616-fastwake-v3-0-79da19fcd08f@debian.org

Changes in v3:
- Drop the "park kicked worker on pool->kicked_list" patch (v2 1/4).
  * That is a fix that is independent of this patch, in case we want to
    revamp it, it can be sent separately.
- Link to v2: https://lore.kernel.org/r/20260603-fastwake-v2-0-2977512fe7fa@debian.org

Changes in v2:
- Close the idle_cull_fn() vs kicked-worker race by parking the kicked
  worker on a new pool->kicked_list under pool->lock (new patch 1).
  Reported by Hillf Danton.
- Use the wake_q machinery (wake_q_add() / wake_up_q() via
  raw_spin_unlock_wake()) instead of plumbing a task_struct out of the
  helper by hand. Suggested by Sebastian Andrzej Siewior.
- Link to v1: https://lore.kernel.org/r/20260526-fastwake-v1-0-e69ad86923e6@debian.org

---
Breno Leitao (3):
      workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
      workqueue: defer the worker wakeup outside pool->lock in __queue_work()
      workqueue: defer the worker wakeup outside pool->lock in process_one_work()

 kernel/workqueue.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)
---
base-commit: 8d6dbbbe3ba62de0a63e962ee004afb848c8e3ac
change-id: 20260526-fastwake-02982fd66312

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
  2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
  2026-06-24 18:46   ` Tejun Heo
  2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, bigeasy
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

Factor the worker selection out of kick_pool() into kick_pool_pick(),
which picks and claims the worker under pool->lock but, instead of waking
it, queues it on a caller-provided wake_q via wake_q_add(). The caller
issues the wakeup later with wake_up_q(). wake_q_add() is safe under the
lock (cmpxchg + get_task_struct()); only wake_up_q() takes rq->lock.

kick_pool() becomes a thin wrapper using a local wake_q, so all existing
callers keep waking under pool->lock.

Pure refactor, no functional change.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..fd3b5bc78df9e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -52,6 +52,7 @@
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
 #include <linux/nmi.h>
 #include <linux/kvm_para.h>
 #include <linux/delay.h>
@@ -1258,13 +1259,17 @@ static void kick_bh_pool(struct worker_pool *pool)
 }
 
 /**
- * kick_pool - wake up an idle worker if necessary
+ * kick_pool_pick - select and claim an idle worker, deferring the wakeup
  * @pool: pool to kick
+ * @wakeq: wake_q to queue the selected worker on
  *
- * @pool may have pending work items. Wake up worker if necessary. Returns
- * whether a worker was woken up.
+ * Like kick_pool() but queues the picked worker on @wakeq (wake_q_add())
+ * instead of waking it, so the caller can wake_up_q(@wakeq) after dropping
+ * pool->lock. Returns whether a worker was selected.
+ *
+ * Must be called with @pool->lock held.
  */
-static bool kick_pool(struct worker_pool *pool)
+static bool kick_pool_pick(struct worker_pool *pool, struct wake_q_head *wakeq)
 {
 	struct worker *worker = first_idle_worker(pool);
 	struct task_struct *p;
@@ -1310,10 +1315,27 @@ static bool kick_pool(struct worker_pool *pool)
 		}
 	}
 #endif
-	wake_up_process(p);
+	wake_q_add(wakeq, p);
 	return true;
 }
 
+/**
+ * kick_pool - wake up an idle worker if necessary
+ * @pool: pool to kick
+ *
+ * @pool may have pending work items. Wake up worker if necessary. Returns
+ * whether a worker was woken up.
+ */
+static bool kick_pool(struct worker_pool *pool)
+{
+	DEFINE_WAKE_Q(wakeq);
+	bool kicked;
+
+	kicked = kick_pool_pick(pool, &wakeq);
+	wake_up_q(&wakeq);
+	return kicked;
+}
+
 #ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
 
 /*

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()
  2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
  2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
  2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
  2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
  3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, bigeasy
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

__queue_work() is the enqueue hot path: it inserts the work item and
calls kick_pool() while holding pool->lock. kick_pool() ends in a
wakeup, which takes the target task's rq->lock, so rq->lock nests under
pool->lock on every enqueue that wakes a worker on a contended unbound
pool.

Use kick_pool_pick() to select and claim the worker under pool->lock,
queue it on an on-stack wake_q, and issue the wakeup with wake_up_q()
right after dropping the lock. Worker selection, wake_cpu setup and
claiming the worker off pool->idle_list still happen under the lock;
only the rq->lock acquisition moves out.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fd3b5bc78df9e..972f783f98281 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2299,6 +2299,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 {
 	struct pool_workqueue *pwq;
 	struct worker_pool *last_pool, *pool;
+	DEFINE_WAKE_Q(wakeq);
 	unsigned int work_flags;
 	unsigned int req_cpu = cpu;
 
@@ -2421,7 +2422,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 		trace_workqueue_activate_work(work);
 		insert_work(pwq, work, &pool->worklist, work_flags);
-		kick_pool(pool);
+		kick_pool_pick(pool, &wakeq);
 	} else {
 		work_flags |= WORK_STRUCT_INACTIVE;
 		insert_work(pwq, work, &pwq->inactive_works, work_flags);
@@ -2429,6 +2430,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 out:
 	raw_spin_unlock(&pool->lock);
+	wake_up_q(&wakeq);
 	rcu_read_unlock();
 }
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work()
  2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
  2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
  2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
  2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
  3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, bigeasy
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

process_one_work() kicks the pool to chain execution of the remaining
work items on WORKER_NOT_RUNNING pools (the UNBOUND and CPU_INTENSIVE
ones), calling kick_pool() while holding pool->lock. As in the enqueue
path, the wakeup pulls the target rq->lock in under pool->lock.

Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_q() after the lock is dropped.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 972f783f98281..ab62af99852ce 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3251,6 +3251,7 @@ __acquires(&pool->lock)
 {
 	struct pool_workqueue *pwq = get_work_pwq(work);
 	struct worker_pool *pool = worker->pool;
+	DEFINE_WAKE_Q(wakeq);
 	unsigned long work_data;
 	int lockdep_start_depth, rcu_start_depth;
 	bool bh_draining = pool->flags & POOL_BH_DRAINING;
@@ -3305,7 +3306,7 @@ __acquires(&pool->lock)
 	 * chain execution of the pending work items for WORKER_NOT_RUNNING
 	 * workers such as the UNBOUND and CPU_INTENSIVE ones.
 	 */
-	kick_pool(pool);
+	kick_pool_pick(pool, &wakeq);
 
 	/*
 	 * Record the last pool and clear PENDING which should be the last
@@ -3317,6 +3318,7 @@ __acquires(&pool->lock)
 
 	pwq->stats[PWQ_STAT_STARTED]++;
 	raw_spin_unlock_irq(&pool->lock);
+	wake_up_q(&wakeq);
 
 	rcu_start_depth = rcu_preempt_depth();
 	lockdep_start_depth = lockdep_depth(current);

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 0/3] workqueue: Shrink the lock time
  2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
                   ` (2 preceding siblings ...)
  2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
@ 2026-06-24 15:30 ` Sebastian Andrzej Siewior
  3 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-24 15:30 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Tejun Heo, Lai Jiangshan, linux-kernel, marco.crivellari,
	frederic, Hillf Danton, kernel-team, kmagar, psuriset, david.dai

On 2026-06-24 04:47:38 [-0700], Breno Leitao wrote:
> The goal of this patchset is to decrease the time spent under the
> workqueue pool->lock.
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
  2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
@ 2026-06-24 18:46   ` Tejun Heo
  2026-06-25 16:51     ` Breno Leitao
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-06-24 18:46 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Lai Jiangshan, bigeasy, linux-kernel, marco.crivellari, frederic,
	Hillf Danton, kernel-team, kmagar, psuriset, david.dai

On Wed, Jun 24, 2026 at 04:47:39AM -0700, Breno Leitao wrote:
> +	wake_q_add(wakeq, p);

This is two extra atomic ops for every work item scheduling. This isn't
necessarily a deal braker but is this the only way to do this? Can't you
just stash the task pointer, extend irq disabled region and wake under rcu
protection?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
  2026-06-24 18:46   ` Tejun Heo
@ 2026-06-25 16:51     ` Breno Leitao
  0 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-25 16:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, bigeasy, linux-kernel, marco.crivellari, frederic,
	Hillf Danton, kernel-team, kmagar, psuriset, david.dai

On Wed, Jun 24, 2026 at 08:46:59AM -1000, Tejun Heo wrote:
> On Wed, Jun 24, 2026 at 04:47:39AM -0700, Breno Leitao wrote:
> > +	wake_q_add(wakeq, p);
>
> This is two extra atomic ops for every work item scheduling. This isn't
> necessarily a deal braker but is this the only way to do this?
>
> Can't you just stash the task pointer, extend irq disabled region and wake under rcu
> protection?

Agreed, that's a better approach. Thanks for the suggestion.

I have a working PoC implementing this. I'll clean it up and post an
updated series.

Thanks,
--breno

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-25 16:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-24 18:46   ` Tejun Heo
2026-06-25 16:51     ` Breno Leitao
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox