* [PATCH v4 0/3] workqueue: Shrink the lock time
@ 2026-06-24 11:47 Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
The goal of this patchset is to decrease the time spent under the
workqueue pool->lock.
Currently the worker process is woken up inside pool->lock. The wakeup
ends in wake_up_process(), which takes the target task's rq->lock, so
rq->lock nests under pool->lock on the two hottest paths of a contended
unbound workqueue (__queue_work() enqueue and process_one_work() chain
kick). On some architectures the wakeup is even more expensive: on
arm64 waking a CPU that is idle (in wfi) issues an IPI.
Doing all of that while holding pool->lock lengthens the locked region
and hurts throughput on contended unbound pools.
This series shortens the locked region by selecting and claiming the
worker to wake under pool->lock, but issuing the actual wakeup after the
lock is dropped, using the wake_q machinery (wake_q_add() under the
lock, wake_up_q() after).
Because the win is a shorter pool->lock hold time, it shows up most
clearly as lower enqueue latency under contention.
Performance numbers (based on in-kernel workqueue microbenchmark)
VMs and arm64 (Grace) is where this series is meant to pay off -- waking
an idle CPU sitting in wfi costs an IPI (on arm; similar type of
operation on VMs), so doing it under pool->lock lengthens the critical
section.
The arm64 bare-metal numbers match what the x86-or-arm64 VM showed:
affinity_scope baseline patched tput p95
(items/s) (items/s) gain drop
-------------- --------- --------- ------ ------
cpu 2,569,880 3,029,740 +17.9% -13.6%
smt 2,586,485 3,044,788 +17.7% -14.0%
cache_shard 572,055 797,621 +39.4% -37.1%
cache 538,132 724,997 +34.7% -30.1%
numa 528,673 658,215 +24.5% -20.5%
system 524,287 614,486 +17.2% -21.1%
(p95 drop = change in p95 enqueue latency; negative is better.)
(tput gain = number of requests enqueued per sec; bigger is better.)
Patch 1 is a pure refactor introducing kick_pool_pick().
Patch 2 defers the wakeup on the enqueue path (__queue_work()).
Patch 3 defers the wakeup on the per-work chain-kick path
(process_one_work()).
Signed-off-by: Breno Leitao <leitao@debian.org>
Changes in v4:
- replace raw_spin_unlock_wake() with a standard
raw_spin_unlock() + wake_up_q() (Sebastian Andrzej Siewior)
- Link to v3: https://lore.kernel.org/r/20260616-fastwake-v3-0-79da19fcd08f@debian.org
Changes in v3:
- Drop the "park kicked worker on pool->kicked_list" patch (v2 1/4).
* That is a fix that is independent of this patch, in case we want to
revamp it, it can be sent separately.
- Link to v2: https://lore.kernel.org/r/20260603-fastwake-v2-0-2977512fe7fa@debian.org
Changes in v2:
- Close the idle_cull_fn() vs kicked-worker race by parking the kicked
worker on a new pool->kicked_list under pool->lock (new patch 1).
Reported by Hillf Danton.
- Use the wake_q machinery (wake_q_add() / wake_up_q() via
raw_spin_unlock_wake()) instead of plumbing a task_struct out of the
helper by hand. Suggested by Sebastian Andrzej Siewior.
- Link to v1: https://lore.kernel.org/r/20260526-fastwake-v1-0-e69ad86923e6@debian.org
---
Breno Leitao (3):
workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
workqueue: defer the worker wakeup outside pool->lock in __queue_work()
workqueue: defer the worker wakeup outside pool->lock in process_one_work()
kernel/workqueue.c | 40 +++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
---
base-commit: 8d6dbbbe3ba62de0a63e962ee004afb848c8e3ac
change-id: 20260526-fastwake-02982fd66312
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
2026-06-24 18:46 ` Tejun Heo
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
Factor the worker selection out of kick_pool() into kick_pool_pick(),
which picks and claims the worker under pool->lock but, instead of waking
it, queues it on a caller-provided wake_q via wake_q_add(). The caller
issues the wakeup later with wake_up_q(). wake_q_add() is safe under the
lock (cmpxchg + get_task_struct()); only wake_up_q() takes rq->lock.
kick_pool() becomes a thin wrapper using a local wake_q, so all existing
callers keep waking under pool->lock.
Pure refactor, no functional change.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..fd3b5bc78df9e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -52,6 +52,7 @@
#include <linux/uaccess.h>
#include <linux/sched/isolation.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/nmi.h>
#include <linux/kvm_para.h>
#include <linux/delay.h>
@@ -1258,13 +1259,17 @@ static void kick_bh_pool(struct worker_pool *pool)
}
/**
- * kick_pool - wake up an idle worker if necessary
+ * kick_pool_pick - select and claim an idle worker, deferring the wakeup
* @pool: pool to kick
+ * @wakeq: wake_q to queue the selected worker on
*
- * @pool may have pending work items. Wake up worker if necessary. Returns
- * whether a worker was woken up.
+ * Like kick_pool() but queues the picked worker on @wakeq (wake_q_add())
+ * instead of waking it, so the caller can wake_up_q(@wakeq) after dropping
+ * pool->lock. Returns whether a worker was selected.
+ *
+ * Must be called with @pool->lock held.
*/
-static bool kick_pool(struct worker_pool *pool)
+static bool kick_pool_pick(struct worker_pool *pool, struct wake_q_head *wakeq)
{
struct worker *worker = first_idle_worker(pool);
struct task_struct *p;
@@ -1310,10 +1315,27 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
- wake_up_process(p);
+ wake_q_add(wakeq, p);
return true;
}
+/**
+ * kick_pool - wake up an idle worker if necessary
+ * @pool: pool to kick
+ *
+ * @pool may have pending work items. Wake up worker if necessary. Returns
+ * whether a worker was woken up.
+ */
+static bool kick_pool(struct worker_pool *pool)
+{
+ DEFINE_WAKE_Q(wakeq);
+ bool kicked;
+
+ kicked = kick_pool_pick(pool, &wakeq);
+ wake_up_q(&wakeq);
+ return kicked;
+}
+
#ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
/*
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
__queue_work() is the enqueue hot path: it inserts the work item and
calls kick_pool() while holding pool->lock. kick_pool() ends in a
wakeup, which takes the target task's rq->lock, so rq->lock nests under
pool->lock on every enqueue that wakes a worker on a contended unbound
pool.
Use kick_pool_pick() to select and claim the worker under pool->lock,
queue it on an on-stack wake_q, and issue the wakeup with wake_up_q()
right after dropping the lock. Worker selection, wake_cpu setup and
claiming the worker off pool->idle_list still happen under the lock;
only the rq->lock acquisition moves out.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fd3b5bc78df9e..972f783f98281 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2299,6 +2299,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
{
struct pool_workqueue *pwq;
struct worker_pool *last_pool, *pool;
+ DEFINE_WAKE_Q(wakeq);
unsigned int work_flags;
unsigned int req_cpu = cpu;
@@ -2421,7 +2422,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
trace_workqueue_activate_work(work);
insert_work(pwq, work, &pool->worklist, work_flags);
- kick_pool(pool);
+ kick_pool_pick(pool, &wakeq);
} else {
work_flags |= WORK_STRUCT_INACTIVE;
insert_work(pwq, work, &pwq->inactive_works, work_flags);
@@ -2429,6 +2430,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
out:
raw_spin_unlock(&pool->lock);
+ wake_up_q(&wakeq);
rcu_read_unlock();
}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work()
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
process_one_work() kicks the pool to chain execution of the remaining
work items on WORKER_NOT_RUNNING pools (the UNBOUND and CPU_INTENSIVE
ones), calling kick_pool() while holding pool->lock. As in the enqueue
path, the wakeup pulls the target rq->lock in under pool->lock.
Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_q() after the lock is dropped.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 972f783f98281..ab62af99852ce 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3251,6 +3251,7 @@ __acquires(&pool->lock)
{
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
+ DEFINE_WAKE_Q(wakeq);
unsigned long work_data;
int lockdep_start_depth, rcu_start_depth;
bool bh_draining = pool->flags & POOL_BH_DRAINING;
@@ -3305,7 +3306,7 @@ __acquires(&pool->lock)
* chain execution of the pending work items for WORKER_NOT_RUNNING
* workers such as the UNBOUND and CPU_INTENSIVE ones.
*/
- kick_pool(pool);
+ kick_pool_pick(pool, &wakeq);
/*
* Record the last pool and clear PENDING which should be the last
@@ -3317,6 +3318,7 @@ __acquires(&pool->lock)
pwq->stats[PWQ_STAT_STARTED]++;
raw_spin_unlock_irq(&pool->lock);
+ wake_up_q(&wakeq);
rcu_start_depth = rcu_preempt_depth();
lockdep_start_depth = lockdep_depth(current);
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/3] workqueue: Shrink the lock time
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
` (2 preceding siblings ...)
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
@ 2026-06-24 15:30 ` Sebastian Andrzej Siewior
3 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-24 15:30 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, Lai Jiangshan, linux-kernel, marco.crivellari,
frederic, Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On 2026-06-24 04:47:38 [-0700], Breno Leitao wrote:
> The goal of this patchset is to decrease the time spent under the
> workqueue pool->lock.
…
> Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
@ 2026-06-24 18:46 ` Tejun Heo
2026-06-25 16:51 ` Breno Leitao
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-06-24 18:46 UTC (permalink / raw)
To: Breno Leitao
Cc: Lai Jiangshan, bigeasy, linux-kernel, marco.crivellari, frederic,
Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On Wed, Jun 24, 2026 at 04:47:39AM -0700, Breno Leitao wrote:
> + wake_q_add(wakeq, p);
This is two extra atomic ops for every work item scheduling. This isn't
necessarily a deal braker but is this the only way to do this? Can't you
just stash the task pointer, extend irq disabled region and wake under rcu
protection?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
2026-06-24 18:46 ` Tejun Heo
@ 2026-06-25 16:51 ` Breno Leitao
0 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-25 16:51 UTC (permalink / raw)
To: Tejun Heo
Cc: Lai Jiangshan, bigeasy, linux-kernel, marco.crivellari, frederic,
Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On Wed, Jun 24, 2026 at 08:46:59AM -1000, Tejun Heo wrote:
> On Wed, Jun 24, 2026 at 04:47:39AM -0700, Breno Leitao wrote:
> > + wake_q_add(wakeq, p);
>
> This is two extra atomic ops for every work item scheduling. This isn't
> necessarily a deal braker but is this the only way to do this?
>
> Can't you just stash the task pointer, extend irq disabled region and wake under rcu
> protection?
Agreed, that's a better approach. Thanks for the suggestion.
I have a working PoC implementing this. I'll clean it up and post an
updated series.
Thanks,
--breno
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-25 16:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-24 18:46 ` Tejun Heo
2026-06-25 16:51 ` Breno Leitao
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox