* [PATCH v4 0/3] workqueue: Shrink the lock time
@ 2026-06-24 11:47 Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
The goal of this patchset is to decrease the time spent under the
workqueue pool->lock.
Currently the worker process is woken up inside pool->lock. The wakeup
ends in wake_up_process(), which takes the target task's rq->lock, so
rq->lock nests under pool->lock on the two hottest paths of a contended
unbound workqueue (__queue_work() enqueue and process_one_work() chain
kick). On some architectures the wakeup is even more expensive: on
arm64 waking a CPU that is idle (in wfi) issues an IPI.
Doing all of that while holding pool->lock lengthens the locked region
and hurts throughput on contended unbound pools.
This series shortens the locked region by selecting and claiming the
worker to wake under pool->lock, but issuing the actual wakeup after the
lock is dropped, using the wake_q machinery (wake_q_add() under the
lock, wake_up_q() after).
Because the win is a shorter pool->lock hold time, it shows up most
clearly as lower enqueue latency under contention.
Performance numbers (based on in-kernel workqueue microbenchmark)
VMs and arm64 (Grace) is where this series is meant to pay off -- waking
an idle CPU sitting in wfi costs an IPI (on arm; similar type of
operation on VMs), so doing it under pool->lock lengthens the critical
section.
The arm64 bare-metal numbers match what the x86-or-arm64 VM showed:
affinity_scope baseline patched tput p95
(items/s) (items/s) gain drop
-------------- --------- --------- ------ ------
cpu 2,569,880 3,029,740 +17.9% -13.6%
smt 2,586,485 3,044,788 +17.7% -14.0%
cache_shard 572,055 797,621 +39.4% -37.1%
cache 538,132 724,997 +34.7% -30.1%
numa 528,673 658,215 +24.5% -20.5%
system 524,287 614,486 +17.2% -21.1%
(p95 drop = change in p95 enqueue latency; negative is better.)
(tput gain = number of requests enqueued per sec; bigger is better.)
Patch 1 is a pure refactor introducing kick_pool_pick().
Patch 2 defers the wakeup on the enqueue path (__queue_work()).
Patch 3 defers the wakeup on the per-work chain-kick path
(process_one_work()).
Signed-off-by: Breno Leitao <leitao@debian.org>
Changes in v4:
- replace raw_spin_unlock_wake() with a standard
raw_spin_unlock() + wake_up_q() (Sebastian Andrzej Siewior)
- Link to v3: https://lore.kernel.org/r/20260616-fastwake-v3-0-79da19fcd08f@debian.org
Changes in v3:
- Drop the "park kicked worker on pool->kicked_list" patch (v2 1/4).
* That is a fix that is independent of this patch, in case we want to
revamp it, it can be sent separately.
- Link to v2: https://lore.kernel.org/r/20260603-fastwake-v2-0-2977512fe7fa@debian.org
Changes in v2:
- Close the idle_cull_fn() vs kicked-worker race by parking the kicked
worker on a new pool->kicked_list under pool->lock (new patch 1).
Reported by Hillf Danton.
- Use the wake_q machinery (wake_q_add() / wake_up_q() via
raw_spin_unlock_wake()) instead of plumbing a task_struct out of the
helper by hand. Suggested by Sebastian Andrzej Siewior.
- Link to v1: https://lore.kernel.org/r/20260526-fastwake-v1-0-e69ad86923e6@debian.org
---
Breno Leitao (3):
workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
workqueue: defer the worker wakeup outside pool->lock in __queue_work()
workqueue: defer the worker wakeup outside pool->lock in process_one_work()
kernel/workqueue.c | 40 +++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
---
base-commit: 8d6dbbbe3ba62de0a63e962ee004afb848c8e3ac
change-id: 20260526-fastwake-02982fd66312
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
2026-06-24 18:46 ` Tejun Heo
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
Factor the worker selection out of kick_pool() into kick_pool_pick(),
which picks and claims the worker under pool->lock but, instead of waking
it, queues it on a caller-provided wake_q via wake_q_add(). The caller
issues the wakeup later with wake_up_q(). wake_q_add() is safe under the
lock (cmpxchg + get_task_struct()); only wake_up_q() takes rq->lock.
kick_pool() becomes a thin wrapper using a local wake_q, so all existing
callers keep waking under pool->lock.
Pure refactor, no functional change.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..fd3b5bc78df9e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -52,6 +52,7 @@
#include <linux/uaccess.h>
#include <linux/sched/isolation.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/nmi.h>
#include <linux/kvm_para.h>
#include <linux/delay.h>
@@ -1258,13 +1259,17 @@ static void kick_bh_pool(struct worker_pool *pool)
}
/**
- * kick_pool - wake up an idle worker if necessary
+ * kick_pool_pick - select and claim an idle worker, deferring the wakeup
* @pool: pool to kick
+ * @wakeq: wake_q to queue the selected worker on
*
- * @pool may have pending work items. Wake up worker if necessary. Returns
- * whether a worker was woken up.
+ * Like kick_pool() but queues the picked worker on @wakeq (wake_q_add())
+ * instead of waking it, so the caller can wake_up_q(@wakeq) after dropping
+ * pool->lock. Returns whether a worker was selected.
+ *
+ * Must be called with @pool->lock held.
*/
-static bool kick_pool(struct worker_pool *pool)
+static bool kick_pool_pick(struct worker_pool *pool, struct wake_q_head *wakeq)
{
struct worker *worker = first_idle_worker(pool);
struct task_struct *p;
@@ -1310,10 +1315,27 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
- wake_up_process(p);
+ wake_q_add(wakeq, p);
return true;
}
+/**
+ * kick_pool - wake up an idle worker if necessary
+ * @pool: pool to kick
+ *
+ * @pool may have pending work items. Wake up worker if necessary. Returns
+ * whether a worker was woken up.
+ */
+static bool kick_pool(struct worker_pool *pool)
+{
+ DEFINE_WAKE_Q(wakeq);
+ bool kicked;
+
+ kicked = kick_pool_pick(pool, &wakeq);
+ wake_up_q(&wakeq);
+ return kicked;
+}
+
#ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
/*
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
__queue_work() is the enqueue hot path: it inserts the work item and
calls kick_pool() while holding pool->lock. kick_pool() ends in a
wakeup, which takes the target task's rq->lock, so rq->lock nests under
pool->lock on every enqueue that wakes a worker on a contended unbound
pool.
Use kick_pool_pick() to select and claim the worker under pool->lock,
queue it on an on-stack wake_q, and issue the wakeup with wake_up_q()
right after dropping the lock. Worker selection, wake_cpu setup and
claiming the worker off pool->idle_list still happen under the lock;
only the rq->lock acquisition moves out.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fd3b5bc78df9e..972f783f98281 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2299,6 +2299,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
{
struct pool_workqueue *pwq;
struct worker_pool *last_pool, *pool;
+ DEFINE_WAKE_Q(wakeq);
unsigned int work_flags;
unsigned int req_cpu = cpu;
@@ -2421,7 +2422,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
trace_workqueue_activate_work(work);
insert_work(pwq, work, &pool->worklist, work_flags);
- kick_pool(pool);
+ kick_pool_pick(pool, &wakeq);
} else {
work_flags |= WORK_STRUCT_INACTIVE;
insert_work(pwq, work, &pwq->inactive_works, work_flags);
@@ -2429,6 +2430,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
out:
raw_spin_unlock(&pool->lock);
+ wake_up_q(&wakeq);
rcu_read_unlock();
}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work()
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
@ 2026-06-24 11:47 ` Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-24 11:47 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, bigeasy
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
process_one_work() kicks the pool to chain execution of the remaining
work items on WORKER_NOT_RUNNING pools (the UNBOUND and CPU_INTENSIVE
ones), calling kick_pool() while holding pool->lock. As in the enqueue
path, the wakeup pulls the target rq->lock in under pool->lock.
Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_q() after the lock is dropped.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 972f783f98281..ab62af99852ce 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3251,6 +3251,7 @@ __acquires(&pool->lock)
{
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
+ DEFINE_WAKE_Q(wakeq);
unsigned long work_data;
int lockdep_start_depth, rcu_start_depth;
bool bh_draining = pool->flags & POOL_BH_DRAINING;
@@ -3305,7 +3306,7 @@ __acquires(&pool->lock)
* chain execution of the pending work items for WORKER_NOT_RUNNING
* workers such as the UNBOUND and CPU_INTENSIVE ones.
*/
- kick_pool(pool);
+ kick_pool_pick(pool, &wakeq);
/*
* Record the last pool and clear PENDING which should be the last
@@ -3317,6 +3318,7 @@ __acquires(&pool->lock)
pwq->stats[PWQ_STAT_STARTED]++;
raw_spin_unlock_irq(&pool->lock);
+ wake_up_q(&wakeq);
rcu_start_depth = rcu_preempt_depth();
lockdep_start_depth = lockdep_depth(current);
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/3] workqueue: Shrink the lock time
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
` (2 preceding siblings ...)
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
@ 2026-06-24 15:30 ` Sebastian Andrzej Siewior
3 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-24 15:30 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, Lai Jiangshan, linux-kernel, marco.crivellari,
frederic, Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On 2026-06-24 04:47:38 [-0700], Breno Leitao wrote:
> The goal of this patchset is to decrease the time spent under the
> workqueue pool->lock.
…
> Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
@ 2026-06-24 18:46 ` Tejun Heo
2026-06-25 16:51 ` Breno Leitao
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-06-24 18:46 UTC (permalink / raw)
To: Breno Leitao
Cc: Lai Jiangshan, bigeasy, linux-kernel, marco.crivellari, frederic,
Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On Wed, Jun 24, 2026 at 04:47:39AM -0700, Breno Leitao wrote:
> + wake_q_add(wakeq, p);
This is two extra atomic ops for every work item scheduling. This isn't
necessarily a deal braker but is this the only way to do this? Can't you
just stash the task pointer, extend irq disabled region and wake under rcu
protection?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q()
2026-06-24 18:46 ` Tejun Heo
@ 2026-06-25 16:51 ` Breno Leitao
0 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-25 16:51 UTC (permalink / raw)
To: Tejun Heo
Cc: Lai Jiangshan, bigeasy, linux-kernel, marco.crivellari, frederic,
Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On Wed, Jun 24, 2026 at 08:46:59AM -1000, Tejun Heo wrote:
> On Wed, Jun 24, 2026 at 04:47:39AM -0700, Breno Leitao wrote:
> > + wake_q_add(wakeq, p);
>
> This is two extra atomic ops for every work item scheduling. This isn't
> necessarily a deal braker but is this the only way to do this?
>
> Can't you just stash the task pointer, extend irq disabled region and wake under rcu
> protection?
Agreed, that's a better approach. Thanks for the suggestion.
I have a working PoC implementing this. I'll clean it up and post an
updated series.
Thanks,
--breno
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-25 16:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 11:47 [PATCH v4 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-24 11:47 ` [PATCH v4 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-24 18:46 ` Tejun Heo
2026-06-25 16:51 ` Breno Leitao
2026-06-24 11:47 ` [PATCH v4 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-24 11:47 ` [PATCH v4 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-24 15:30 ` [PATCH v4 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.