* [PATCH v5 0/3] workqueue: Shrink the lock time
@ 2026-06-26 9:57 Breno Leitao
2026-06-26 9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 9:57 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
The goal of this patchset is to decrease the time spent under the
workqueue pool->lock.
Currently the worker process is woken up inside pool->lock. The wakeup
ends in wake_up_process(), which takes the target task's rq->lock, so
rq->lock nests under pool->lock on the two hottest paths of a contended
unbound workqueue (__queue_work() enqueue and process_one_work() chain
kick). On some architectures the wakeup is even more expensive: on
arm64 waking a CPU that is idle (in wfi) issues an IPI.
Doing all of that while holding pool->lock lengthens the locked region
and hurts throughput on contended unbound pools.
This series shortens the locked region by selecting and claiming the
worker to wake under pool->lock, but issuing the actual wakeup after the
lock is dropped, using the wake_q machinery (wake_q_add() under the
lock, wake_up_q() after).
Because the win is a shorter pool->lock hold time, it shows up most
clearly as lower enqueue latency under contention.
Performance numbers (based on in-kernel workqueue microbenchmark)
VMs and arm64 (Grace) is where this series is meant to pay off -- waking
an idle CPU sitting in wfi costs an IPI (on arm; similar type of
operation on VMs), so doing it under pool->lock lengthens the critical
section.
Latested number (from v5) on a Grace arm64 host:
affinity_scope baseline patched tput p95
(items/s) (items/s) gain drop
-------------- --------- --------- ------ ------
cpu 3,580,440 3,486,014 -2.6% +3.5%
smt 3,545,763 3,512,633 -0.9% +2.8%
cache_shard 3,397,678 3,651,063 +7.5% -4.2%
cache 720,368 797,914 +10.8% -9.8%
numa 719,794 794,049 +10.3% -10.3%
system 721,058 798,010 +10.7% -10.0%
Signed-off-by: Breno Leitao <leitao@debian.org>
Changes in v5:
- Use wake_up_process() instead of the fancy wake_q_add() as raised by
tejun.
- Dropped the review-by from Sebastian, given the code changed.
- Link to v4: https://lore.kernel.org/r/20260624-fastwake-v4-0-7b6d7b494a44@debian.org
Changes in v4:
- replace raw_spin_unlock_wake() with a standard
raw_spin_unlock() + wake_up_q() (Sebastian Andrzej Siewior)
- Link to v3: https://lore.kernel.org/r/20260616-fastwake-v3-0-79da19fcd08f@debian.org
Changes in v3:
- Drop the "park kicked worker on pool->kicked_list" patch (v2 1/4).
* That is a fix that is independent of this patch, in case we want to
revamp it, it can be sent separately.
- Link to v2: https://lore.kernel.org/r/20260603-fastwake-v2-0-2977512fe7fa@debian.org
Changes in v2:
- Close the idle_cull_fn() vs kicked-worker race by parking the kicked
worker on a new pool->kicked_list under pool->lock (new patch 1).
Reported by Hillf Danton.
- Use the wake_q machinery (wake_q_add() / wake_up_q() via
raw_spin_unlock_wake()) instead of plumbing a task_struct out of the
helper by hand. Suggested by Sebastian Andrzej Siewior.
- Link to v1: https://lore.kernel.org/r/20260526-fastwake-v1-0-e69ad86923e6@debian.org
---
Breno Leitao (3):
workqueue: split kick_pool() into kick_pool_pick()
workqueue: defer the worker wakeup outside pool->lock in __queue_work()
workqueue: defer the worker wakeup outside pool->lock in process_one_work()
kernel/workqueue.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 44 insertions(+), 7 deletions(-)
---
base-commit: 8d6dbbbe3ba62de0a63e962ee004afb848c8e3ac
change-id: 20260526-fastwake-02982fd66312
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick()
2026-06-26 9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
@ 2026-06-26 9:57 ` Breno Leitao
2026-06-26 9:57 ` [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 9:57 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
Factor the worker selection out of kick_pool() into kick_pool_pick(),
which picks and claims the worker under pool->lock but, instead of waking
it, returns the worker's task via an out-param so the caller can issue
the wakeup after dropping pool->lock. BH kicks and wake_cpu setup still
happen under the lock.
kick_pool() becomes a thin wrapper that wakes the returned task, so all
existing callers keep waking under pool->lock.
Pure refactor, no functional change.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..e855d15c1fb7b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1258,19 +1258,27 @@ static void kick_bh_pool(struct worker_pool *pool)
}
/**
- * kick_pool - wake up an idle worker if necessary
+ * kick_pool_pick - select an idle worker to kick, deferring the wakeup
* @pool: pool to kick
+ * @wakep: out-param, set to the task to wake after pool->lock is dropped
*
- * @pool may have pending work items. Wake up worker if necessary. Returns
- * whether a worker was woken up.
+ * Like kick_pool() but, for a regular (non-BH) pool, returns the picked
+ * worker's task via @wakep instead of waking it, so the caller can issue the
+ * wakeup after dropping pool->lock (the wakeup takes rq->lock). Worker
+ * selection, wake_cpu setup and the BH kick still happen under the lock.
+ * Returns whether a worker was selected or kicked.
+ *
+ * Must be called with @pool->lock held.
*/
-static bool kick_pool(struct worker_pool *pool)
+static bool kick_pool_pick(struct worker_pool *pool, struct task_struct **wakep)
{
struct worker *worker = first_idle_worker(pool);
struct task_struct *p;
lockdep_assert_held(&pool->lock);
+ *wakep = NULL;
+
if (!need_more_worker(pool) || !worker)
return false;
@@ -1310,10 +1318,27 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
- wake_up_process(p);
+ *wakep = p;
return true;
}
+/**
+ * kick_pool - wake up an idle worker if necessary
+ * @pool: pool to kick
+ *
+ * @pool may have pending work items. Wake up worker if necessary. Returns
+ * whether a worker was woken up.
+ */
+static bool kick_pool(struct worker_pool *pool)
+{
+ struct task_struct *p;
+ bool kicked = kick_pool_pick(pool, &p);
+
+ if (p)
+ wake_up_process(p);
+ return kicked;
+}
+
#ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
/*
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()
2026-06-26 9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-26 9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
@ 2026-06-26 9:57 ` Breno Leitao
2026-06-26 9:57 ` [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 9:57 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
__queue_work() is the enqueue hot path: it inserts the work item and
calls kick_pool() while holding pool->lock. kick_pool() ends in a
wakeup, which takes the target task's rq->lock, so rq->lock nests under
pool->lock on every enqueue that wakes a worker on a contended unbound
pool.
Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_process() right after dropping the lock.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e855d15c1fb7b..594592768ef10 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2302,6 +2302,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
{
struct pool_workqueue *pwq;
struct worker_pool *last_pool, *pool;
+ struct task_struct *wake_task = NULL;
unsigned int work_flags;
unsigned int req_cpu = cpu;
@@ -2424,7 +2425,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
trace_workqueue_activate_work(work);
insert_work(pwq, work, &pool->worklist, work_flags);
- kick_pool(pool);
+ kick_pool_pick(pool, &wake_task);
} else {
work_flags |= WORK_STRUCT_INACTIVE;
insert_work(pwq, work, &pwq->inactive_works, work_flags);
@@ -2432,6 +2433,8 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
out:
raw_spin_unlock(&pool->lock);
+ if (wake_task)
+ wake_up_process(wake_task);
rcu_read_unlock();
}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work()
2026-06-26 9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-26 9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
2026-06-26 9:57 ` [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
@ 2026-06-26 9:57 ` Breno Leitao
2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 9:57 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
Breno Leitao, kernel-team, kmagar, psuriset, david.dai
Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_process() after the lock is dropped.
Unlike __queue_work(), this path has no surrounding RCU section, so take
rcu_read_lock() before dropping pool->lock to keep the picked worker's
task_struct valid across the wakeup.
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Krishna Magar <kmagar@redhat.com>
---
kernel/workqueue.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 594592768ef10..640590d270ce5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3255,6 +3255,7 @@ __acquires(&pool->lock)
{
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
+ struct task_struct *wake_task = NULL;
unsigned long work_data;
int lockdep_start_depth, rcu_start_depth;
bool bh_draining = pool->flags & POOL_BH_DRAINING;
@@ -3308,8 +3309,11 @@ __acquires(&pool->lock)
* since nr_running would always be >= 1 at this point. This is used to
* chain execution of the pending work items for WORKER_NOT_RUNNING
* workers such as the UNBOUND and CPU_INTENSIVE ones.
+ *
+ * Select the worker under pool->lock; the wakeup is deferred until
+ * after the lock is dropped, guarded by the rcu_read_lock() below.
*/
- kick_pool(pool);
+ kick_pool_pick(pool, &wake_task);
/*
* Record the last pool and clear PENDING which should be the last
@@ -3320,7 +3324,12 @@ __acquires(&pool->lock)
set_work_pool_and_clear_pending(work, pool->id, pool_offq_flags(pool));
pwq->stats[PWQ_STAT_STARTED]++;
+
+ rcu_read_lock();
raw_spin_unlock_irq(&pool->lock);
+ if (wake_task)
+ wake_up_process(wake_task);
+ rcu_read_unlock();
rcu_start_depth = rcu_preempt_depth();
lockdep_start_depth = lockdep_depth(current);
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/3] workqueue: Shrink the lock time
2026-06-26 9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
` (2 preceding siblings ...)
2026-06-26 9:57 ` [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
@ 2026-06-29 18:24 ` Tejun Heo
2026-06-29 19:31 ` Breno Leitao
3 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-06-29 18:24 UTC (permalink / raw)
To: Breno Leitao, Lai Jiangshan
Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
kernel-team, kmagar, psuriset, david.dai
Hello,
> Breno Leitao (3):
> workqueue: split kick_pool() into kick_pool_pick()
> workqueue: defer the worker wakeup outside pool->lock in __queue_work()
> workqueue: defer the worker wakeup outside pool->lock in process_one_work()
Applied 1-3 to wq/for-7.3.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/3] workqueue: Shrink the lock time
2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
@ 2026-06-29 19:31 ` Breno Leitao
2026-06-29 20:05 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-29 19:31 UTC (permalink / raw)
To: Tejun Heo
Cc: Lai Jiangshan, linux-kernel, marco.crivellari, frederic, bigeasy,
Hillf Danton, kernel-team, kmagar, psuriset, david.dai
On Mon, Jun 29, 2026 at 08:24:26AM -1000, Tejun Heo wrote:
> Hello,
>
> > Breno Leitao (3):
> > workqueue: split kick_pool() into kick_pool_pick()
> > workqueue: defer the worker wakeup outside pool->lock in __queue_work()
> > workqueue: defer the worker wakeup outside pool->lock in process_one_work()
>
> Applied 1-3 to wq/for-7.3.
Thanks Tejun,
If you don't have any objection, I will prepare and send the the proper
patches for the RFC I've sent about workqueue stalls dump.
https://lore.kernel.org/all/20260616-wq_dump_petr-v1-0-b57473ca6d18@debian.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/3] workqueue: Shrink the lock time
2026-06-29 19:31 ` Breno Leitao
@ 2026-06-29 20:05 ` Tejun Heo
0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2026-06-29 20:05 UTC (permalink / raw)
To: Breno Leitao
Cc: Lai Jiangshan, linux-kernel, marco.crivellari, frederic, bigeasy,
Hillf Danton, kernel-team, kmagar, psuriset, david.dai
Hello, Breno.
Please go ahead and send the non-RFC version.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-29 20:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-26 9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
2026-06-26 9:57 ` [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-26 9:57 ` [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
2026-06-29 19:31 ` Breno Leitao
2026-06-29 20:05 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox