All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] workqueue: Shrink the lock time
@ 2026-06-26  9:57 Breno Leitao
  2026-06-26  9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26  9:57 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

The goal of this patchset is to decrease the time spent under the
workqueue pool->lock.

Currently the worker process is woken up inside pool->lock. The wakeup
ends in wake_up_process(), which takes the target task's rq->lock, so
rq->lock nests under pool->lock on the two hottest paths of a contended
unbound workqueue (__queue_work() enqueue and process_one_work() chain
kick). On some architectures the wakeup is even more expensive: on
arm64 waking a CPU that is idle (in wfi) issues an IPI.

Doing all of that while holding pool->lock lengthens the locked region
and hurts throughput on contended unbound pools.

This series shortens the locked region by selecting and claiming the
worker to wake under pool->lock, but issuing the actual wakeup after the
lock is dropped, using the wake_q machinery (wake_q_add() under the
lock, wake_up_q() after).

Because the win is a shorter pool->lock hold time, it shows up most
clearly as lower enqueue latency under contention.

Performance numbers (based on in-kernel workqueue microbenchmark)

VMs and arm64 (Grace) is where this series is meant to pay off -- waking
an idle CPU sitting in wfi costs an IPI (on arm; similar type of
operation on VMs), so doing it under pool->lock lengthens the critical
section.

Latested number (from v5) on a Grace arm64 host:

      affinity_scope    baseline    patched    tput     p95
                       (items/s)  (items/s)    gain    drop
      --------------   ---------  ---------  ------  ------
      cpu              3,580,440  3,486,014   -2.6%   +3.5%
      smt              3,545,763  3,512,633   -0.9%   +2.8%
      cache_shard      3,397,678  3,651,063   +7.5%   -4.2%
      cache              720,368    797,914  +10.8%   -9.8%
      numa               719,794    794,049  +10.3%  -10.3%
      system             721,058    798,010  +10.7%  -10.0%

Signed-off-by: Breno Leitao <leitao@debian.org>

Changes in v5:
- Use wake_up_process() instead of the fancy wake_q_add() as raised by
  tejun.
- Dropped the review-by from Sebastian, given the code changed.
- Link to v4: https://lore.kernel.org/r/20260624-fastwake-v4-0-7b6d7b494a44@debian.org

Changes in v4:
- replace raw_spin_unlock_wake() with a standard 
  raw_spin_unlock() + wake_up_q() (Sebastian Andrzej Siewior)
- Link to v3: https://lore.kernel.org/r/20260616-fastwake-v3-0-79da19fcd08f@debian.org

Changes in v3:
- Drop the "park kicked worker on pool->kicked_list" patch (v2 1/4).
  * That is a fix that is independent of this patch, in case we want to
    revamp it, it can be sent separately.
- Link to v2: https://lore.kernel.org/r/20260603-fastwake-v2-0-2977512fe7fa@debian.org

Changes in v2:
- Close the idle_cull_fn() vs kicked-worker race by parking the kicked
  worker on a new pool->kicked_list under pool->lock (new patch 1).
  Reported by Hillf Danton.
- Use the wake_q machinery (wake_q_add() / wake_up_q() via
  raw_spin_unlock_wake()) instead of plumbing a task_struct out of the
  helper by hand. Suggested by Sebastian Andrzej Siewior.
- Link to v1: https://lore.kernel.org/r/20260526-fastwake-v1-0-e69ad86923e6@debian.org

---
Breno Leitao (3):
      workqueue: split kick_pool() into kick_pool_pick()
      workqueue: defer the worker wakeup outside pool->lock in __queue_work()
      workqueue: defer the worker wakeup outside pool->lock in process_one_work()

 kernel/workqueue.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)
---
base-commit: 8d6dbbbe3ba62de0a63e962ee004afb848c8e3ac
change-id: 20260526-fastwake-02982fd66312

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick()
  2026-06-26  9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
@ 2026-06-26  9:57 ` Breno Leitao
  2026-06-26  9:57 ` [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26  9:57 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

Factor the worker selection out of kick_pool() into kick_pool_pick(),
which picks and claims the worker under pool->lock but, instead of waking
it, returns the worker's task via an out-param so the caller can issue
the wakeup after dropping pool->lock. BH kicks and wake_cpu setup still
happen under the lock.

kick_pool() becomes a thin wrapper that wakes the returned task, so all
existing callers keep waking under pool->lock.

Pure refactor, no functional change.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..e855d15c1fb7b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1258,19 +1258,27 @@ static void kick_bh_pool(struct worker_pool *pool)
 }
 
 /**
- * kick_pool - wake up an idle worker if necessary
+ * kick_pool_pick - select an idle worker to kick, deferring the wakeup
  * @pool: pool to kick
+ * @wakep: out-param, set to the task to wake after pool->lock is dropped
  *
- * @pool may have pending work items. Wake up worker if necessary. Returns
- * whether a worker was woken up.
+ * Like kick_pool() but, for a regular (non-BH) pool, returns the picked
+ * worker's task via @wakep instead of waking it, so the caller can issue the
+ * wakeup after dropping pool->lock (the wakeup takes rq->lock). Worker
+ * selection, wake_cpu setup and the BH kick still happen under the lock.
+ * Returns whether a worker was selected or kicked.
+ *
+ * Must be called with @pool->lock held.
  */
-static bool kick_pool(struct worker_pool *pool)
+static bool kick_pool_pick(struct worker_pool *pool, struct task_struct **wakep)
 {
 	struct worker *worker = first_idle_worker(pool);
 	struct task_struct *p;
 
 	lockdep_assert_held(&pool->lock);
 
+	*wakep = NULL;
+
 	if (!need_more_worker(pool) || !worker)
 		return false;
 
@@ -1310,10 +1318,27 @@ static bool kick_pool(struct worker_pool *pool)
 		}
 	}
 #endif
-	wake_up_process(p);
+	*wakep = p;
 	return true;
 }
 
+/**
+ * kick_pool - wake up an idle worker if necessary
+ * @pool: pool to kick
+ *
+ * @pool may have pending work items. Wake up worker if necessary. Returns
+ * whether a worker was woken up.
+ */
+static bool kick_pool(struct worker_pool *pool)
+{
+	struct task_struct *p;
+	bool kicked = kick_pool_pick(pool, &p);
+
+	if (p)
+		wake_up_process(p);
+	return kicked;
+}
+
 #ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
 
 /*

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()
  2026-06-26  9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
  2026-06-26  9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
@ 2026-06-26  9:57 ` Breno Leitao
  2026-06-26  9:57 ` [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
  2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
  3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26  9:57 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

__queue_work() is the enqueue hot path: it inserts the work item and
calls kick_pool() while holding pool->lock. kick_pool() ends in a
wakeup, which takes the target task's rq->lock, so rq->lock nests under
pool->lock on every enqueue that wakes a worker on a contended unbound
pool.

Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_process() right after dropping the lock.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e855d15c1fb7b..594592768ef10 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2302,6 +2302,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 {
 	struct pool_workqueue *pwq;
 	struct worker_pool *last_pool, *pool;
+	struct task_struct *wake_task = NULL;
 	unsigned int work_flags;
 	unsigned int req_cpu = cpu;
 
@@ -2424,7 +2425,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 		trace_workqueue_activate_work(work);
 		insert_work(pwq, work, &pool->worklist, work_flags);
-		kick_pool(pool);
+		kick_pool_pick(pool, &wake_task);
 	} else {
 		work_flags |= WORK_STRUCT_INACTIVE;
 		insert_work(pwq, work, &pwq->inactive_works, work_flags);
@@ -2432,6 +2433,8 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 out:
 	raw_spin_unlock(&pool->lock);
+	if (wake_task)
+		wake_up_process(wake_task);
 	rcu_read_unlock();
 }
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work()
  2026-06-26  9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
  2026-06-26  9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
  2026-06-26  9:57 ` [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
@ 2026-06-26  9:57 ` Breno Leitao
  2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
  3 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26  9:57 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	Breno Leitao, kernel-team, kmagar, psuriset, david.dai

Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_process() after the lock is dropped.

Unlike __queue_work(), this path has no surrounding RCU section, so take
rcu_read_lock() before dropping pool->lock to keep the picked worker's
task_struct valid across the wakeup.

Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Krishna Magar <kmagar@redhat.com>
---
 kernel/workqueue.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 594592768ef10..640590d270ce5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3255,6 +3255,7 @@ __acquires(&pool->lock)
 {
 	struct pool_workqueue *pwq = get_work_pwq(work);
 	struct worker_pool *pool = worker->pool;
+	struct task_struct *wake_task = NULL;
 	unsigned long work_data;
 	int lockdep_start_depth, rcu_start_depth;
 	bool bh_draining = pool->flags & POOL_BH_DRAINING;
@@ -3308,8 +3309,11 @@ __acquires(&pool->lock)
 	 * since nr_running would always be >= 1 at this point. This is used to
 	 * chain execution of the pending work items for WORKER_NOT_RUNNING
 	 * workers such as the UNBOUND and CPU_INTENSIVE ones.
+	 *
+	 * Select the worker under pool->lock; the wakeup is deferred until
+	 * after the lock is dropped, guarded by the rcu_read_lock() below.
 	 */
-	kick_pool(pool);
+	kick_pool_pick(pool, &wake_task);
 
 	/*
 	 * Record the last pool and clear PENDING which should be the last
@@ -3320,7 +3324,12 @@ __acquires(&pool->lock)
 	set_work_pool_and_clear_pending(work, pool->id, pool_offq_flags(pool));
 
 	pwq->stats[PWQ_STAT_STARTED]++;
+
+	rcu_read_lock();
 	raw_spin_unlock_irq(&pool->lock);
+	if (wake_task)
+		wake_up_process(wake_task);
+	rcu_read_unlock();
 
 	rcu_start_depth = rcu_preempt_depth();
 	lockdep_start_depth = lockdep_depth(current);

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 0/3] workqueue: Shrink the lock time
  2026-06-26  9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
                   ` (2 preceding siblings ...)
  2026-06-26  9:57 ` [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
@ 2026-06-29 18:24 ` Tejun Heo
  2026-06-29 19:31   ` Breno Leitao
  3 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-06-29 18:24 UTC (permalink / raw)
  To: Breno Leitao, Lai Jiangshan
  Cc: linux-kernel, marco.crivellari, frederic, bigeasy, Hillf Danton,
	kernel-team, kmagar, psuriset, david.dai

Hello,

> Breno Leitao (3):
>       workqueue: split kick_pool() into kick_pool_pick()
>       workqueue: defer the worker wakeup outside pool->lock in __queue_work()
>       workqueue: defer the worker wakeup outside pool->lock in process_one_work()

Applied 1-3 to wq/for-7.3.

Thanks.
-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 0/3] workqueue: Shrink the lock time
  2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
@ 2026-06-29 19:31   ` Breno Leitao
  2026-06-29 20:05     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-29 19:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, linux-kernel, marco.crivellari, frederic, bigeasy,
	Hillf Danton, kernel-team, kmagar, psuriset, david.dai

On Mon, Jun 29, 2026 at 08:24:26AM -1000, Tejun Heo wrote:
> Hello,
> 
> > Breno Leitao (3):
> >       workqueue: split kick_pool() into kick_pool_pick()
> >       workqueue: defer the worker wakeup outside pool->lock in __queue_work()
> >       workqueue: defer the worker wakeup outside pool->lock in process_one_work()
> 
> Applied 1-3 to wq/for-7.3.

Thanks Tejun,

If you don't have any objection, I will prepare and send the the proper
patches for the RFC I've sent about workqueue stalls dump.

https://lore.kernel.org/all/20260616-wq_dump_petr-v1-0-b57473ca6d18@debian.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 0/3] workqueue: Shrink the lock time
  2026-06-29 19:31   ` Breno Leitao
@ 2026-06-29 20:05     ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2026-06-29 20:05 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Lai Jiangshan, linux-kernel, marco.crivellari, frederic, bigeasy,
	Hillf Danton, kernel-team, kmagar, psuriset, david.dai

Hello, Breno.

Please go ahead and send the non-RFC version.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-29 20:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26  9:57 [PATCH v5 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-26  9:57 ` [PATCH v5 1/3] workqueue: split kick_pool() into kick_pool_pick() Breno Leitao
2026-06-26  9:57 ` [PATCH v5 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-26  9:57 ` [PATCH v5 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-29 18:24 ` [PATCH v5 0/3] workqueue: Shrink the lock time Tejun Heo
2026-06-29 19:31   ` Breno Leitao
2026-06-29 20:05     ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.