From: Breno Leitao <leitao@debian.org>
To: Hillf Danton <hdanton@sina.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
linux-kernel@vger.kernel.org, marco.crivellari@suse.com,
frederic@kernel.org, bigeasy@linutronix.de
Subject: Re: [PATCH 2/2] workqueue: defer wake_up_process() outside pool->lock on hot paths
Date: Wed, 27 May 2026 07:51:17 -0700 [thread overview]
Message-ID: <ahcEcR6ApWr7E4ID@gmail.com> (raw)
In-Reply-To: <aha6ENW-K-4F4AHi@gmail.com>
On Wed, May 27, 2026 at 10:48:12AM +0100, Breno Leitao wrote:
> I am still unclear how to fix it, but I am thinking about it.
I would say the best approach is to take the worker from the idle list
before waking it up.
worker_leave_idle(worker);
wake_up_process(p);
Not later when the worker is scheduled (worker_thread()).
commit f0a524a156b7563b9b37242aa42c822f7a8256e2
Author: Breno Leitao <leitao@debian.org>
Date: Wed May 27 07:39:37 2026 -0700
workqueue: claim picked idle worker under pool->lock in kick_pool()
kick_pool() picks an idle worker via first_idle_worker(), optionally
adjusts the wake_cpu and repatriates it, then wakes it via
wake_up_process() -- but it does not remove the picked worker from
pool->idle_list and does not clear WORKER_IDLE. Those transitions
happen later in worker_thread:woke_up: when the woken kthread
reacquires pool->lock and calls worker_leave_idle().
idle_cull_fn() walks pool->idle_list from the tail and calls
set_worker_dying() on entries whose last_active + IDLE_WORKER_TIMEOUT
has elapsed. set_worker_dying() only checks WORKER_IDLE; it does
not inspect task->__state. A worker that has been kicked but has
not yet rerun worker_leave_idle() therefore remains a valid cull
victim. If cull races in between the kick and the wakee reacquiring
pool->lock, the worker gets WORKER_DIE set and is reaped by
kthread_stop_put(). On wakeup it sees WORKER_DIE in worker_thread()
and returns 0 without consuming pool->worklist, stranding the
just-enqueued work item until the next pool event kicks somebody
else.
CPU A (kicker) CPU B (cull)
---------------- ----------------
raw_spin_lock(&pool->lock)
pick worker W (idle_list head)
wake_up_process(W->task)
raw_spin_unlock(&pool->lock)
raw_spin_lock(&pool->lock)
too_many_workers() == true
W is list_last_entry(idle_list)
set_worker_dying(W, cull_list)
raw_spin_unlock(&pool->lock)
reap_dying_workers()
kthread_stop_put(W->task)
(W finally schedules in)
worker_thread():
raw_spin_lock(&pool->lock)
WORKER_DIE set -> return 0
Today the window is bounded by scheduler latency between the kicker
dropping pool->lock and the wakee reacquiring it, plus the 5 minute
IDLE_WORKER_TIMEOUT precondition. Deferring wake_up_process()
outside pool->lock on the hot enqueue / process_one_work() paths
widens it considerably (IRQs re-enabled, fully preemptible).
Close the window by claiming the picked worker under pool->lock:
call worker_leave_idle() in kick_pool() before wake_up_process().
That removes the worker from pool->idle_list and clears WORKER_IDLE
while we still hold the lock, so set_worker_dying()'s WORKER_IDLE
precondition is no longer satisfied and idle_cull_fn()'s tail walk
cannot see the worker. Gate the existing worker_leave_idle() call
in worker_thread:woke_up: behind WORKER_IDLE, so first-time wakeups
from create_worker() (which do not go through kick_pool()) still
transition correctly.
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8df671066dd1..321df19f9fd8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1310,6 +1310,18 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
+ /*
+ * Claim @worker under pool->lock so it cannot race idle_cull_fn():
+ * once @worker is off pool->idle_list and no longer WORKER_IDLE,
+ * set_worker_dying() will skip it and the cull walk cannot reach
+ * it. Otherwise, the woken worker remains WORKER_IDLE on
+ * pool->idle_list until it actually schedules in and runs
+ * worker_leave_idle() in worker_thread:woke_up:, leaving a window
+ * where a concurrent idle_cull_fn() can flag it WORKER_DIE and
+ * kthread_stop_put() it before it consumes pool->worklist,
+ * stranding the just-enqueued work.
+ */
+ worker_leave_idle(worker);
wake_up_process(p);
return true;
}
@@ -3447,7 +3459,13 @@ static int worker_thread(void *__worker)
return 0;
}
- worker_leave_idle(worker);
+ /*
+ * Kicked workers have already been removed from pool->idle_list
+ * by kick_pool(); only first-time wakeups (via create_worker())
+ * still arrive with WORKER_IDLE set.
+ */
+ if (worker->flags & WORKER_IDLE)
+ worker_leave_idle(worker);
recheck:
/* no more worker necessary? */
if (!need_more_worker(pool))
next prev parent reply other threads:[~2026-05-27 14:51 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 18:08 [PATCH 0/2] workqueue: Shrink the lock time Breno Leitao
2026-05-26 18:08 ` [PATCH 1/2] workqueue: split kick_pool() into kick_pool_pick() + wake_up_process() Breno Leitao
2026-05-26 18:08 ` [PATCH 2/2] workqueue: defer wake_up_process() outside pool->lock on hot paths Breno Leitao
2026-05-26 21:23 ` Hillf Danton
2026-05-27 9:48 ` Breno Leitao
2026-05-27 14:51 ` Breno Leitao [this message]
2026-05-27 15:35 ` Sebastian Andrzej Siewior
2026-05-28 14:35 ` Breno Leitao
2026-06-01 17:26 ` Breno Leitao
2026-05-27 15:22 ` Sebastian Andrzej Siewior
2026-05-28 13:41 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahcEcR6ApWr7E4ID@gmail.com \
--to=leitao@debian.org \
--cc=bigeasy@linutronix.de \
--cc=frederic@kernel.org \
--cc=hdanton@sina.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marco.crivellari@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.