From: Breno Leitao <leitao@debian.org>
To: Hillf Danton <hdanton@sina.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
linux-kernel@vger.kernel.org, marco.crivellari@suse.com,
frederic@kernel.org, bigeasy@linutronix.de
Subject: Re: [PATCH 2/2] workqueue: defer wake_up_process() outside pool->lock on hot paths
Date: Wed, 27 May 2026 10:48:12 +0100 [thread overview]
Message-ID: <aha6ENW-K-4F4AHi@gmail.com> (raw)
In-Reply-To: <20260526212346.1100-1-hdanton@sina.com>
On Wed, May 27, 2026 at 05:23:40AM +0000, Hillf Danton wrote:
> On Tue, 26 May 2026 14:08:06 -0400 Breno Leitao wrote:
> > @@ -2423,6 +2424,15 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
> >
> > out:
> > raw_spin_unlock(&pool->lock);
> > + /*
> > + * Issue the wakeup after dropping pool->lock to shorten the
> > + * locked region on this hot enqueue path. kick_pool_pick() did all
> > + * of the work that required the lock (worker selection and
> > + * wake_cpu setup); the wake_up_process() itself only needs to
> > + * take the target rq->lock.
> > + */
> > + if (wake_p)
> > + wake_up_process(wake_p);
> > rcu_read_unlock();
> > }
> >
> I suspect this works without the pool lock held, go and see workers culled
> in idle_cull_fn() for example.
Ack, you're right that wake_up_process() itself works without pool->lock
held — idle_cull_fn() is a good precedent. But sashiko's point made me
look again, and I think there's a latent bug that should be fixed before
(or as part of) this patch.
When kick_pool() picks a worker and wakes it, the worker is flipped to
TASK_RUNNING by the scheduler, but from the workqueue's point of view it
is still WORKER_IDLE and still on pool->idle_list. The transition off
the idle list only happens later, when >the woken kthread actually
schedules in, reacquires pool->lock, and runs worker_leave_idle().
That matters because idle_cull_fn() never looks at the task's scheduler
state. It doesn't read task->__state or anything equivalent — its "is
this worker idle?" check is purely "is it on pool->idle_list, and is
last_active + IDLE_WORKER_TIMEOUT in the past?". So a worker that has
been woken but hasn't yet run worker_leave_idle() is, as far as cull is
concerned, a perfectly valid victim.
The race is then:
CPU A (kicker) CPU B (cull)
---------------- ----------------
raw_spin_lock(&pool->lock)
pick worker W (idle_list head)
wake_up_process(W->task)
raw_spin_unlock(&pool->lock)
raw_spin_lock(&pool->lock)
too_many_workers() == true
W is list_last_entry(idle_list)
set_worker_dying(W, cull_list)
raw_spin_unlock(&pool->lock)
reap_dying_workers()
kthread_stop_put(W->task)
(W finally schedules in)
worker_thread():
raw_spin_lock(&pool->lock)
WORKER_DIE set -> return 0
W exits without ever running worker_leave_idle() or touching
pool->worklist, and the work item we just enqueued is stranded until the
next pool event kicks somebody else.
This window is narrow in mainline today (bounded by scheduler latency
between the kicker dropping pool->lock and the wakee reacquiring it,
plus the 5 s IDLE_WORKER_TIMEOUT precondition), but my patch widens it
considerably — process_one_work()'s deferred-wake site re-enables IRQs
and is fully preemptible across the gap.
I am still unclear how to fix it, but I am thinking about it.
--
pw-bot: cr
next prev parent reply other threads:[~2026-05-27 9:48 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 18:08 [PATCH 0/2] workqueue: Shrink the lock time Breno Leitao
2026-05-26 18:08 ` [PATCH 1/2] workqueue: split kick_pool() into kick_pool_pick() + wake_up_process() Breno Leitao
2026-05-26 18:08 ` [PATCH 2/2] workqueue: defer wake_up_process() outside pool->lock on hot paths Breno Leitao
2026-05-26 21:23 ` Hillf Danton
2026-05-27 9:48 ` Breno Leitao [this message]
2026-05-27 14:51 ` Breno Leitao
2026-05-27 15:35 ` Sebastian Andrzej Siewior
2026-05-28 14:35 ` Breno Leitao
2026-06-01 17:26 ` Breno Leitao
2026-05-27 15:22 ` Sebastian Andrzej Siewior
2026-05-28 13:41 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aha6ENW-K-4F4AHi@gmail.com \
--to=leitao@debian.org \
--cc=bigeasy@linutronix.de \
--cc=frederic@kernel.org \
--cc=hdanton@sina.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marco.crivellari@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.