From: Breno Leitao <leitao@debian.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
linux-kernel@vger.kernel.org, marco.crivellari@suse.com,
frederic@kernel.org, Hillf Danton <hdanton@sina.com>,
kernel-team@meta.com, kmagar@redhat.com, psuriset@redhat.com
Subject: Re: [PATCH v3 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()
Date: Wed, 24 Jun 2026 04:19:02 -0700 [thread overview]
Message-ID: <aju8GR04O-haKOTE@gmail.com> (raw)
In-Reply-To: <20260624084756.X2i4QiPm@linutronix.de>
Hello Sebastian,
On Wed, Jun 24, 2026 at 10:47:56AM +0200, Sebastian Andrzej Siewior wrote:
> On 2026-06-16 06:33:32 [-0700], Breno Leitao wrote:
> > __queue_work() is the enqueue hot path: it inserts the work item and
> > calls kick_pool() while holding pool->lock. kick_pool() ends in a
> > wakeup, which takes the target task's rq->lock, so rq->lock nests under
> > pool->lock on every enqueue that wakes a worker on a contended unbound
> > pool.
> >
> > Use kick_pool_pick() to select and claim the worker under pool->lock,
> > queue it on an on-stack wake_q, and issue the wakeup with wake_up_q()
> > right after dropping the lock via raw_spin_unlock_wake(). Worker
> > selection, wake_cpu setup and claiming the worker off pool->idle_list
> > still happen under the lock; only the rq->lock acquisition moves out.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> …
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> …
> > out:
> > - raw_spin_unlock(&pool->lock);
> > + raw_spin_unlock_wake(&pool->lock, &wakeq);
> > rcu_read_unlock();
> > }
>
> It is not wrong but I am not sure if this is really needed here. The
> pattern
> preempt_disable();
> raw_spin_unlock();
> wake_up_q()
> preempt_enable();
>
> is used to prevent task preemption after the unlock operation. The
> futex/ locking code needs to wake a task but before the unlock operation
> the task priority might have been lowered as result of dropping the
> lock. This means it might not be the task with the highest priority in
> the system and the task with highest is not yet active and we schedule a
> task in middle in the instead.
> To form this easier: say we have Task A prio=1, B prio=2 and C prio=3
> with higher number higher priority.
> A owns a lock and is preempted by B. C gets on the CPU preempts B. C
> wants A's lock so it passes its priority to A (PI-boost) and goes idle.
> A gets on the CPU and unlocks.
> Now: As part of the unlock operation (before the raw_spin_unlock()) A
> goes back to its initial priority and C is not yet woken up meaning B is
> the task with highest priority and the held raw_spinlock_t is the only
> thing preventing scheduling. So we disable preemption, unlock and then
> perform the wake-up so C becomes the next candidate (as it should be) to
> occupy the CPU.
> Otherwise it would be B which means per definition a task with lower
> priority runs before a task with higher priority.
>
> I don't think workqueue has this requirements here. Worst case something
> else gets on the CPU and worker wakeup is delayed until the task is
> scheduled again. It could be fine since the preemption could happen
> before queue-work().
> It probably does not lead to a huge performance
> regression but who knows.
>
> Your goal was to lower the contention on the pool lock during the wake
> up which you achieved. The __queue_work() remains still not preemptible
> until after the wake up which might be fine.
You're absolutely right — the preemption dance isn't needed here.
I used raw_spin_unlock_wake() because the helper seemed to match what I
was doing, but this preempt_disable/enable pattern (while cheap) is
unnecessary for this use case.
I'll update the series to use the standard approach:
raw_spin_unlock_irq(&pool->lock);
wake_up_q(&wakeq);
Thanks,
--breno
next prev parent reply other threads:[~2026-06-24 11:19 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 13:33 [PATCH v3 0/3] workqueue: Shrink the lock time Breno Leitao
2026-06-16 13:33 ` [PATCH v3 1/3] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-16 13:33 ` [PATCH v3 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-24 8:47 ` Sebastian Andrzej Siewior
2026-06-24 11:19 ` Breno Leitao [this message]
2026-06-16 13:33 ` [PATCH v3 3/3] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-24 8:54 ` [PATCH v3 0/3] workqueue: Shrink the lock time Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aju8GR04O-haKOTE@gmail.com \
--to=leitao@debian.org \
--cc=bigeasy@linutronix.de \
--cc=frederic@kernel.org \
--cc=hdanton@sina.com \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=kmagar@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marco.crivellari@suse.com \
--cc=psuriset@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox