From: Matthew Brost <matthew.brost@intel.com>
To: Waiman Long <longman@redhat.com>
Cc: <intel-xe@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
Carlos Santa <carlos.santa@intel.com>,
"Ryan Neph" <ryanneph@google.com>, <stable@vger.kernel.org>,
Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>
Subject: Re: [PATCH v2] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works
Date: Wed, 1 Apr 2026 08:40:01 -0700 [thread overview]
Message-ID: <ac08UdszEeEI2iJj@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <8eaf9c5e-70fc-4d68-a919-df371bb38283@redhat.com>
On Wed, Apr 01, 2026 at 10:44:55AM -0400, Waiman Long wrote:
> On 3/31/26 9:07 PM, Matthew Brost wrote:
> > In unplug_oldest_pwq(), the first inactive work item on the
> > pool_workqueue is activated correctly. However, if multiple inactive
> > works exist on the same pool_workqueue, subsequent works fail to
> > activate because wq_node_nr_active.pending_pwqs is empty — the list
> > insertion is skipped when the pool_workqueue is plugged.
> >
> > Fix this by checking for additional inactive works in
> > unplug_oldest_pwq() and updating wq_node_nr_active.pending_pwqs
> > accordingly.
> >
> > v2:
> > - Use pwq_activate_first_inactive(pwq, false) rather than open coding
> > list operations (Tejun)
> >
> > Cc: Carlos Santa <carlos.santa@intel.com>
> > Cc: Ryan Neph <ryanneph@google.com>
> > Cc: stable@vger.kernel.org
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: linux-kernel@vger.kernel.org
> > Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >
> > ---
> >
> > This bug was first reported by Google, where the Xe driver appeared to
> > hang due to a fencing signal not completing. We traced the issue to work
> > items not being scheduled, and it can be trivially reproduced on drm-tip
> > with the following commands:
> >
> > shell0:
> > for i in {1..100}; do echo "Run $i"; xe_exec_threads --r \
> > threads-rebind-bindexecqueue; done
> >
> > shell1:
> > for i in {1..1000}; do echo "toggle $i"; echo f > \
> > /sys/devices/virtual/workqueue/cpumask; echo ff > \
> > /sys/devices/virtual/workqueue/cpumask; echo fff > \
> > /sys/devices/virtual/workqueue/cpumask ; echo ffff > \
> > /sys/devices/virtual/workqueue/cpumask; sleep .1; done
> > ---
> > kernel/workqueue.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index b77119d71641..bee3f37fffde 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -1849,8 +1849,17 @@ static void unplug_oldest_pwq(struct workqueue_struct *wq)
> > raw_spin_lock_irq(&pwq->pool->lock);
> > if (pwq->plugged) {
> > pwq->plugged = false;
> > - if (pwq_activate_first_inactive(pwq, true))
> > + if (pwq_activate_first_inactive(pwq, true)) {
> > + /*
> > + * pwq is unbound. Additional inactive work_items need
> > + * to reinsert the pwq into nna->pending_pwqs, which
> > + * was skipped while pwq->plugged was true. See
> > + * pwq_tryinc_nr_active() for additional details.
> > + */
> > + pwq_activate_first_inactive(pwq, false);
> > +
> > kick_pool(pwq->pool);
> > + }
> > }
> > raw_spin_unlock_irq(&pwq->pool->lock);
> > }
>
> Thanks for fixing this bug. However, calling pwq_activate_first_inactive
No problem — I think this one has been lurking around for a while, and
we’ve just papered over it in Xe for a couple of years.
> twice can be a bit hard to understand. Will modifying pwq_tryinc_nr_active()
I actually think it makes quite a bit of sense, as it matches what
__queue_work does if two items are added back-to-back on an ordered
workqueue — the first one updates the nr_active counts and activates,
and the second one updates the pending_pwqs.
> like the following works?
>
My initial thought was that your snippet should work — in fact, it does
for a while (drm-tip hangs almost immediately), but eventually I do get
a hang when running my reproducer, whereas with this patch I don’t. I
can’t reason exactly why — maybe it’s because
node_activate_pending_pwq() can find a plugged pwq, but that’s just a
guess.
Matt
> Thanks,
> Longman
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index b77119d71641..b35e6e62e474 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1738,9 +1738,6 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
> goto out;
> }
> - if (unlikely(pwq->plugged))
> - return false;
> -
> /*
> * Unbound workqueue uses per-node shared nr_active $nna. If @pwq is
> * already waiting on $nna, pwq_dec_nr_active() will maintain the
> @@ -1749,13 +1746,19 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
> * We need to ignore the pending test after max_active has increased as
> * pwq_dec_nr_active() can only maintain the concurrency level but not
> * increase it. This is indicated by @fill.
> + *
> + * If @pwq is plugged, we need to make sure that it is linked to a
> + * pending_pwqs of a $nna.
> + *
> */
> - if (!list_empty(&pwq->pending_node) && likely(!fill))
> + if (!list_empty(&pwq->pending_node) && likely(!fill || pwq->plugged))
> goto out;
> - obtained = tryinc_node_nr_active(nna);
> - if (obtained)
> - goto out;
> + if (likely(!pwq->plugged)) {
> + obtained = tryinc_node_nr_active(nna);
> + if (obtained)
> + goto out;
> + }
> /*
> * Lockless acquisition failed. Lock, add ourself to $nna->pending_pwqs
> @@ -1773,7 +1776,8 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
> smp_mb();
> - obtained = tryinc_node_nr_active(nna);
> + if (likely(!pwq->plugged))
> + obtained = tryinc_node_nr_active(nna);
> /*
> * If @fill, @pwq might have already been pending. Being spuriously
>
next prev parent reply other threads:[~2026-04-01 15:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 1:07 [PATCH v2] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works Matthew Brost
2026-04-01 14:44 ` Waiman Long
2026-04-01 15:40 ` Matthew Brost [this message]
2026-04-01 18:04 ` Waiman Long
2026-04-01 20:20 ` Tejun Heo
2026-04-02 4:18 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac08UdszEeEI2iJj@gsse-cloud1.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=carlos.santa@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=ryanneph@google.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox