From: Breno Leitao <leitao@debian.org>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Song Liu <song@kernel.org>,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools
Date: Mon, 22 Jun 2026 04:14:04 -0700 [thread overview]
Message-ID: <ajkJueret31nlymY@gmail.com> (raw)
In-Reply-To: <ajVHYOh3DywrzB-h@pathway.suse.cz>
Hello Petr,
On Fri, Jun 19, 2026 at 03:42:56PM +0200, Petr Mladek wrote:
> It makes some sense. wq_watchdog_timer_fn() checks either
> 'per_cpu(wq_watchdog_touched_cpu)' or the global 'wq_watchdog_touched'
> depending whether pool->cpu is set or not. And it seems to be wrong
> for disassociated pools.
>
> But this seems to be an existing problem which should be fixed
> separately.
Good observation. For disassociated pools (where a CPU has been offlined),
pool->cpu remains set, only the workers' CPU affinity changes.
When a CPU goes offline, the pool becomes disassociated but pool->cpu still
points to the now-offline CPU.
Later in wq_watchdog_timer_fn(), when checking the stalled pool:
if (pool->cpu >= 0)
touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
This reads wq_watchdog_touched_cpu for the offline CPU, which is still being
updated by wq_watchdog_reset_touched() via for_each_possible_cpu()
(which updates CPU, including offlined CPUs).
Regardless of whether the CPU is online or offline,
wq_watchdog_reset_touched() will mark it as touched.
The real problem is that pool->cpu now names an offline CPU:
- the per-cpu "touched" heartbeat we consult is the wrong one. The pool's
work now runs on online CPUs (it behaves like an unbound pool), so the
global wq_watchdog_touched is the correct grace signal
- the same pool->cpu >= 0 test marks the pool cpu_stall and aims the new
single-CPU backtrace at the offline CPU.
So, I suppose we have a few options:
1) Set pool->cpu to -1 at dissociation time. But, that would lose the
cpu that would be necessary to rebind later. We would need to backup
pool->cpu if we decide to unset it.
int workqueue_online_cpu(unsigned int cpu) {
...
if (pool->cpu == cpu)
2) Treat the pool as cpuless if they are disassociated.
static int pool_watchdog_cpu(struct worker_pool *pool)
{
if (pool->cpu < 0 || (pool->flags & POOL_DISASSOCIATED))
return -1;
return pool->cpu;
}
and replace pool->cpu read by pool_watchdog_cpu() everywhere in the stall
code path. I lean towards 2).
Either way this is unrelated to this patchset, so my suggestion is:
1) I respin this RFC with your Reviewed-by + a cpu_online() check before
triggering the backtrace:
if (!found_running && cpu_online(cpu))
trigger_single_cpu_backtrace(cpu);
2) we continue the disassociated-pool discussion separately, so it does not
block this series.
Thanks,
--breno
next prev parent reply other threads:[~2026-06-22 11:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-19 12:58 ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-19 13:42 ` Petr Mladek
2026-06-22 11:14 ` Breno Leitao [this message]
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-19 15:40 ` Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajkJueret31nlymY@gmail.com \
--to=leitao@debian.org \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.