From: Breno Leitao <leitao@debian.org>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Song Liu <song@kernel.org>,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools
Date: Mon, 22 Jun 2026 04:14:04 -0700 [thread overview]
Message-ID: <ajkJueret31nlymY@gmail.com> (raw)
In-Reply-To: <ajVHYOh3DywrzB-h@pathway.suse.cz>
Hello Petr,
On Fri, Jun 19, 2026 at 03:42:56PM +0200, Petr Mladek wrote:
> It makes some sense. wq_watchdog_timer_fn() checks either
> 'per_cpu(wq_watchdog_touched_cpu)' or the global 'wq_watchdog_touched'
> depending whether pool->cpu is set or not. And it seems to be wrong
> for disassociated pools.
>
> But this seems to be an existing problem which should be fixed
> separately.
Good observation. For disassociated pools (where a CPU has been offlined),
pool->cpu remains set, only the workers' CPU affinity changes.
When a CPU goes offline, the pool becomes disassociated but pool->cpu still
points to the now-offline CPU.
Later in wq_watchdog_timer_fn(), when checking the stalled pool:
if (pool->cpu >= 0)
touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
This reads wq_watchdog_touched_cpu for the offline CPU, which is still being
updated by wq_watchdog_reset_touched() via for_each_possible_cpu()
(which updates CPU, including offlined CPUs).
Regardless of whether the CPU is online or offline,
wq_watchdog_reset_touched() will mark it as touched.
The real problem is that pool->cpu now names an offline CPU:
- the per-cpu "touched" heartbeat we consult is the wrong one. The pool's
work now runs on online CPUs (it behaves like an unbound pool), so the
global wq_watchdog_touched is the correct grace signal
- the same pool->cpu >= 0 test marks the pool cpu_stall and aims the new
single-CPU backtrace at the offline CPU.
So, I suppose we have a few options:
1) Set pool->cpu to -1 at dissociation time. But, that would lose the
cpu that would be necessary to rebind later. We would need to backup
pool->cpu if we decide to unset it.
int workqueue_online_cpu(unsigned int cpu) {
...
if (pool->cpu == cpu)
2) Treat the pool as cpuless if they are disassociated.
static int pool_watchdog_cpu(struct worker_pool *pool)
{
if (pool->cpu < 0 || (pool->flags & POOL_DISASSOCIATED))
return -1;
return pool->cpu;
}
and replace pool->cpu read by pool_watchdog_cpu() everywhere in the stall
code path. I lean towards 2).
Either way this is unrelated to this patchset, so my suggestion is:
1) I respin this RFC with your Reviewed-by + a cpu_online() check before
triggering the backtrace:
if (!found_running && cpu_online(cpu))
trigger_single_cpu_backtrace(cpu);
2) we continue the disassociated-pool discussion separately, so it does not
block this series.
Thanks,
--breno
next prev parent reply other threads:[~2026-06-22 11:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-19 12:58 ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-19 13:42 ` Petr Mladek
2026-06-22 11:14 ` Breno Leitao [this message]
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-19 15:40 ` Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajkJueret31nlymY@gmail.com \
--to=leitao@debian.org \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox