The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	 Song Liu <song@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools
Date: Mon, 22 Jun 2026 04:14:04 -0700	[thread overview]
Message-ID: <ajkJueret31nlymY@gmail.com> (raw)
In-Reply-To: <ajVHYOh3DywrzB-h@pathway.suse.cz>

Hello Petr,

On Fri, Jun 19, 2026 at 03:42:56PM +0200, Petr Mladek wrote:
> It makes some sense. wq_watchdog_timer_fn() checks either
> 'per_cpu(wq_watchdog_touched_cpu)' or the global 'wq_watchdog_touched'
> depending whether pool->cpu is set or not. And it seems to be wrong
> for disassociated pools.
>
> But this seems to be an existing problem which should be fixed
> separately.

Good observation. For disassociated pools (where a CPU has been offlined),
pool->cpu remains set, only the workers' CPU affinity changes.

When a CPU goes offline, the pool becomes disassociated but pool->cpu still
points to the now-offline CPU.

Later in wq_watchdog_timer_fn(), when checking the stalled pool:

	if (pool->cpu >= 0)
		touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));

This reads wq_watchdog_touched_cpu for the offline CPU, which is still being
updated by wq_watchdog_reset_touched() via for_each_possible_cpu()
(which updates CPU, including offlined CPUs).

Regardless of whether the CPU is online or offline,
wq_watchdog_reset_touched() will mark it as touched.

The real problem is that pool->cpu now names an offline CPU:

  - the per-cpu "touched" heartbeat we consult is the wrong one. The pool's
    work now runs on online CPUs (it behaves like an unbound pool), so the
    global wq_watchdog_touched is the correct grace signal

  - the same pool->cpu >= 0 test marks the pool cpu_stall and aims the new
    single-CPU backtrace at the offline CPU.

So, I suppose we have a few options:

1) Set pool->cpu to -1 at dissociation time.  But, that would lose the
cpu that would be necessary to rebind later. We would need to backup
pool->cpu if we decide to unset it.

	int workqueue_online_cpu(unsigned int cpu) {
		...
		if (pool->cpu == cpu)

2) Treat the pool as cpuless if they are disassociated.

	static int pool_watchdog_cpu(struct worker_pool *pool)
	{
		if (pool->cpu < 0 || (pool->flags & POOL_DISASSOCIATED))
			return -1;
		return pool->cpu;
	}

and replace pool->cpu read by pool_watchdog_cpu() everywhere in the stall 
code path.  I lean towards 2).


Either way this is unrelated to this patchset, so my suggestion is:

  1) I respin this RFC with your Reviewed-by + a cpu_online() check before
     triggering the backtrace:

       if (!found_running && cpu_online(cpu))
                trigger_single_cpu_backtrace(cpu);

  2) we continue the disassociated-pool discussion separately, so it does not
     block this series.

Thanks,
--breno


  reply	other threads:[~2026-06-22 11:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-19 12:58   ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-19 13:42   ` Petr Mladek
2026-06-22 11:14     ` Breno Leitao [this message]
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-19 15:40   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajkJueret31nlymY@gmail.com \
    --to=leitao@debian.org \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox