From: Breno Leitao <leitao@debian.org>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Omar Sandoval <osandov@osandov.com>, Song Liu <song@kernel.org>,
Danielle Costantino <dcostantino@meta.com>,
kasan-dev@googlegroups.com, kernel-team@meta.com
Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics
Date: Fri, 13 Mar 2026 05:57:59 -0700 [thread overview]
Message-ID: <abQJY3EBElumYpCj@gmail.com> (raw)
In-Reply-To: <abLxx2cFdBFUQx5V@pathway.suse.cz>
On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote:
> On Thu 2026-03-05 08:15:40, Breno Leitao wrote:
> > show_cpu_pool_hog() only prints workers whose task is currently running
> > on the CPU (task_is_running()). This misses workers that are busy
> > processing a work item but are sleeping or blocked — for example, a
> > worker that clears PF_WQ_WORKER and enters wait_event_idle().
>
> IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only
> when they are going to die. They never do so when going to sleep.
>
> > Such a
> > worker still occupies a pool slot and prevents progress, yet produces
> > an empty backtrace section in the watchdog output.
> >
> > This is happening on real arm64 systems, where
> > toggle_allocation_gate() IPIs every single CPU in the machine (which
> > lacks NMI), causing workqueue stalls that show empty backtraces because
> > toggle_allocation_gate() is sleeping in wait_event_idle().
>
> The wait_event_idle() called in toggle_allocation_gate() should not
> cause a stall. The scheduler should call wq_worker_sleeping(tsk)
> and wake up another idle worker. It should guarantee the progress.
>
> > Remove the task_is_running() filter so every in-flight worker in the
> > pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
> > which is already held.
>
> As I explained in reply to the cover letter, sleeping workers should
> not block forward progress. It seems that in this case, the system was
> not able to wake up the other idle worker or it was the last idle
> worker and was not able to fork a new one.
>
> IMHO, we should warn about this when there is no running worker.
> It might be more useful than printing backtraces of the sleeping
> workers because they likely did not cause the problem.
>
> I believe that the problem, in this particular situation, is that
> the system can't schedule or fork new processes. It might help
> to warn about it and maybe show backtrace of the currently
> running process on the stalled CPU.
Do you mean checking if pool->busy_hash is empty, and then warning?
Commit fc36ad49ce7160907bcbe4f05c226595611ac293
Author: Breno Leitao <leitao@debian.org>
Date: Fri Mar 13 05:35:02 2026 -0700
workqueue: warn when stalled pool has no running workers
When the workqueue watchdog detects a pool stall and the pool's
busy_hash is empty (no workers executing any work item), print a
diagnostic warning with the pool state and trigger a backtrace of
the currently running task on the stalled CPU.
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6ee52ba9b14f7..d538067754123 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
raw_spin_lock_irqsave(&pool->lock, irq_flags);
+ if (hash_empty(pool->busy_hash)) {
+ raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ pr_info("pool %d: no running workers, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ trigger_single_cpu_backtrace(pool->cpu);
+ return;
+ }
+
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
if (task_is_running(worker->task)) {
/*
next prev parent reply other threads:[~2026-03-13 12:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 16:15 [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Breno Leitao
2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-03-05 17:13 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 2/5] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
2026-03-05 17:16 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 3/5] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
2026-03-05 17:17 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 4/5] workqueue: Show all busy workers " Breno Leitao
2026-03-05 17:17 ` Song Liu
2026-03-12 17:03 ` Petr Mladek
2026-03-13 12:57 ` Breno Leitao [this message]
2026-03-13 16:27 ` Petr Mladek
2026-03-18 11:31 ` Breno Leitao
2026-03-18 15:11 ` Petr Mladek
2026-03-20 10:41 ` Breno Leitao
2026-03-05 16:15 ` [PATCH v2 5/5] workqueue: Add stall detector sample module Breno Leitao
2026-03-05 17:25 ` Song Liu
2026-03-05 17:39 ` [PATCH v2 0/5] workqueue: Improve stall diagnostics Tejun Heo
2026-03-12 16:38 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Petr Mladek
2026-03-13 12:24 ` Breno Leitao
2026-03-13 14:38 ` Petr Mladek
2026-03-13 17:36 ` Breno Leitao
2026-03-18 16:46 ` Petr Mladek
2026-03-20 10:44 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abQJY3EBElumYpCj@gmail.com \
--to=leitao@debian.org \
--cc=akpm@linux-foundation.org \
--cc=dcostantino@meta.com \
--cc=jiangshanlai@gmail.com \
--cc=kasan-dev@googlegroups.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=pmladek@suse.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox