public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
	Song Liu <song@kernel.org>,
	Danielle Costantino <dcostantino@meta.com>,
	kasan-dev@googlegroups.com, kernel-team@meta.com
Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics
Date: Fri, 13 Mar 2026 17:27:40 +0100	[thread overview]
Message-ID: <abQ6_FsxtfH8nXka@pathway> (raw)
In-Reply-To: <abQJY3EBElumYpCj@gmail.com>

On Fri 2026-03-13 05:57:59, Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote:
> > On Thu 2026-03-05 08:15:40, Breno Leitao wrote:
> > > show_cpu_pool_hog() only prints workers whose task is currently running
> > > on the CPU (task_is_running()).  This misses workers that are busy
> > > processing a work item but are sleeping or blocked — for example, a
> > > worker that clears PF_WQ_WORKER and enters wait_event_idle().
> > 
> > IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only
> > when they are going to die. They never do so when going to sleep.
> > 
> > > Such a
> > > worker still occupies a pool slot and prevents progress, yet produces
> > > an empty backtrace section in the watchdog output.
> > > 
> > > This is happening on real arm64 systems, where
> > > toggle_allocation_gate() IPIs every single CPU in the machine (which
> > > lacks NMI), causing workqueue stalls that show empty backtraces because
> > > toggle_allocation_gate() is sleeping in wait_event_idle().
> > 
> > The wait_event_idle() called in toggle_allocation_gate() should not
> > cause a stall. The scheduler should call wq_worker_sleeping(tsk)
> > and wake up another idle worker. It should guarantee the progress.
> > 
> > > Remove the task_is_running() filter so every in-flight worker in the
> > > pool's busy_hash is dumped.  The busy_hash is protected by pool->lock,
> > > which is already held.
> > 
> > As I explained in reply to the cover letter, sleeping workers should
> > not block forward progress. It seems that in this case, the system was
> > not able to wake up the other idle worker or it was the last idle
> > worker and was not able to fork a new one.
> > 
> > IMHO, we should warn about this when there is no running worker.
> > It might be more useful than printing backtraces of the sleeping
> > workers because they likely did not cause the problem.
> > 
> > I believe that the problem, in this particular situation, is that
> > the system can't schedule or fork new processes. It might help
> > to warn about it and maybe show backtrace of the currently
> > running process on the stalled CPU.
> 
> Do you mean checking if pool->busy_hash is empty, and then warning?
> 
> Commit fc36ad49ce7160907bcbe4f05c226595611ac293
> Author: Breno Leitao <leitao@debian.org>
> Date:   Fri Mar 13 05:35:02 2026 -0700
> 
>     workqueue: warn when stalled pool has no running workers
> 
>     When the workqueue watchdog detects a pool stall and the pool's
>     busy_hash is empty (no workers executing any work item), print a
>     diagnostic warning with the pool state and trigger a backtrace of
>     the currently running task on the stalled CPU.
> 
>     Signed-off-by: Breno Leitao <leitao@debian.org>
>     Suggested-by: Petr Mladek <pmladek@suse.com>
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6ee52ba9b14f7..d538067754123 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
> 
>         raw_spin_lock_irqsave(&pool->lock, irq_flags);
> 
> +       if (hash_empty(pool->busy_hash)) {

This would print it only when there is no in-flight work.

But I think that the problem is when there in no worker in
the running state. There should always be one to guarantee
the forward progress.

I took inspiration from your patch. This is what comes to my mind
on top of the current master (printing only running workers):

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeaec79bc09c..a044c7e42139 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7588,12 +7588,15 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
 {
 	struct worker *worker;
 	unsigned long irq_flags;
+	bool found_running;
 	int bkt;
 
 	raw_spin_lock_irqsave(&pool->lock, irq_flags);
 
+	found_running = false;
 	hash_for_each(pool->busy_hash, bkt, worker, hentry) {
 		if (task_is_running(worker->task)) {
+			found_running = true;
 			/*
 			 * Defer printing to avoid deadlocks in console
 			 * drivers that queue work while holding locks
@@ -7609,6 +7612,19 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
 	}
 
 	raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+	if (!found_running) {
+		pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+			pool->id, pool->cpu,
+			idle_cpu(pool->cpu) ? "idle" : "busy",
+			pool->nr_workers, pool->nr_idle);
+		pr_info("The pool might have troubles to wake up another idle worker.\n");
+		if (pool->manager) {
+			pr_info("Backtrace of the pool manager:\n");
+			sched_show_task(pool->manager->task);
+		}
+		trigger_single_cpu_backtrace(pool->cpu);
+	}
 }
 
 static void show_cpu_pools_hogs(void)


Warning: The code is not safe. We would need add some synchronization
	 of the pool->manager pointer.

	Even better might be to print state and backtrace of the process
	which was woken by kick_pool() when the last running worker
	went asleep.

Motivation: AFAIK, if there is a pending work in CPU bound workqueue
	than at least one worker in the related worker pool should be
	in "task_is_running()" state to guarantee forward progress.

	If we find the running worker then it will likely be the
	culprit. It either runs for too long. Or it is the last
	idle worker and it fails to create a new one.

	If there is no worker in running state then there is likely
	a problem in the core workqueue code. Or some work shoot
	the workqueue into its leg. Anyway, we might need to print
	much more details to nail it down.

Best Regards,
Petr

  reply	other threads:[~2026-03-13 16:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 16:15 [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Breno Leitao
2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-03-05 17:13   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 2/5] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
2026-03-05 17:16   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 3/5] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
2026-03-05 17:17   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 4/5] workqueue: Show all busy workers " Breno Leitao
2026-03-05 17:17   ` Song Liu
2026-03-12 17:03   ` Petr Mladek
2026-03-13 12:57     ` Breno Leitao
2026-03-13 16:27       ` Petr Mladek [this message]
2026-03-18 11:31         ` Breno Leitao
2026-03-18 15:11           ` Petr Mladek
2026-03-20 10:41             ` Breno Leitao
2026-03-05 16:15 ` [PATCH v2 5/5] workqueue: Add stall detector sample module Breno Leitao
2026-03-05 17:25   ` Song Liu
2026-03-05 17:39 ` [PATCH v2 0/5] workqueue: Improve stall diagnostics Tejun Heo
2026-03-12 16:38 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Petr Mladek
2026-03-13 12:24   ` Breno Leitao
2026-03-13 14:38     ` Petr Mladek
2026-03-13 17:36       ` Breno Leitao
2026-03-18 16:46         ` Petr Mladek
2026-03-20 10:44           ` Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abQ6_FsxtfH8nXka@pathway \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=dcostantino@meta.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox