From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDFF233F5A5
	for <linux-kernel@vger.kernel.org>; Fri, 13 Mar 2026 16:27:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773419266; cv=none; b=VBFh53zhF9MeqjBpXVTSeUPtbU3DxXm9oApCfvrFVUD9kWn/EejeMVhz6qCCrcuiGxlwIp5Z96pkYskdf0kCvNiyaraQo+QKuePB1MW5FRYnlS94DoU63pLlCGoZBWy6+SI6xvLtaSaL20bGpa29hMesMXAHfYXh2xpNmt+C9Y4=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773419266; c=relaxed/simple;
	bh=xZbq7V5rR8h2u81EIa43Q5fJQvgeRh096g11kqeS41Y=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=lqX2MrqJ4QReW7iZf7cOMnmnlmwozFCwERtP8cGkwLYmU7DcSidPDXUFY0d/Ulf2zK+B7ZqioEyhxDKzpzubuHkjiNBz9h+95jcwvLALlf2c62S9gNmWRE3uB0g4WBYhcb9dBoVhugbkbjXpuTc27N/4pGppyBJnwXSe61NE8rE=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=dQqMIbHz; arc=none smtp.client-ip=209.85.128.51
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="dQqMIbHz"
Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-4853fd7b59aso15022825e9.2
        for <linux-kernel@vger.kernel.org>; Fri, 13 Mar 2026 09:27:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=suse.com; s=google; t=1773419263; x=1774024063; darn=vger.kernel.org;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:from:to
         :cc:subject:date:message-id:reply-to;
        bh=H9Mgarnh5HO7C51mAZlTdXM97ZjuMx4Acn3Efvw/k0I=;
        b=dQqMIbHzj+5NthpsbwVECDwbQ334dI0PTvzYmKaQexrLLImWLCv0yLS3JxkzglrGTZ
         uj7Ke/NRCErlWt0NOeSFMav6ATYzUl0GiJcoEi1MPqrbsCwSbxo0EGrMP/ZZevNQUibQ
         ns14dWHiHTGlZYsrogOMfUCBHXHONzaTz5EPmX01aMXrJhU/aZANHMmbBnm9jwypXG+U
         nL/Ap0tSfKhEs0v1H77TdeojC2+0zC+ObhAZEuJlObgFiIblw3Silh1vZsa/DdMvYQjp
         ldiwlb/l32HJYEHp8RPqt/Yu/XT3yxqbsiYLlPkjugM8HqtCjs9JmWK7bahVq9tJgonp
         OslQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1773419263; x=1774024063;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=H9Mgarnh5HO7C51mAZlTdXM97ZjuMx4Acn3Efvw/k0I=;
        b=IKKXVn9m+LU+1M4nHZqhQViedT37aUUFrTfGgh+GdvqvdOLGE004s+bEMlUYxgAGuH
         dYiRsC6FolgvyWlP6GHyH0WWB/2B/nCs6sMuQ+eSH+GV/0WRwavkQTCJHCnE8HEa2jgv
         GTlEj4SiXWNYNaULwH5AP+tPwpYHom2emUPoY3/L6++YvPp7LI+RDRpZVkeabfl8ayHM
         gM3GZF31+w1i/kelyQOQteGYpriJnhDE9YUGAYkSd1fxPAKwrfe77lsDFIf/3bG1l385
         H+kJwaxrG/z9balKWMi4mzzlQ1fv7w+GSYUwGXzsBTUeSXgZVewaUiqZ1kut+I+eJg6A
         B7yw==
X-Forwarded-Encrypted: i=1; AJvYcCU4F/TEVSNIBl+vl4G54fRhkY02PC92odVn8GLOE+OgDrgGv6JN0pqZzXJ7Bp6stQmmzwV0NVAp6S/6cGE=@vger.kernel.org
X-Gm-Message-State: AOJu0YzweA0sDBb/BM35JnbiMYbQWDKn8vPakhyWLGZ9ElxMsFkrNgOy
	RPbaM/wwB45MrQvZ418iG6yO8Cvs+zVb9TQxBgfS0aMzCX0GtS2S7NOOMorBAOSrjS8=
X-Gm-Gg: ATEYQzx1+RtrMar/L5v4jUwgq131lL2rw+GDPtPiKd9NNNhSob2N+I4cebE/oDIVcjk
	CIjYv7NrFDgEi+XAhTyaFuG08RCQGw4jBPnKB/7qwIF63rh/GtU9sXSF5hHkJYmqJYZX0DmV6YG
	jH0fqojjM2Ne70ZjYfBD/jBx/PyR7hhRr8L7qvMxkAkd1ZL68bOi3PUuzFLRKZJhYxxbSYTLjJr
	qUtX1CuXIazUzcKQQNXNraG/b1bKWz1MwsQH5Nb8g+ugcZvuGb9qiCwGN+dPRpYsbjcPYJcfHJ7
	ctwOR+tTGhLBMxeY+sMtKsM5BJm5cBlxDfCj5jzGCjq/xFQf6tDq38nGlLe6Oh0MIWRkDyqW0xb
	mXAwzHa5KxWenN6OJXpnl6sfPCZQyqxYKykysOvOWQ9WZX+j/Kjbuu6xYH3A05PZUMkY6ypIZu2
	FrTfAa2krmg5BQWRk=
X-Received: by 2002:a05:600c:64cf:b0:485:3f58:da6 with SMTP id 5b1f17b1804b1-485566cdc2dmr63816165e9.2.1773419262894;
        Fri, 13 Mar 2026 09:27:42 -0700 (PDT)
Received: from pathway ([176.114.240.130])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-439fe20b544sm19802829f8f.20.2026.03.13.09.27.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 13 Mar 2026 09:27:42 -0700 (PDT)
Date: Fri, 13 Mar 2026 17:27:40 +0100
From: Petr Mladek <pmladek@suse.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
	Song Liu <song@kernel.org>,
	Danielle Costantino <dcostantino@meta.com>,
	kasan-dev@googlegroups.com, kernel-team@meta.com
Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall
 diagnostics
Message-ID: <abQ6_FsxtfH8nXka@pathway>
References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org>
 <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org>
 <abLxx2cFdBFUQx5V@pathway.suse.cz>
 <abQJY3EBElumYpCj@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <abQJY3EBElumYpCj@gmail.com>

On Fri 2026-03-13 05:57:59, Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote:
> > On Thu 2026-03-05 08:15:40, Breno Leitao wrote:
> > > show_cpu_pool_hog() only prints workers whose task is currently running
> > > on the CPU (task_is_running()).  This misses workers that are busy
> > > processing a work item but are sleeping or blocked — for example, a
> > > worker that clears PF_WQ_WORKER and enters wait_event_idle().
> > 
> > IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only
> > when they are going to die. They never do so when going to sleep.
> > 
> > > Such a
> > > worker still occupies a pool slot and prevents progress, yet produces
> > > an empty backtrace section in the watchdog output.
> > > 
> > > This is happening on real arm64 systems, where
> > > toggle_allocation_gate() IPIs every single CPU in the machine (which
> > > lacks NMI), causing workqueue stalls that show empty backtraces because
> > > toggle_allocation_gate() is sleeping in wait_event_idle().
> > 
> > The wait_event_idle() called in toggle_allocation_gate() should not
> > cause a stall. The scheduler should call wq_worker_sleeping(tsk)
> > and wake up another idle worker. It should guarantee the progress.
> > 
> > > Remove the task_is_running() filter so every in-flight worker in the
> > > pool's busy_hash is dumped.  The busy_hash is protected by pool->lock,
> > > which is already held.
> > 
> > As I explained in reply to the cover letter, sleeping workers should
> > not block forward progress. It seems that in this case, the system was
> > not able to wake up the other idle worker or it was the last idle
> > worker and was not able to fork a new one.
> > 
> > IMHO, we should warn about this when there is no running worker.
> > It might be more useful than printing backtraces of the sleeping
> > workers because they likely did not cause the problem.
> > 
> > I believe that the problem, in this particular situation, is that
> > the system can't schedule or fork new processes. It might help
> > to warn about it and maybe show backtrace of the currently
> > running process on the stalled CPU.
> 
> Do you mean checking if pool->busy_hash is empty, and then warning?
> 
> Commit fc36ad49ce7160907bcbe4f05c226595611ac293
> Author: Breno Leitao <leitao@debian.org>
> Date:   Fri Mar 13 05:35:02 2026 -0700
> 
>     workqueue: warn when stalled pool has no running workers
> 
>     When the workqueue watchdog detects a pool stall and the pool's
>     busy_hash is empty (no workers executing any work item), print a
>     diagnostic warning with the pool state and trigger a backtrace of
>     the currently running task on the stalled CPU.
> 
>     Signed-off-by: Breno Leitao <leitao@debian.org>
>     Suggested-by: Petr Mladek <pmladek@suse.com>
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6ee52ba9b14f7..d538067754123 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
> 
>         raw_spin_lock_irqsave(&pool->lock, irq_flags);
> 
> +       if (hash_empty(pool->busy_hash)) {

This would print it only when there is no in-flight work.

But I think that the problem is when there in no worker in
the running state. There should always be one to guarantee
the forward progress.

I took inspiration from your patch. This is what comes to my mind
on top of the current master (printing only running workers):

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeaec79bc09c..a044c7e42139 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7588,12 +7588,15 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
 {
 	struct worker *worker;
 	unsigned long irq_flags;
+	bool found_running;
 	int bkt;
 
 	raw_spin_lock_irqsave(&pool->lock, irq_flags);
 
+	found_running = false;
 	hash_for_each(pool->busy_hash, bkt, worker, hentry) {
 		if (task_is_running(worker->task)) {
+			found_running = true;
 			/*
 			 * Defer printing to avoid deadlocks in console
 			 * drivers that queue work while holding locks
@@ -7609,6 +7612,19 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
 	}
 
 	raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+	if (!found_running) {
+		pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+			pool->id, pool->cpu,
+			idle_cpu(pool->cpu) ? "idle" : "busy",
+			pool->nr_workers, pool->nr_idle);
+		pr_info("The pool might have troubles to wake up another idle worker.\n");
+		if (pool->manager) {
+			pr_info("Backtrace of the pool manager:\n");
+			sched_show_task(pool->manager->task);
+		}
+		trigger_single_cpu_backtrace(pool->cpu);
+	}
 }
 
 static void show_cpu_pools_hogs(void)


Warning: The code is not safe. We would need add some synchronization
	 of the pool->manager pointer.

	Even better might be to print state and backtrace of the process
	which was woken by kick_pool() when the last running worker
	went asleep.

Motivation: AFAIK, if there is a pending work in CPU bound workqueue
	than at least one worker in the related worker pool should be
	in "task_is_running()" state to guarantee forward progress.

	If we find the running worker then it will likely be the
	culprit. It either runs for too long. Or it is the last
	idle worker and it fails to create a new one.

	If there is no worker in running state then there is likely
	a problem in the core workqueue code. Or some work shoot
	the workqueue into its leg. Anyway, we might need to print
	much more details to nail it down.

Best Regards,
Petr