From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from r3-20.sinamail.sina.com.cn (r3-20.sinamail.sina.com.cn [202.108.3.20])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB05C3E8C40
	for <linux-kernel@vger.kernel.org>; Wed, 13 May 2026 08:58:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.108.3.20
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778662695; cv=none; b=R6P/MNARi5jkSAA1CIP3hCRc4VPUnCGoyBNLwJ5OHAVGX4wkorS9pLoK0zhF8JY00UQZqI32AjU8mpm2pWpnQrwpc6OqVIjkTnvjjX27NZKeJlDsWjSq2kV9Kga2RfQB4W/TL+i3SzUQO916ydB+FRhH0wZvAbpvJz/dbU+OJiQ=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778662695; c=relaxed/simple;
	bh=J18s+o8tJh2qydj0eJjqjtEtS+wBfXjIzjSyubpP5qE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version; b=F6/UPARZVzvuURwowMv2aWJbhd9wrGZQbF95toIorxuBwIpmlt3M4+29lraLcHLX8oRAtRRoMSg6O11vlFqNMZej5oT5NCTbEpkdgl9dyAJ+z5buTmkm0h4DPAHb69swpnkPDd1dnke+G1TUXIHsmNuuHsNoHMXYX6FJE4gq6Qk=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sina.com; spf=pass smtp.mailfrom=sina.com; dkim=pass (1024-bit key) header.d=sina.com header.i=@sina.com header.b=m8VvpyVH; arc=none smtp.client-ip=202.108.3.20
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sina.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=sina.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=sina.com header.i=@sina.com header.b="m8VvpyVH"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sina.com; s=201208; t=1778662691;
	bh=rpkdf1MJCgkrrFLIiKJ6kJlMUdMtVQ6uI+7ODL89XeY=;
	h=From:Subject:Date:Message-ID;
	b=m8VvpyVHDXP9pG/Qk0s7Qzb7cXr/UksUo7kWvG0FfeDTUHb9Q/Dw/nJNy1C+oy/Q4
	 AmVvaI9KDq6DZ63ZVJ0zf6+CaK9PyBb7Wg7vwff19LE3iRxp0a7mx/EPt1k9Ew6/mE
	 VWZhTBSCIJ42LH/6BB7JbmAJaEcCqwcBh5R1/nFM=
X-SMAIL-HELO: localhost.localdomain
Received: from unknown (HELO localhost.localdomain)([114.249.62.144])
	by sina.com (10.54.253.32) with ESMTP
	id 6A043CF200004C8F; Wed, 13 May 2026 16:57:24 +0800 (CST)
X-Sender: hdanton@sina.com
X-Auth-ID: hdanton@sina.com
Authentication-Results: sina.com;
	 spf=none smtp.mailfrom=hdanton@sina.com;
	 dkim=none header.i=none;
	 dmarc=none action=none header.from=hdanton@sina.com
X-SMAIL-MID: 2873054456658
X-SMAIL-UIID: 325E577710F84125B1DE45DAEEE047F0-20260513-165724-1
From: Hillf Danton <hdanton@sina.com>
To: Breno Leitao <leitao@debian.org>
Cc: Petr Mladek <pmladek@suse.com>,
	Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org,
	Omar Sandoval <osandov@osandov.com>,
	Danielle Costantino <dcostantino@meta.com>,
	kasan-dev@googlegroups.com
Subject: Re: [PATCH v2 0/5] workqueue: Detect stalled in-flight workers
Date: Wed, 13 May 2026 16:57:24 +0800
Message-ID: <20260513085725.597-1-hdanton@sina.com>
In-Reply-To: <abP8wDhYWwk3ufmA@gmail.com>
References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> <abLsAi7_fU5FrYiF@pathway.suse.cz>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

On Fri, 13 Mar 2026 05:24:54 -0700 Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 05:38:26PM +0100, Petr Mladek wrote:
> > On Thu 2026-03-05 08:15:36, Breno Leitao wrote:
> > > There is a blind spot exists in the work queue stall detecetor (aka
> > > show_cpu_pool_hog()). It only prints workers whose task_is_running() is
> > > true, so a busy worker that is sleeping (e.g. wait_event_idle())
> > > produces an empty backtrace section even though it is the cause of the
> > > stall.
> > > 
> > > Additionally, when the watchdog does report stalled pools, the output
> > > doesn't show how long each in-flight work item has been running, making
> > > it harder to identify which specific worker is stuck.
> > > 
> > > Example of the sample code:
> > > 
> > >     BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
> > >     Showing busy workqueues and worker pools:
> > >     workqueue events: flags=0x100
> > >         pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
> > >         in-flight: 178:stall_work1_fn [wq_stall]
> > >         pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
> > > 	...
> > >     Showing backtraces of running workers in stalled
> > >     CPU-bound worker pools:
> > >         <nothing here>
> > > 
> > > I see it happening on real machines, causing some stalls that doesn't
> > > have any backtrace. This is one of the code path:
> > > 
> > >   1) kfence executes toggle_allocation_gate() as a delayed workqueue
> > >      item (kfence_timer) on the system WQ.
> > > 
> > >   2) toggle_allocation_gate() enables a static key, which IPIs every
> > >      CPU to patch code:
> > >           static_branch_enable(&kfence_allocation_key);
> > > 
> > >   3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
> > >      kfence allocation to occur:
> > >           wait_event_idle(allocation_wait,
> > >                   atomic_read(&kfence_allocation_gate) > 0 || ...);
> > > 
> > >      This can last indefinitely if no allocation goes through the
> > >      kfence path (or IPIing all the CPUs take longer, which is common on
> > >      platforms that do not have NMI).
> > > 
> > >      The worker remains in the pool's busy_hash
> > >      (in-flight) but is no longer task_is_running().
> > >
> > >   4) The workqueue watchdog detects the stall and calls
> > >      show_cpu_pool_hog(), which only prints backtraces for workers
> > >      that are actively running on CPU:
> > > 
> > >           static void show_cpu_pool_hog(struct worker_pool *pool) {
> > >                   ...
> > >                   if (task_is_running(worker->task))
> > >                           sched_show_task(worker->task);
> > >           }
> > > 
> > >   5) Nothing is printed because the offending worker is in TASK_IDLE
> > >      state. The output shows "Showing backtraces of running workers in
> > >      stalled CPU-bound worker pools:" followed by nothing, effectively
> > >      hiding the actual culprit.
> > 
> > I am trying to better understand the situation. There was a reason
> > why only the worker in the running state was shown.
> > 
> > Normally, a sleeping worker should not cause a stall. The scheduler calls
> > wq_worker_sleeping() which should wake up another idle worker. There is
> > always at least one idle worker in the poll. It should start processing
> > the next pending work. Or it should fork another worker when it was
> > the last idle one.
> 
> Right, but let's look at this case:
> 
> 	 BUG: workqueue lockup - pool 55 cpu 13 curr 0 (swapper/13) stack ffff800085640000 cpus=13 node=0 flags=0x0 nice=-20 stuck for 679s!
> 	  work func=blk_mq_timeout_work data=0xffff0000ad7e3a05
> 	  Showing busy workqueues and worker pools:
> 	  workqueue events_unbound: flags=0x2
> 	    pwq 288: cpus=0-71 flags=0x4 nice=0 active=1 refcnt=2
> 	      in-flight: 4083734:btrfs_extent_map_shrinker_worker
> 	  workqueue mm_percpu_wq: flags=0x8
> 	    pwq 14: cpus=3 node=0 flags=0x0 nice=0 active=1 refcnt=2
> 	      pending: vmstat_update
> 	  pool 288: cpus=0-71 flags=0x4 nice=0 hung=0s workers=17 idle: 3800629 3959700 3554824 3706405 3759881 4065549 4041361 4065548 1715676 4086805 3860852 3587585 4065550 4014041 3944711 3744484
> 	  Showing backtraces of running workers in stalled CPU-bound worker pools:
> 		# Nothing in here
> 
> It seems CPU 13 is idle (curr = 0) and blk_mq_timeout_work has been pending for
> 679s ?
>
An idle CPU failed to process pending work, so the root cause lies outside
workqueue, and it is difficult to understand why giving more X-ray scan
to Peter helps if Paul has a bone in throat.