From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from r3-20.sinamail.sina.com.cn (r3-20.sinamail.sina.com.cn [202.108.3.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB05C3E8C40 for ; Wed, 13 May 2026 08:58:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.108.3.20 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778662695; cv=none; b=R6P/MNARi5jkSAA1CIP3hCRc4VPUnCGoyBNLwJ5OHAVGX4wkorS9pLoK0zhF8JY00UQZqI32AjU8mpm2pWpnQrwpc6OqVIjkTnvjjX27NZKeJlDsWjSq2kV9Kga2RfQB4W/TL+i3SzUQO916ydB+FRhH0wZvAbpvJz/dbU+OJiQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778662695; c=relaxed/simple; bh=J18s+o8tJh2qydj0eJjqjtEtS+wBfXjIzjSyubpP5qE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F6/UPARZVzvuURwowMv2aWJbhd9wrGZQbF95toIorxuBwIpmlt3M4+29lraLcHLX8oRAtRRoMSg6O11vlFqNMZej5oT5NCTbEpkdgl9dyAJ+z5buTmkm0h4DPAHb69swpnkPDd1dnke+G1TUXIHsmNuuHsNoHMXYX6FJE4gq6Qk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sina.com; spf=pass smtp.mailfrom=sina.com; dkim=pass (1024-bit key) header.d=sina.com header.i=@sina.com header.b=m8VvpyVH; arc=none smtp.client-ip=202.108.3.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sina.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=sina.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=sina.com header.i=@sina.com header.b="m8VvpyVH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sina.com; s=201208; t=1778662691; bh=rpkdf1MJCgkrrFLIiKJ6kJlMUdMtVQ6uI+7ODL89XeY=; h=From:Subject:Date:Message-ID; b=m8VvpyVHDXP9pG/Qk0s7Qzb7cXr/UksUo7kWvG0FfeDTUHb9Q/Dw/nJNy1C+oy/Q4 AmVvaI9KDq6DZ63ZVJ0zf6+CaK9PyBb7Wg7vwff19LE3iRxp0a7mx/EPt1k9Ew6/mE VWZhTBSCIJ42LH/6BB7JbmAJaEcCqwcBh5R1/nFM= X-SMAIL-HELO: localhost.localdomain Received: from unknown (HELO localhost.localdomain)([114.249.62.144]) by sina.com (10.54.253.32) with ESMTP id 6A043CF200004C8F; Wed, 13 May 2026 16:57:24 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com Authentication-Results: sina.com; spf=none smtp.mailfrom=hdanton@sina.com; dkim=none header.i=none; dmarc=none action=none header.from=hdanton@sina.com X-SMAIL-MID: 2873054456658 X-SMAIL-UIID: 325E577710F84125B1DE45DAEEE047F0-20260513-165724-1 From: Hillf Danton To: Breno Leitao Cc: Petr Mladek , Tejun Heo , linux-kernel@vger.kernel.org, Omar Sandoval , Danielle Costantino , kasan-dev@googlegroups.com Subject: Re: [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Date: Wed, 13 May 2026 16:57:24 +0800 Message-ID: <20260513085725.597-1-hdanton@sina.com> In-Reply-To: References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On Fri, 13 Mar 2026 05:24:54 -0700 Breno Leitao wrote: > On Thu, Mar 12, 2026 at 05:38:26PM +0100, Petr Mladek wrote: > > On Thu 2026-03-05 08:15:36, Breno Leitao wrote: > > > There is a blind spot exists in the work queue stall detecetor (aka > > > show_cpu_pool_hog()). It only prints workers whose task_is_running() is > > > true, so a busy worker that is sleeping (e.g. wait_event_idle()) > > > produces an empty backtrace section even though it is the cause of the > > > stall. > > > > > > Additionally, when the watchdog does report stalled pools, the output > > > doesn't show how long each in-flight work item has been running, making > > > it harder to identify which specific worker is stuck. > > > > > > Example of the sample code: > > > > > > BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s! > > > Showing busy workqueues and worker pools: > > > workqueue events: flags=0x100 > > > pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5 > > > in-flight: 178:stall_work1_fn [wq_stall] > > > pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work > > > ... > > > Showing backtraces of running workers in stalled > > > CPU-bound worker pools: > > > > > > > > > I see it happening on real machines, causing some stalls that doesn't > > > have any backtrace. This is one of the code path: > > > > > > 1) kfence executes toggle_allocation_gate() as a delayed workqueue > > > item (kfence_timer) on the system WQ. > > > > > > 2) toggle_allocation_gate() enables a static key, which IPIs every > > > CPU to patch code: > > > static_branch_enable(&kfence_allocation_key); > > > > > > 3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a > > > kfence allocation to occur: > > > wait_event_idle(allocation_wait, > > > atomic_read(&kfence_allocation_gate) > 0 || ...); > > > > > > This can last indefinitely if no allocation goes through the > > > kfence path (or IPIing all the CPUs take longer, which is common on > > > platforms that do not have NMI). > > > > > > The worker remains in the pool's busy_hash > > > (in-flight) but is no longer task_is_running(). > > > > > > 4) The workqueue watchdog detects the stall and calls > > > show_cpu_pool_hog(), which only prints backtraces for workers > > > that are actively running on CPU: > > > > > > static void show_cpu_pool_hog(struct worker_pool *pool) { > > > ... > > > if (task_is_running(worker->task)) > > > sched_show_task(worker->task); > > > } > > > > > > 5) Nothing is printed because the offending worker is in TASK_IDLE > > > state. The output shows "Showing backtraces of running workers in > > > stalled CPU-bound worker pools:" followed by nothing, effectively > > > hiding the actual culprit. > > > > I am trying to better understand the situation. There was a reason > > why only the worker in the running state was shown. > > > > Normally, a sleeping worker should not cause a stall. The scheduler calls > > wq_worker_sleeping() which should wake up another idle worker. There is > > always at least one idle worker in the poll. It should start processing > > the next pending work. Or it should fork another worker when it was > > the last idle one. > > Right, but let's look at this case: > > BUG: workqueue lockup - pool 55 cpu 13 curr 0 (swapper/13) stack ffff800085640000 cpus=13 node=0 flags=0x0 nice=-20 stuck for 679s! > work func=blk_mq_timeout_work data=0xffff0000ad7e3a05 > Showing busy workqueues and worker pools: > workqueue events_unbound: flags=0x2 > pwq 288: cpus=0-71 flags=0x4 nice=0 active=1 refcnt=2 > in-flight: 4083734:btrfs_extent_map_shrinker_worker > workqueue mm_percpu_wq: flags=0x8 > pwq 14: cpus=3 node=0 flags=0x0 nice=0 active=1 refcnt=2 > pending: vmstat_update > pool 288: cpus=0-71 flags=0x4 nice=0 hung=0s workers=17 idle: 3800629 3959700 3554824 3706405 3759881 4065549 4041361 4065548 1715676 4086805 3860852 3587585 4065550 4014041 3944711 3744484 > Showing backtraces of running workers in stalled CPU-bound worker pools: > # Nothing in here > > It seems CPU 13 is idle (curr = 0) and blk_mq_timeout_work has been pending for > 679s ? > An idle CPU failed to process pending work, so the root cause lies outside workqueue, and it is difficult to understand why giving more X-ray scan to Peter helps if Paul has a bone in throat.