From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9AA62C0F91 for ; Thu, 11 Jun 2026 14:50:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781189423; cv=none; b=efPfsYt5EntH4JDHg2JFNgox8cnfxsXAortmBLoKbAj3gNwNvNg2eg1d0258FvnGorOq14kdnjcGS0QschLixAJMuo4/Ebp9xPo5NCNDxfGBn1l9FILNuShattbMWTYvZBdBuIJ/GusQi2g9Q4U7LYxldFXuV0lnCe9BZ2ODn8A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781189423; c=relaxed/simple; bh=ixF1BqHTgO4PtTktv/vSNl/rULeQ9smaupQIHB4iL6c=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=IuGSz/t7/+EzeXXSkFvzG2AW9M/to9GkZvCnReR2UVsaPrfSKzADhEI6LegsuEpHIU6s82AFx6F8TBC1muZJSPJ9iFYVP7UQ1Ozyihh7+6Otn/kGZXVnTmNINlGPp/GR+mF4s4heC2f6hVJoaAJIdDJDtxf/GHA04mWWdkQTZfc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=RYBGtakd; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="RYBGtakd" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=/mBEydRPgxWUiYC2H+yaSJzeVn2wiNwJFGOTsRVtrQ0=; b=RYBGtakdGY7kFZi9shT/eOK6Ju BO7KgYy39E53RQufR+T6TUuRVfju7dGCTdhYuQJRZM49aPXusD5r3x9f7dEEPllcht4a723PaoaeJ ajBNDGQVAQO2Sl1QhDqleYoJrg4svKT/ZTtBBWInpbXEXDEPKcH4lv4grBpBdw/kxHm+vXGnNAPQJ zvwikloSqioTp2tFMkQgAHsCpxZughJP4q3LjXDP1WwTThHpYRmU6SW2HymT0Ml4/8gy3nLn8/w/M 1zYRwi79klClfpItAu+lfEB3jHn7f1vo41XyMC0xPOCuSxaYynC5izf+06NY9l/m/MlqwMPQlbkGg d/YkQJWA==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wXgjS-00A2Ch-13; Thu, 11 Jun 2026 14:50:10 +0000 Date: Thu, 11 Jun 2026 07:50:04 -0700 From: Breno Leitao To: Petr Mladek Cc: Tejun Heo , Lai Jiangshan , Andrew Morton , linux-kernel@vger.kernel.org, Omar Sandoval , Song Liu , Danielle Costantino , kasan-dev@googlegroups.com, kernel-team@meta.com Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Message-ID: References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Debian-User: leitao On Fri, Mar 20, 2026 at 03:41:13AM -0700, Breno Leitao wrote: > On Wed, Mar 18, 2026 at 04:11:54PM +0100, Petr Mladek wrote: > > On Wed 2026-03-18 04:31:08, Breno Leitao wrote: > > Otherwise, I like this patch. > > > > I still think what might be the reason that there is no worker > > in the running state. Let's see if this patch brings some useful info. > > > > One more idea. It might be useful to store a timestamp when the last > > worker was woken. And then print either the timestamp or delta. > > It would help to make sure that kick_pool() was really called > > during the reported stall. > > Ack, this is the following patch I will deploy in production, let's see > how useful it is. I got this running in production (backported to 6.16), and we finally got the culprit. 05:42:00 BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 115s! NMI backtrace for cpu 2 CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G O 6.16.1-0_fbk4_0_gb849430a436c #1 NONE Tainted: [O]=OOT_MODULE Hardware name: Workqueue: efi_rts_wq efi_call_rts pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) pc : 0x4052f10900 lr : 0x4052f10e94 sp : ffff800088cefc90 x29: ffff800088cefc90 x28: 0000000048524641 x27: 0000004052b60000 x26: 0000000000010058 x25: 0000004043ba0000 x24: 0000000001280000 x23: 000000405a02807f x22: 0000000000010080 x21: 0000004053ac0097 x20: 000000405a028080 x19: 0000004053ac0098 x18: 0000000000000000 x17: 0000000000000030 x16: 0000004052eb6de0 x15: 0000004042ba0030 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 x11: 0000000001d00d09 x10: 0000004042ba0028 x9 : ffff800088cefc90 x8 : 0000000001d00cd9 x7 : 0000000000000000 x6 : 0000004043ba0000 x5 : 0000004043bb0000 x4 : 0000004053ac0098 x3 : 000000405a028080 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffffffffffe1e8 Call trace: 0x4052f10900 (P) 0x4052f10e94 0x4052b00ed0 0x4052b02e38 0x4052b0175c 0x4052b517b4 0x4052a70b84 0x4052cb11d4 __efi_rt_asm_wrapper+0x50/0x78 efi_call_rts+0x178/0x240 process_scheduled_works+0x17c/0x420 worker_thread+0x184/0x4d8 kthread+0xcc/0x1f8 ret_from_fork+0x10/0x20 05:42:30 BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 145s! NMI backtrace for cpu 2 CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G O 6.16.1-0_fbk4_0_gb849430a436c #1 NONE Tainted: [O]=OOT_MODULE Hardware name: Workqueue: efi_rts_wq efi_call_rts pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) pc : 0x4052f11ecc lr : 0x4052f10b8c sp : ffff800088cefc30 x29: ffff800088cefc40 x28: 0000000048524641 x27: 0000004052b60000 x26: 0000000000010058 x25: 0000004043fb0000 x24: 0000000001690000 x23: 0000004053ab0040 x22: 0000000000010080 x21: ffff800088cefd00 rinse and repeat.. Unfortunately I didn't get the other pr_info(), because of console settings, but, I can say the following from this issue and previous code: 1) in show_cpu_pool_hog, found_running variable is set to false. 2) hash_for_each() never found any running task 3) The following code was trigger and was very helpful: if (!found_running) trigger_single_cpu_backtrace(cpu);