All of lore.kernel.org
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	 Omar Sandoval <osandov@osandov.com>, Song Liu <song@kernel.org>,
	 Danielle Costantino <dcostantino@meta.com>,
	kasan-dev@googlegroups.com, kernel-team@meta.com
Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics
Date: Thu, 11 Jun 2026 07:50:04 -0700	[thread overview]
Message-ID: <airKG4Hl8R-7sY_x@gmail.com> (raw)
In-Reply-To: <ab0kDS01bh5cK4KG@gmail.com>

On Fri, Mar 20, 2026 at 03:41:13AM -0700, Breno Leitao wrote:
> On Wed, Mar 18, 2026 at 04:11:54PM +0100, Petr Mladek wrote:
> > On Wed 2026-03-18 04:31:08, Breno Leitao wrote:
> > Otherwise, I like this patch.
> > 
> > I still think what might be the reason that there is no worker
> > in the running state. Let's see if this patch brings some useful info.
> > 
> > One more idea. It might be useful to store a timestamp when the last
> > worker was woken. And then print either the timestamp or delta.
> > It would help to make sure that kick_pool() was really called
> > during the reported stall.
> 
> Ack, this is the following patch I will deploy in production, let's see
> how useful it is.

I got this running in production (backported to 6.16), and we finally got the culprit.

	05:42:00  BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 115s!
		NMI backtrace for cpu 2
		CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G           O        6.16.1-0_fbk4_0_gb849430a436c #1 NONE
		Tainted: [O]=OOT_MODULE
		Hardware name: <foo>
		Workqueue: efi_rts_wq efi_call_rts
		pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
		pc : 0x4052f10900
		lr : 0x4052f10e94
		sp : ffff800088cefc90
		x29: ffff800088cefc90 x28: 0000000048524641 x27: 0000004052b60000
		x26: 0000000000010058 x25: 0000004043ba0000 x24: 0000000001280000
		x23: 000000405a02807f x22: 0000000000010080 x21: 0000004053ac0097
		x20: 000000405a028080 x19: 0000004053ac0098 x18: 0000000000000000
		x17: 0000000000000030 x16: 0000004052eb6de0 x15: 0000004042ba0030
		x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
		x11: 0000000001d00d09 x10: 0000004042ba0028 x9 : ffff800088cefc90
		x8 : 0000000001d00cd9 x7 : 0000000000000000 x6 : 0000004043ba0000
		x5 : 0000004043bb0000 x4 : 0000004053ac0098 x3 : 000000405a028080
		x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffffffffffe1e8
		Call trace:
		0x4052f10900 (P)
		0x4052f10e94
		0x4052b00ed0
		0x4052b02e38
		0x4052b0175c
		0x4052b517b4
		0x4052a70b84
		0x4052cb11d4
		__efi_rt_asm_wrapper+0x50/0x78
		efi_call_rts+0x178/0x240
		process_scheduled_works+0x17c/0x420
		worker_thread+0x184/0x4d8
		kthread+0xcc/0x1f8
		ret_from_fork+0x10/0x20
	05:42:30  BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 145s!
		NMI backtrace for cpu 2
		CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G           O        6.16.1-0_fbk4_0_gb849430a436c #1 NONE
		Tainted: [O]=OOT_MODULE
		Hardware name: <foo>
		Workqueue: efi_rts_wq efi_call_rts
		pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
		pc : 0x4052f11ecc
		lr : 0x4052f10b8c
		sp : ffff800088cefc30
		x29: ffff800088cefc40 x28: 0000000048524641 x27: 0000004052b60000
		x26: 0000000000010058 x25: 0000004043fb0000 x24: 0000000001690000
		x23: 0000004053ab0040 x22: 0000000000010080 x21: ffff800088cefd00

	rinse and repeat..

Unfortunately I didn't get the other pr_info(), because of console settings,
but, I can say the following from this issue and previous code:

1) in show_cpu_pool_hog, found_running variable is set to false.
2) hash_for_each() never found any running task
3) The following code was trigger and was very helpful:

         if (!found_running)
                trigger_single_cpu_backtrace(cpu);

  reply	other threads:[~2026-06-11 14:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 16:15 [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Breno Leitao
2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-03-05 17:13   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 2/5] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
2026-03-05 17:16   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 3/5] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
2026-03-05 17:17   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 4/5] workqueue: Show all busy workers " Breno Leitao
2026-03-05 17:17   ` Song Liu
2026-03-12 17:03   ` Petr Mladek
2026-03-13 12:57     ` Breno Leitao
2026-03-13 16:27       ` Petr Mladek
2026-03-18 11:31         ` Breno Leitao
2026-03-18 15:11           ` Petr Mladek
2026-03-20 10:41             ` Breno Leitao
2026-06-11 14:50               ` Breno Leitao [this message]
2026-05-07 10:20   ` Jiri Slaby
2026-05-07 13:11     ` Breno Leitao
2026-05-11  5:21       ` Jiri Slaby
2026-05-13  7:29         ` Thorsten Leemhuis
2026-05-13  8:03           ` Jiri Slaby
2026-03-05 16:15 ` [PATCH v2 5/5] workqueue: Add stall detector sample module Breno Leitao
2026-03-05 17:25   ` Song Liu
2026-03-05 17:39 ` [PATCH v2 0/5] workqueue: Improve stall diagnostics Tejun Heo
2026-03-12 16:38 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Petr Mladek
2026-03-13 12:24   ` Breno Leitao
2026-03-13 14:38     ` Petr Mladek
2026-03-13 17:36       ` Breno Leitao
2026-03-18 16:46         ` Petr Mladek
2026-03-20 10:44           ` Breno Leitao
2026-05-13  8:57     ` Hillf Danton
2026-05-13  8:53 ` Markus Elfring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=airKG4Hl8R-7sY_x@gmail.com \
    --to=leitao@debian.org \
    --cc=akpm@linux-foundation.org \
    --cc=dcostantino@meta.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=pmladek@suse.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.