From: Petr Mladek <pmladek@suse.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
Song Liu <song@kernel.org>,
Danielle Costantino <dcostantino@meta.com>,
kasan-dev@googlegroups.com, kernel-team@meta.com
Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics
Date: Tue, 16 Jun 2026 14:38:11 +0200 [thread overview]
Message-ID: <ajFDs6AIf7YFgRx4@pathway.suse.cz> (raw)
In-Reply-To: <airKG4Hl8R-7sY_x@gmail.com>
On Thu 2026-06-11 07:50:04, Breno Leitao wrote:
> On Fri, Mar 20, 2026 at 03:41:13AM -0700, Breno Leitao wrote:
> > On Wed, Mar 18, 2026 at 04:11:54PM +0100, Petr Mladek wrote:
> > > On Wed 2026-03-18 04:31:08, Breno Leitao wrote:
> > > Otherwise, I like this patch.
> > >
> > > I still think what might be the reason that there is no worker
> > > in the running state. Let's see if this patch brings some useful info.
> > >
> > > One more idea. It might be useful to store a timestamp when the last
> > > worker was woken. And then print either the timestamp or delta.
> > > It would help to make sure that kick_pool() was really called
> > > during the reported stall.
> >
> > Ack, this is the following patch I will deploy in production, let's see
> > how useful it is.
>
> I got this running in production (backported to 6.16), and we finally got the culprit.
>
> 05:42:00 BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 115s!
> NMI backtrace for cpu 2
> CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G O 6.16.1-0_fbk4_0_gb849430a436c #1 NONE
> Tainted: [O]=OOT_MODULE
> Hardware name: <foo>
> Workqueue: efi_rts_wq efi_call_rts
> pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
> pc : 0x4052f10900
> lr : 0x4052f10e94
> sp : ffff800088cefc90
> x29: ffff800088cefc90 x28: 0000000048524641 x27: 0000004052b60000
> x26: 0000000000010058 x25: 0000004043ba0000 x24: 0000000001280000
> x23: 000000405a02807f x22: 0000000000010080 x21: 0000004053ac0097
> x20: 000000405a028080 x19: 0000004053ac0098 x18: 0000000000000000
> x17: 0000000000000030 x16: 0000004052eb6de0 x15: 0000004042ba0030
> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
> x11: 0000000001d00d09 x10: 0000004042ba0028 x9 : ffff800088cefc90
> x8 : 0000000001d00cd9 x7 : 0000000000000000 x6 : 0000004043ba0000
> x5 : 0000004043bb0000 x4 : 0000004053ac0098 x3 : 000000405a028080
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffffffffffe1e8
> Call trace:
> 0x4052f10900 (P)
> 0x4052f10e94
> 0x4052b00ed0
> 0x4052b02e38
> 0x4052b0175c
> 0x4052b517b4
> 0x4052a70b84
> 0x4052cb11d4
> __efi_rt_asm_wrapper+0x50/0x78
> efi_call_rts+0x178/0x240
> process_scheduled_works+0x17c/0x420
> worker_thread+0x184/0x4d8
> kthread+0xcc/0x1f8
> ret_from_fork+0x10/0x20
> 05:42:30 BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 145s!
> NMI backtrace for cpu 2
> CPU: 2 UID: 0 PID: 411 Comm: kworker/u288:2 Tainted: G O 6.16.1-0_fbk4_0_gb849430a436c #1 NONE
> Tainted: [O]=OOT_MODULE
> Hardware name: <foo>
> Workqueue: efi_rts_wq efi_call_rts
> pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
> pc : 0x4052f11ecc
> lr : 0x4052f10b8c
> sp : ffff800088cefc30
> x29: ffff800088cefc40 x28: 0000000048524641 x27: 0000004052b60000
> x26: 0000000000010058 x25: 0000004043fb0000 x24: 0000000001690000
> x23: 0000004053ab0040 x22: 0000000000010080 x21: ffff800088cefd00
>
> rinse and repeat..
>
> Unfortunately I didn't get the other pr_info(), because of console settings,
> but, I can say the following from this issue and previous code:
>
> 1) in show_cpu_pool_hog, found_running variable is set to false.
> 2) hash_for_each() never found any running task
> 3) The following code was trigger and was very helpful:
>
> if (!found_running)
> trigger_single_cpu_backtrace(cpu);
Great. So, the extra complexity was worth it. Should I clean it and
send a proper patch? Or would you like to do so?
Also I wonder whether it would make sense to revert the commit
8823eaef45da7f ("workqueue: Show all busy workers in stall
diagnostics"). If I get it correctly then printing all busy workers
was not that helpful. Namely, the sleeping workers should not prevent
progress.
Best Regards,
Petr
next prev parent reply other threads:[~2026-06-16 12:38 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 16:15 [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Breno Leitao
2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-03-05 17:13 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 2/5] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
2026-03-05 17:16 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 3/5] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
2026-03-05 17:17 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 4/5] workqueue: Show all busy workers " Breno Leitao
2026-03-05 17:17 ` Song Liu
2026-03-12 17:03 ` Petr Mladek
2026-03-13 12:57 ` Breno Leitao
2026-03-13 16:27 ` Petr Mladek
2026-03-18 11:31 ` Breno Leitao
2026-03-18 15:11 ` Petr Mladek
2026-03-20 10:41 ` Breno Leitao
2026-06-11 14:50 ` Breno Leitao
2026-06-16 12:38 ` Petr Mladek [this message]
2026-06-16 12:44 ` Breno Leitao
2026-06-16 12:57 ` Petr Mladek
2026-06-16 16:48 ` Breno Leitao
2026-05-07 10:20 ` Jiri Slaby
2026-05-07 13:11 ` Breno Leitao
2026-05-11 5:21 ` Jiri Slaby
2026-05-13 7:29 ` Thorsten Leemhuis
2026-05-13 8:03 ` Jiri Slaby
2026-03-05 16:15 ` [PATCH v2 5/5] workqueue: Add stall detector sample module Breno Leitao
2026-03-05 17:25 ` Song Liu
2026-03-05 17:39 ` [PATCH v2 0/5] workqueue: Improve stall diagnostics Tejun Heo
2026-03-12 16:38 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Petr Mladek
2026-03-13 12:24 ` Breno Leitao
2026-03-13 14:38 ` Petr Mladek
2026-03-13 17:36 ` Breno Leitao
2026-03-18 16:46 ` Petr Mladek
2026-03-20 10:44 ` Breno Leitao
2026-05-13 8:57 ` Hillf Danton
2026-05-13 8:53 ` Markus Elfring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajFDs6AIf7YFgRx4@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=akpm@linux-foundation.org \
--cc=dcostantino@meta.com \
--cc=jiangshanlai@gmail.com \
--cc=kasan-dev@googlegroups.com \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.