From: Jiri Slaby <jirislaby@kernel.org>
To: Thorsten Leemhuis <regressions@leemhuis.info>,
Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
Song Liu <song@kernel.org>,
Danielle Costantino <dcostantino@meta.com>,
kasan-dev@googlegroups.com, Petr Mladek <pmladek@suse.com>,
kernel-team@meta.com,
Linux kernel regressions list <regressions@lists.linux.dev>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics
Date: Wed, 13 May 2026 10:03:56 +0200 [thread overview]
Message-ID: <f0351414-41af-4f74-899c-87fb5bb45621@kernel.org> (raw)
In-Reply-To: <7e3c68dc-5695-4978-a991-68f6e9d1c4e8@leemhuis.info>
On 13. 05. 26, 9:29, Thorsten Leemhuis wrote:
> On 5/11/26 07:21, Jiri Slaby wrote:
>> we currently have several reports of this. On s390, ppc64, and x86_64.
>
> I stumbled on this by accident and this is not my area of expertise, so
> the following might be bogus:
>
> Is this maybe the same as "Observed Workqueue lockups on offline CPUs.":
> https://lore.kernel.org/lkml/97a7d011-d573-4754-9e5d-68b562c64089@linux.ibm.com/
Thanks, looks like pretty much it. All three reports have:
rcu: srcu_init: Setting srcu_struct sizes to big.
> Fix is here:
> https://lore.kernel.org/lkml/20260508174353.905746-1-paulmck@kernel.org/
Building a kernel with this and serving to the reporters to test.
> Ciao, Thorsten
>
>> On 07. 05. 26, 15:11, Breno Leitao wrote:
>>> Hi Jiri,
>>>
>>> On Thu, May 07, 2026 at 12:20:33PM +0200, Jiri Slaby wrote:
>>>> On 05. 03. 26, 17:15, Breno Leitao wrote:
>>>>
>>>> BUG: workqueue lockup - pool cpus=144 node=0 flags=0x4 nice=0
>>>> stuck for
>>>> 168224s!
>>>
>>> That's an extremely long stall (~1.95 days).
>>>
>>>> ...
>>>> Showing busy workqueues and worker pools:
>>>> workqueue rcu_gp: flags=0x108
>>>> pwq 578: cpus=144 node=0 flags=0x4 nice=0 active=3 refcnt=4
>>>> in:
>>>> https://bugzilla.suse.com/show_bug.cgi?id=1263947
>>>> ?
>>>>
>>>> Can this (or other patch from the series) cause this? Should there be
>>>> something like cpu_online() instead of task_is_running() somewhere?
>>>
>>> This series only affects stall reporting, not detection. The changes run
>>> after the watchdog has identified a stall, so the detection logic itself
>>> remains unchanged.
>>>
>>> To help diagnose this issue, could you provide some additional
>>> information:
>>>
>>> 1) Was CPU 144 online at any point? If so, when was it taken offline?
>>
>> It was not, it's non-present.
>>
>>> 2) Does this message appear repeatedly? If you bring CPU 144 online, does
>>> the issue resolve?
>>
>> Yes, look at this new x86_64 report's dmesg (I believe it is related to
>> the above report):
>> BUG: workqueue lockup - pool cpus=2 node=0 flags=0x4 nice=0 stuck for
>> 50s!
>> in:
>> https://bugzilla.suse.com/attachment.cgi?id=890229
>>
>> $ grep -c BUG sl.txt
>> 504
>> $ grep -c pwq sl.txt
>> 509
>>
>> It comes from:
>> https://bugzilla.suse.com/show_bug.cgi?id=1264554
>>
>>> 3) Have you run similar tests on earlier kernel versions without seeing
>>> this behavior, or is this a clear regression?
>>
>> It's new in 7.0. Going back to 6.19.12 makes it disappear.
>>
>> thanks,
>
--
js
suse labs
next prev parent reply other threads:[~2026-05-13 8:04 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org>
[not found] ` <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org>
2026-05-07 10:20 ` [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Jiri Slaby
2026-05-07 13:11 ` Breno Leitao
2026-05-11 5:21 ` Jiri Slaby
2026-05-13 7:29 ` Thorsten Leemhuis
2026-05-13 8:03 ` Jiri Slaby [this message]
2026-05-13 8:53 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Markus Elfring
[not found] ` <abLsAi7_fU5FrYiF@pathway.suse.cz>
[not found] ` <abP8wDhYWwk3ufmA@gmail.com>
2026-05-13 8:57 ` Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f0351414-41af-4f74-899c-87fb5bb45621@kernel.org \
--to=jirislaby@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=dcostantino@meta.com \
--cc=jiangshanlai@gmail.com \
--cc=kasan-dev@googlegroups.com \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=paulmck@kernel.org \
--cc=pmladek@suse.com \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.