Linux Kernel Selftest development
 help / color / mirror / Atom feed
* RCU stalls running KUnit on mainline
@ 2026-02-17 14:10 Mark Brown
  2026-02-18  8:53 ` David Gow
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Brown @ 2026-02-17 14:10 UTC (permalink / raw)
  To: Brendan Higgins, David Gow, Rae Moar; +Cc: linux-kselftest, kunit-dev

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

Hi,

When running KUnit via qemu on current mailine I'm seeing random
lockups, frequently but not always reporting an RCU stall.
Unfortunately these don't seem to happen in a consistent place which
makes it hard to figure out exactly what's going on, they started in
-next at some point shortly before or early in the merge window but I've
never managed to drill down and investigate them.  I don't imagine
they're due to KUnit specifically, though it seems likely some test is
triggering them.  Has anyone else seen this, or do you have any leads?

Thanks,
Mark

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RCU stalls running KUnit on mainline
  2026-02-17 14:10 RCU stalls running KUnit on mainline Mark Brown
@ 2026-02-18  8:53 ` David Gow
  2026-02-18 11:32   ` Mark Brown
  2026-02-18 19:31   ` Mark Brown
  0 siblings, 2 replies; 6+ messages in thread
From: David Gow @ 2026-02-18  8:53 UTC (permalink / raw)
  To: Mark Brown, Brendan Higgins, Rae Moar, Frederic Weisbecker
  Cc: linux-kselftest, kunit-dev

Le 17/02/2026 à 10:10 PM, 'Mark Brown' via KUnit Development a écrit :
> Hi,
> 
> When running KUnit via qemu on current mailine I'm seeing random
> lockups, frequently but not always reporting an RCU stall.
> Unfortunately these don't seem to happen in a consistent place which
> makes it hard to figure out exactly what's going on, they started in
> -next at some point shortly before or early in the merge window but I've
> never managed to drill down and investigate them.  I don't imagine
> they're due to KUnit specifically, though it seems likely some test is
> triggering them.  Has anyone else seen this, or do you have any leads?
>

Hmm… I haven't seen this yet on x86_64, but looking at arm64 and 32-bit 
i386, I do see a sporadic panic, often with rcu in the stacktrace. Seems 
to happen more often when the KUnit test kthread is starting/stopping 
(particularly, at least on i386, if it's due to a trapped fault).

I've not been able to reproduce it after reverting the kthread affinity 
series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but 
that could just be due to luck. It's flaky enough that my attempt at 
bisection kept pointing at documentation patches.

Frederic, any idea if the 7.0 kthread updates could be causing these? My 
most reliable repro command thus far is:
./tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1

— David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RCU stalls running KUnit on mainline
  2026-02-18  8:53 ` David Gow
@ 2026-02-18 11:32   ` Mark Brown
  2026-02-18 19:31   ` Mark Brown
  1 sibling, 0 replies; 6+ messages in thread
From: Mark Brown @ 2026-02-18 11:32 UTC (permalink / raw)
  To: David Gow
  Cc: Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-kselftest,
	kunit-dev

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

On Wed, Feb 18, 2026 at 04:53:16PM +0800, David Gow wrote:
> Le 17/02/2026 à 10:10 PM, 'Mark Brown' via KUnit Development a écrit :

> > When running KUnit via qemu on current mailine I'm seeing random
> > lockups, frequently but not always reporting an RCU stall.
> > Unfortunately these don't seem to happen in a consistent place which
> > makes it hard to figure out exactly what's going on, they started in

> Hmm… I haven't seen this yet on x86_64, but looking at arm64 and 32-bit
> i386, I do see a sporadic panic, often with rcu in the stacktrace. Seems to
> happen more often when the KUnit test kthread is starting/stopping
> (particularly, at least on i386, if it's due to a trapped fault).

I did see this on x86_64 FWIW.

> I've not been able to reproduce it after reverting the kthread affinity
> series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but that
> could just be due to luck. It's flaky enough that my attempt at bisection
> kept pointing at documentation patches.

Yeah, it's not reliably reproducible though it happens pretty often.  I
did try things like retrying N times to see if it fails but I have a
horrible feeling there's some dependency on the specific build somehow.
Hopefully not.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RCU stalls running KUnit on mainline
  2026-02-18  8:53 ` David Gow
  2026-02-18 11:32   ` Mark Brown
@ 2026-02-18 19:31   ` Mark Brown
  2026-02-20 15:30     ` Guillaume Tucker
  1 sibling, 1 reply; 6+ messages in thread
From: Mark Brown @ 2026-02-18 19:31 UTC (permalink / raw)
  To: David Gow
  Cc: Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-kselftest,
	kunit-dev

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

On Wed, Feb 18, 2026 at 04:53:16PM +0800, David Gow wrote:
> Le 17/02/2026 à 10:10 PM, 'Mark Brown' via KUnit Development a écrit :
> > When running KUnit via qemu on current mailine I'm seeing random
> > lockups, frequently but not always reporting an RCU stall.
> > Unfortunately these don't seem to happen in a consistent place which
> > makes it hard to figure out exactly what's going on, they started in
> > -next at some point shortly before or early in the merge window but I've
> > never managed to drill down and investigate them.  I don't imagine
> > they're due to KUnit specifically, though it seems likely some test is
> > triggering them.  Has anyone else seen this, or do you have any leads?

> I've not been able to reproduce it after reverting the kthread affinity
> series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but that
> could just be due to luck. It's flaky enough that my attempt at bisection
> kept pointing at documentation patches.

One other data point is that there's some range of commits which
generates an actual failure in the runtime PM tests:

[19:26:28] [PASSED] pm_runtime_disabled_test
[19:26:28] Unable to handle kernel execute from non-executable memory at virtual address fff000000145f358
...
[19:26:28] Call trace:
[19:26:28]  0xfff000000145f358 (P)
[19:26:28]  rpm_callback+0x74/0x80
[19:26:28]  rpm_resume+0x3cc/0x6a0
[19:26:28]  __pm_runtime_resume+0x50/0x9c
[19:26:28]  device_release_driver_internal+0xd0/0x224
[19:26:28]  device_release_driver+0x18/0x24
[19:26:28]  bus_remove_device+0xd0/0x114
[19:26:28]  device_del+0x14c/0x408
[19:26:28]  device_unregister+0x18/0x38
[19:26:28]  device_unregister_wrapper+0x10/0x20
[19:26:28]  __kunit_action_free+0x14/0x20
...
[19:26:28] [FAILED] pm_runtime_error_test

which might upset bisections.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RCU stalls running KUnit on mainline
  2026-02-18 19:31   ` Mark Brown
@ 2026-02-20 15:30     ` Guillaume Tucker
  2026-02-21 14:10       ` Mark Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Guillaume Tucker @ 2026-02-20 15:30 UTC (permalink / raw)
  To: Mark Brown, David Gow
  Cc: Brendan Higgins, Rae Moar, Frederic Weisbecker, linux-kselftest,
	kunit-dev

(sorry, resending but not with gmail)

Hello,

On 18/02/2026 20:31, 'Mark Brown' via KUnit Development wrote:
> On Wed, Feb 18, 2026 at 04:53:16PM +0800, David Gow wrote:
>> Le 17/02/2026 à 10:10 PM, 'Mark Brown' via KUnit Development a écrit :
>>> When running KUnit via qemu on current mailine I'm seeing random
>>> lockups, frequently but not always reporting an RCU stall.
>>> Unfortunately these don't seem to happen in a consistent place which
>>> makes it hard to figure out exactly what's going on, they started in
>>> -next at some point shortly before or early in the merge window but I've
>>> never managed to drill down and investigate them.  I don't imagine
>>> they're due to KUnit specifically, though it seems likely some test is
>>> triggering them.  Has anyone else seen this, or do you have any leads?
> 
>> I've not been able to reproduce it after reverting the kthread affinity
>> series (git revert -m1 d16738a4e79e55b2c3c9ff4fb7b74a4a24723515), but that
>> could just be due to luck. It's flaky enough that my attempt at bisection
>> kept pointing at documentation patches.
> 
> One other data point is that there's some range of commits which
> generates an actual failure in the runtime PM tests:
> 
> [19:26:28] [PASSED] pm_runtime_disabled_test
> [19:26:28] Unable to handle kernel execute from non-executable memory at virtual address fff000000145f358
> ...
> [19:26:28] Call trace:
> [19:26:28]  0xfff000000145f358 (P)
> [19:26:28]  rpm_callback+0x74/0x80
> [19:26:28]  rpm_resume+0x3cc/0x6a0
> [19:26:28]  __pm_runtime_resume+0x50/0x9c
> [19:26:28]  device_release_driver_internal+0xd0/0x224
> [19:26:28]  device_release_driver+0x18/0x24
> [19:26:28]  bus_remove_device+0xd0/0x114
> [19:26:28]  device_del+0x14c/0x408
> [19:26:28]  device_unregister+0x18/0x38
> [19:26:28]  device_unregister_wrapper+0x10/0x20
> [19:26:28]  __kunit_action_free+0x14/0x20
> ...
> [19:26:28] [FAILED] pm_runtime_error_test
> 
> which might upset bisections.

Yes, although I've run some automated VIXI bisections and found one
reliable panic in rcu.  I had to do it in two steps as the first
bisection landed on a merge commit.  After a bit more investigation I
reported what I found here:

    https://lore.kernel.org/all/0150e237-41d2-40ae-a857-4f97ca664468@gtucker.io/

I can bisect the other KUnit issues separately too if that helps now
that I have a quick workaround to avoid this panic (see email).

Cheers,
Guillaume

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RCU stalls running KUnit on mainline
  2026-02-20 15:30     ` Guillaume Tucker
@ 2026-02-21 14:10       ` Mark Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Brown @ 2026-02-21 14:10 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: David Gow, Brendan Higgins, Rae Moar, Frederic Weisbecker,
	linux-kselftest, kunit-dev

[-- Attachment #1: Type: text/plain, Size: 772 bytes --]

On Fri, Feb 20, 2026 at 04:30:19PM +0100, Guillaume Tucker wrote:

> Yes, although I've run some automated VIXI bisections and found one
> reliable panic in rcu.  I had to do it in two steps as the first
> bisection landed on a merge commit.  After a bit more investigation I
> reported what I found here:

>     https://lore.kernel.org/all/0150e237-41d2-40ae-a857-4f97ca664468@gtucker.io/

> I can bisect the other KUnit issues separately too if that helps now
> that I have a quick workaround to avoid this panic (see email).

Thanks for tracking that down and reporting!  Hopefully it gets fixed
soon and we can turn KUnit testing back on for -next, one flaw with that
check for intermittent bugs is that it's not so easy to hold back the
tree introducing the problem.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-21 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-17 14:10 RCU stalls running KUnit on mainline Mark Brown
2026-02-18  8:53 ` David Gow
2026-02-18 11:32   ` Mark Brown
2026-02-18 19:31   ` Mark Brown
2026-02-20 15:30     ` Guillaume Tucker
2026-02-21 14:10       ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox