public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: Alice Ryhl <aliceryhl@google.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Danilo Krummrich <dakr@kernel.org>,
	driver-core@lists.linux.dev, linux-kernel@vger.kernel.org
Cc: Sasha Levin <sashal@kernel.org>,
	Maxime Ripard <mripard@kernel.org>,
	David Gow <davidgow@google.com>, Stephen Boyd <sboyd@kernel.org>,
	Brendan Higgins <brendanhiggins@google.com>,
	Rae Moar <rmoar@google.com>,
	linux-kselftest@vger.kernel.org, kunit-dev@googlegroups.com
Subject: Re: [CRASH] kunit failures in platform-device-devm
Date: Mon,  2 Mar 2026 07:31:24 -0500	[thread overview]
Message-ID: <20260302123125.2282292-1-sashal@kernel.org> (raw)
In-Reply-To: <aaRH-aXYKntYyjRS@google.com>

This response was AI-generated by bug-bot. The analysis may contain errors - please verify independently.

Hi Alice,

Thanks for the detailed report. Here is my analysis.

___

1. Bug Summary

The platform-device-devm kunit test suite crashes with a general
protection fault in queued_spin_lock_slowpath() during device
registration, followed by cascading failures including sysfs duplicate
filename errors. The root issue is test isolation: earlier kunit tests
(including the intentional NULL dereference in kunit_test_null_dereference)
corrupt kernel state, and the platform-device-devm tests use raw
platform device APIs without kunit-managed cleanup, so they cannot
recover from or survive this corrupted state. The severity is a test
infrastructure issue, not a driver core bug.

2. Stack Trace Analysis

First crash (Oops #3 — two earlier oopses already occurred):

  Oops: general protection fault, probably for non-canonical address 0xb4c3c33fcc9f57f6: 0000 [#3] SMP PTI
  CPU: 0 UID: 0 PID: 2500 Comm: kunit_try_catch Tainted: G      D  W        N  7.0.0-rc1-00138-g0c21570fbd3d-dirty #3 PREEMPT(lazy)
  Tainted: [D]=DIE, [W]=WARN, [N]=TEST
  RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0
  RAX: b4c3c340405a5a26 RBX: ffffb222800e3ce8 RCX: 0000000000050000
  RDX: ffffa0a4fec1ddd0 RSI: 0000000000000010 RDI: ffffa0a4c2b43340
  Call Trace:
   <TASK>
   klist_iter_exit+0x2c/0x70
   ? __pfx___device_attach_driver+0x10/0x10
   bus_for_each_drv+0x12a/0x160
   __device_attach+0xbf/0x160
   device_initial_probe+0x2f/0x50
   bus_probe_device+0x8f/0x110
   device_add+0x23f/0x3d0
   platform_device_add+0x137/0x1d0
   platform_device_devm_register_unregister_test+0x6c/0x2e0
   kunit_try_run_case+0x8f/0x190
   kunit_generic_run_threadfn_adapter+0x1d/0x40
   kthread+0x142/0x160
   ret_from_fork+0xc7/0x1f0
   ret_from_fork_asm+0x1a/0x30
   </TASK>

The crash point is in queued_spin_lock_slowpath() at
kernel/locking/qspinlock.c, called from klist_iter_exit() at
lib/klist.c:311. RAX holds non-canonical address 0xb4c3c340405a5a26,
indicating corrupted klist data. The calling chain is process context:
platform_device_devm_register_unregister_test() calls
platform_device_add() -> device_add() -> bus_probe_device() ->
__device_attach() -> bus_for_each_drv() (drivers/base/bus.c:420)
which iterates the bus's klist_drivers. During klist_iter_exit(),
it tries to acquire the klist spinlock and hits corrupted memory.

Second failure (duplicate sysfs entry):

  sysfs: cannot create duplicate filename '/devices/platform/test'
  Call Trace:
   <TASK>
   dump_stack_lvl+0x2d/0x70
   sysfs_create_dir_ns+0xe8/0x130
   kobject_add_internal+0x1dd/0x360
   kobject_add+0x88/0xf0
   device_add+0x171/0x3d0
   platform_device_add+0x137/0x1d0
   platform_device_devm_register_get_unregister_with_devm_test+0x6c/0x2f0
   kunit_try_run_case+0x8f/0x190
   kunit_generic_run_threadfn_adapter+0x1d/0x40
   kthread+0x142/0x160
   ret_from_fork+0xc7/0x1f0
   ret_from_fork_asm+0x1a/0x30
   </TASK>

The assertion at drivers/base/test/platform-device-test.c:97 fails
with ret == -17 (EEXIST) because the first test crashed without
unregistering its device, leaving "/devices/platform/test" in sysfs.

3. Root Cause Analysis

This is a test isolation problem, not a driver core bug. Two issues
combine to cause the failures:

(a) Corrupted kernel state from earlier oopses. The Oops header shows
"[#3]" meaning this is the third kernel oops during the boot. The
taint flags [D]=DIE and [W]=WARN confirm prior fatal faults. The
kunit_test_null_dereference() function in lib/kunit/kunit-test.c:117
intentionally dereferences NULL to test kunit's fault handling. After
multiple oopses, kernel data structures (including the platform bus
klist) can be corrupted, which explains the non-canonical address
(0xb4c3c33fcc9f57f6) seen during spinlock acquisition.

(b) Missing test-managed cleanup. The four tests in
platform_device_devm_test_suite all use the raw kernel APIs
platform_device_alloc() and platform_device_add() directly, and all
use the same hardcoded name "test" with PLATFORM_DEVID_NONE
(drivers/base/test/platform-device-test.c:62-77). If a test crashes
before reaching platform_device_unregister(), the device remains
registered and subsequent tests cannot register a device with the
same name.

By contrast, the platform_device_find_by_null_test() in the same
file already uses the kunit-managed helpers kunit_platform_device_alloc()
and kunit_platform_device_add() from lib/kunit/platform.c (added in
commit 5ac79730324c "platform: Add test managed platform_device/driver
APIs"), which automatically unregister the device when the test exits,
even on crash.

4. Affected Versions

The platform-device-devm tests were introduced in commit b4cc44301b9d
("drivers: base: Add basic devm tests for platform devices") by
Maxime Ripard, which predates the kunit-managed platform device helpers
from commit 5ac79730324c. All kernel versions containing these tests
are affected by this test isolation issue. This is not a regression in
the driver core itself.

5. Relevant Commits and Fixes

- b4cc44301b9d ("drivers: base: Add basic devm tests for platform devices")
  Introduced the test suite with raw platform device APIs.

- 699fb50d99039 ("drivers: base: Free devm resources when unregistering a device")
  Fixed devm resource release for unprobed devices; updated test expectations.

- 5ac79730324c ("platform: Add test managed platform_device/driver APIs")
  Added kunit_platform_device_alloc()/kunit_platform_device_add()
  helpers that automatically clean up on test exit.

- 86a5f32ed8813 ("drivers: base: test: Add ...find_device_by...(... NULL) tests")
  Added platform_device_find_by_null_test which already uses the
  kunit-managed helpers correctly.

No fix for this specific test isolation issue exists in mainline yet.

6. Suggested Actions

The devm tests should be updated to use the kunit-managed platform
device helpers (kunit_platform_device_alloc/kunit_platform_device_add)
from include/kunit/platform_device.h, similar to how
platform_device_find_by_null_test() already does. This would ensure
proper cleanup even when tests crash.

One subtlety: the devm tests specifically test the behavior of
platform_device_unregister() releasing devm resources, so
kunit_platform_device_add() (which auto-unregisters) needs to be
used carefully — the kunit cleanup action should be removed or
disabled before the explicit unregister call to avoid double-free.
Alternatively, each test could use a unique device name via
PLATFORM_DEVID_AUTO to at least prevent the cascading sysfs duplicate
errors.

For the probed tests (probed_platform_device_devm_register_unregister_test
and probed_platform_device_devm_register_get_unregister_with_devm_test),
there is a similar need to register/unregister the fake_driver with
kunit-managed helpers like kunit_platform_driver_register().

In the short term, you can work around this by running the
platform-device-devm suite in isolation:

  ./tools/testing/kunit/kunit.py run --make_options LLVM=1 \
    --arch x86_64 --kconfig_add CONFIG_RUST=y \
    --kconfig_add CONFIG_PCI=y platform-device-devm

This avoids the corrupted state from earlier intentional-crash tests.


  parent reply	other threads:[~2026-03-02 12:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-01 14:06 [CRASH] kunit failures in platform-device-devm Alice Ryhl
2026-03-01 20:17 ` Greg Kroah-Hartman
2026-03-01 21:21   ` Alice Ryhl
2026-03-02 12:31 ` Sasha Levin [this message]
2026-03-02 15:32 ` Alice Ryhl
2026-03-03  0:45   ` David Gow
2026-03-03 10:00     ` Alice Ryhl
2026-03-03 10:32     ` Frederic Weisbecker
2026-03-03  0:50   ` Frederic Weisbecker
2026-03-02 15:52 ` Alice Ryhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260302123125.2282292-1-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=aliceryhl@google.com \
    --cc=brendanhiggins@google.com \
    --cc=dakr@kernel.org \
    --cc=davidgow@google.com \
    --cc=driver-core@lists.linux.dev \
    --cc=gregkh@linuxfoundation.org \
    --cc=kunit-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mripard@kernel.org \
    --cc=rafael@kernel.org \
    --cc=rmoar@google.com \
    --cc=sboyd@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox