Hello Danilo, [expanded Cc: a bit for the affected driver] On Wed, Apr 15, 2026 at 01:04:47AM +0200, Danilo Krummrich wrote: > On Tue Apr 14, 2026 at 8:39 PM CEST, Uwe Kleine-König wrote: > > does that mean that there is a driver involved that somehow violates driver > > core assumptions and should be fixed even without the consistent locking? > > Most likely. There are two known cases where interactions with this commit are > expected. > > (1) One of the drivers probed on your machine gets stuck within probe() (or > any other place where the device lock is held, e.g. bus callbacks) for > some reason, e.g. due to a deadlock. In this case this commit would > potentially cause other tasks to get stuck in driver_attach() when they > attempt to register a driver for the same bus the bad one sits on. > > This is also the main reason why we eventually reverted this commit, i.e. > despite not being the root cause of an issue, it makes an already bad > situation worse. > > (2) If there is a driver probed on your machine that registers another driver > from within its probe() function for the same bus it results in a > deadlock. Note that this is transitive -- if a driver is probed on bus A, > which e.g. deploys devices on bus B that are subsequently probed, and then > in one of the probe() calls on bus B a driver is registered for bus A, > that is a deadlock as well. > > For instance, this could happen when a platform driver that runs a PCIe > root complex deploys the corresponding PCI devices and one of the > corresponding PCI drivers registers a platform driver from probe(). > > Anyways, for the underlying problem this reveals, the exact constellation > doesn't matter. The anti-pattern it reveals is that drivers shouldn't be > registered from another driver's probe() function in the first place. > > I fixed a few drivers having this anti-pattern and all of them had other > (lifetime) issues due to this and I think there are other potential > deadlock scenarios as well. > > > Hints about how to approach the issue (if there is any) welcome. > > For (1) I think it's obvious, and I think it wouldn't have gone unnoticed if any > of the drivers were bad to the point that they're getting stuck in probe() or > any other place where the device lock is held. > > As for (2) I think the best way to catch it is lockdep. Unfortunately, lockdep > won't be very helpful without some additional tricks, since the driver core > calls lockdep_set_novalidate_class() for the device lock to avoid false > positives. Thanks for the patch, indeed it creates a lockdep splat on my machine: [ 2.151192] optee: probing for conduit method. [ 2.195336] optee: revision 4.9 [ 2.203597] optee: Asynchronous notifications enabled [ 2.203937] optee: dynamic shared memory is enabled [ 2.218444] [ 2.218466] ============================================ [ 2.218474] WARNING: possible recursive locking detected [ 2.218484] 7.0.0-dirty #32 Tainted: G W [ 2.218496] -------------------------------------------- [ 2.218500] swapper/0/1 is trying to acquire lock: [ 2.218510] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x0/0x270 [ 2.218565] [ 2.218565] but task is already holding lock: [ 2.218570] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270 [ 2.218601] [ 2.218601] other info that might help us debug this: [ 2.218607] Possible unsafe locking scenario: [ 2.218607] [ 2.218611] CPU0 [ 2.218614] ---- [ 2.218617] lock(dev->mutex#3); [ 2.218631] lock(dev->mutex#3); [ 2.218643] [ 2.218643] *** DEADLOCK *** [ 2.218643] [ 2.218647] May be due to missing lock nesting notation [ 2.218647] [ 2.218651] 2 locks held by swapper/0/1: [ 2.218659] #0: c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270 [ 2.218693] #1: c2690cb4 (dev->mutex#59){+.+.}-{4:4}, at: __device_attach+0x34/0x200 [ 2.218728] [ 2.218728] stack backtrace: [ 2.218738] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 7.0.0-dirty #32 PREEMPT [ 2.218757] Tainted: [W]=WARN [ 2.218762] Hardware name: STM32 (Device Tree Support) [ 2.218770] Call trace: [ 2.218780] unwind_backtrace from show_stack+0x18/0x1c [ 2.218814] show_stack from dump_stack_lvl+0x68/0x88 [ 2.218843] dump_stack_lvl from print_deadlock_bug+0x370/0x380 [ 2.218871] print_deadlock_bug from __lock_acquire+0x1498/0x1f38 [ 2.218895] __lock_acquire from lock_acquire+0x138/0x40c [ 2.218918] lock_acquire from __driver_attach+0x40/0x270 [ 2.218939] __driver_attach from bus_for_each_dev+0x78/0xc8 [ 2.218966] bus_for_each_dev from bus_add_driver+0xe8/0x238 [ 2.218996] bus_add_driver from driver_register+0x8c/0x140 [ 2.219022] driver_register from scmi_optee_service_probe+0x150/0x1f0 [ 2.219053] scmi_optee_service_probe from really_probe+0xe8/0x424 [ 2.219079] really_probe from __driver_probe_device+0xa4/0x1fc [ 2.219097] __driver_probe_device from driver_probe_device+0x3c/0xd8 [ 2.219117] driver_probe_device from __device_attach_driver+0xbc/0x174 [ 2.219136] __device_attach_driver from bus_for_each_drv+0x8c/0xe0 [ 2.219160] bus_for_each_drv from __device_attach+0xb0/0x200 [ 2.219184] __device_attach from device_initial_probe+0x50/0x6c [ 2.219203] device_initial_probe from bus_probe_device+0x2c/0x84 [ 2.219228] bus_probe_device from device_add+0x618/0x87c [ 2.219257] device_add from optee_enumerate_devices+0x210/0x2cc [ 2.219286] optee_enumerate_devices from optee_probe+0x8a0/0xa14 [ 2.219311] optee_probe from platform_probe+0x64/0x98 [ 2.219335] platform_probe from really_probe+0xe8/0x424 [ 2.219355] really_probe from __driver_probe_device+0xa4/0x1fc [ 2.219374] __driver_probe_device from driver_probe_device+0x3c/0xd8 [ 2.219393] driver_probe_device from __driver_attach+0x13c/0x270 [ 2.219412] __driver_attach from bus_for_each_dev+0x78/0xc8 [ 2.219436] bus_for_each_dev from bus_add_driver+0xe8/0x238 [ 2.219465] bus_add_driver from driver_register+0x8c/0x140 [ 2.219490] driver_register from optee_core_init+0x18/0x3c [ 2.219519] optee_core_init from do_one_initcall+0x74/0x424 [ 2.219548] do_one_initcall from kernel_init_freeable+0x2a8/0x328 [ 2.219574] kernel_init_freeable from kernel_init+0x1c/0x138 [ 2.219599] kernel_init from ret_from_fork+0x14/0x28 [ 2.219620] Exception stack(0xdd811fb0 to 0xdd811ff8) [ 2.219634] 1fa0: 00000000 00000000 00000000 00000000 [ 2.219648] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.219659] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 2.221255] arm-scmi arm-scmi.0.auto: Using scmi_optee_transport [ 2.221289] arm-scmi arm-scmi.0.auto: SCMI max-rx-timeout: 30ms / max-msg-size: 104bytes / max-msg: 20 The anti-pattern here is that scmi_optee_service_probe() calls platform_driver_register(&scmi_optee_driver), see drivers/firmware/arm_scmi/transports/optee.c. Best regards Uwe