All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Uwe Kleine-König" <u.kleine-koenig@baylibre.com>
To: Danilo Krummrich <dakr@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	 Saravana Kannan <saravanak@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 driver-core@lists.linux.dev, rust-for-linux@vger.kernel.org,
	 "linux-kernel@vger.kernel.org Sudeep Holla"
	<sudeep.holla@kernel.org>,
	Cristian Marussi <cristian.marussi@arm.com>,
	 arm-scmi@vger.kernel.org
Subject: Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1]
Date: Wed, 15 Apr 2026 11:48:15 +0200	[thread overview]
Message-ID: <ad9cglZCwtsVsGmq@monoceros> (raw)
In-Reply-To: <DHT95ARBC9W8.37Z54YM70UCT2@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 7411 bytes --]

Hello Danilo,

[expanded Cc: a bit for the affected driver]

On Wed, Apr 15, 2026 at 01:04:47AM +0200, Danilo Krummrich wrote:
> On Tue Apr 14, 2026 at 8:39 PM CEST, Uwe Kleine-König wrote:
> > does that mean that there is a driver involved that somehow violates driver
> > core assumptions and should be fixed even without the consistent locking?
> 
> Most likely. There are two known cases where interactions with this commit are
> expected.
> 
>   (1) One of the drivers probed on your machine gets stuck within probe() (or
>       any other place where the device lock is held, e.g. bus callbacks) for
>       some reason, e.g. due to a deadlock.  In this case this commit would
>       potentially cause other tasks to get stuck in driver_attach() when they
>       attempt to register a driver for the same bus the bad one sits on.
> 
>       This is also the main reason why we eventually reverted this commit, i.e.
>       despite not being the root cause of an issue, it makes an already bad
>       situation worse.
> 
>   (2) If there is a driver probed on your machine that registers another driver
>       from within its probe() function for the same bus it results in a
>       deadlock.  Note that this is transitive -- if a driver is probed on bus A,
>       which e.g. deploys devices on bus B that are subsequently probed, and then
>       in one of the probe() calls on bus B a driver is registered for bus A,
>       that is a deadlock as well.
> 
>       For instance, this could happen when a platform driver that runs a PCIe
>       root complex deploys the corresponding PCI devices and one of the
>       corresponding PCI drivers registers a platform driver from probe().
> 
>       Anyways, for the underlying problem this reveals, the exact constellation
>       doesn't matter.  The anti-pattern it reveals is that drivers shouldn't be
>       registered from another driver's probe() function in the first place.
> 
>       I fixed a few drivers having this anti-pattern and all of them had other
>       (lifetime) issues due to this and I think there are other potential
>       deadlock scenarios as well.
> 
> > Hints about how to approach the issue (if there is any) welcome.
> 
> For (1) I think it's obvious, and I think it wouldn't have gone unnoticed if any
> of the drivers were bad to the point that they're getting stuck in probe() or
> any other place where the device lock is held.
> 
> As for (2) I think the best way to catch it is lockdep. Unfortunately, lockdep
> won't be very helpful without some additional tricks, since the driver core
> calls lockdep_set_novalidate_class() for the device lock to avoid false
> positives.

Thanks for the patch, indeed it creates a lockdep splat on my machine:

[    2.151192] optee: probing for conduit method.
[    2.195336] optee: revision 4.9
[    2.203597] optee: Asynchronous notifications enabled
[    2.203937] optee: dynamic shared memory is enabled
[    2.218444]
[    2.218466] ============================================
[    2.218474] WARNING: possible recursive locking detected
[    2.218484] 7.0.0-dirty #32 Tainted: G        W
[    2.218496] --------------------------------------------
[    2.218500] swapper/0/1 is trying to acquire lock:
[    2.218510] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x0/0x270
[    2.218565]
[    2.218565] but task is already holding lock:
[    2.218570] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270
[    2.218601]
[    2.218601] other info that might help us debug this:
[    2.218607]  Possible unsafe locking scenario:
[    2.218607]
[    2.218611]        CPU0
[    2.218614]        ----
[    2.218617]   lock(dev->mutex#3);
[    2.218631]   lock(dev->mutex#3);
[    2.218643]
[    2.218643]  *** DEADLOCK ***
[    2.218643]
[    2.218647]  May be due to missing lock nesting notation
[    2.218647]
[    2.218651] 2 locks held by swapper/0/1:
[    2.218659]  #0: c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270
[    2.218693]  #1: c2690cb4 (dev->mutex#59){+.+.}-{4:4}, at: __device_attach+0x34/0x200
[    2.218728]
[    2.218728] stack backtrace:
[    2.218738] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W           7.0.0-dirty #32 PREEMPT
[    2.218757] Tainted: [W]=WARN
[    2.218762] Hardware name: STM32 (Device Tree Support)
[    2.218770] Call trace:
[    2.218780]  unwind_backtrace from show_stack+0x18/0x1c
[    2.218814]  show_stack from dump_stack_lvl+0x68/0x88
[    2.218843]  dump_stack_lvl from print_deadlock_bug+0x370/0x380
[    2.218871]  print_deadlock_bug from __lock_acquire+0x1498/0x1f38
[    2.218895]  __lock_acquire from lock_acquire+0x138/0x40c
[    2.218918]  lock_acquire from __driver_attach+0x40/0x270
[    2.218939]  __driver_attach from bus_for_each_dev+0x78/0xc8
[    2.218966]  bus_for_each_dev from bus_add_driver+0xe8/0x238
[    2.218996]  bus_add_driver from driver_register+0x8c/0x140
[    2.219022]  driver_register from scmi_optee_service_probe+0x150/0x1f0
[    2.219053]  scmi_optee_service_probe from really_probe+0xe8/0x424
[    2.219079]  really_probe from __driver_probe_device+0xa4/0x1fc
[    2.219097]  __driver_probe_device from driver_probe_device+0x3c/0xd8
[    2.219117]  driver_probe_device from __device_attach_driver+0xbc/0x174
[    2.219136]  __device_attach_driver from bus_for_each_drv+0x8c/0xe0
[    2.219160]  bus_for_each_drv from __device_attach+0xb0/0x200
[    2.219184]  __device_attach from device_initial_probe+0x50/0x6c
[    2.219203]  device_initial_probe from bus_probe_device+0x2c/0x84
[    2.219228]  bus_probe_device from device_add+0x618/0x87c
[    2.219257]  device_add from optee_enumerate_devices+0x210/0x2cc
[    2.219286]  optee_enumerate_devices from optee_probe+0x8a0/0xa14
[    2.219311]  optee_probe from platform_probe+0x64/0x98
[    2.219335]  platform_probe from really_probe+0xe8/0x424
[    2.219355]  really_probe from __driver_probe_device+0xa4/0x1fc
[    2.219374]  __driver_probe_device from driver_probe_device+0x3c/0xd8
[    2.219393]  driver_probe_device from __driver_attach+0x13c/0x270
[    2.219412]  __driver_attach from bus_for_each_dev+0x78/0xc8
[    2.219436]  bus_for_each_dev from bus_add_driver+0xe8/0x238
[    2.219465]  bus_add_driver from driver_register+0x8c/0x140
[    2.219490]  driver_register from optee_core_init+0x18/0x3c
[    2.219519]  optee_core_init from do_one_initcall+0x74/0x424
[    2.219548]  do_one_initcall from kernel_init_freeable+0x2a8/0x328
[    2.219574]  kernel_init_freeable from kernel_init+0x1c/0x138
[    2.219599]  kernel_init from ret_from_fork+0x14/0x28
[    2.219620] Exception stack(0xdd811fb0 to 0xdd811ff8)
[    2.219634] 1fa0:                                     00000000 00000000 00000000 00000000
[    2.219648] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.219659] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    2.221255] arm-scmi arm-scmi.0.auto: Using scmi_optee_transport
[    2.221289] arm-scmi arm-scmi.0.auto: SCMI max-rx-timeout: 30ms / max-msg-size: 104bytes / max-msg: 20

The anti-pattern here is that scmi_optee_service_probe() calls
platform_driver_register(&scmi_optee_driver), see
drivers/firmware/arm_scmi/transports/optee.c.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2026-04-15  9:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 23:04 [GIT PULL] Driver core changes for 7.0-rc1 Danilo Krummrich
2026-02-12  3:58 ` pr-tracker-bot
2026-03-01  7:44 ` Linus Torvalds
2026-03-01  7:56   ` Linus Torvalds
2026-03-01 13:01   ` Gary Guo
2026-03-01 13:04   ` Danilo Krummrich
2026-03-01 18:17     ` Linus Torvalds
2026-03-01 20:21       ` Danilo Krummrich
2026-03-01 21:01         ` Linus Torvalds
2026-03-02 19:19           ` Danilo Krummrich
2026-03-01 18:20     ` Danilo Krummrich
2026-04-14 18:39   ` Uwe Kleine-König
2026-04-14 23:04     ` Danilo Krummrich
2026-04-15  9:48       ` Uwe Kleine-König [this message]
2026-04-15 14:16         ` Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1] Cristian Marussi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad9cglZCwtsVsGmq@monoceros \
    --to=u.kleine-koenig@baylibre.com \
    --cc=akpm@linux-foundation.org \
    --cc=arm-scmi@vger.kernel.org \
    --cc=cristian.marussi@arm.com \
    --cc=dakr@kernel.org \
    --cc=driver-core@lists.linux.dev \
    --cc=gregkh@linuxfoundation.org \
    --cc=rafael@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=saravanak@kernel.org \
    --cc=sudeep.holla@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.