public inbox for driver-core@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH] Revert "driver core: enforce device_lock for driver_match_device()"
@ 2026-03-02  0:25 Danilo Krummrich
  2026-03-02  0:32 ` Danilo Krummrich
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Danilo Krummrich @ 2026-03-02  0:25 UTC (permalink / raw)
  To: gregkh, rafael, hanguidong02
  Cc: driver-core, linux-kernel, Danilo Krummrich, Linus Torvalds

This reverts commit dc23806a7c47 ("driver core: enforce device_lock for
driver_match_device()") and commit 289b14592cef ("driver core: fix
inverted "locked" suffix of driver_match_device()").

While technically correct, there is a major downside to this approach:

When a device is already present in the system and a driver is
registered on the same bus, we iterate over all devices registered on
this bus to see if one of them matches. If we come across an already
bound one where the corresponding driver crashed while holding the
device lock (e.g. in probe()) we can't make any progress anymore.

However, drivers are typically the least tested code in the kernel and
hence it is a case that is likely to happen regularly. Besides hurting
developer ergonomics, it potentially decreases chances of shutting
things down cleanly and obtaining logs in production environments as
well. [1]

This came up in the context of a firewire bug, which only in combination
with the reverted commit, caused the machine to hang. [2]

Thus, revert commit dc23806a7c47 ("driver core: enforce device_lock for
driver_match_device()") and add a brief note clarifying that an
implementer of struct bus_type must not expect match() to be called with
the device lock held.

Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Link: https://lore.kernel.org/all/67f655bb-4d81-4609-b008-68d200255dd2@davidgow.net/ [2]
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/driver-core/CAHk-=wgJ_L1C=HjcYJotg_zrZEmiLFJaoic+PWthjuQrutrfJw@mail.gmail.com/
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
 drivers/base/base.h        | 11 +----------
 drivers/base/dd.c          |  2 +-
 include/linux/device/bus.h |  2 ++
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 79d031d2d845..1af95ac68b77 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -179,19 +179,10 @@ void device_release_driver_internal(struct device *dev, const struct device_driv
 void driver_detach(const struct device_driver *drv);
 void driver_deferred_probe_del(struct device *dev);
 void device_set_deferred_probe_reason(const struct device *dev, struct va_format *vaf);
-static inline int driver_match_device_locked(const struct device_driver *drv,
-					     struct device *dev)
-{
-	device_lock_assert(dev);
-
-	return drv->bus->match ? drv->bus->match(dev, drv) : 1;
-}
-
 static inline int driver_match_device(const struct device_driver *drv,
 				      struct device *dev)
 {
-	guard(device)(dev);
-	return driver_match_device_locked(drv, dev);
+	return drv->bus->match ? drv->bus->match(dev, drv) : 1;
 }
 
 static inline void dev_sync_state(struct device *dev)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 0354f209529c..bea8da5f8a3a 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -928,7 +928,7 @@ static int __device_attach_driver(struct device_driver *drv, void *_data)
 	bool async_allowed;
 	int ret;
 
-	ret = driver_match_device_locked(drv, dev);
+	ret = driver_match_device(drv, dev);
 	if (ret == 0) {
 		/* no match */
 		return 0;
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index 99c3c83ea520..63de5f053c33 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -35,6 +35,8 @@ struct fwnode_handle;
  *		otherwise. It may also return error code if determining that
  *		the driver supports the device is not possible. In case of
  *		-EPROBE_DEFER it will queue the device for deferred probing.
+ *		Note: This callback may be invoked with or without the device
+ *		lock held.
  * @uevent:	Called when a device is added, removed, or a few other things
  *		that generate uevents to add the environment variables.
  * @probe:	Called when a new device or driver add to this bus, and callback

base-commit: 78437ab3b769f80526416570f60173c89858dd84
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Revert "driver core: enforce device_lock for driver_match_device()"
  2026-03-02  0:25 [PATCH] Revert "driver core: enforce device_lock for driver_match_device()" Danilo Krummrich
@ 2026-03-02  0:32 ` Danilo Krummrich
  2026-03-02  1:55 ` Gui-Dong Han
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Danilo Krummrich @ 2026-03-02  0:32 UTC (permalink / raw)
  To: gregkh, rafael, hanguidong02
  Cc: driver-core, linux-kernel, Danilo Krummrich, Linus Torvalds

On Mon Mar 2, 2026 at 1:25 AM CET, Danilo Krummrich wrote:
> This reverts commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and commit 289b14592cef ("driver core: fix
> inverted "locked" suffix of driver_match_device()").
>
> While technically correct, there is a major downside to this approach:
>
> When a device is already present in the system and a driver is
> registered on the same bus, we iterate over all devices registered on
> this bus to see if one of them matches. If we come across an already
> bound one where the corresponding driver crashed while holding the
> device lock (e.g. in probe()) we can't make any progress anymore.
>
> However, drivers are typically the least tested code in the kernel and
> hence it is a case that is likely to happen regularly. Besides hurting
> developer ergonomics, it potentially decreases chances of shutting
> things down cleanly and obtaining logs in production environments as
> well. [1]
>
> This came up in the context of a firewire bug, which only in combination
> with the reverted commit, caused the machine to hang. [2]
>
> Thus, revert commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and add a brief note clarifying that an
> implementer of struct bus_type must not expect match() to be called with
> the device lock held.
>
> Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
> Link: https://lore.kernel.org/all/67f655bb-4d81-4609-b008-68d200255dd2@davidgow.net/ [2]
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Closes: https://lore.kernel.org/driver-core/CAHk-=wgJ_L1C=HjcYJotg_zrZEmiLFJaoic+PWthjuQrutrfJw@mail.gmail.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Follow-up regarding driver_override: https://lore.kernel.org/driver-core/20260302002729.19438-1-dakr@kernel.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Revert "driver core: enforce device_lock for driver_match_device()"
  2026-03-02  0:25 [PATCH] Revert "driver core: enforce device_lock for driver_match_device()" Danilo Krummrich
  2026-03-02  0:32 ` Danilo Krummrich
@ 2026-03-02  1:55 ` Gui-Dong Han
  2026-03-03  1:49 ` Greg KH
  2026-03-03 12:18 ` Danilo Krummrich
  3 siblings, 0 replies; 5+ messages in thread
From: Gui-Dong Han @ 2026-03-02  1:55 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, driver-core, linux-kernel, Linus Torvalds

On Mon, Mar 2, 2026 at 8:25 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> This reverts commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and commit 289b14592cef ("driver core: fix
> inverted "locked" suffix of driver_match_device()").
>
> While technically correct, there is a major downside to this approach:
>
> When a device is already present in the system and a driver is
> registered on the same bus, we iterate over all devices registered on
> this bus to see if one of them matches. If we come across an already
> bound one where the corresponding driver crashed while holding the
> device lock (e.g. in probe()) we can't make any progress anymore.
>
> However, drivers are typically the least tested code in the kernel and
> hence it is a case that is likely to happen regularly. Besides hurting
> developer ergonomics, it potentially decreases chances of shutting
> things down cleanly and obtaining logs in production environments as
> well. [1]
>
> This came up in the context of a firewire bug, which only in combination
> with the reverted commit, caused the machine to hang. [2]
>
> Thus, revert commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and add a brief note clarifying that an
> implementer of struct bus_type must not expect match() to be called with
> the device lock held.
>
> Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
> Link: https://lore.kernel.org/all/67f655bb-4d81-4609-b008-68d200255dd2@davidgow.net/ [2]
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Closes: https://lore.kernel.org/driver-core/CAHk-=wgJ_L1C=HjcYJotg_zrZEmiLFJaoic+PWthjuQrutrfJw@mail.gmail.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

This aligns with our previous analysis:
https://lore.kernel.org/lkml/CALbr=LZ4v7N=tO1vgOsyj9AS+XuNbn6kG-QcF+PacdMjSo0iyw@mail.gmail.com/

Enforcing the device_lock here amplifies the impact of an oops during
probe, so moving to a separate lock for driver_override is definitely
a better approach.

The revert looks good to me.

As a minor note, it might be worth adding a tag for the initial Tegra
issue where this lock stall was first observed. No big deal if you
decide to leave it as is, though.

Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com>

> ---
>  drivers/base/base.h        | 11 +----------
>  drivers/base/dd.c          |  2 +-
>  include/linux/device/bus.h |  2 ++
>  3 files changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 79d031d2d845..1af95ac68b77 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -179,19 +179,10 @@ void device_release_driver_internal(struct device *dev, const struct device_driv
>  void driver_detach(const struct device_driver *drv);
>  void driver_deferred_probe_del(struct device *dev);
>  void device_set_deferred_probe_reason(const struct device *dev, struct va_format *vaf);
> -static inline int driver_match_device_locked(const struct device_driver *drv,
> -                                            struct device *dev)
> -{
> -       device_lock_assert(dev);
> -
> -       return drv->bus->match ? drv->bus->match(dev, drv) : 1;
> -}
> -
>  static inline int driver_match_device(const struct device_driver *drv,
>                                       struct device *dev)
>  {
> -       guard(device)(dev);
> -       return driver_match_device_locked(drv, dev);
> +       return drv->bus->match ? drv->bus->match(dev, drv) : 1;
>  }
>
>  static inline void dev_sync_state(struct device *dev)
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 0354f209529c..bea8da5f8a3a 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -928,7 +928,7 @@ static int __device_attach_driver(struct device_driver *drv, void *_data)
>         bool async_allowed;
>         int ret;
>
> -       ret = driver_match_device_locked(drv, dev);
> +       ret = driver_match_device(drv, dev);
>         if (ret == 0) {
>                 /* no match */
>                 return 0;
> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> index 99c3c83ea520..63de5f053c33 100644
> --- a/include/linux/device/bus.h
> +++ b/include/linux/device/bus.h
> @@ -35,6 +35,8 @@ struct fwnode_handle;
>   *             otherwise. It may also return error code if determining that
>   *             the driver supports the device is not possible. In case of
>   *             -EPROBE_DEFER it will queue the device for deferred probing.
> + *             Note: This callback may be invoked with or without the device
> + *             lock held.
>   * @uevent:    Called when a device is added, removed, or a few other things
>   *             that generate uevents to add the environment variables.
>   * @probe:     Called when a new device or driver add to this bus, and callback
>
> base-commit: 78437ab3b769f80526416570f60173c89858dd84
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Revert "driver core: enforce device_lock for driver_match_device()"
  2026-03-02  0:25 [PATCH] Revert "driver core: enforce device_lock for driver_match_device()" Danilo Krummrich
  2026-03-02  0:32 ` Danilo Krummrich
  2026-03-02  1:55 ` Gui-Dong Han
@ 2026-03-03  1:49 ` Greg KH
  2026-03-03 12:18 ` Danilo Krummrich
  3 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2026-03-03  1:49 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: rafael, hanguidong02, driver-core, linux-kernel, Linus Torvalds

On Mon, Mar 02, 2026 at 01:25:44AM +0100, Danilo Krummrich wrote:
> This reverts commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and commit 289b14592cef ("driver core: fix
> inverted "locked" suffix of driver_match_device()").
> 
> While technically correct, there is a major downside to this approach:
> 
> When a device is already present in the system and a driver is
> registered on the same bus, we iterate over all devices registered on
> this bus to see if one of them matches. If we come across an already
> bound one where the corresponding driver crashed while holding the
> device lock (e.g. in probe()) we can't make any progress anymore.
> 
> However, drivers are typically the least tested code in the kernel and
> hence it is a case that is likely to happen regularly. Besides hurting
> developer ergonomics, it potentially decreases chances of shutting
> things down cleanly and obtaining logs in production environments as
> well. [1]
> 
> This came up in the context of a firewire bug, which only in combination
> with the reverted commit, caused the machine to hang. [2]
> 
> Thus, revert commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and add a brief note clarifying that an
> implementer of struct bus_type must not expect match() to be called with
> the device lock held.
> 
> Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
> Link: https://lore.kernel.org/all/67f655bb-4d81-4609-b008-68d200255dd2@davidgow.net/ [2]
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Closes: https://lore.kernel.org/driver-core/CAHk-=wgJ_L1C=HjcYJotg_zrZEmiLFJaoic+PWthjuQrutrfJw@mail.gmail.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> ---
>  drivers/base/base.h        | 11 +----------
>  drivers/base/dd.c          |  2 +-
>  include/linux/device/bus.h |  2 ++
>  3 files changed, 4 insertions(+), 11 deletions(-)


Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Revert "driver core: enforce device_lock for driver_match_device()"
  2026-03-02  0:25 [PATCH] Revert "driver core: enforce device_lock for driver_match_device()" Danilo Krummrich
                   ` (2 preceding siblings ...)
  2026-03-03  1:49 ` Greg KH
@ 2026-03-03 12:18 ` Danilo Krummrich
  3 siblings, 0 replies; 5+ messages in thread
From: Danilo Krummrich @ 2026-03-03 12:18 UTC (permalink / raw)
  To: gregkh, rafael, hanguidong02
  Cc: driver-core, linux-kernel, Danilo Krummrich, Linus Torvalds

On Mon Mar 2, 2026 at 1:25 AM CET, Danilo Krummrich wrote:
> This reverts commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and commit 289b14592cef ("driver core: fix
> inverted "locked" suffix of driver_match_device()").
>
> While technically correct, there is a major downside to this approach:
>
> When a device is already present in the system and a driver is
> registered on the same bus, we iterate over all devices registered on
> this bus to see if one of them matches. If we come across an already
> bound one where the corresponding driver crashed while holding the
> device lock (e.g. in probe()) we can't make any progress anymore.
>
> However, drivers are typically the least tested code in the kernel and
> hence it is a case that is likely to happen regularly. Besides hurting
> developer ergonomics, it potentially decreases chances of shutting
> things down cleanly and obtaining logs in production environments as
> well. [1]
>
> This came up in the context of a firewire bug, which only in combination
> with the reverted commit, caused the machine to hang. [2]
>
> Thus, revert commit dc23806a7c47 ("driver core: enforce device_lock for
> driver_match_device()") and add a brief note clarifying that an
> implementer of struct bus_type must not expect match() to be called with
> the device lock held.
>
> Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
> Link: https://lore.kernel.org/all/67f655bb-4d81-4609-b008-68d200255dd2@davidgow.net/ [2]
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Closes: https://lore.kernel.org/driver-core/CAHk-=wgJ_L1C=HjcYJotg_zrZEmiLFJaoic+PWthjuQrutrfJw@mail.gmail.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Applied to driver-core-linus, thanks!

    [ Add additional Link: reference. - Danilo ]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-03 12:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02  0:25 [PATCH] Revert "driver core: enforce device_lock for driver_match_device()" Danilo Krummrich
2026-03-02  0:32 ` Danilo Krummrich
2026-03-02  1:55 ` Gui-Dong Han
2026-03-03  1:49 ` Greg KH
2026-03-03 12:18 ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox