public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] Driver core changes for 7.0-rc1
@ 2026-02-11 23:04 Danilo Krummrich
  2026-02-12  3:58 ` pr-tracker-bot
  2026-03-01  7:44 ` Linus Torvalds
  0 siblings, 2 replies; 15+ messages in thread
From: Danilo Krummrich @ 2026-02-11 23:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

Hi Linus,

Please pull these driver-core changes.

All commits have been in linux-next for a couple rounds; no conflicts expected.

In the -rc7 PR [1] I held back a driver-core fix (now included), as it
uncovered potential deadlocks that required a few driver fixes.

You should have received one of those for -rc8 through the gpio tree [2];
another one is included in this PR [3], and the latest one [4] should reach you
via a -fixes PR from the clk tree.

I spent significant effort catching those cases, but they are a bit tricky to
spot. However, all known ones have been addressed.

[1] https://lore.kernel.org/driver-core/DFWVL46MM928.V9LOBRWI8BLZ@kernel.org/
[2] https://lore.kernel.org/all/20260127201725.35883-1-dakr@kernel.org/
[3] https://lore.kernel.org/all/20260121141215.29658-1-dakr@kernel.org/
[4] https://lore.kernel.org/all/20260211142321.55404-1-dakr@kernel.org/

The following changes since commit 63804fed149a6750ffd28610c5c1c98cce6bd377:

  Linux 6.19-rc7 (2026-01-25 14:11:24 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git tags/driver-core-7.0-rc1

for you to fetch changes up to ba268514ea14b44570030e8ed2aef92a38679e85:

  rust: devres: fix race condition due to nesting (2026-02-07 01:03:49 +0100)

----------------------------------------------------------------
Driver core changes for 7.0-rc1

- Bus:
  - Ensure bus->match() is consistently called with the device lock held
  - Improve type safety of bus_find_device_by_acpi_dev()

- Devtmpfs:
  - Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
    simple_strtoul()
  - Avoid sparse warning by making devtmpfs_context_ops static

- IOMMU:
  - Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()

- MAINTAINERS:
  - Add the new driver-core mailing list (driver-core@lists.linux.dev)
    to all relevant entries
  - Add missing tree location for "FIRMWARE LOADER (request_firmware)"
  - Add driver-model documentation to the "DRIVER CORE" entry
  - Add missing driver-core maintainers to the "AUXILIARY BUS" entry

- Misc:
  - Change return type of attribute_container_register() to void; it has
    always been infallible
  - Do not export sysfs_change_owner(), sysfs_file_change_owner() and
    device_change_owner()
  - Move devres_for_each_res() from the public devres header to
    drivers/base/base.h
  - Do not use a static struct device for the faux bus; allocate it
    dynamically

- Revocable:
  - Patches for the revocable synchronization primitive have been
    scheduled for v7.0-rc1, but have been reverted as they need some
    more refinement

- Rust:
  - Device:
    - Support dev_printk on all device types, not just the core Device
      struct; remove now-redundant .as_ref() calls in dev_* print calls

  - Devres:
    - Introduce an internal reference count in Devres<T> to avoid a
      deadlock condition in case of (indirect) nesting

  - DMA:
    - Allow drivers to tune the maximum DMA segment size via
      dma_set_max_seg_size()

  - I/O:
    - Introduce the concept of generic I/O backends to handle different
      kinds of device shared memory through a common interface.

      This enables higher-level concepts such as register abstractions,
      I/O slices, and field projections to be built generically on top.

      In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
      trait hierarchy for sharing a common interface supporting offset
      validation and bound-checking logic between I/O backends.

    - Refactor MMIO to use the common I/O backend infrastructure

  - Misc:
    - Add __rust_helper annotations to C helpers for inlining into Rust
      code
    - Use "kernel vertical" style for imports
    - Replace kernel::c_str! with C string literals
    - Update ARef imports to use sync::aref
    - Use pin_init::zeroed() for struct auxiliary_device_id and debugfs
      file_operations initialization
    - Use LKMM atomic types in debugfs doc-tests
    - Various minor comment and documentation fixes

  - PCI:
    - Implement PCI configuration space accessors using the common I/O
      backend infrastructure
    - Document pci::Bar device endianness assumptions

  - SoC:
    - Abstractions for struct soc_device and struct soc_device_attribute
    - Sample driver for soc::Device

----------------------------------------------------------------
Alexandre Courbot (1):
      rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize

Alice Ryhl (9):
      rust: auxiliary: add __rust_helper to helpers
      rust: device: add __rust_helper to helpers
      rust: dma: add __rust_helper to helpers
      rust: io: add __rust_helper to helpers
      rust: irq: add __rust_helper to helpers
      rust: pci: add __rust_helper to helpers
      rust: platform: add __rust_helper to helpers
      rust: property: add __rust_helper to helpers
      rust: scatterlist: add __rust_helper to helpers

Alok Tiwari (2):
      rust: auxiliary: fix remove_callback invariant comment
      rust: platform: fix remove_callback invariant comment

Andy Shevchenko (1):
      driver core: make bus_find_device_by_acpi_dev() stub prototype aligned

Atharv Dubey (1):
      rust: auxiliary: use `pin_init::zeroed()` for device ID

Beata Michalska (1):
      rust: dma: allow drivers to tune max segment size

Ben Dooks (1):
      devtmpfs: make 'devtmpfs_context_ops' static

Daniel Gomez (1):
      driver core: attribute_container: change return type to void

Danilo Krummrich (18):
      rust: debugfs: use "kernel vertical" style for imports
      Merge tag 'v6.19-rc3' into driver-core-next
      rust: auxiliary: use "kernel vertical" style for imports
      rust: platform: use "kernel vertical" style for imports
      rust: driver-core: use "kernel vertical" style for imports
      rust: faux: use "kernel vertical" style for imports
      Merge tag 'v6.19-rc5' into driver-core-next
      MAINTAINERS: update auxiliary bus entry
      MAINTAINERS: driver-core: add driver-model documentation
      revocable: fix missing module license and description
      iommu/arm-smmu-qcom: do not register driver in probe()
      driver-core: move devres_for_each_res() to base.h
      MAINTAINERS: Add driver-core mailing list
      MAINTAINERS: Add missing T: entry for FIRMWARE LOADER
      Merge tag 'v6.19-rc7' into driver-core-next
      Merge tag 'driver-core-6.19-rc7-deferred' into driver-core-next
      driver core: fix inverted "locked" suffix of driver_match_device()
      rust: devres: fix race condition due to nesting

Dirk Behme (2):
      samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
      rust: dma: add missing __rust_helper annotations

FUJITA Tomonori (1):
      rust: debugfs: Use kernel Atomic type in docs example

Gary Guo (5):
      rust: device: support `dev_printk` on all devices
      rust: pci: remove redundant `.as_ref()` for `dev_*` print
      rust: samples: driver-core: remove redundant `.as_ref()` for `dev_*` print
      rust: samples: dma: remove redundant `.as_ref()` for `dev_*` print
      gpu: tyr: remove redundant `.as_ref()` for `dev_*` print

Greg Kroah-Hartman (4):
      driver core: faux: stop using static struct device
      driver core: disable revocable code from build
      sysfs: remove exports of sysfs_*change_owner()
      driver core: remove device_change_owner() export

Gui-Dong Han (1):
      driver core: enforce device_lock for driver_match_device()

Johan Hovold (3):
      Revert "selftests: revocable: Add kselftest cases"
      Revert "revocable: Add Kunit test cases"
      Revert "revocable: Revocable resource management"

Ke Sun (1):
      rust: debugfs: use pin_init::zeroed() for file_operations

Marko Turk (2):
      rust: pci: document Bar's endianness conversion
      rust: io: remove square brackets from pci::Bar reference

Matthew Maurer (2):
      rust: Add soc_device support
      rust: Add SoC Driver Sample

Shankari Anand (3):
      rust: device: Update ARef and AlwaysRefCounted imports from sync::aref
      rust: scatterlist: Update ARef imports to use sync::aref
      samples: rust: debugfs: Update ARef imports to use sync::aref

Tamir Duberstein (9):
      rust: auxiliary: replace `kernel::c_str!` with C-Strings
      rust: device: replace `kernel::c_str!` with C-Strings
      rust: platform: replace `kernel::c_str!` with C-Strings
      rust: io: replace `kernel::c_str!` with C-Strings
      rust: irq: replace `kernel::c_str!` with C-Strings
      rust: debugfs: replace `kernel::c_str!` with C-Strings
      samples: rust: debugfs: replace `kernel::c_str!` with C-Strings
      samples: rust: pci: replace `kernel::c_str!` with C-Strings
      samples: rust: faux: replace `kernel::c_str!` with C-Strings

Thorsten Blum (1):
      devtmpfs: Replace simple_strtoul with kstrtoint in mount_param

Tzung-Bi Shih (8):
      revocable: Revocable resource management
      revocable: Add Kunit test cases
      selftests: revocable: Add kselftest cases
      revocable: Remove redundant synchronize_srcu() call
      revocable: Fix races in revocable_alloc() using RCU
      revocable: Add KUnit test for provider lifetime races
      revocable: fix SRCU index corruption by requiring caller-provided storage
      revocable: Add KUnit test for concurrent access

Zhi Wang (5):
      rust: devres: style for imports
      rust: io: separate generic I/O helpers from MMIO implementation
      rust: io: factor out MMIO read/write macros
      rust: pci: add config space read/write support
      sample: rust: pci: add tests for config space routines

Zijing Zhang (1):
      rust: pci: re-export ConfigSpace

 MAINTAINERS                                |  15 +++-
 drivers/base/attribute_container.c         |   4 +-
 drivers/base/base.h                        |  15 +++-
 drivers/base/core.c                        |   1 -
 drivers/base/dd.c                          |   2 +-
 drivers/base/devtmpfs.c                    |   5 +-
 drivers/base/faux.c                        |  18 +++--
 drivers/base/transport_class.c             |   8 +-
 drivers/gpu/drm/tyr/driver.rs              |   2 +-
 drivers/gpu/drm/tyr/gpu.rs                 |   6 +-
 drivers/gpu/drm/tyr/regs.rs                |   1 +
 drivers/gpu/nova-core/gsp/sequencer.rs     |   5 +-
 drivers/gpu/nova-core/regs/macros.rs       |  90 ++++++++++++---------
 drivers/gpu/nova-core/vbios.rs             |   1 +
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  14 ++++
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  14 +++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c      |  24 +++++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |   5 ++
 drivers/pwm/pwm_th1520.rs                  |   5 +-
 drivers/scsi/scsi_transport_spi.c          |   2 +-
 fs/sysfs/file.c                            |   2 -
 include/linux/attribute_container.h        |   2 +-
 include/linux/device/bus.h                 |   4 +-
 include/linux/device/devres.h              |   4 -
 include/linux/transport_class.h            |   6 +-
 rust/bindings/bindings_helper.h            |   1 +
 rust/helpers/auxiliary.c                   |   6 +-
 rust/helpers/device.c                      |  16 ++--
 rust/helpers/dma.c                         |  31 ++++---
 rust/helpers/io.c                          |  64 ++++++++-------
 rust/helpers/irq.c                         |   6 +-
 rust/helpers/pci.c                         |  23 +++---
 rust/helpers/platform.c                    |   2 +-
 rust/helpers/property.c                    |   2 +-
 rust/helpers/scatterlist.c                 |  12 +--
 rust/kernel/auxiliary.rs                   |  30 ++++---
 rust/kernel/debugfs.rs                     |  86 +++++++++++++-------
 rust/kernel/debugfs/callback_adapters.rs   |  21 +++--
 rust/kernel/debugfs/entry.rs               |  14 +++-
 rust/kernel/debugfs/file_ops.rs            |  43 +++++-----
 rust/kernel/debugfs/traits.rs              |  43 +++++++---
 rust/kernel/device.rs                      |  32 +++++---
 rust/kernel/device/property.rs             |  11 +--
 rust/kernel/devres.rs                      | 197 ++++++++++++++++++---------------------------
 rust/kernel/dma.rs                         |  17 ++++
 rust/kernel/driver.rs                      |  12 ++-
 rust/kernel/faux.rs                        |  13 ++-
 rust/kernel/io.rs                          | 479 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------
 rust/kernel/io/mem.rs                      |  33 +++++---
 rust/kernel/io/poll.rs                     |  16 +++-
 rust/kernel/irq/request.rs                 |   6 +-
 rust/kernel/lib.rs                         |   2 +
 rust/kernel/pci.rs                         |  11 ++-
 rust/kernel/pci/id.rs                      |   2 +-
 rust/kernel/pci/io.rs                      | 210 ++++++++++++++++++++++++++++++++++++++++++++++--
 rust/kernel/platform.rs                    |  46 ++++++++---
 rust/kernel/scatterlist.rs                 |   3 +-
 rust/kernel/soc.rs                         | 135 +++++++++++++++++++++++++++++++
 samples/rust/Kconfig                       |  11 +++
 samples/rust/Makefile                      |   1 +
 samples/rust/rust_debugfs.rs               |  46 +++++++----
 samples/rust/rust_debugfs_scoped.rs        |  38 +++++----
 samples/rust/rust_dma.rs                   |  13 +--
 samples/rust/rust_driver_auxiliary.rs      |  14 ++--
 samples/rust/rust_driver_faux.rs           |  10 ++-
 samples/rust/rust_driver_pci.rs            |  43 ++++++++--
 samples/rust/rust_driver_platform.rs       |  42 +++++-----
 samples/rust/rust_soc.rs                   |  79 ++++++++++++++++++
 68 files changed, 1604 insertions(+), 573 deletions(-)
 create mode 100644 rust/kernel/soc.rs
 create mode 100644 samples/rust/rust_soc.rs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-02-11 23:04 [GIT PULL] Driver core changes for 7.0-rc1 Danilo Krummrich
@ 2026-02-12  3:58 ` pr-tracker-bot
  2026-03-01  7:44 ` Linus Torvalds
  1 sibling, 0 replies; 15+ messages in thread
From: pr-tracker-bot @ 2026-02-12  3:58 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Linus Torvalds, Greg Kroah-Hartman, Rafael J. Wysocki,
	Saravana Kannan, Andrew Morton, driver-core, rust-for-linux,
	linux-kernel

The pull request you sent on Thu, 12 Feb 2026 00:04:46 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git tags/driver-core-7.0-rc1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c6e62d002b7f0613f02d8707c80f2a7bd66808a0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-02-11 23:04 [GIT PULL] Driver core changes for 7.0-rc1 Danilo Krummrich
  2026-02-12  3:58 ` pr-tracker-bot
@ 2026-03-01  7:44 ` Linus Torvalds
  2026-03-01  7:56   ` Linus Torvalds
                     ` (3 more replies)
  1 sibling, 4 replies; 15+ messages in thread
From: Linus Torvalds @ 2026-03-01  7:44 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Wed, 11 Feb 2026 at 15:04, Danilo Krummrich <dakr@kernel.org> wrote:
>
> Driver core changes for 7.0-rc1
>
> - Bus:
>   - Ensure bus->match() is consistently called with the device lock held

So I'm coming back to this, because it turns out this sounds like a
horrible mistake in the end.

You document it as being about consistent locking, but it appears this
change is what made the "firewire oops at driver attach" turn an oops
into just a silently dead machine.

In other words, it makes fragile drivers go from "you get an oops" to
something much worse. The oops becomes unrecoverable - with typically
a black screen at boot - because the probe is holding a lock that then
makes everything else come to a grinding halt when the driver fails.

And yes, this obviously only happens for buggy driver and doesn't
matter for _correct_ code, but about half of the kernel code is
drivers, and that half of the kernel code is also the typically the
most badly tested and often questionably implemented half.

No, not all drivers a bad, but there are a lot of drivers, and some of
them have problems.

So if a driver problem causes problems for the whole machine, the
driver core design is bad.

I really think this should be re-thought. Perhaps just reverted
outright. Instead of saying

  "This inconsistency means that
    bus match() callbacks are not guaranteed to be called with the lock
    held"

as if it's automatically a bad thing, just don't depend on the device
match having to be called with a lock held if that lock has this
problem.

It's not clear why anybody should *care* about the lock at driver
attach time, when nothing else can access the device that hasn't been
brought up yet.

Put another way: the downsides seem worse than the upsides.
"Consistency" is not an upside if it causes problems.

             Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01  7:44 ` Linus Torvalds
@ 2026-03-01  7:56   ` Linus Torvalds
  2026-03-01 13:01   ` Gary Guo
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2026-03-01  7:56 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sat, 28 Feb 2026 at 23:44, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In other words, it makes fragile drivers go from "you get an oops" to
> something much worse. The oops becomes unrecoverable - with typically
> a black screen at boot - because the probe is holding a lock that then
> makes everything else come to a grinding halt when the driver fails.

See this message:

    https://lore.kernel.org/all/67f655bb-4d81-4609-b008-68d200255dd2@davidgow.net/

on the interaction with the driver core merge with the firewire oops bug.

In this case we were actually somewhat lucky, in that the hardware is
fairly common, but seldom actually used - a combination that meant tht
wer had several people who caught it fairly quickly and were willing
and able to bisect it. I had the first report the day after -rc1 was
released.

IOW, it could have been *so* much worse. Imagine some random driver
bug on hardware that isn't common and just causes inexplicable boot
failures for the people who see it and are likely using distro
kernels.

              Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01  7:44 ` Linus Torvalds
  2026-03-01  7:56   ` Linus Torvalds
@ 2026-03-01 13:01   ` Gary Guo
  2026-03-01 13:04   ` Danilo Krummrich
  2026-04-14 18:39   ` Uwe Kleine-König
  3 siblings, 0 replies; 15+ messages in thread
From: Gary Guo @ 2026-03-01 13:01 UTC (permalink / raw)
  To: Linus Torvalds, Danilo Krummrich
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun Mar 1, 2026 at 7:44 AM GMT, Linus Torvalds wrote:
> On Wed, 11 Feb 2026 at 15:04, Danilo Krummrich <dakr@kernel.org> wrote:
>>
>> Driver core changes for 7.0-rc1
>>
>> - Bus:
>>   - Ensure bus->match() is consistently called with the device lock held
>
> So I'm coming back to this, because it turns out this sounds like a
> horrible mistake in the end.
>
> You document it as being about consistent locking, but it appears this
> change is what made the "firewire oops at driver attach" turn an oops
> into just a silently dead machine.
>
> In other words, it makes fragile drivers go from "you get an oops" to
> something much worse. The oops becomes unrecoverable - with typically
> a black screen at boot - because the probe is holding a lock that then
> makes everything else come to a grinding halt when the driver fails.
>
> And yes, this obviously only happens for buggy driver and doesn't
> matter for _correct_ code, but about half of the kernel code is
> drivers, and that half of the kernel code is also the typically the
> most badly tested and often questionably implemented half.
>
> No, not all drivers a bad, but there are a lot of drivers, and some of
> them have problems.
>
> So if a driver problem causes problems for the whole machine, the
> driver core design is bad.
>
> I really think this should be re-thought. Perhaps just reverted
> outright. Instead of saying
>
>   "This inconsistency means that
>     bus match() callbacks are not guaranteed to be called with the lock
>     held"
>
> as if it's automatically a bad thing, just don't depend on the device
> match having to be called with a lock held if that lock has this
> problem.

Note that taking lock on match() fixes a real bug where data race can lead to
use-after-free https://bugzilla.kernel.org/show_bug.cgi?id=220789. It is
mentioned in the patch
https://lore.kernel.org/lkml/20260113162843.12712-1-hanguidong02@gmail.com/.

>
> It's not clear why anybody should *care* about the lock at driver
> attach time, when nothing else can access the device that hasn't been
> brought up yet.

We have always been taking the device lock when probing. This is needed as
obviously you don't want to have two drivers attaching to the same device at the
same time. When probing oops, the device lock is never going to be unlocked
again.

However, before matching starts to take the lock, we're "fine" in a sense that,
everything else keeps working as unless a device is matched and would actually
require probing, the device lock is not touched.

Perhaps what we should do is to defend against drivers oopsing inside probe and
have a mechanism so that device locks are unlocked even when probe oops. Another
option is to have `driver_override` protected by a different lock so match()
takes that lock instead of the device lock.

Best,
Gary

>
> Put another way: the downsides seem worse than the upsides.
> "Consistency" is not an upside if it causes problems.
>
>              Linus


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01  7:44 ` Linus Torvalds
  2026-03-01  7:56   ` Linus Torvalds
  2026-03-01 13:01   ` Gary Guo
@ 2026-03-01 13:04   ` Danilo Krummrich
  2026-03-01 18:17     ` Linus Torvalds
  2026-03-01 18:20     ` Danilo Krummrich
  2026-04-14 18:39   ` Uwe Kleine-König
  3 siblings, 2 replies; 15+ messages in thread
From: Danilo Krummrich @ 2026-03-01 13:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun Mar 1, 2026 at 8:44 AM CET, Linus Torvalds wrote:
> So I'm coming back to this, because it turns out this sounds like a
> horrible mistake in the end.

I came to the same conclusion following the discussion around the firewire oops.

> You document it as being about consistent locking

It happens that quite a few busses rely on this, and there is a possible race
condition that can lead to UAF bugs in the context of driver_override.

I think it is rather unlikely to happen though, as it would require a user to
change a device's driver_override field through sysfs while the device is
matched with a driver.

In any case, this can easily be solved with a separate lock.

> In other words, it makes fragile drivers go from "you get an oops" to
> something much worse. The oops becomes unrecoverable - with typically
> a black screen at boot - because the probe is holding a lock that then
> makes everything else come to a grinding halt when the driver fails.

Yes, the problem is that when a device is already present in the system and a
driver is registered on the same bus, we iterate over all devices registered on
this bus to see if one of them matches. If we come across an already bound one
where the corresponding driver crashed while holding the device lock (e.g. in
probe()) we can't make any progress anymore.

Obviously, this is not an issue the other way around, i.e. when the driver is
present in the system first and the device is added subsequently.

> And yes, this obviously only happens for buggy driver and doesn't
> matter for _correct_ code, but about half of the kernel code is
> drivers, and that half of the kernel code is also the typically the
> most badly tested and often questionably implemented half.

I agree, it is a case that will happen regularly, and besides hurting developer
ergonomics, it potentially decreases chances of shutting things down cleanly and
obtaining logs in a production environment as well.

> I really think this should be re-thought. Perhaps just reverted
> outright.

Yes, I agree and in fact I already have a few local changes to move
driver_override to struct device, provide corresponding accessors for busses and
handle locking with a separate lock.

(Technically, the "move driver_override to struct device" part is orthogonal,
but doing it right away results in less (and much cleaner) changes.)

I do not consider those changes to be complicated and risky, but I'm not sure
you want to see those for one of the upcoming -rc releases (probably -rc4/5).

Independently, I can send a revert for -rc3.

Thanks,
Danilo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01 13:04   ` Danilo Krummrich
@ 2026-03-01 18:17     ` Linus Torvalds
  2026-03-01 20:21       ` Danilo Krummrich
  2026-03-01 18:20     ` Danilo Krummrich
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2026-03-01 18:17 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun, 1 Mar 2026 at 05:04, Danilo Krummrich <dakr@kernel.org> wrote:
>
> It happens that quite a few busses rely on this, and there is a possible race
> condition that can lead to UAF bugs in the context of driver_override.
>
> I think it is rather unlikely to happen though, as it would require a user to
> change a device's driver_override field through sysfs while the device is
> matched with a driver.
>
> In any case, this can easily be solved with a separate lock.

Yes, if it's literally just about driver_override, please just fix the locking.

Use some really simple local spinlock lock to just copy the string
into a local copy when accessing it - it's not like it's even some
arbitrarily long string afaik (how long can driver names be?)

Don't use a huge sleeping lock that has other semantics for something
trivial like this.

(Or is there some other driver_override thing I'm not aware of?)

             Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01 13:04   ` Danilo Krummrich
  2026-03-01 18:17     ` Linus Torvalds
@ 2026-03-01 18:20     ` Danilo Krummrich
  1 sibling, 0 replies; 15+ messages in thread
From: Danilo Krummrich @ 2026-03-01 18:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun Mar 1, 2026 at 2:04 PM CET, Danilo Krummrich wrote:
> Yes, I agree and in fact I already have a few local changes to move
> driver_override to struct device, provide corresponding accessors for busses and
> handle locking with a separate lock.
>
> (Technically, the "move driver_override to struct device" part is orthogonal,
> but doing it right away results in less (and much cleaner) changes.)
>
> I do not consider those changes to be complicated and risky, but I'm not sure
> you want to see those for one of the upcoming -rc releases (probably -rc4/5).

This is roughly what I have in mind [1] (partially compile and runtime tested).

Given that we agree that the driver_match_device() change should be reverted, it
may make sense to land the first patch in an upcoming -rc, split up the treewide
one, and let subsystems follow-up with those fixes individually.

Please let me know what you prefer.

- Danilo

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=driver_override

> Independently, I can send a revert for -rc3.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01 18:17     ` Linus Torvalds
@ 2026-03-01 20:21       ` Danilo Krummrich
  2026-03-01 21:01         ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Danilo Krummrich @ 2026-03-01 20:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun Mar 1, 2026 at 7:17 PM CET, Linus Torvalds wrote:
> Use some really simple local spinlock lock to just copy the string
> into a local copy when accessing it - it's not like it's even some
> arbitrarily long string afaik (how long can driver names be?)

Yes, that's what my code in [1] already does. Actually, I think we don't even
need a local copy for accessing the string. We should be good with something
like

	int device_match_driver_override(struct device *dev,
					 const struct device_driver *drv)

which internally compares the strings holding the spinlock.

Otherwise, since you asked, the string length is currently limited to
PAGE_SIZE - 1 as it will be copied with sysfs_emit(). But as mentioned, we
shouldn't need a copy anyways.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=driver_override

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01 20:21       ` Danilo Krummrich
@ 2026-03-01 21:01         ` Linus Torvalds
  2026-03-02 19:19           ` Danilo Krummrich
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2026-03-01 21:01 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun, 1 Mar 2026 at 12:21, Danilo Krummrich <dakr@kernel.org> wrote:
>
> Otherwise, since you asked, the string length is currently limited to
> PAGE_SIZE - 1 as it will be copied with sysfs_emit().

Well, the string length from /proc itself might be long, but do we *care*?

It's only meaningful when it matches a driver name, so anything longer
than the longest driver name is irrelevant - it's not going to match.

So the thing that would matter is the longest actual real driver name.
Aren't those typically just a few bytes (eg "regulator-bus-drv" or
"__typec_altmode_driver" being long ones I find with a bad grep
pattern that might miss millions of other cases)

IOW, it looks like it would be fine to just say "use just a 32-byte
buffer" if a buffer is needed.

Of course, if no buffer is needed that's even better.

             Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01 21:01         ` Linus Torvalds
@ 2026-03-02 19:19           ` Danilo Krummrich
  0 siblings, 0 replies; 15+ messages in thread
From: Danilo Krummrich @ 2026-03-02 19:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

On Sun Mar 1, 2026 at 10:01 PM CET, Linus Torvalds wrote:
> So the thing that would matter is the longest actual real driver name.
> Aren't those typically just a few bytes (eg "regulator-bus-drv" or
> "__typec_altmode_driver" being long ones I find with a bad grep
> pattern that might miss millions of other cases)

Yes, generally they should be pretty short.

OOC, I asked an LLM to figure it out (so take this with the necessary grain of
salt):

	Out of ~7,560 unique driver names:
	
	┌────────────┬───────────┐
	│ Threshold  │   Count   │
	├────────────┼───────────┤
	│ > 32 chars │ 6 (0.08%) │
	├────────────┼───────────┤
	│ > 48 chars │ 1 (0.01%) │
	├────────────┼───────────┤
	│ > 64 chars │ 0         │
	└────────────┴───────────┘
	
	Length distribution
	
	 1-10 chars:  3,513  (46.5%)
	11-16 chars:  3,111  (41.1%)
	17-24 chars:    880  (11.6%)
	25-32 chars:     50  ( 0.7%)
	33-49 chars:      6  ( 0.1%)

> IOW, it looks like it would be fine to just say "use just a 32-byte
> buffer" if a buffer is needed.

Assuming the above is roughly correct, you are pretty close. :)

> Of course, if no buffer is needed that's even better.

Yes, I don't think we ever need to obtain a copy other than through
sysfs_emit().

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-03-01  7:44 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2026-03-01 13:04   ` Danilo Krummrich
@ 2026-04-14 18:39   ` Uwe Kleine-König
  2026-04-14 23:04     ` Danilo Krummrich
  3 siblings, 1 reply; 15+ messages in thread
From: Uwe Kleine-König @ 2026-04-14 18:39 UTC (permalink / raw)
  To: Danilo Krummrich, Linus Torvalds
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Saravana Kannan,
	Andrew Morton, driver-core, rust-for-linux, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

Hello,

On Sat, Feb 28, 2026 at 11:44:25PM -0800, Linus Torvalds wrote:
> On Wed, 11 Feb 2026 at 15:04, Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > Driver core changes for 7.0-rc1
> >
> > - Bus:
> >   - Ensure bus->match() is consistently called with the device lock held
> 
> So I'm coming back to this, because it turns out this sounds like a
> horrible mistake in the end.
> 
> You document it as being about consistent locking, but it appears this
> change is what made the "firewire oops at driver attach" turn an oops
> into just a silently dead machine.
> 
> In other words, it makes fragile drivers go from "you get an oops" to
> something much worse. The oops becomes unrecoverable - with typically
> a black screen at boot - because the probe is holding a lock that then
> makes everything else come to a grinding halt when the driver fails.
> 
> And yes, this obviously only happens for buggy driver and doesn't
> matter for _correct_ code, but about half of the kernel code is
> drivers, and that half of the kernel code is also the typically the
> most badly tested and often questionably implemented half.

I have a machine here (stm32mp135f-dk, ARCH=arm) that fails to boot with
dc23806a7c47 ("driver core: enforce device_lock for
driver_match_device()"), but doesn't oops on dc23806a7c47^.
(Fails to boot = no kernel messages on console.)

I didn't try to debug that yet, but I wonder if that is an understood
problem of said commit. I know that the commit was reverted in the
meantime (and the machine boots fine on 9de68394a615 ("Revert "driver
core: enforce device_lock for driver_match_device()""), but does that
mean that there is a driver involved that somehow violates driver core
assumptions and should be fixed even without the consistent locking?
Hints about how to approach the issue (if there is any) welcome.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] Driver core changes for 7.0-rc1
  2026-04-14 18:39   ` Uwe Kleine-König
@ 2026-04-14 23:04     ` Danilo Krummrich
  2026-04-15  9:48       ` Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1] Uwe Kleine-König
  0 siblings, 1 reply; 15+ messages in thread
From: Danilo Krummrich @ 2026-04-14 23:04 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Linus Torvalds, Greg Kroah-Hartman, Rafael J. Wysocki,
	Saravana Kannan, Andrew Morton, driver-core, rust-for-linux,
	linux-kernel

On Tue Apr 14, 2026 at 8:39 PM CEST, Uwe Kleine-König wrote:
> does that mean that there is a driver involved that somehow violates driver
> core assumptions and should be fixed even without the consistent locking?

Most likely. There are two known cases where interactions with this commit are
expected.

  (1) One of the drivers probed on your machine gets stuck within probe() (or
      any other place where the device lock is held, e.g. bus callbacks) for
      some reason, e.g. due to a deadlock.  In this case this commit would
      potentially cause other tasks to get stuck in driver_attach() when they
      attempt to register a driver for the same bus the bad one sits on.

      This is also the main reason why we eventually reverted this commit, i.e.
      despite not being the root cause of an issue, it makes an already bad
      situation worse.

  (2) If there is a driver probed on your machine that registers another driver
      from within its probe() function for the same bus it results in a
      deadlock.  Note that this is transitive -- if a driver is probed on bus A,
      which e.g. deploys devices on bus B that are subsequently probed, and then
      in one of the probe() calls on bus B a driver is registered for bus A,
      that is a deadlock as well.

      For instance, this could happen when a platform driver that runs a PCIe
      root complex deploys the corresponding PCI devices and one of the
      corresponding PCI drivers registers a platform driver from probe().

      Anyways, for the underlying problem this reveals, the exact constellation
      doesn't matter.  The anti-pattern it reveals is that drivers shouldn't be
      registered from another driver's probe() function in the first place.

      I fixed a few drivers having this anti-pattern and all of them had other
      (lifetime) issues due to this and I think there are other potential
      deadlock scenarios as well.

> Hints about how to approach the issue (if there is any) welcome.

For (1) I think it's obvious, and I think it wouldn't have gone unnoticed if any
of the drivers were bad to the point that they're getting stuck in probe() or
any other place where the device lock is held.

As for (2) I think the best way to catch it is lockdep. Unfortunately, lockdep
won't be very helpful without some additional tricks, since the driver core
calls lockdep_set_novalidate_class() for the device lock to avoid false
positives.

However, we can work around this by registering a dynamic lock class key for
every struct device individually [1] and fake taking the device lock with
mutex_acquire() and mutex_release() in __driver_attach().

This way your box should still boot properly, and in case it got stuck due to
(2), print a proper lockdep splat.

I hope this helps!

- Danilo

[1]

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 763e17e9f148..6770eba83fbd 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2555,6 +2555,7 @@ static void device_release(struct kobject *kobj)
         */
        devres_release_all(dev);

+       lockdep_unregister_key(&dev->mutex_key);
        kfree(dev->dma_range_map);
        kfree(dev->driver_override.name);

@@ -3160,9 +3161,9 @@ void device_initialize(struct device *dev)
        dev->kobj.kset = devices_kset;
        kobject_init(&dev->kobj, &device_ktype);
        INIT_LIST_HEAD(&dev->dma_pools);
-       mutex_init(&dev->mutex);
+       lockdep_register_key(&dev->mutex_key);
+       __mutex_init(&dev->mutex, "dev->mutex", &dev->mutex_key);
        spin_lock_init(&dev->driver_override.lock);
-       lockdep_set_novalidate_class(&dev->mutex);
        spin_lock_init(&dev->devres_lock);
        INIT_LIST_HEAD(&dev->devres_head);
        device_pm_init(dev);
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index cb5046f0634d..a81a4ec2284c 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1228,7 +1228,9 @@ static int __driver_attach(struct device *dev, void *data)
         * is an error.
         */

+       mutex_acquire(&dev->mutex.dep_map, 0, 0, _THIS_IP_);
        ret = driver_match_device(drv, dev);
+       mutex_release(&dev->mutex.dep_map, _THIS_IP_);
        if (ret == 0) {
                /* no match */
                return 0;
diff --git a/include/linux/device.h b/include/linux/device.h
index f0d52e1a6e07..2185d50f1c1d 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -585,6 +585,7 @@ struct device {
        struct mutex            mutex;  /* mutex to synchronize calls to
                                         * its driver.
                                         */
+       struct lock_class_key   mutex_key;

        struct dev_links_info   links;
        struct dev_pm_info      power;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1]
  2026-04-14 23:04     ` Danilo Krummrich
@ 2026-04-15  9:48       ` Uwe Kleine-König
  2026-04-15 14:16         ` Cristian Marussi
  0 siblings, 1 reply; 15+ messages in thread
From: Uwe Kleine-König @ 2026-04-15  9:48 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Linus Torvalds, Greg Kroah-Hartman, Rafael J. Wysocki,
	Saravana Kannan, Andrew Morton, driver-core, rust-for-linux,
	linux-kernel@vger.kernel.org Sudeep Holla, Cristian Marussi,
	arm-scmi

[-- Attachment #1: Type: text/plain, Size: 7411 bytes --]

Hello Danilo,

[expanded Cc: a bit for the affected driver]

On Wed, Apr 15, 2026 at 01:04:47AM +0200, Danilo Krummrich wrote:
> On Tue Apr 14, 2026 at 8:39 PM CEST, Uwe Kleine-König wrote:
> > does that mean that there is a driver involved that somehow violates driver
> > core assumptions and should be fixed even without the consistent locking?
> 
> Most likely. There are two known cases where interactions with this commit are
> expected.
> 
>   (1) One of the drivers probed on your machine gets stuck within probe() (or
>       any other place where the device lock is held, e.g. bus callbacks) for
>       some reason, e.g. due to a deadlock.  In this case this commit would
>       potentially cause other tasks to get stuck in driver_attach() when they
>       attempt to register a driver for the same bus the bad one sits on.
> 
>       This is also the main reason why we eventually reverted this commit, i.e.
>       despite not being the root cause of an issue, it makes an already bad
>       situation worse.
> 
>   (2) If there is a driver probed on your machine that registers another driver
>       from within its probe() function for the same bus it results in a
>       deadlock.  Note that this is transitive -- if a driver is probed on bus A,
>       which e.g. deploys devices on bus B that are subsequently probed, and then
>       in one of the probe() calls on bus B a driver is registered for bus A,
>       that is a deadlock as well.
> 
>       For instance, this could happen when a platform driver that runs a PCIe
>       root complex deploys the corresponding PCI devices and one of the
>       corresponding PCI drivers registers a platform driver from probe().
> 
>       Anyways, for the underlying problem this reveals, the exact constellation
>       doesn't matter.  The anti-pattern it reveals is that drivers shouldn't be
>       registered from another driver's probe() function in the first place.
> 
>       I fixed a few drivers having this anti-pattern and all of them had other
>       (lifetime) issues due to this and I think there are other potential
>       deadlock scenarios as well.
> 
> > Hints about how to approach the issue (if there is any) welcome.
> 
> For (1) I think it's obvious, and I think it wouldn't have gone unnoticed if any
> of the drivers were bad to the point that they're getting stuck in probe() or
> any other place where the device lock is held.
> 
> As for (2) I think the best way to catch it is lockdep. Unfortunately, lockdep
> won't be very helpful without some additional tricks, since the driver core
> calls lockdep_set_novalidate_class() for the device lock to avoid false
> positives.

Thanks for the patch, indeed it creates a lockdep splat on my machine:

[    2.151192] optee: probing for conduit method.
[    2.195336] optee: revision 4.9
[    2.203597] optee: Asynchronous notifications enabled
[    2.203937] optee: dynamic shared memory is enabled
[    2.218444]
[    2.218466] ============================================
[    2.218474] WARNING: possible recursive locking detected
[    2.218484] 7.0.0-dirty #32 Tainted: G        W
[    2.218496] --------------------------------------------
[    2.218500] swapper/0/1 is trying to acquire lock:
[    2.218510] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x0/0x270
[    2.218565]
[    2.218565] but task is already holding lock:
[    2.218570] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270
[    2.218601]
[    2.218601] other info that might help us debug this:
[    2.218607]  Possible unsafe locking scenario:
[    2.218607]
[    2.218611]        CPU0
[    2.218614]        ----
[    2.218617]   lock(dev->mutex#3);
[    2.218631]   lock(dev->mutex#3);
[    2.218643]
[    2.218643]  *** DEADLOCK ***
[    2.218643]
[    2.218647]  May be due to missing lock nesting notation
[    2.218647]
[    2.218651] 2 locks held by swapper/0/1:
[    2.218659]  #0: c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270
[    2.218693]  #1: c2690cb4 (dev->mutex#59){+.+.}-{4:4}, at: __device_attach+0x34/0x200
[    2.218728]
[    2.218728] stack backtrace:
[    2.218738] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W           7.0.0-dirty #32 PREEMPT
[    2.218757] Tainted: [W]=WARN
[    2.218762] Hardware name: STM32 (Device Tree Support)
[    2.218770] Call trace:
[    2.218780]  unwind_backtrace from show_stack+0x18/0x1c
[    2.218814]  show_stack from dump_stack_lvl+0x68/0x88
[    2.218843]  dump_stack_lvl from print_deadlock_bug+0x370/0x380
[    2.218871]  print_deadlock_bug from __lock_acquire+0x1498/0x1f38
[    2.218895]  __lock_acquire from lock_acquire+0x138/0x40c
[    2.218918]  lock_acquire from __driver_attach+0x40/0x270
[    2.218939]  __driver_attach from bus_for_each_dev+0x78/0xc8
[    2.218966]  bus_for_each_dev from bus_add_driver+0xe8/0x238
[    2.218996]  bus_add_driver from driver_register+0x8c/0x140
[    2.219022]  driver_register from scmi_optee_service_probe+0x150/0x1f0
[    2.219053]  scmi_optee_service_probe from really_probe+0xe8/0x424
[    2.219079]  really_probe from __driver_probe_device+0xa4/0x1fc
[    2.219097]  __driver_probe_device from driver_probe_device+0x3c/0xd8
[    2.219117]  driver_probe_device from __device_attach_driver+0xbc/0x174
[    2.219136]  __device_attach_driver from bus_for_each_drv+0x8c/0xe0
[    2.219160]  bus_for_each_drv from __device_attach+0xb0/0x200
[    2.219184]  __device_attach from device_initial_probe+0x50/0x6c
[    2.219203]  device_initial_probe from bus_probe_device+0x2c/0x84
[    2.219228]  bus_probe_device from device_add+0x618/0x87c
[    2.219257]  device_add from optee_enumerate_devices+0x210/0x2cc
[    2.219286]  optee_enumerate_devices from optee_probe+0x8a0/0xa14
[    2.219311]  optee_probe from platform_probe+0x64/0x98
[    2.219335]  platform_probe from really_probe+0xe8/0x424
[    2.219355]  really_probe from __driver_probe_device+0xa4/0x1fc
[    2.219374]  __driver_probe_device from driver_probe_device+0x3c/0xd8
[    2.219393]  driver_probe_device from __driver_attach+0x13c/0x270
[    2.219412]  __driver_attach from bus_for_each_dev+0x78/0xc8
[    2.219436]  bus_for_each_dev from bus_add_driver+0xe8/0x238
[    2.219465]  bus_add_driver from driver_register+0x8c/0x140
[    2.219490]  driver_register from optee_core_init+0x18/0x3c
[    2.219519]  optee_core_init from do_one_initcall+0x74/0x424
[    2.219548]  do_one_initcall from kernel_init_freeable+0x2a8/0x328
[    2.219574]  kernel_init_freeable from kernel_init+0x1c/0x138
[    2.219599]  kernel_init from ret_from_fork+0x14/0x28
[    2.219620] Exception stack(0xdd811fb0 to 0xdd811ff8)
[    2.219634] 1fa0:                                     00000000 00000000 00000000 00000000
[    2.219648] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.219659] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    2.221255] arm-scmi arm-scmi.0.auto: Using scmi_optee_transport
[    2.221289] arm-scmi arm-scmi.0.auto: SCMI max-rx-timeout: 30ms / max-msg-size: 104bytes / max-msg: 20

The anti-pattern here is that scmi_optee_service_probe() calls
platform_driver_register(&scmi_optee_driver), see
drivers/firmware/arm_scmi/transports/optee.c.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1]
  2026-04-15  9:48       ` Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1] Uwe Kleine-König
@ 2026-04-15 14:16         ` Cristian Marussi
  0 siblings, 0 replies; 15+ messages in thread
From: Cristian Marussi @ 2026-04-15 14:16 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Danilo Krummrich, Linus Torvalds, Greg Kroah-Hartman,
	Rafael J. Wysocki, Saravana Kannan, Andrew Morton, driver-core,
	rust-for-linux, linux-kernel@vger.kernel.org Sudeep Holla,
	Cristian Marussi, arm-scmi

On Wed, Apr 15, 2026 at 11:48:15AM +0200, Uwe Kleine-König wrote:
> Hello Danilo,
> 
> [expanded Cc: a bit for the affected driver]
> 

Hi,

> On Wed, Apr 15, 2026 at 01:04:47AM +0200, Danilo Krummrich wrote:
> > On Tue Apr 14, 2026 at 8:39 PM CEST, Uwe Kleine-König wrote:
> > > does that mean that there is a driver involved that somehow violates driver
> > > core assumptions and should be fixed even without the consistent locking?
> > 

[snip]
 
> Thanks for the patch, indeed it creates a lockdep splat on my machine:
> 
> [    2.151192] optee: probing for conduit method.
> [    2.195336] optee: revision 4.9
> [    2.203597] optee: Asynchronous notifications enabled
> [    2.203937] optee: dynamic shared memory is enabled
> [    2.218444]
> [    2.218466] ============================================
> [    2.218474] WARNING: possible recursive locking detected
> [    2.218484] 7.0.0-dirty #32 Tainted: G        W
> [    2.218496] --------------------------------------------
> [    2.218500] swapper/0/1 is trying to acquire lock:
> [    2.218510] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x0/0x270
> [    2.218565]
> [    2.218565] but task is already holding lock:
> [    2.218570] c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270
> [    2.218601]
> [    2.218601] other info that might help us debug this:
> [    2.218607]  Possible unsafe locking scenario:
> [    2.218607]
> [    2.218611]        CPU0
> [    2.218614]        ----
> [    2.218617]   lock(dev->mutex#3);
> [    2.218631]   lock(dev->mutex#3);
> [    2.218643]
> [    2.218643]  *** DEADLOCK ***
> [    2.218643]
> [    2.218647]  May be due to missing lock nesting notation
> [    2.218647]
> [    2.218651] 2 locks held by swapper/0/1:
> [    2.218659]  #0: c2035cb4 (dev->mutex#3){+.+.}-{4:4}, at: __driver_attach+0x130/0x270
> [    2.218693]  #1: c2690cb4 (dev->mutex#59){+.+.}-{4:4}, at: __device_attach+0x34/0x200
> [    2.218728]
> [    2.218728] stack backtrace:
> [    2.218738] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W           7.0.0-dirty #32 PREEMPT
> [    2.218757] Tainted: [W]=WARN
> [    2.218762] Hardware name: STM32 (Device Tree Support)
> [    2.218770] Call trace:
> [    2.218780]  unwind_backtrace from show_stack+0x18/0x1c
> [    2.218814]  show_stack from dump_stack_lvl+0x68/0x88
> [    2.218843]  dump_stack_lvl from print_deadlock_bug+0x370/0x380
> [    2.218871]  print_deadlock_bug from __lock_acquire+0x1498/0x1f38
> [    2.218895]  __lock_acquire from lock_acquire+0x138/0x40c
> [    2.218918]  lock_acquire from __driver_attach+0x40/0x270
> [    2.218939]  __driver_attach from bus_for_each_dev+0x78/0xc8
> [    2.218966]  bus_for_each_dev from bus_add_driver+0xe8/0x238
> [    2.218996]  bus_add_driver from driver_register+0x8c/0x140
> [    2.219022]  driver_register from scmi_optee_service_probe+0x150/0x1f0
> [    2.219053]  scmi_optee_service_probe from really_probe+0xe8/0x424
> [    2.219079]  really_probe from __driver_probe_device+0xa4/0x1fc
> [    2.219097]  __driver_probe_device from driver_probe_device+0x3c/0xd8
> [    2.219117]  driver_probe_device from __device_attach_driver+0xbc/0x174
> [    2.219136]  __device_attach_driver from bus_for_each_drv+0x8c/0xe0
> [    2.219160]  bus_for_each_drv from __device_attach+0xb0/0x200
> [    2.219184]  __device_attach from device_initial_probe+0x50/0x6c
> [    2.219203]  device_initial_probe from bus_probe_device+0x2c/0x84
> [    2.219228]  bus_probe_device from device_add+0x618/0x87c
> [    2.219257]  device_add from optee_enumerate_devices+0x210/0x2cc
> [    2.219286]  optee_enumerate_devices from optee_probe+0x8a0/0xa14
> [    2.219311]  optee_probe from platform_probe+0x64/0x98
> [    2.219335]  platform_probe from really_probe+0xe8/0x424
> [    2.219355]  really_probe from __driver_probe_device+0xa4/0x1fc
> [    2.219374]  __driver_probe_device from driver_probe_device+0x3c/0xd8
> [    2.219393]  driver_probe_device from __driver_attach+0x13c/0x270
> [    2.219412]  __driver_attach from bus_for_each_dev+0x78/0xc8
> [    2.219436]  bus_for_each_dev from bus_add_driver+0xe8/0x238
> [    2.219465]  bus_add_driver from driver_register+0x8c/0x140
> [    2.219490]  driver_register from optee_core_init+0x18/0x3c
> [    2.219519]  optee_core_init from do_one_initcall+0x74/0x424
> [    2.219548]  do_one_initcall from kernel_init_freeable+0x2a8/0x328
> [    2.219574]  kernel_init_freeable from kernel_init+0x1c/0x138
> [    2.219599]  kernel_init from ret_from_fork+0x14/0x28
> [    2.219620] Exception stack(0xdd811fb0 to 0xdd811ff8)
> [    2.219634] 1fa0:                                     00000000 00000000 00000000 00000000
> [    2.219648] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    2.219659] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [    2.221255] arm-scmi arm-scmi.0.auto: Using scmi_optee_transport
> [    2.221289] arm-scmi arm-scmi.0.auto: SCMI max-rx-timeout: 30ms / max-msg-size: 104bytes / max-msg: 20
> 
> The anti-pattern here is that scmi_optee_service_probe() calls
> platform_driver_register(&scmi_optee_driver), see
> drivers/firmware/arm_scmi/transports/optee.c.
> 

Yes, this was reported a few weeks ago, and it affects both SCMI Optee AND
Virtio transport drivers, indeed.

It was a bad idea implemented while reworking the SCMI transport layer a
couple of years back.

To address this, I posted an initial RFC a few weeks ago [1], which aimed
at solving the problem above by removing the ugly driver registration
@probe_time machinery, while also generalizing the probe sequence logic
enough to allow for multiple instance probing also for Optee/Virtio
transports, thing that proves to be useful in some testing setup.

While the series at [1] solves/removes the register_at_probe_time issue, it
is still experimental (i.e. buggy/crappy) at generalizing the probing logic
to allow for multiple instances...

Then I was dragged away by something else and the series remains stalled...

I will see if I can improve that RFC series in a sensible way by the end
of the merge window, if not, I will instead post a distinct series to
address/remove the register_at_probe_time issue, by simply re-introducing
the -EPROBE_DEFER logic.

Thanks,
Cristian

[1] https://lore.kernel.org/arm-scmi/abviCkBeX9uzsgxy@pluto/T/#mcfe17ffaac86fb84ef8917782d5e34b8789f4d3f

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-04-15 14:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 23:04 [GIT PULL] Driver core changes for 7.0-rc1 Danilo Krummrich
2026-02-12  3:58 ` pr-tracker-bot
2026-03-01  7:44 ` Linus Torvalds
2026-03-01  7:56   ` Linus Torvalds
2026-03-01 13:01   ` Gary Guo
2026-03-01 13:04   ` Danilo Krummrich
2026-03-01 18:17     ` Linus Torvalds
2026-03-01 20:21       ` Danilo Krummrich
2026-03-01 21:01         ` Linus Torvalds
2026-03-02 19:19           ` Danilo Krummrich
2026-03-01 18:20     ` Danilo Krummrich
2026-04-14 18:39   ` Uwe Kleine-König
2026-04-14 23:04     ` Danilo Krummrich
2026-04-15  9:48       ` Probe function registering another driver [Was: Re: [GIT PULL] Driver core changes for 7.0-rc1] Uwe Kleine-König
2026-04-15 14:16         ` Cristian Marussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox