rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
@ 2025-02-17 14:04 Alexandre Courbot
  2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
                   ` (5 more replies)
  0 siblings, 6 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-17 14:04 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Hi everyone,

This short RFC is based on top of Danilo's initial driver stub series
[1] and has for goal to initiate discussions and hopefully some design
decisions using the simplest subdevice of the GPU (the timer) as an
example, before implementing more devices allowing the GPU
initialization sequence to progress (Falcon being the logical next step
so we can get the GSP rolling).

It is kept simple and short for that purpose, and to avoid bumping into
a wall with much more device code because my assumptions were incorrect.

This is my first time trying to write Rust kernel code, and some of my
questions below are probably due to me not understanding yet how to use
the core kernel interfaces. So before going further I thought it would
make sense to raise the most obvious questions that came to my mind
while writing this draft:

- Where and how to store subdevices. The timer device is currently a
  direct member of the GPU structure. It might work for GSP devices
  which are IIUC supposed to have at least a few fixed devices required
  to bring the GSP up ; but as a general rule this probably won't scale
  as not all subdevices are present on all GPU variants, or in the same
  numbers. So we will probably need to find an equivalent to the
  `subdev` linked list in Nouveau.

- BAR sharing between subdevices. Right now each subdevice gets access
  to the full BAR range. I am wondering whether we could not split it
  into the relevant slices for each-subdevice, and transfer ownership of
  each slice to the device that is supposed to use it. That way each
  register would have a single owner, which is arguably safer - but
  maybe not as flexible as we will need down the road?

- On a related note, since the BAR is behind a Devres its availability
  must first be secured before any hardware access using try_access().
  Doing this on a per-register or per-operation basis looks overkill, so
  all methods that access the BAR take a reference to it, allowing to
  call try_access() from the highest-level caller and thus reducing the
  number of times this needs to be performed. Doing so comes at the cost
  of an extra argument to most subdevice methods ; but also with the
  benefit that we don't need to put the BAR behind another Arc and share
  it across all subdevices. I don't know which design is better here,
  and input would be very welcome.

- We will probably need sometime like a `Subdevice` trait or something
  down the road, but I'll wait until we have more than one subdevice to
  think about it.

The first 2 patches are small additions to the core Rust modules, that
the following patches make use of and which might be useful for other
drivers as well. The last patch is the naive implementation of the timer
device. I don't expect it to stay this way at all, so please point out
all the deficiencies in this very early code! :)

[1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr@kernel.org/

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Alexandre Courbot (3):
      rust: add useful ops for u64
      rust: make ETIMEDOUT error available
      gpu: nova-core: add basic timer device

 drivers/gpu/nova-core/driver.rs    |  4 +-
 drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
 drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
 rust/kernel/error.rs               |  1 +
 rust/kernel/lib.rs                 |  1 +
 rust/kernel/num.rs                 | 32 ++++++++++++++
 8 files changed, 206 insertions(+), 2 deletions(-)
---
base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
change-id: 20250216-nova_timer-c69430184f54

Best regards,
-- 
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
@ 2025-02-17 14:04 ` Alexandre Courbot
  2025-02-17 20:47   ` Sergio González Collado
                     ` (2 more replies)
  2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-17 14:04 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

It is common to build a u64 from its high and low parts obtained from
two 32-bit registers. Conversely, it is also common to split a u64 into
two u32s to write them into registers. Add an extension trait for u64
that implement these methods in a new `num` module.

It is expected that this trait will be extended with other useful
operations, and similar extension traits implemented for other types.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -59,6 +59,7 @@
 pub mod miscdevice;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod num;
 pub mod of;
 pub mod page;
 #[cfg(CONFIG_PCI)]
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+/// Useful operations for `u64`.
+pub trait U64Ext {
+    /// Build a `u64` by combining its `high` and `low` parts.
+    ///
+    /// ```
+    /// use kernel::num::U64Ext;
+    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
+    /// ```
+    fn from_u32s(high: u32, low: u32) -> Self;
+
+    /// Returns the `(high, low)` u32s that constitute `self`.
+    ///
+    /// ```
+    /// use kernel::num::U64Ext;
+    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
+    /// ```
+    fn into_u32s(self) -> (u32, u32);
+}
+
+impl U64Ext for u64 {
+    fn from_u32s(high: u32, low: u32) -> Self {
+        ((high as u64) << u32::BITS) | low as u64
+    }
+
+    fn into_u32s(self) -> (u32, u32) {
+        ((self >> u32::BITS) as u32, self as u32)
+    }
+}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH RFC 2/3] rust: make ETIMEDOUT error available
  2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
  2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
@ 2025-02-17 14:04 ` Alexandre Courbot
  2025-02-17 21:15   ` Daniel Almeida
  2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-17 14:04 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/error.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index f6ecf09cb65f4ebe9b88da68b3830ae79aa4f182..8858eb13b3df674b54572d2a371b8ec1303492dd 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -64,6 +64,7 @@ macro_rules! declare_err {
     declare_err!(EPIPE, "Broken pipe.");
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
+    declare_err!(ETIMEDOUT, "Connection timed out.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH RFC 3/3] gpu: nova-core: add basic timer device
  2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
  2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
  2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
@ 2025-02-17 14:04 ` Alexandre Courbot
  2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-17 14:04 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add a basic timer device and exercise it during device probing. This
first draft is probably very questionable.

One point in particular which should IMHO receive attention: the generic
wait_on() method aims at providing similar functionality to Nouveau's
nvkm_[num]sec() macros. Since this method will be heavily used with
different conditions to test, I'd like to avoid monomorphizing it
entirely with each instance ; that's something that is achieved in
nvkm_xsec() using functions that the macros invoke.

I have tried achieving the same result in Rust using closures (kept
as-is in the current code), but they seem to be monomorphized as well.
Calling extra functions could work better, but looks also less elegant
to me, so I am really open to suggestions here.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs    |  4 +-
 drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
 drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x9500;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
@@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>
             GFP_KERNEL,
         )?;
 
+        let _ = this.gpu.test_timer();
+
         Ok(this)
     }
 }
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index e7da6a2fa29d82e9624ba8baa2c7281f38cb3133..2fbf4041f6d421583636c7bede449c3416272301 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,12 +1,16 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use kernel::device::Device;
+use kernel::types::ARef;
 use kernel::{
     device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*, str::CString,
 };
 
 use crate::driver::Bar0;
 use crate::regs;
+use crate::timer::Timer;
 use core::fmt;
+use core::time::Duration;
 
 /// Enum representation of the GPU chipset.
 #[derive(fmt::Debug)]
@@ -165,10 +169,12 @@ fn new(dev: &device::Device, spec: &Spec, ver: &str) -> Result<Firmware> {
 /// Structure holding the resources required to operate the GPU.
 #[pin_data]
 pub(crate) struct Gpu {
+    dev: ARef<Device>,
     spec: Spec,
     /// MMIO mapping of PCI BAR 0
     bar: Devres<Bar0>,
     fw: Firmware,
+    timer: Timer,
 }
 
 impl Gpu {
@@ -184,6 +190,33 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<
             spec.revision
         );
 
-        Ok(pin_init!(Self { spec, bar, fw }))
+        let dev = pdev.as_ref().into();
+        let timer = Timer::new();
+
+        Ok(pin_init!(Self {
+            dev,
+            spec,
+            bar,
+            fw,
+            timer,
+        }))
+    }
+
+    pub(crate) fn test_timer(&self) -> Result<()> {
+        let bar = self.bar.try_access().ok_or(ENXIO)?;
+
+        dev_info!(&self.dev, "testing timer subdev\n");
+        assert!(matches!(
+            self.timer
+                .wait_on(&bar, Duration::from_millis(10), || Some(())),
+            Ok(())
+        ));
+        assert_eq!(
+            self.timer
+                .wait_on(&bar, Duration::from_millis(10), || Option::<()>::None),
+            Err(ETIMEDOUT)
+        );
+
+        Ok(())
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 5d0230042793dae97026146e94f3cdb31ba20642..94b165a340ddfffd448f87cd82200391de075806 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -5,6 +5,7 @@
 mod driver;
 mod gpu;
 mod regs;
+mod timer;
 
 kernel::module_pci_driver! {
     type: driver::NovaCore,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index f2766f95e9d1eeab6734b18525fe504e1e7ea587..5127cc3454c047d64b7aaf599d8bf5f63a08bdfe 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -54,3 +54,46 @@ pub(crate) fn major_rev(&self) -> u8 {
         ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as u8
     }
 }
+
+const PTIMER_TIME_0: usize = 0x00009400;
+const PTIMER_TIME_1: usize = 0x00009410;
+
+#[derive(Copy, Clone, PartialEq, Eq)]
+pub(crate) struct PtimerTime0(u32);
+
+impl PtimerTime0 {
+    #[inline]
+    pub(crate) fn read(bar: &Bar0) -> Self {
+        Self(bar.readl(PTIMER_TIME_0))
+    }
+
+    #[inline]
+    pub(crate) fn write(bar: &Bar0, val: u32) {
+        bar.writel(val, PTIMER_TIME_0)
+    }
+
+    #[inline]
+    pub(crate) fn lo(&self) -> u32 {
+        self.0
+    }
+}
+
+#[derive(Copy, Clone, PartialEq, Eq)]
+pub(crate) struct PtimerTime1(u32);
+
+impl PtimerTime1 {
+    #[inline]
+    pub(crate) fn read(bar: &Bar0) -> Self {
+        Self(bar.readl(PTIMER_TIME_1))
+    }
+
+    #[inline]
+    pub(crate) fn write(bar: &Bar0, val: u32) {
+        bar.writel(val, PTIMER_TIME_1)
+    }
+
+    #[inline]
+    pub(crate) fn hi(&self) -> u32 {
+        self.0
+    }
+}
diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
new file mode 100644
index 0000000000000000000000000000000000000000..f6a787d4fbdb90b3dc13e322d50da1c7f64818df
--- /dev/null
+++ b/drivers/gpu/nova-core/timer.rs
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Nova Core Timer subdevice
+
+use core::time::Duration;
+
+use kernel::num::U64Ext;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+pub(crate) struct Timer {}
+
+impl Timer {
+    pub(crate) fn new() -> Self {
+        Self {}
+    }
+
+    pub(crate) fn read(bar: &Bar0) -> u64 {
+        loop {
+            let hi = regs::PtimerTime1::read(bar);
+            let lo = regs::PtimerTime0::read(bar);
+
+            if hi == regs::PtimerTime1::read(bar) {
+                return u64::from_u32s(hi.hi(), lo.lo());
+            }
+        }
+    }
+
+    #[allow(dead_code)]
+    pub(crate) fn time(bar: &Bar0, time: u64) {
+        let (hi, lo) = time.into_u32s();
+
+        regs::PtimerTime1::write(bar, hi);
+        regs::PtimerTime0::write(bar, lo);
+    }
+
+    /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
+    ///
+    /// When `cond` evaluates to `Some`, its return value is returned.
+    ///
+    /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
+    /// `Some`, or if the timer device is stuck for some reason.
+    pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
+        &self,
+        bar: &Bar0,
+        timeout: Duration,
+        cond: F,
+    ) -> Result<R> {
+        // Number of consecutive time reads after which we consider the timer frozen if it hasn't
+        // moved forward.
+        const MAX_STALLED_READS: usize = 16;
+
+        let (mut cur_time, mut prev_time, deadline) = (|| {
+            let cur_time = Timer::read(bar);
+            let deadline =
+                cur_time.saturating_add(u64::try_from(timeout.as_nanos()).unwrap_or(u64::MAX));
+
+            (cur_time, cur_time, deadline)
+        })();
+        let mut num_reads = 0;
+
+        loop {
+            if let Some(ret) = cond() {
+                return Ok(ret);
+            }
+
+            (|| {
+                cur_time = Timer::read(bar);
+
+                /* Check if the timer is frozen for some reason. */
+                if cur_time == prev_time {
+                    if num_reads >= MAX_STALLED_READS {
+                        return Err(ETIMEDOUT);
+                    }
+                    num_reads += 1;
+                } else {
+                    if cur_time >= deadline {
+                        return Err(ETIMEDOUT);
+                    }
+
+                    num_reads = 0;
+                    prev_time = cur_time;
+                }
+
+                Ok(())
+            })()?;
+        }
+    }
+}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
                   ` (2 preceding siblings ...)
  2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
@ 2025-02-17 15:48 ` Simona Vetter
  2025-02-18  8:07   ` Greg KH
  2025-02-17 21:33 ` Danilo Krummrich
  2025-02-18  1:42 ` Dave Airlie
  5 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-02-17 15:48 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot wrote:
> Hi everyone,
> 
> This short RFC is based on top of Danilo's initial driver stub series
> [1] and has for goal to initiate discussions and hopefully some design
> decisions using the simplest subdevice of the GPU (the timer) as an
> example, before implementing more devices allowing the GPU
> initialization sequence to progress (Falcon being the logical next step
> so we can get the GSP rolling).
> 
> It is kept simple and short for that purpose, and to avoid bumping into
> a wall with much more device code because my assumptions were incorrect.
> 
> This is my first time trying to write Rust kernel code, and some of my
> questions below are probably due to me not understanding yet how to use
> the core kernel interfaces. So before going further I thought it would
> make sense to raise the most obvious questions that came to my mind
> while writing this draft:
> 
> - Where and how to store subdevices. The timer device is currently a
>   direct member of the GPU structure. It might work for GSP devices
>   which are IIUC supposed to have at least a few fixed devices required
>   to bring the GSP up ; but as a general rule this probably won't scale
>   as not all subdevices are present on all GPU variants, or in the same
>   numbers. So we will probably need to find an equivalent to the
>   `subdev` linked list in Nouveau.
> 
> - BAR sharing between subdevices. Right now each subdevice gets access
>   to the full BAR range. I am wondering whether we could not split it
>   into the relevant slices for each-subdevice, and transfer ownership of
>   each slice to the device that is supposed to use it. That way each
>   register would have a single owner, which is arguably safer - but
>   maybe not as flexible as we will need down the road?
> 
> - On a related note, since the BAR is behind a Devres its availability
>   must first be secured before any hardware access using try_access().
>   Doing this on a per-register or per-operation basis looks overkill, so
>   all methods that access the BAR take a reference to it, allowing to
>   call try_access() from the highest-level caller and thus reducing the
>   number of times this needs to be performed. Doing so comes at the cost
>   of an extra argument to most subdevice methods ; but also with the
>   benefit that we don't need to put the BAR behind another Arc and share
>   it across all subdevices. I don't know which design is better here,
>   and input would be very welcome.
> 
> - We will probably need sometime like a `Subdevice` trait or something
>   down the road, but I'll wait until we have more than one subdevice to
>   think about it.

It might make sense to go with a full-blown aux bus. Gives you a lot of
structures and answers to these questions, but also might be way too much.

So just throwing this a consideration in here.
-Sima

> 
> The first 2 patches are small additions to the core Rust modules, that
> the following patches make use of and which might be useful for other
> drivers as well. The last patch is the naive implementation of the timer
> device. I don't expect it to stay this way at all, so please point out
> all the deficiencies in this very early code! :)
> 
> [1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr@kernel.org/
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> Alexandre Courbot (3):
>       rust: add useful ops for u64
>       rust: make ETIMEDOUT error available
>       gpu: nova-core: add basic timer device
> 
>  drivers/gpu/nova-core/driver.rs    |  4 +-
>  drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
>  drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
>  rust/kernel/error.rs               |  1 +
>  rust/kernel/lib.rs                 |  1 +
>  rust/kernel/num.rs                 | 32 ++++++++++++++
>  8 files changed, 206 insertions(+), 2 deletions(-)
> ---
> base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
> change-id: 20250216-nova_timer-c69430184f54
> 
> Best regards,
> -- 
> Alexandre Courbot <acourbot@nvidia.com>
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
@ 2025-02-17 20:47   ` Sergio González Collado
  2025-02-17 21:10   ` Daniel Almeida
  2025-02-18 10:07   ` Dirk Behme
  2 siblings, 0 replies; 104+ messages in thread
From: Sergio González Collado @ 2025-02-17 20:47 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Mon, 17 Feb 2025 at 15:07, Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> It is common to build a u64 from its high and low parts obtained from
> two 32-bit registers. Conversely, it is also common to split a u64 into
> two u32s to write them into registers. Add an extension trait for u64
> that implement these methods in a new `num` module.
>
> It is expected that this trait will be extended with other useful
> operations, and similar extension traits implemented for other types.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  rust/kernel/lib.rs |  1 +
>  rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
>
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -59,6 +59,7 @@
>  pub mod miscdevice;
>  #[cfg(CONFIG_NET)]
>  pub mod net;
> +pub mod num;
>  pub mod of;
>  pub mod page;
>  #[cfg(CONFIG_PCI)]
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
> --- /dev/null
> +++ b/rust/kernel/num.rs
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Numerical and binary utilities for primitive types.
> +
> +/// Useful operations for `u64`.
> +pub trait U64Ext {
> +    /// Build a `u64` by combining its `high` and `low` parts.
> +    ///
> +    /// ```
> +    /// use kernel::num::U64Ext;
> +    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
> +    /// ```
> +    fn from_u32s(high: u32, low: u32) -> Self;
> +
> +    /// Returns the `(high, low)` u32s that constitute `self`.
> +    ///
> +    /// ```
> +    /// use kernel::num::U64Ext;
> +    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
> +    /// ```
> +    fn into_u32s(self) -> (u32, u32);
> +}
> +
> +impl U64Ext for u64 {
> +    fn from_u32s(high: u32, low: u32) -> Self {
> +        ((high as u64) << u32::BITS) | low as u64
> +    }
> +
> +    fn into_u32s(self) -> (u32, u32) {
> +        ((self >> u32::BITS) as u32, self as u32)
> +    }
> +}
>
> --
> 2.48.1
>
>
Looks good :)
Reviewed-by: Sergio González Collado <sergio.collado@gmail.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
  2025-02-17 20:47   ` Sergio González Collado
@ 2025-02-17 21:10   ` Daniel Almeida
  2025-02-18 13:16     ` Alexandre Courbot
  2025-02-18 10:07   ` Dirk Behme
  2 siblings, 1 reply; 104+ messages in thread
From: Daniel Almeida @ 2025-02-17 21:10 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Alex, 

> On 17 Feb 2025, at 11:04, Alexandre Courbot <acourbot@nvidia.com> wrote:
> 
> It is common to build a u64 from its high and low parts obtained from
> two 32-bit registers. Conversely, it is also common to split a u64 into
> two u32s to write them into registers. Add an extension trait for u64
> that implement these methods in a new `num` module.

Thank you for working on that. I find myself doing this manually extremely often indeed.


> 
> It is expected that this trait will be extended with other useful
> operations, and similar extension traits implemented for other types.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> rust/kernel/lib.rs |  1 +
> rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
> 2 files changed, 33 insertions(+)
> 
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -59,6 +59,7 @@
> pub mod miscdevice;
> #[cfg(CONFIG_NET)]
> pub mod net;
> +pub mod num;
> pub mod of;
> pub mod page;
> #[cfg(CONFIG_PCI)]
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
> --- /dev/null
> +++ b/rust/kernel/num.rs
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Numerical and binary utilities for primitive types.
> +
> +/// Useful operations for `u64`.
> +pub trait U64Ext {
> +    /// Build a `u64` by combining its `high` and `low` parts.
> +    ///
> +    /// ```
> +    /// use kernel::num::U64Ext;
> +    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
> +    /// ```
> +    fn from_u32s(high: u32, low: u32) -> Self;
> +
> +    /// Returns the `(high, low)` u32s that constitute `self`.
> +    ///
> +    /// ```
> +    /// use kernel::num::U64Ext;
> +    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
> +    /// ```
> +    fn into_u32s(self) -> (u32, u32);
> +}
> +
> +impl U64Ext for u64 {
> +    fn from_u32s(high: u32, low: u32) -> Self {
> +        ((high as u64) << u32::BITS) | low as u64
> +    }
> +
> +    fn into_u32s(self) -> (u32, u32) {

I wonder if a struct would make more sense here.

Just recently I had to debug an issue where I forgot the
right order for code I had just written. Something like:

let (pgcount, pgsize) = foo(); where the function actually
returned (pgsize, pgcount).

A proper struct with `high` and `low` might be more verbose, but
it rules out this issue.

> +        ((self >> u32::BITS) as u32, self as u32)
> +    }
> +}
> 
> -- 
> 2.48.1
> 

— Daniel
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 2/3] rust: make ETIMEDOUT error available
  2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
@ 2025-02-17 21:15   ` Daniel Almeida
  0 siblings, 0 replies; 104+ messages in thread
From: Daniel Almeida @ 2025-02-17 21:15 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Alex,

> On 17 Feb 2025, at 11:04, Alexandre Courbot <acourbot@nvidia.com> wrote:
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> rust/kernel/error.rs | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
> index f6ecf09cb65f4ebe9b88da68b3830ae79aa4f182..8858eb13b3df674b54572d2a371b8ec1303492dd 100644
> --- a/rust/kernel/error.rs
> +++ b/rust/kernel/error.rs
> @@ -64,6 +64,7 @@ macro_rules! declare_err {
>     declare_err!(EPIPE, "Broken pipe.");
>     declare_err!(EDOM, "Math argument out of domain of func.");
>     declare_err!(ERANGE, "Math result not representable.");
> +    declare_err!(ETIMEDOUT, "Connection timed out.");
>     declare_err!(ERESTARTSYS, "Restart the system call.");
>     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
>     declare_err!(ERESTARTNOHAND, "Restart if no handler.");
> 
> -- 
> 2.48.1
> 
> 

FYI this is a conflict with https://lore.kernel.org/rust-for-linux/20250207132623.168854-8-fujita.tomonori@gmail.com/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
                   ` (3 preceding siblings ...)
  2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
@ 2025-02-17 21:33 ` Danilo Krummrich
  2025-02-18  1:46   ` Dave Airlie
  2025-02-18 13:35   ` Alexandre Courbot
  2025-02-18  1:42 ` Dave Airlie
  5 siblings, 2 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-17 21:33 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: David Airlie, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel

Hi Alex,

On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot wrote:
> Hi everyone,
> 
> This short RFC is based on top of Danilo's initial driver stub series
> [1] and has for goal to initiate discussions and hopefully some design
> decisions using the simplest subdevice of the GPU (the timer) as an
> example, before implementing more devices allowing the GPU
> initialization sequence to progress (Falcon being the logical next step
> so we can get the GSP rolling).
> 
> It is kept simple and short for that purpose, and to avoid bumping into
> a wall with much more device code because my assumptions were incorrect.
> 
> This is my first time trying to write Rust kernel code, and some of my
> questions below are probably due to me not understanding yet how to use
> the core kernel interfaces. So before going further I thought it would
> make sense to raise the most obvious questions that came to my mind
> while writing this draft:

Thanks for sending this RFC, that makes a lot of sense.

It's great to see you picking up work on Nova and Rust in the kernel in general!

One nit: For the future, please make sure to copy in the folks listed under the
RUST entry in the maintainers file explicitly.

> 
> - Where and how to store subdevices. The timer device is currently a
>   direct member of the GPU structure. It might work for GSP devices
>   which are IIUC supposed to have at least a few fixed devices required
>   to bring the GSP up ; but as a general rule this probably won't scale
>   as not all subdevices are present on all GPU variants, or in the same
>   numbers. So we will probably need to find an equivalent to the
>   `subdev` linked list in Nouveau.

Hm...I think a Vec should probably do the job for this. Once we know the
chipset, we know the exact topology of subdevices too.

> 
> - BAR sharing between subdevices. Right now each subdevice gets access
>   to the full BAR range. I am wondering whether we could not split it
>   into the relevant slices for each-subdevice, and transfer ownership of
>   each slice to the device that is supposed to use it. That way each
>   register would have a single owner, which is arguably safer - but
>   maybe not as flexible as we will need down the road?

I think for self-contained subdevices we can easily add an abstraction for
pci_iomap_range() to pci::Device. I considered doing that from the get-go, but
then decided to wait until we have some actual use for that.

For where we have to share a mapping of the same set of registers between
multiple structures, I think we have to embedd in into an Arc (unfortunately,
we can't re-use the inner Arc of Devres for that).

An alternative would be to request a whole new mapping, i.e. Devres<pci::Bar>
instance, but that includes an inner Arc anyways and, hence, is more costly.

> 
> - On a related note, since the BAR is behind a Devres its availability
>   must first be secured before any hardware access using try_access().
>   Doing this on a per-register or per-operation basis looks overkill, so
>   all methods that access the BAR take a reference to it, allowing to
>   call try_access() from the highest-level caller and thus reducing the
>   number of times this needs to be performed. Doing so comes at the cost
>   of an extra argument to most subdevice methods ; but also with the
>   benefit that we don't need to put the BAR behind another Arc and share
>   it across all subdevices. I don't know which design is better here,
>   and input would be very welcome.

I'm not sure I understand you correctly, because what you describe here seem to
be two different things to me.

1. How to avoid unnecessary calls to try_access().

This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
think we can just call try_access() once and then propage the guard through the
callchain, where necessary.

2. Share the MMIO mapping between subdevices.

This is where I can follow. How does 1. help with that? How are 1. and 2.
related?

> 
> - We will probably need sometime like a `Subdevice` trait or something
>   down the road, but I'll wait until we have more than one subdevice to
>   think about it.

Yeah, that sounds reasonable.

> 
> The first 2 patches are small additions to the core Rust modules, that
> the following patches make use of and which might be useful for other
> drivers as well. The last patch is the naive implementation of the timer
> device. I don't expect it to stay this way at all, so please point out
> all the deficiencies in this very early code! :)
> 
> [1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr@kernel.org/
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> Alexandre Courbot (3):
>       rust: add useful ops for u64
>       rust: make ETIMEDOUT error available
>       gpu: nova-core: add basic timer device
> 
>  drivers/gpu/nova-core/driver.rs    |  4 +-
>  drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
>  drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
>  rust/kernel/error.rs               |  1 +
>  rust/kernel/lib.rs                 |  1 +
>  rust/kernel/num.rs                 | 32 ++++++++++++++
>  8 files changed, 206 insertions(+), 2 deletions(-)
> ---
> base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
> change-id: 20250216-nova_timer-c69430184f54
> 
> Best regards,
> -- 
> Alexandre Courbot <acourbot@nvidia.com>
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
                   ` (4 preceding siblings ...)
  2025-02-17 21:33 ` Danilo Krummrich
@ 2025-02-18  1:42 ` Dave Airlie
  2025-02-18 13:47   ` Alexandre Courbot
  5 siblings, 1 reply; 104+ messages in thread
From: Dave Airlie @ 2025-02-18  1:42 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel

On Tue, 18 Feb 2025 at 00:04, Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> Hi everyone,
>
> This short RFC is based on top of Danilo's initial driver stub series
> [1] and has for goal to initiate discussions and hopefully some design
> decisions using the simplest subdevice of the GPU (the timer) as an
> example, before implementing more devices allowing the GPU
> initialization sequence to progress (Falcon being the logical next step
> so we can get the GSP rolling).
>
> It is kept simple and short for that purpose, and to avoid bumping into
> a wall with much more device code because my assumptions were incorrect.
>
> This is my first time trying to write Rust kernel code, and some of my
> questions below are probably due to me not understanding yet how to use
> the core kernel interfaces. So before going further I thought it would
> make sense to raise the most obvious questions that came to my mind
> while writing this draft:
>
> - Where and how to store subdevices. The timer device is currently a
>   direct member of the GPU structure. It might work for GSP devices
>   which are IIUC supposed to have at least a few fixed devices required
>   to bring the GSP up ; but as a general rule this probably won't scale
>   as not all subdevices are present on all GPU variants, or in the same
>   numbers. So we will probably need to find an equivalent to the
>   `subdev` linked list in Nouveau.

I deliberately avoided doing this.

My reasoning is this isn't like nouveau, where we control a bunch of
devices, we have one mission, bring up GSP, if that entails a bunch of
fixed function blocks being setup in a particular order then let's
just deal with that.

If things become optional later we can move to Option<> or just have a
completely new path. But in those cases I'd make the Option
<TuringGSPBootDevices> rather than Option<Sec2>, Option<NVDEC>, etc,
but I think we need to look at the boot requirements on other GSP
devices to know.

I just don't see any case where we need to iterate over the subdevices
in any form of list that makes sense and doesn't lead to the nouveau
design which is a pain in the ass to tease out any sense of ordering
or hierarchy.

Just be explicit, boot the devices you need in the order you need to
get GSP up and running.

>
> - BAR sharing between subdevices. Right now each subdevice gets access
>   to the full BAR range. I am wondering whether we could not split it
>   into the relevant slices for each-subdevice, and transfer ownership of
>   each slice to the device that is supposed to use it. That way each
>   register would have a single owner, which is arguably safer - but
>   maybe not as flexible as we will need down the road?

This could be useful, again the mission is mostly not to be hitting
registers since GSP will deal with it, the only case I know that it
won't work is, the GSP CPU sequencer code gets a script from the
device, the script tells you to directly hit registers, with no real
bounds checking, so I don't know if this is practical.

>
> - On a related note, since the BAR is behind a Devres its availability
>   must first be secured before any hardware access using try_access().
>   Doing this on a per-register or per-operation basis looks overkill, so
>   all methods that access the BAR take a reference to it, allowing to
>   call try_access() from the highest-level caller and thus reducing the
>   number of times this needs to be performed. Doing so comes at the cost
>   of an extra argument to most subdevice methods ; but also with the
>   benefit that we don't need to put the BAR behind another Arc and share
>   it across all subdevices. I don't know which design is better here,
>   and input would be very welcome.

We can't pass down the bar, because it takes a devres lock and it
interferes with lockdep ordering, even hanging onto devres too long
caused me lockdep issues.

We should only be doing try access on registers that are runtime
sized, is this a lot of them? Do we expect to be hitting a lot of
registers in an actual fast path?

> - We will probably need sometime like a `Subdevice` trait or something
>   down the road, but I'll wait until we have more than one subdevice to
>   think about it.

Again I'm kinda opposed to this idea, I don't think it buys anything,
with GSP we just want to boot, after that we never touch most of the
subdevices ever again.

Dave.

>
> The first 2 patches are small additions to the core Rust modules, that
> the following patches make use of and which might be useful for other
> drivers as well. The last patch is the naive implementation of the timer
> device. I don't expect it to stay this way at all, so please point out
> all the deficiencies in this very early code! :)
>
> [1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr@kernel.org/
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> Alexandre Courbot (3):
>       rust: add useful ops for u64
>       rust: make ETIMEDOUT error available
>       gpu: nova-core: add basic timer device
>
>  drivers/gpu/nova-core/driver.rs    |  4 +-
>  drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
>  drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
>  rust/kernel/error.rs               |  1 +
>  rust/kernel/lib.rs                 |  1 +
>  rust/kernel/num.rs                 | 32 ++++++++++++++
>  8 files changed, 206 insertions(+), 2 deletions(-)
> ---
> base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
> change-id: 20250216-nova_timer-c69430184f54
>
> Best regards,
> --
> Alexandre Courbot <acourbot@nvidia.com>
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-17 21:33 ` Danilo Krummrich
@ 2025-02-18  1:46   ` Dave Airlie
  2025-02-18 10:26     ` Danilo Krummrich
  2025-02-24  1:40     ` Alexandre Courbot
  2025-02-18 13:35   ` Alexandre Courbot
  1 sibling, 2 replies; 104+ messages in thread
From: Dave Airlie @ 2025-02-18  1:46 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel

> 1. How to avoid unnecessary calls to try_access().
>
> This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
> think we can just call try_access() once and then propage the guard through the
> callchain, where necessary.

Nope, you can't do that, RevocableGuard holds a lock and things
explode badly in lockdep if you do.

[ 39.960247] =============================
[ 39.960265] [ BUG: Invalid wait context ]
[ 39.960282] 6.12.0-rc2+ #151 Not tainted
[ 39.960298] -----------------------------
[ 39.960316] modprobe/2006 is trying to lock:
[ 39.960335] ffffa08dd7783a68
(drivers/gpu/nova-core/gsp/sharedq.rs:259){....}-{3:3}, at:
_RNvMs0_NtNtCs6v51TV2h8sK_6nova_c3gsp7sharedqNtB5_26GSPSharedQueuesr535_113_018rpc_push+0x34/0x4c0
[nova_core]
[ 39.960413] other info that might help us debug this:
[ 39.960434] context-{4:4}
[ 39.960447] 2 locks held by modprobe/2006:
[ 39.960465] #0: ffffa08dc27581b0 (&dev->mutex){....}-{3:3}, at:
__driver_attach+0x111/0x260
[ 39.960505] #1: ffffffffad55ac10 (rcu_read_lock){....}-{1:2}, at:
rust_helper_rcu_read_lock+0x11/0x80
[ 39.960545] stack backtrace:
[ 39.960559] CPU: 8 UID: 0 PID: 2006 Comm: modprobe Not tainted 6.12.0-rc2+ #151
[ 39.960586] Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 6231 08/31/2024
[ 39.960618] Call Trace:
[ 39.960632] <TASK>

was one time I didn't drop a revocable before proceeding to do other things,

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
@ 2025-02-18  8:07   ` Greg KH
  2025-02-18 13:23     ` Alexandre Courbot
  0 siblings, 1 reply; 104+ messages in thread
From: Greg KH @ 2025-02-18  8:07 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, David Airlie, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel

On Mon, Feb 17, 2025 at 04:48:13PM +0100, Simona Vetter wrote:
> On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot wrote:
> > Hi everyone,
> > 
> > This short RFC is based on top of Danilo's initial driver stub series
> > [1] and has for goal to initiate discussions and hopefully some design
> > decisions using the simplest subdevice of the GPU (the timer) as an
> > example, before implementing more devices allowing the GPU
> > initialization sequence to progress (Falcon being the logical next step
> > so we can get the GSP rolling).
> > 
> > It is kept simple and short for that purpose, and to avoid bumping into
> > a wall with much more device code because my assumptions were incorrect.
> > 
> > This is my first time trying to write Rust kernel code, and some of my
> > questions below are probably due to me not understanding yet how to use
> > the core kernel interfaces. So before going further I thought it would
> > make sense to raise the most obvious questions that came to my mind
> > while writing this draft:
> > 
> > - Where and how to store subdevices. The timer device is currently a
> >   direct member of the GPU structure. It might work for GSP devices
> >   which are IIUC supposed to have at least a few fixed devices required
> >   to bring the GSP up ; but as a general rule this probably won't scale
> >   as not all subdevices are present on all GPU variants, or in the same
> >   numbers. So we will probably need to find an equivalent to the
> >   `subdev` linked list in Nouveau.
> > 
> > - BAR sharing between subdevices. Right now each subdevice gets access
> >   to the full BAR range. I am wondering whether we could not split it
> >   into the relevant slices for each-subdevice, and transfer ownership of
> >   each slice to the device that is supposed to use it. That way each
> >   register would have a single owner, which is arguably safer - but
> >   maybe not as flexible as we will need down the road?
> > 
> > - On a related note, since the BAR is behind a Devres its availability
> >   must first be secured before any hardware access using try_access().
> >   Doing this on a per-register or per-operation basis looks overkill, so
> >   all methods that access the BAR take a reference to it, allowing to
> >   call try_access() from the highest-level caller and thus reducing the
> >   number of times this needs to be performed. Doing so comes at the cost
> >   of an extra argument to most subdevice methods ; but also with the
> >   benefit that we don't need to put the BAR behind another Arc and share
> >   it across all subdevices. I don't know which design is better here,
> >   and input would be very welcome.
> > 
> > - We will probably need sometime like a `Subdevice` trait or something
> >   down the road, but I'll wait until we have more than one subdevice to
> >   think about it.
> 
> It might make sense to go with a full-blown aux bus. Gives you a lot of
> structures and answers to these questions, but also might be way too much.

No, it's not too much, that's exactly what the auxbus code is for
(splitting a real device into child ones where they all share the same
physical resources.)  So good suggestion.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
  2025-02-17 20:47   ` Sergio González Collado
  2025-02-17 21:10   ` Daniel Almeida
@ 2025-02-18 10:07   ` Dirk Behme
  2025-02-18 13:07     ` Alexandre Courbot
  2 siblings, 1 reply; 104+ messages in thread
From: Dirk Behme @ 2025-02-18 10:07 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, David Airlie, John Hubbard,
	Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel

On 17/02/2025 15:04, Alexandre Courbot wrote:
> It is common to build a u64 from its high and low parts obtained from
> two 32-bit registers. Conversely, it is also common to split a u64 into
> two u32s to write them into registers. Add an extension trait for u64
> that implement these methods in a new `num` module.
> 
> It is expected that this trait will be extended with other useful
> operations, and similar extension traits implemented for other types.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  rust/kernel/lib.rs |  1 +
>  rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -59,6 +59,7 @@
>  pub mod miscdevice;
>  #[cfg(CONFIG_NET)]
>  pub mod net;
> +pub mod num;
>  pub mod of;
>  pub mod page;
>  #[cfg(CONFIG_PCI)]
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
> --- /dev/null
> +++ b/rust/kernel/num.rs
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Numerical and binary utilities for primitive types.
> +
> +/// Useful operations for `u64`.
> +pub trait U64Ext {
> +    /// Build a `u64` by combining its `high` and `low` parts.
> +    ///
> +    /// ```
> +    /// use kernel::num::U64Ext;
> +    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
> +    /// ```
> +    fn from_u32s(high: u32, low: u32) -> Self;
> +
> +    /// Returns the `(high, low)` u32s that constitute `self`.
> +    ///
> +    /// ```
> +    /// use kernel::num::U64Ext;
> +    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
> +    /// ```
> +    fn into_u32s(self) -> (u32, u32);
> +}
> +
> +impl U64Ext for u64 {
> +    fn from_u32s(high: u32, low: u32) -> Self {
> +        ((high as u64) << u32::BITS) | low as u64
> +    }
> +
> +    fn into_u32s(self) -> (u32, u32) {
> +        ((self >> u32::BITS) as u32, self as u32)
> +    }
> +}
Just as a question: Would it make sense to make this more generic?

For example

u64 -> u32, u32 / u32, u32 -> u64 (as done here)
u32 -> u16, u16 / u16, u16 -> u32
u16 -> u8, u8 / u8, u8 -> u16

Additionally, I wonder if this might be combined with the Integer trait
[1]? But the usize and signed ones might not make sense here...

Dirk

[1] E.g.

https://github.com/senekor/linux/commit/7291dcc98e8ab74e34c1600784ec9ff3e2fa32d0


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-18  1:46   ` Dave Airlie
@ 2025-02-18 10:26     ` Danilo Krummrich
  2025-02-19 12:58       ` Simona Vetter
  2025-02-24  1:40     ` Alexandre Courbot
  1 sibling, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-18 10:26 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Alexandre Courbot, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel

On Tue, Feb 18, 2025 at 11:46:26AM +1000, Dave Airlie wrote:
> > 1. How to avoid unnecessary calls to try_access().
> >
> > This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
> > think we can just call try_access() once and then propage the guard through the
> > callchain, where necessary.
> 
> Nope, you can't do that, RevocableGuard holds a lock and things
> explode badly in lockdep if you do.

Yes, try_access() marks the begin of an RCU read side critical section. Hence,
sections holding the guard should be kept as short as possible.

What I meant is that for a series of I/O operations we can still pass the guard
to subsequent functions doing the actual I/O ops.

More generally, I also thought about whether we should also provide an SRCU
variant of Revocable and hence Devres. Maybe we even want to replace it with
SRCU entirely to ensure that drivers can't stall the RCU grace period for too
long by accident.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-18 10:07   ` Dirk Behme
@ 2025-02-18 13:07     ` Alexandre Courbot
  2025-02-20  6:23       ` Dirk Behme
  0 siblings, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-18 13:07 UTC (permalink / raw)
  To: Dirk Behme, Danilo Krummrich, David Airlie, John Hubbard,
	Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue Feb 18, 2025 at 7:07 PM JST, Dirk Behme wrote:
> On 17/02/2025 15:04, Alexandre Courbot wrote:
>> It is common to build a u64 from its high and low parts obtained from
>> two 32-bit registers. Conversely, it is also common to split a u64 into
>> two u32s to write them into registers. Add an extension trait for u64
>> that implement these methods in a new `num` module.
>> 
>> It is expected that this trait will be extended with other useful
>> operations, and similar extension traits implemented for other types.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  rust/kernel/lib.rs |  1 +
>>  rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
>>  2 files changed, 33 insertions(+)
>> 
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -59,6 +59,7 @@
>>  pub mod miscdevice;
>>  #[cfg(CONFIG_NET)]
>>  pub mod net;
>> +pub mod num;
>>  pub mod of;
>>  pub mod page;
>>  #[cfg(CONFIG_PCI)]
>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
>> --- /dev/null
>> +++ b/rust/kernel/num.rs
>> @@ -0,0 +1,32 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Numerical and binary utilities for primitive types.
>> +
>> +/// Useful operations for `u64`.
>> +pub trait U64Ext {
>> +    /// Build a `u64` by combining its `high` and `low` parts.
>> +    ///
>> +    /// ```
>> +    /// use kernel::num::U64Ext;
>> +    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
>> +    /// ```
>> +    fn from_u32s(high: u32, low: u32) -> Self;
>> +
>> +    /// Returns the `(high, low)` u32s that constitute `self`.
>> +    ///
>> +    /// ```
>> +    /// use kernel::num::U64Ext;
>> +    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
>> +    /// ```
>> +    fn into_u32s(self) -> (u32, u32);
>> +}
>> +
>> +impl U64Ext for u64 {
>> +    fn from_u32s(high: u32, low: u32) -> Self {
>> +        ((high as u64) << u32::BITS) | low as u64
>> +    }
>> +
>> +    fn into_u32s(self) -> (u32, u32) {
>> +        ((self >> u32::BITS) as u32, self as u32)
>> +    }
>> +}
> Just as a question: Would it make sense to make this more generic?
>
> For example
>
> u64 -> u32, u32 / u32, u32 -> u64 (as done here)
> u32 -> u16, u16 / u16, u16 -> u32
> u16 -> u8, u8 / u8, u8 -> u16
>
> Additionally, I wonder if this might be combined with the Integer trait
> [1]? But the usize and signed ones might not make sense here...
>
> Dirk
>
> [1] E.g.
>
> https://github.com/senekor/linux/commit/7291dcc98e8ab74e34c1600784ec9ff3e2fa32d0

I agree something more generic would be nice. One drawback I see though
is that it would have to use more generic (and lengthy) method names -
i.e. `from_components(u32, u32)` instead of `from_u32s`.

I quickly tried to write a completely generic trait where the methods
are auto-implemented from constants and associated types, but got stuck
by the impossibility to use `as` in that context without a macro.

Regardless, I was looking for an already existing trait/module to
leverage instead of introducing a whole new one, maybe the one you
linked is what I was looking for?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-17 21:10   ` Daniel Almeida
@ 2025-02-18 13:16     ` Alexandre Courbot
  2025-02-18 20:51       ` Timur Tabi
  0 siblings, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-18 13:16 UTC (permalink / raw)
  To: Daniel Almeida
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Daniel!

On Tue Feb 18, 2025 at 6:10 AM JST, Daniel Almeida wrote:
> Hi Alex, 
>
>> On 17 Feb 2025, at 11:04, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> 
>> It is common to build a u64 from its high and low parts obtained from
>> two 32-bit registers. Conversely, it is also common to split a u64 into
>> two u32s to write them into registers. Add an extension trait for u64
>> that implement these methods in a new `num` module.
>
> Thank you for working on that. I find myself doing this manually extremely often indeed.

Are you aware of existing upstream code that could benefit from this?
This would allow me to split that patch out of this series.

>
>
>> 
>> It is expected that this trait will be extended with other useful
>> operations, and similar extension traits implemented for other types.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> rust/kernel/lib.rs |  1 +
>> rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
>> 2 files changed, 33 insertions(+)
>> 
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -59,6 +59,7 @@
>> pub mod miscdevice;
>> #[cfg(CONFIG_NET)]
>> pub mod net;
>> +pub mod num;
>> pub mod of;
>> pub mod page;
>> #[cfg(CONFIG_PCI)]
>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
>> --- /dev/null
>> +++ b/rust/kernel/num.rs
>> @@ -0,0 +1,32 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Numerical and binary utilities for primitive types.
>> +
>> +/// Useful operations for `u64`.
>> +pub trait U64Ext {
>> +    /// Build a `u64` by combining its `high` and `low` parts.
>> +    ///
>> +    /// ```
>> +    /// use kernel::num::U64Ext;
>> +    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
>> +    /// ```
>> +    fn from_u32s(high: u32, low: u32) -> Self;
>> +
>> +    /// Returns the `(high, low)` u32s that constitute `self`.
>> +    ///
>> +    /// ```
>> +    /// use kernel::num::U64Ext;
>> +    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
>> +    /// ```
>> +    fn into_u32s(self) -> (u32, u32);
>> +}
>> +
>> +impl U64Ext for u64 {
>> +    fn from_u32s(high: u32, low: u32) -> Self {
>> +        ((high as u64) << u32::BITS) | low as u64
>> +    }
>> +
>> +    fn into_u32s(self) -> (u32, u32) {
>
> I wonder if a struct would make more sense here.
>
> Just recently I had to debug an issue where I forgot the
> right order for code I had just written. Something like:
>
> let (pgcount, pgsize) = foo(); where the function actually
> returned (pgsize, pgcount).
>
> A proper struct with `high` and `low` might be more verbose, but
> it rules out this issue.

Mmm indeed, so we would have client code looking like:

  let SplitU64 { high, low } = some_u64.into_u32();

instead of 

  let (high, low) = some_u64.into_u32();

which is correct, and

  let (low, high) = some_u64.into_u32();

which is incorrect, but is likely to not be caught.

Since the point of these methods is to avoid potential errors in what
otherwise appears to be trivial code, I agree it would be better to
avoid introducing a new trap because the elements of the returned tuple
are not clearly named. Not sure which is more idiomatic here.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-18  8:07   ` Greg KH
@ 2025-02-18 13:23     ` Alexandre Courbot
  0 siblings, 0 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Greg KH, Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue Feb 18, 2025 at 5:07 PM JST, Greg KH wrote:
> On Mon, Feb 17, 2025 at 04:48:13PM +0100, Simona Vetter wrote:
>> On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot wrote:
>> > Hi everyone,
>> > 
>> > This short RFC is based on top of Danilo's initial driver stub series
>> > [1] and has for goal to initiate discussions and hopefully some design
>> > decisions using the simplest subdevice of the GPU (the timer) as an
>> > example, before implementing more devices allowing the GPU
>> > initialization sequence to progress (Falcon being the logical next step
>> > so we can get the GSP rolling).
>> > 
>> > It is kept simple and short for that purpose, and to avoid bumping into
>> > a wall with much more device code because my assumptions were incorrect.
>> > 
>> > This is my first time trying to write Rust kernel code, and some of my
>> > questions below are probably due to me not understanding yet how to use
>> > the core kernel interfaces. So before going further I thought it would
>> > make sense to raise the most obvious questions that came to my mind
>> > while writing this draft:
>> > 
>> > - Where and how to store subdevices. The timer device is currently a
>> >   direct member of the GPU structure. It might work for GSP devices
>> >   which are IIUC supposed to have at least a few fixed devices required
>> >   to bring the GSP up ; but as a general rule this probably won't scale
>> >   as not all subdevices are present on all GPU variants, or in the same
>> >   numbers. So we will probably need to find an equivalent to the
>> >   `subdev` linked list in Nouveau.
>> > 
>> > - BAR sharing between subdevices. Right now each subdevice gets access
>> >   to the full BAR range. I am wondering whether we could not split it
>> >   into the relevant slices for each-subdevice, and transfer ownership of
>> >   each slice to the device that is supposed to use it. That way each
>> >   register would have a single owner, which is arguably safer - but
>> >   maybe not as flexible as we will need down the road?
>> > 
>> > - On a related note, since the BAR is behind a Devres its availability
>> >   must first be secured before any hardware access using try_access().
>> >   Doing this on a per-register or per-operation basis looks overkill, so
>> >   all methods that access the BAR take a reference to it, allowing to
>> >   call try_access() from the highest-level caller and thus reducing the
>> >   number of times this needs to be performed. Doing so comes at the cost
>> >   of an extra argument to most subdevice methods ; but also with the
>> >   benefit that we don't need to put the BAR behind another Arc and share
>> >   it across all subdevices. I don't know which design is better here,
>> >   and input would be very welcome.
>> > 
>> > - We will probably need sometime like a `Subdevice` trait or something
>> >   down the road, but I'll wait until we have more than one subdevice to
>> >   think about it.
>> 
>> It might make sense to go with a full-blown aux bus. Gives you a lot of
>> structures and answers to these questions, but also might be way too much.
>
> No, it's not too much, that's exactly what the auxbus code is for
> (splitting a real device into child ones where they all share the same
> physical resources.)  So good suggestion.

Dave's comments have somehow convinced me that we probably won't need to
do something as complex as I initially planned, so hopefully it won't
come to that. :) But thanks for the suggestion, I'll keep it in mind
just in case.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-17 21:33 ` Danilo Krummrich
  2025-02-18  1:46   ` Dave Airlie
@ 2025-02-18 13:35   ` Alexandre Courbot
  1 sibling, 0 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-18 13:35 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: David Airlie, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel

Hi Danilo,

On Tue Feb 18, 2025 at 6:33 AM JST, Danilo Krummrich wrote:
> Hi Alex,
>
> On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot wrote:
>> Hi everyone,
>> 
>> This short RFC is based on top of Danilo's initial driver stub series
>> [1] and has for goal to initiate discussions and hopefully some design
>> decisions using the simplest subdevice of the GPU (the timer) as an
>> example, before implementing more devices allowing the GPU
>> initialization sequence to progress (Falcon being the logical next step
>> so we can get the GSP rolling).
>> 
>> It is kept simple and short for that purpose, and to avoid bumping into
>> a wall with much more device code because my assumptions were incorrect.
>> 
>> This is my first time trying to write Rust kernel code, and some of my
>> questions below are probably due to me not understanding yet how to use
>> the core kernel interfaces. So before going further I thought it would
>> make sense to raise the most obvious questions that came to my mind
>> while writing this draft:
>
> Thanks for sending this RFC, that makes a lot of sense.
>
> It's great to see you picking up work on Nova and Rust in the kernel in general!
>
> One nit: For the future, please make sure to copy in the folks listed under the
> RUST entry in the maintainers file explicitly.

Ack. I tend to get nervous as the list of recipients increases and
reduce it to the people I believe will be strictly interested, but will
refrain from doing that in the future.

>
>> 
>> - Where and how to store subdevices. The timer device is currently a
>>   direct member of the GPU structure. It might work for GSP devices
>>   which are IIUC supposed to have at least a few fixed devices required
>>   to bring the GSP up ; but as a general rule this probably won't scale
>>   as not all subdevices are present on all GPU variants, or in the same
>>   numbers. So we will probably need to find an equivalent to the
>>   `subdev` linked list in Nouveau.
>
> Hm...I think a Vec should probably do the job for this. Once we know the
> chipset, we know the exact topology of subdevices too.
>
>> 
>> - BAR sharing between subdevices. Right now each subdevice gets access
>>   to the full BAR range. I am wondering whether we could not split it
>>   into the relevant slices for each-subdevice, and transfer ownership of
>>   each slice to the device that is supposed to use it. That way each
>>   register would have a single owner, which is arguably safer - but
>>   maybe not as flexible as we will need down the road?
>
> I think for self-contained subdevices we can easily add an abstraction for
> pci_iomap_range() to pci::Device. I considered doing that from the get-go, but
> then decided to wait until we have some actual use for that.

Yeah, my comment was just to check whether this makes sense at all, we
can definitely live without it for now. Would be a nice safety addition
though IMHO.

>
> For where we have to share a mapping of the same set of registers between
> multiple structures, I think we have to embedd in into an Arc (unfortunately,
> we can't re-use the inner Arc of Devres for that).
>
> An alternative would be to request a whole new mapping, i.e. Devres<pci::Bar>
> instance, but that includes an inner Arc anyways and, hence, is more costly.

Another way could be to make the owning subdevice itself implement the
required functionality through a method that other devices could call as
needed.

>
>> 
>> - On a related note, since the BAR is behind a Devres its availability
>>   must first be secured before any hardware access using try_access().
>>   Doing this on a per-register or per-operation basis looks overkill, so
>>   all methods that access the BAR take a reference to it, allowing to
>>   call try_access() from the highest-level caller and thus reducing the
>>   number of times this needs to be performed. Doing so comes at the cost
>>   of an extra argument to most subdevice methods ; but also with the
>>   benefit that we don't need to put the BAR behind another Arc and share
>>   it across all subdevices. I don't know which design is better here,
>>   and input would be very welcome.
>
> I'm not sure I understand you correctly, because what you describe here seem to
> be two different things to me.
>
> 1. How to avoid unnecessary calls to try_access().
>
> This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
> think we can just call try_access() once and then propage the guard through the
> callchain, where necessary.
>
> 2. Share the MMIO mapping between subdevices.
>
> This is where I can follow. How does 1. help with that? How are 1. and 2.
> related?

The idea was that by having the Gpu device secure access to the Bar and
pass a reference to the guard (or to the Bar itself since the guard
implements Deref, as I mentioned in [1]), we can avoid the repeated
calls to try_access() AND "share" the Bar between all the subdevices
through an argument instead of e.g. wrapping it into another Arc that
each subdevice would store.

But as Dave pointed out, it looks like this won't work in practice
anyway, so we can probably drop that idea...

[1] https://lore.kernel.org/nouveau/D7Q79WJZSFEK.R9BX1V85SV1Z@nvidia.com/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-18  1:42 ` Dave Airlie
@ 2025-02-18 13:47   ` Alexandre Courbot
  0 siblings, 0 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-18 13:47 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Danilo Krummrich, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel

Hi Dave,

On Tue Feb 18, 2025 at 10:42 AM JST, Dave Airlie wrote:
> On Tue, 18 Feb 2025 at 00:04, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> Hi everyone,
>>
>> This short RFC is based on top of Danilo's initial driver stub series
>> [1] and has for goal to initiate discussions and hopefully some design
>> decisions using the simplest subdevice of the GPU (the timer) as an
>> example, before implementing more devices allowing the GPU
>> initialization sequence to progress (Falcon being the logical next step
>> so we can get the GSP rolling).
>>
>> It is kept simple and short for that purpose, and to avoid bumping into
>> a wall with much more device code because my assumptions were incorrect.
>>
>> This is my first time trying to write Rust kernel code, and some of my
>> questions below are probably due to me not understanding yet how to use
>> the core kernel interfaces. So before going further I thought it would
>> make sense to raise the most obvious questions that came to my mind
>> while writing this draft:
>>
>> - Where and how to store subdevices. The timer device is currently a
>>   direct member of the GPU structure. It might work for GSP devices
>>   which are IIUC supposed to have at least a few fixed devices required
>>   to bring the GSP up ; but as a general rule this probably won't scale
>>   as not all subdevices are present on all GPU variants, or in the same
>>   numbers. So we will probably need to find an equivalent to the
>>   `subdev` linked list in Nouveau.
>
> I deliberately avoided doing this.
>
> My reasoning is this isn't like nouveau, where we control a bunch of
> devices, we have one mission, bring up GSP, if that entails a bunch of
> fixed function blocks being setup in a particular order then let's
> just deal with that.
>
> If things become optional later we can move to Option<> or just have a
> completely new path. But in those cases I'd make the Option
> <TuringGSPBootDevices> rather than Option<Sec2>, Option<NVDEC>, etc,
> but I think we need to look at the boot requirements on other GSP
> devices to know.
>
> I just don't see any case where we need to iterate over the subdevices
> in any form of list that makes sense and doesn't lead to the nouveau
> design which is a pain in the ass to tease out any sense of ordering
> or hierarchy.
>
> Just be explicit, boot the devices you need in the order you need to
> get GSP up and running.

Right, I was looking too far ahead and lost track of the fact that our
main purpose is to get the GSP to run. For that a fixed set of devices
should do the job just fine.

I still suspect that at some point we will need to keep some kernel-side
state for the functions supported by the GSP thus will have to introduce
more flexibility, but let's think about it when we arrive there. Core
GSP boot + communication is fixed.

>
>>
>> - BAR sharing between subdevices. Right now each subdevice gets access
>>   to the full BAR range. I am wondering whether we could not split it
>>   into the relevant slices for each-subdevice, and transfer ownership of
>>   each slice to the device that is supposed to use it. That way each
>>   register would have a single owner, which is arguably safer - but
>>   maybe not as flexible as we will need down the road?
>
> This could be useful, again the mission is mostly not to be hitting
> registers since GSP will deal with it, the only case I know that it
> won't work is, the GSP CPU sequencer code gets a script from the
> device, the script tells you to directly hit registers, with no real
> bounds checking, so I don't know if this is practical.
>
>>
>> - On a related note, since the BAR is behind a Devres its availability
>>   must first be secured before any hardware access using try_access().
>>   Doing this on a per-register or per-operation basis looks overkill, so
>>   all methods that access the BAR take a reference to it, allowing to
>>   call try_access() from the highest-level caller and thus reducing the
>>   number of times this needs to be performed. Doing so comes at the cost
>>   of an extra argument to most subdevice methods ; but also with the
>>   benefit that we don't need to put the BAR behind another Arc and share
>>   it across all subdevices. I don't know which design is better here,
>>   and input would be very welcome.
>
> We can't pass down the bar, because it takes a devres lock and it
> interferes with lockdep ordering, even hanging onto devres too long
> caused me lockdep issues.
>
> We should only be doing try access on registers that are runtime
> sized, is this a lot of them? Do we expect to be hitting a lot of
> registers in an actual fast path?

For this particular driver, I don't think we do. But other drivers will
probably also have this issue, and the challenge for them will to be
find the right granularity at which to invoke try_access(). My worry
being that this can explode down the road without warning if invoked at
the wrong place, which is the kind of behavior we are trying to avoid by
introducing Rust.

>
>> - We will probably need sometime like a `Subdevice` trait or something
>>   down the road, but I'll wait until we have more than one subdevice to
>>   think about it.
>
> Again I'm kinda opposed to this idea, I don't think it buys anything,
> with GSP we just want to boot, after that we never touch most of the
> subdevices ever again.

Yep, it's definitely not the thing we need to worry at the moment.
Minimal static set of devices, and let's get the GSP running. :)

Thanks,
Alex.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-18 13:16     ` Alexandre Courbot
@ 2025-02-18 20:51       ` Timur Tabi
  2025-02-19  1:21         ` Alexandre Courbot
  0 siblings, 1 reply; 104+ messages in thread
From: Timur Tabi @ 2025-02-18 20:51 UTC (permalink / raw)
  To: Alexandre Courbot, daniel.almeida@collabora.com
  Cc: John Hubbard, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org,
	nouveau@lists.freedesktop.org, dakr@kernel.org, airlied@gmail.com,
	Ben Skeggs

On Tue, 2025-02-18 at 22:16 +0900, Alexandre Courbot wrote:
> > A proper struct with `high` and `low` might be more verbose, but
> > it rules out this issue.
> 
> Mmm indeed, so we would have client code looking like:
> 
>   let SplitU64 { high, low } = some_u64.into_u32();
> 
> instead of 
> 
>   let (high, low) = some_u64.into_u32();
> 
> which is correct, and
> 
>   let (low, high) = some_u64.into_u32();
> 
> which is incorrect, but is likely to not be caught.

I'm new to Rust, so let me see if I get this right.

struct SplitU64 {
  high: u32,
  low: u32
}

So if you want to extract the upper 32 bits of a u64, you have to do this:

let split = some_u64.into_u32s();
let some_u32 = split.high;

as opposed to your original design:

let (some_u32, _) = some_u64.into_u32s();

Personally, I prefer the latter.  The other advantage is that into_u32s and
from_u32s are reciprocal:

assert_eq!(u64::from_u32s(u64::into_u32s(some_u64)), some_u64);

(or something like that)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-18 20:51       ` Timur Tabi
@ 2025-02-19  1:21         ` Alexandre Courbot
  2025-02-19  3:24           ` John Hubbard
  2025-02-19 20:11           ` Sergio González Collado
  0 siblings, 2 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-19  1:21 UTC (permalink / raw)
  To: Timur Tabi, Alexandre Courbot, daniel.almeida@collabora.com
  Cc: John Hubbard, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org,
	nouveau@lists.freedesktop.org, dakr@kernel.org, airlied@gmail.com,
	Ben Skeggs

On Wed Feb 19, 2025 at 5:51 AM JST, Timur Tabi wrote:
> On Tue, 2025-02-18 at 22:16 +0900, Alexandre Courbot wrote:
>> > A proper struct with `high` and `low` might be more verbose, but
>> > it rules out this issue.
>> 
>> Mmm indeed, so we would have client code looking like:
>> 
>>   let SplitU64 { high, low } = some_u64.into_u32();
>> 
>> instead of 
>> 
>>   let (high, low) = some_u64.into_u32();
>> 
>> which is correct, and
>> 
>>   let (low, high) = some_u64.into_u32();
>> 
>> which is incorrect, but is likely to not be caught.
>
> I'm new to Rust, so let me see if I get this right.
>
> struct SplitU64 {
>   high: u32,
>   low: u32
> }
>
> So if you want to extract the upper 32 bits of a u64, you have to do this:
>
> let split = some_u64.into_u32s();
> let some_u32 = split.high;

More likely this would be something like:

  let SplitU64 { high: some_u32, .. } = some_u64;

Which is still a bit verbose, but a single-liner.

Actually. How about adding methods to this trait that return either
component?

  let some_u32 = some_u64.high_half();
  let another_u32 = some_u64.low_half();

These should be used most of the times, and using destructuring/tuple
would only be useful for a few select cases.

>
> as opposed to your original design:
>
> let (some_u32, _) = some_u64.into_u32s();
>
> Personally, I prefer the latter.  The other advantage is that into_u32s and
> from_u32s are reciprocal:
>
> assert_eq!(u64::from_u32s(u64::into_u32s(some_u64)), some_u64);
>
> (or something like that)

Yeah, having symmetry is definitely nice. OTOH there are no safeguards
against mixing up the order and the high and low components, so a
compromise will have to be made one way or the other. But if we also add
the methods I proposed above, that question should matter less.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19  1:21         ` Alexandre Courbot
@ 2025-02-19  3:24           ` John Hubbard
  2025-02-19 12:51             ` Alexandre Courbot
  2025-02-19 20:11           ` Sergio González Collado
  1 sibling, 1 reply; 104+ messages in thread
From: John Hubbard @ 2025-02-19  3:24 UTC (permalink / raw)
  To: Alexandre Courbot, Timur Tabi, daniel.almeida@collabora.com
  Cc: dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	dakr@kernel.org, airlied@gmail.com, Ben Skeggs

On 2/18/25 5:21 PM, Alexandre Courbot wrote:
> On Wed Feb 19, 2025 at 5:51 AM JST, Timur Tabi wrote:
>> On Tue, 2025-02-18 at 22:16 +0900, Alexandre Courbot wrote:
...
> More likely this would be something like:
> 
>    let SplitU64 { high: some_u32, .. } = some_u64;
> 
> Which is still a bit verbose, but a single-liner.
> 
> Actually. How about adding methods to this trait that return either
> component?
> 
>    let some_u32 = some_u64.high_half();
>    let another_u32 = some_u64.low_half();
> 
> These should be used most of the times, and using destructuring/tuple
> would only be useful for a few select cases.

I think I like this approach best so far, because that is actually how
drivers tend to use these values: one or the other 32 bits at a time.
Registers are often grouped into 32-bit named registers, and driver code
wants to refer to them one at a time (before breaking some of them down
into smaller named fields)>

The .high_half() and .low_half() approach matches that very closely.
And it's simpler to read than the SplitU64 API, without losing anything
we need, right?

  
thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19  3:24           ` John Hubbard
@ 2025-02-19 12:51             ` Alexandre Courbot
  2025-02-19 20:22               ` John Hubbard
  0 siblings, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-19 12:51 UTC (permalink / raw)
  To: John Hubbard, Alexandre Courbot, Timur Tabi,
	daniel.almeida@collabora.com
  Cc: dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	dakr@kernel.org, airlied@gmail.com, Ben Skeggs

On Wed Feb 19, 2025 at 12:24 PM JST, John Hubbard wrote:
> On 2/18/25 5:21 PM, Alexandre Courbot wrote:
>> On Wed Feb 19, 2025 at 5:51 AM JST, Timur Tabi wrote:
>>> On Tue, 2025-02-18 at 22:16 +0900, Alexandre Courbot wrote:
> ...
>> More likely this would be something like:
>> 
>>    let SplitU64 { high: some_u32, .. } = some_u64;
>> 
>> Which is still a bit verbose, but a single-liner.
>> 
>> Actually. How about adding methods to this trait that return either
>> component?
>> 
>>    let some_u32 = some_u64.high_half();
>>    let another_u32 = some_u64.low_half();
>> 
>> These should be used most of the times, and using destructuring/tuple
>> would only be useful for a few select cases.
>
> I think I like this approach best so far, because that is actually how
> drivers tend to use these values: one or the other 32 bits at a time.
> Registers are often grouped into 32-bit named registers, and driver code
> wants to refer to them one at a time (before breaking some of them down
> into smaller named fields)>
>
> The .high_half() and .low_half() approach matches that very closely.
> And it's simpler to read than the SplitU64 API, without losing anything
> we need, right?

Yes, that looks like the optimal way to do this actually. It also
doesn't introduce any overhead as the destructuring was doing both
high_half() and low_half() in sequence, so in some cases it might
even be more efficient.

I'd just like to find a better naming. high() and low() might be enough?
Or are there other suggestions?


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-18 10:26     ` Danilo Krummrich
@ 2025-02-19 12:58       ` Simona Vetter
  0 siblings, 0 replies; 104+ messages in thread
From: Simona Vetter @ 2025-02-19 12:58 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Dave Airlie, Alexandre Courbot, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue, Feb 18, 2025 at 11:26:01AM +0100, Danilo Krummrich wrote:
> On Tue, Feb 18, 2025 at 11:46:26AM +1000, Dave Airlie wrote:
> > > 1. How to avoid unnecessary calls to try_access().
> > >
> > > This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
> > > think we can just call try_access() once and then propage the guard through the
> > > callchain, where necessary.
> > 
> > Nope, you can't do that, RevocableGuard holds a lock and things
> > explode badly in lockdep if you do.
> 
> Yes, try_access() marks the begin of an RCU read side critical section. Hence,
> sections holding the guard should be kept as short as possible.
> 
> What I meant is that for a series of I/O operations we can still pass the guard
> to subsequent functions doing the actual I/O ops.
> 
> More generally, I also thought about whether we should also provide an SRCU
> variant of Revocable and hence Devres. Maybe we even want to replace it with
> SRCU entirely to ensure that drivers can't stall the RCU grace period for too
> long by accident.

The issue with srcu is that the revocation on driver unload or hotunplug
becomes unbound. Which is very, very uncool, and the fundamental issue
that also drm_dev_enter/exit() struggles with. So unless the kernel has a
concept of "bound-time mutex only, but not unbounded sleeps of any sort" I
think we should try really, really hard to introduce srcu revocables on
the rust side. And at least for mmio I don't think any driver needs more
than maybe some retry loops while holding a spinlock, which is fine under
standard rcu. It does mean that drivers need to have fairly fine-grained
fallible paths for dealing with revocable resources, but I think that's
also a good thing - mmio to an unplugged device times out and is really
slow, so doing too many of those isn't a great idea either.

Ofc on the C side of things the balance shits a lot, and we just want to
at least make "no uaf on driver hotunplug" something achievable and hence
are much more ok with the risk that it's just stuck forever or driver
calls take an undue amount of time until they've finished timing out
everything.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19  1:21         ` Alexandre Courbot
  2025-02-19  3:24           ` John Hubbard
@ 2025-02-19 20:11           ` Sergio González Collado
  1 sibling, 0 replies; 104+ messages in thread
From: Sergio González Collado @ 2025-02-19 20:11 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Timur Tabi, daniel.almeida@collabora.com, John Hubbard,
	dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	dakr@kernel.org, airlied@gmail.com, Ben Skeggs

> Actually. How about adding methods to this trait that return either
> component?
>
>   let some_u32 = some_u64.high_half();
>   let another_u32 = some_u64.low_half();
>
> These should be used most of the times, and using destructuring/tuple
> would only be useful for a few select cases.
>

Indeed very nice!

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19 12:51             ` Alexandre Courbot
@ 2025-02-19 20:22               ` John Hubbard
  2025-02-19 20:23                 ` Dave Airlie
  0 siblings, 1 reply; 104+ messages in thread
From: John Hubbard @ 2025-02-19 20:22 UTC (permalink / raw)
  To: Alexandre Courbot, Timur Tabi, daniel.almeida@collabora.com
  Cc: dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	dakr@kernel.org, airlied@gmail.com, Ben Skeggs

On 2/19/25 4:51 AM, Alexandre Courbot wrote:
> Yes, that looks like the optimal way to do this actually. It also
> doesn't introduce any overhead as the destructuring was doing both
> high_half() and low_half() in sequence, so in some cases it might
> even be more efficient.
> 
> I'd just like to find a better naming. high() and low() might be enough?
> Or are there other suggestions?
> 

Maybe use "32" instead of "half":

     .high_32()  / .low_32()
     .upper_32() / .lower_32()


thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19 20:22               ` John Hubbard
@ 2025-02-19 20:23                 ` Dave Airlie
  2025-02-19 23:13                   ` Daniel Almeida
  0 siblings, 1 reply; 104+ messages in thread
From: Dave Airlie @ 2025-02-19 20:23 UTC (permalink / raw)
  To: John Hubbard
  Cc: Alexandre Courbot, Timur Tabi, daniel.almeida@collabora.com,
	dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	dakr@kernel.org, Ben Skeggs

On Thu, 20 Feb 2025 at 06:22, John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 2/19/25 4:51 AM, Alexandre Courbot wrote:
> > Yes, that looks like the optimal way to do this actually. It also
> > doesn't introduce any overhead as the destructuring was doing both
> > high_half() and low_half() in sequence, so in some cases it might
> > even be more efficient.
> >
> > I'd just like to find a better naming. high() and low() might be enough?
> > Or are there other suggestions?
> >
>
> Maybe use "32" instead of "half":
>
>      .high_32()  / .low_32()
>      .upper_32() / .lower_32()
>

The C code currently does upper_32_bits and lower_32_bits, do we want
to align or diverge here?

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19 20:23                 ` Dave Airlie
@ 2025-02-19 23:13                   ` Daniel Almeida
  2025-02-20  0:14                     ` John Hubbard
  0 siblings, 1 reply; 104+ messages in thread
From: Daniel Almeida @ 2025-02-19 23:13 UTC (permalink / raw)
  To: Dave Airlie
  Cc: John Hubbard, Alexandre Courbot, Timur Tabi,
	dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	dakr@kernel.org, Ben Skeggs



> On 19 Feb 2025, at 17:23, Dave Airlie <airlied@gmail.com> wrote:
> 
> On Thu, 20 Feb 2025 at 06:22, John Hubbard <jhubbard@nvidia.com> wrote:
>> 
>> On 2/19/25 4:51 AM, Alexandre Courbot wrote:
>>> Yes, that looks like the optimal way to do this actually. It also
>>> doesn't introduce any overhead as the destructuring was doing both
>>> high_half() and low_half() in sequence, so in some cases it might
>>> even be more efficient.
>>> 
>>> I'd just like to find a better naming. high() and low() might be enough?
>>> Or are there other suggestions?
>>> 
>> 
>> Maybe use "32" instead of "half":
>> 
>>     .high_32()  / .low_32()
>>     .upper_32() / .lower_32()
>> 
> 
> The C code currently does upper_32_bits and lower_32_bits, do we want
> to align or diverge here?
> 
> Dave.


My humble suggestion here is to use the same nomenclature. `upper_32_bits` and
`lower_32_bits` immediately and succinctly informs the reader of what is going on.

— Daniel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-19 23:13                   ` Daniel Almeida
@ 2025-02-20  0:14                     ` John Hubbard
  2025-02-21 11:35                       ` Alexandre Courbot
  0 siblings, 1 reply; 104+ messages in thread
From: John Hubbard @ 2025-02-20  0:14 UTC (permalink / raw)
  To: Daniel Almeida, Dave Airlie
  Cc: Alexandre Courbot, Timur Tabi, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org,
	nouveau@lists.freedesktop.org, dakr@kernel.org, Ben Skeggs

On 2/19/25 3:13 PM, Daniel Almeida wrote:
>> On 19 Feb 2025, at 17:23, Dave Airlie <airlied@gmail.com> wrote:
>> On Thu, 20 Feb 2025 at 06:22, John Hubbard <jhubbard@nvidia.com> wrote:
>>> On 2/19/25 4:51 AM, Alexandre Courbot wrote:
>>>> Yes, that looks like the optimal way to do this actually. It also
>>>> doesn't introduce any overhead as the destructuring was doing both
>>>> high_half() and low_half() in sequence, so in some cases it might
>>>> even be more efficient.
>>>>
>>>> I'd just like to find a better naming. high() and low() might be enough?
>>>> Or are there other suggestions?
>>>>
>>>
>>> Maybe use "32" instead of "half":
>>>
>>>      .high_32()  / .low_32()
>>>      .upper_32() / .lower_32()
>>>
>>
>> The C code currently does upper_32_bits and lower_32_bits, do we want
>> to align or diverge here?

This sounds like a trick question, so I'm going to go with..."align". haha :)

>>
>> Dave.
> 
> 
> My humble suggestion here is to use the same nomenclature. `upper_32_bits` and
> `lower_32_bits` immediately and succinctly informs the reader of what is going on.
> 

Yes. I missed the pre-existing naming in C, but since we have it and it's
well-named as well, definitely this is the way to go.

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-18 13:07     ` Alexandre Courbot
@ 2025-02-20  6:23       ` Dirk Behme
  0 siblings, 0 replies; 104+ messages in thread
From: Dirk Behme @ 2025-02-20  6:23 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, David Airlie, John Hubbard,
	Ben Skeggs
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel

On 18/02/2025 14:07, Alexandre Courbot wrote:
> On Tue Feb 18, 2025 at 7:07 PM JST, Dirk Behme wrote:
>> On 17/02/2025 15:04, Alexandre Courbot wrote:
>>> It is common to build a u64 from its high and low parts obtained from
>>> two 32-bit registers. Conversely, it is also common to split a u64 into
>>> two u32s to write them into registers. Add an extension trait for u64
>>> that implement these methods in a new `num` module.
>>>
>>> It is expected that this trait will be extended with other useful
>>> operations, and similar extension traits implemented for other types.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>>  rust/kernel/lib.rs |  1 +
>>>  rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
>>>  2 files changed, 33 insertions(+)
>>>
>>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>>> index 496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77 100644
>>> --- a/rust/kernel/lib.rs
>>> +++ b/rust/kernel/lib.rs
>>> @@ -59,6 +59,7 @@
>>>  pub mod miscdevice;
>>>  #[cfg(CONFIG_NET)]
>>>  pub mod net;
>>> +pub mod num;
>>>  pub mod of;
>>>  pub mod page;
>>>  #[cfg(CONFIG_PCI)]
>>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
>>> --- /dev/null
>>> +++ b/rust/kernel/num.rs
>>> @@ -0,0 +1,32 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Numerical and binary utilities for primitive types.
>>> +
>>> +/// Useful operations for `u64`.
>>> +pub trait U64Ext {
>>> +    /// Build a `u64` by combining its `high` and `low` parts.
>>> +    ///
>>> +    /// ```
>>> +    /// use kernel::num::U64Ext;
>>> +    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
>>> +    /// ```
>>> +    fn from_u32s(high: u32, low: u32) -> Self;
>>> +
>>> +    /// Returns the `(high, low)` u32s that constitute `self`.
>>> +    ///
>>> +    /// ```
>>> +    /// use kernel::num::U64Ext;
>>> +    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567, 0x89abcdef));
>>> +    /// ```
>>> +    fn into_u32s(self) -> (u32, u32);
>>> +}
>>> +
>>> +impl U64Ext for u64 {
>>> +    fn from_u32s(high: u32, low: u32) -> Self {
>>> +        ((high as u64) << u32::BITS) | low as u64
>>> +    }
>>> +
>>> +    fn into_u32s(self) -> (u32, u32) {
>>> +        ((self >> u32::BITS) as u32, self as u32)
>>> +    }
>>> +}
>> Just as a question: Would it make sense to make this more generic?
>>
>> For example
>>
>> u64 -> u32, u32 / u32, u32 -> u64 (as done here)
>> u32 -> u16, u16 / u16, u16 -> u32
>> u16 -> u8, u8 / u8, u8 -> u16
>>
>> Additionally, I wonder if this might be combined with the Integer trait
>> [1]? But the usize and signed ones might not make sense here...
>>
>> Dirk
>>
>> [1] E.g.
>>
>> https://github.com/senekor/linux/commit/7291dcc98e8ab74e34c1600784ec9ff3e2fa32d0
> 
> I agree something more generic would be nice. One drawback I see though
> is that it would have to use more generic (and lengthy) method names -
> i.e. `from_components(u32, u32)` instead of `from_u32s`.
> 
> I quickly tried to write a completely generic trait where the methods
> are auto-implemented from constants and associated types, but got stuck
> by the impossibility to use `as` in that context without a macro.


Being inspired by the Integer trait example [1] above, just as an idea,
I wonder if anything like

impl_split_merge! {
    (u64, u32),
    (u32, u16),
    (u16, u8),
}

would be implementable?


> Regardless, I was looking for an already existing trait/module to
> leverage instead of introducing a whole new one, maybe the one you
> linked is what I was looking for?
Cheers,

Dirk

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-20  0:14                     ` John Hubbard
@ 2025-02-21 11:35                       ` Alexandre Courbot
  2025-02-21 12:31                         ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-21 11:35 UTC (permalink / raw)
  To: John Hubbard, Daniel Almeida, Dave Airlie
  Cc: Timur Tabi, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org,
	nouveau@lists.freedesktop.org, dakr@kernel.org, Ben Skeggs,
	Nouveau

On Thu Feb 20, 2025 at 9:14 AM JST, John Hubbard wrote:
> On 2/19/25 3:13 PM, Daniel Almeida wrote:
>>> On 19 Feb 2025, at 17:23, Dave Airlie <airlied@gmail.com> wrote:
>>> On Thu, 20 Feb 2025 at 06:22, John Hubbard <jhubbard@nvidia.com> wrote:
>>>> On 2/19/25 4:51 AM, Alexandre Courbot wrote:
>>>>> Yes, that looks like the optimal way to do this actually. It also
>>>>> doesn't introduce any overhead as the destructuring was doing both
>>>>> high_half() and low_half() in sequence, so in some cases it might
>>>>> even be more efficient.
>>>>>
>>>>> I'd just like to find a better naming. high() and low() might be enough?
>>>>> Or are there other suggestions?
>>>>>
>>>>
>>>> Maybe use "32" instead of "half":
>>>>
>>>>      .high_32()  / .low_32()
>>>>      .upper_32() / .lower_32()
>>>>
>>>
>>> The C code currently does upper_32_bits and lower_32_bits, do we want
>>> to align or diverge here?
>
> This sounds like a trick question, so I'm going to go with..."align". haha :)
>
>>>
>>> Dave.
>> 
>> 
>> My humble suggestion here is to use the same nomenclature. `upper_32_bits` and
>> `lower_32_bits` immediately and succinctly informs the reader of what is going on.
>> 
>
> Yes. I missed the pre-existing naming in C, but since we have it and it's
> well-named as well, definitely this is the way to go.

Agreed, I wasn't aware of the C equivalents either, but since they exist
we should definitely use the same naming scheme.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH RFC 1/3] rust: add useful ops for u64
  2025-02-21 11:35                       ` Alexandre Courbot
@ 2025-02-21 12:31                         ` Danilo Krummrich
  0 siblings, 0 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-21 12:31 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: John Hubbard, Daniel Almeida, Dave Airlie, Timur Tabi,
	dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org,
	Ben Skeggs, Nouveau

On Fri, Feb 21, 2025 at 08:35:54PM +0900, Alexandre Courbot wrote:
> On Thu Feb 20, 2025 at 9:14 AM JST, John Hubbard wrote:
> > On 2/19/25 3:13 PM, Daniel Almeida wrote:
> >>> On 19 Feb 2025, at 17:23, Dave Airlie <airlied@gmail.com> wrote:
> >>> On Thu, 20 Feb 2025 at 06:22, John Hubbard <jhubbard@nvidia.com> wrote:
> >>>> On 2/19/25 4:51 AM, Alexandre Courbot wrote:
> >>>>> Yes, that looks like the optimal way to do this actually. It also
> >>>>> doesn't introduce any overhead as the destructuring was doing both
> >>>>> high_half() and low_half() in sequence, so in some cases it might
> >>>>> even be more efficient.
> >>>>>
> >>>>> I'd just like to find a better naming. high() and low() might be enough?
> >>>>> Or are there other suggestions?
> >>>>>
> >>>>
> >>>> Maybe use "32" instead of "half":
> >>>>
> >>>>      .high_32()  / .low_32()
> >>>>      .upper_32() / .lower_32()
> >>>>
> >>>
> >>> The C code currently does upper_32_bits and lower_32_bits, do we want
> >>> to align or diverge here?
> >
> > This sounds like a trick question, so I'm going to go with..."align". haha :)
> >
> >>>
> >>> Dave.
> >> 
> >> 
> >> My humble suggestion here is to use the same nomenclature. `upper_32_bits` and
> >> `lower_32_bits` immediately and succinctly informs the reader of what is going on.
> >> 
> >
> > Yes. I missed the pre-existing naming in C, but since we have it and it's
> > well-named as well, definitely this is the way to go.
> 
> Agreed, I wasn't aware of the C equivalents either, but since they exist
> we should definitely use the same naming scheme.

IIUC, we're still talking about extending the u64 primitive type.

Hence, I think there is no necessity to do align with the corresponding C
nameing scheme. I think this would only be the case if we'd write an abstraction
for the C API.

In this case though we extend an existing Rust type, so we should do something
that aligns with the corresponding Rust type.

In this specific case I think it goes hand in hand though.

- Danilo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-18  1:46   ` Dave Airlie
  2025-02-18 10:26     ` Danilo Krummrich
@ 2025-02-24  1:40     ` Alexandre Courbot
  2025-02-24 12:07       ` Danilo Krummrich
  1 sibling, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-24  1:40 UTC (permalink / raw)
  To: Dave Airlie, Danilo Krummrich, Joel Fernandes, Boqun Feng
  Cc: John Hubbard, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel

On Tue Feb 18, 2025 at 10:46 AM JST, Dave Airlie wrote:
>> 1. How to avoid unnecessary calls to try_access().
>>
>> This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
>> think we can just call try_access() once and then propage the guard through the
>> callchain, where necessary.
>
> Nope, you can't do that, RevocableGuard holds a lock and things
> explode badly in lockdep if you do.
>
> [ 39.960247] =============================
> [ 39.960265] [ BUG: Invalid wait context ]
> [ 39.960282] 6.12.0-rc2+ #151 Not tainted
> [ 39.960298] -----------------------------
> [ 39.960316] modprobe/2006 is trying to lock:
> [ 39.960335] ffffa08dd7783a68
> (drivers/gpu/nova-core/gsp/sharedq.rs:259){....}-{3:3}, at:
> _RNvMs0_NtNtCs6v51TV2h8sK_6nova_c3gsp7sharedqNtB5_26GSPSharedQueuesr535_113_018rpc_push+0x34/0x4c0
> [nova_core]
> [ 39.960413] other info that might help us debug this:
> [ 39.960434] context-{4:4}
> [ 39.960447] 2 locks held by modprobe/2006:
> [ 39.960465] #0: ffffa08dc27581b0 (&dev->mutex){....}-{3:3}, at:
> __driver_attach+0x111/0x260
> [ 39.960505] #1: ffffffffad55ac10 (rcu_read_lock){....}-{1:2}, at:
> rust_helper_rcu_read_lock+0x11/0x80
> [ 39.960545] stack backtrace:
> [ 39.960559] CPU: 8 UID: 0 PID: 2006 Comm: modprobe Not tainted 6.12.0-rc2+ #151
> [ 39.960586] Hardware name: System manufacturer System Product
> Name/PRIME X370-PRO, BIOS 6231 08/31/2024
> [ 39.960618] Call Trace:
> [ 39.960632] <TASK>
>
> was one time I didn't drop a revocable before proceeding to do other things,

This inability to sleep while we are accessing registers seems very
constraining to me, if not dangerous. It is pretty common to have
functions intermingle hardware accesses with other operations that might
sleep, and this constraint means that in such cases the caller would
need to perform guard lifetime management manually:

  let bar_guard = bar.try_access()?;
  /* do something non-sleeping with bar_guard */
  drop(bar_guard);

  /* do something that might sleep */

  let bar_guard = bar.try_access()?;
  /* do something non-sleeping with bar_guard */
  drop(bar_guard);

  ...

Failure to drop the guard potentially introduces a race condition, which
will receive no compile-time warning and potentialy not even a runtime
one unless lockdep is enabled. This problem does not exist with the
equivalent C code AFAICT, which makes the Rust version actually more
error-prone and dangerous, the opposite of what we are trying to achieve
with Rust. Or am I missing something?


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-24  1:40     ` Alexandre Courbot
@ 2025-02-24 12:07       ` Danilo Krummrich
  2025-02-24 12:11         ` Danilo Krummrich
  2025-02-25 14:11         ` Alexandre Courbot
  0 siblings, 2 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-24 12:07 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel

CC: Gary

On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> This inability to sleep while we are accessing registers seems very
> constraining to me, if not dangerous. It is pretty common to have
> functions intermingle hardware accesses with other operations that might
> sleep, and this constraint means that in such cases the caller would
> need to perform guard lifetime management manually:
> 
>   let bar_guard = bar.try_access()?;
>   /* do something non-sleeping with bar_guard */
>   drop(bar_guard);
> 
>   /* do something that might sleep */
> 
>   let bar_guard = bar.try_access()?;
>   /* do something non-sleeping with bar_guard */
>   drop(bar_guard);
> 
>   ...
> 
> Failure to drop the guard potentially introduces a race condition, which
> will receive no compile-time warning and potentialy not even a runtime
> one unless lockdep is enabled. This problem does not exist with the
> equivalent C code AFAICT, which makes the Rust version actually more
> error-prone and dangerous, the opposite of what we are trying to achieve
> with Rust. Or am I missing something?

Generally you are right, but you have to see it from a different perspective.

What you describe is not an issue that comes from the design of the API, but is
a limitation of Rust in the kernel. People are aware of the issue and with klint
[1] there are solutions for that in the pipeline, see also [2] and [3].

[1] https://rust-for-linux.com/klint
[2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
[3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-24 12:07       ` Danilo Krummrich
@ 2025-02-24 12:11         ` Danilo Krummrich
  2025-02-24 18:45           ` Joel Fernandes
  2025-02-25 14:11         ` Alexandre Courbot
  1 sibling, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-24 12:11 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel

On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> CC: Gary
> 
> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> > This inability to sleep while we are accessing registers seems very
> > constraining to me, if not dangerous. It is pretty common to have
> > functions intermingle hardware accesses with other operations that might
> > sleep, and this constraint means that in such cases the caller would
> > need to perform guard lifetime management manually:
> > 
> >   let bar_guard = bar.try_access()?;
> >   /* do something non-sleeping with bar_guard */
> >   drop(bar_guard);
> > 
> >   /* do something that might sleep */
> > 
> >   let bar_guard = bar.try_access()?;
> >   /* do something non-sleeping with bar_guard */
> >   drop(bar_guard);
> > 
> >   ...
> > 
> > Failure to drop the guard potentially introduces a race condition, which
> > will receive no compile-time warning and potentialy not even a runtime
> > one unless lockdep is enabled. This problem does not exist with the
> > equivalent C code AFAICT

Without klint [1] it is exactly the same as in C, where I have to remember to
not call into something that might sleep from atomic context.

> > which makes the Rust version actually more
> > error-prone and dangerous, the opposite of what we are trying to achieve
> > with Rust. Or am I missing something?
> 
> Generally you are right, but you have to see it from a different perspective.
> 
> What you describe is not an issue that comes from the design of the API, but is
> a limitation of Rust in the kernel. People are aware of the issue and with klint
> [1] there are solutions for that in the pipeline, see also [2] and [3].
> 
> [1] https://rust-for-linux.com/klint
> [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-24 12:11         ` Danilo Krummrich
@ 2025-02-24 18:45           ` Joel Fernandes
  2025-02-24 23:44             ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Joel Fernandes @ 2025-02-24 18:45 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

Hi Danilo,

On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> > CC: Gary
> > 
> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> > > This inability to sleep while we are accessing registers seems very
> > > constraining to me, if not dangerous. It is pretty common to have
> > > functions intermingle hardware accesses with other operations that might
> > > sleep, and this constraint means that in such cases the caller would
> > > need to perform guard lifetime management manually:
> > > 
> > >   let bar_guard = bar.try_access()?;
> > >   /* do something non-sleeping with bar_guard */
> > >   drop(bar_guard);
> > > 
> > >   /* do something that might sleep */
> > > 
> > >   let bar_guard = bar.try_access()?;
> > >   /* do something non-sleeping with bar_guard */
> > >   drop(bar_guard);
> > > 
> > >   ...
> > > 
> > > Failure to drop the guard potentially introduces a race condition, which
> > > will receive no compile-time warning and potentialy not even a runtime
> > > one unless lockdep is enabled. This problem does not exist with the
> > > equivalent C code AFAICT
> 
> Without klint [1] it is exactly the same as in C, where I have to remember to
> not call into something that might sleep from atomic context.
>

Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
not sleeping?

I am fairly new to rust, could you help elaborate more about why these MMIO
accesses need to have RevocableGuard in Rust? What problem are we trying to
solve that C has but Rust doesn't with the aid of a RCU read-side section? I
vaguely understand we are trying to "wait for an MMIO access" using
synchronize here, but it is just a guest.

+Paul as well.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-24 18:45           ` Joel Fernandes
@ 2025-02-24 23:44             ` Danilo Krummrich
  2025-02-25 15:52               ` Joel Fernandes
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-24 23:44 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> Hi Danilo,
> 
> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> > On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> > > CC: Gary
> > > 
> > > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> > > > This inability to sleep while we are accessing registers seems very
> > > > constraining to me, if not dangerous. It is pretty common to have
> > > > functions intermingle hardware accesses with other operations that might
> > > > sleep, and this constraint means that in such cases the caller would
> > > > need to perform guard lifetime management manually:
> > > > 
> > > >   let bar_guard = bar.try_access()?;
> > > >   /* do something non-sleeping with bar_guard */
> > > >   drop(bar_guard);
> > > > 
> > > >   /* do something that might sleep */
> > > > 
> > > >   let bar_guard = bar.try_access()?;
> > > >   /* do something non-sleeping with bar_guard */
> > > >   drop(bar_guard);
> > > > 
> > > >   ...
> > > > 
> > > > Failure to drop the guard potentially introduces a race condition, which
> > > > will receive no compile-time warning and potentialy not even a runtime
> > > > one unless lockdep is enabled. This problem does not exist with the
> > > > equivalent C code AFAICT
> > 
> > Without klint [1] it is exactly the same as in C, where I have to remember to
> > not call into something that might sleep from atomic context.
> >
> 
> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
> not sleeping?

It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
just that the synchronization mechanism (RCU) used for the Revocable type
implies that.

In C we have something that is pretty similar with drm_dev_enter() /
drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.

In DRM this is used to prevent accesses to device resources after the device has
been unplugged.

> 
> I am fairly new to rust, could you help elaborate more about why these MMIO
> accesses need to have RevocableGuard in Rust? What problem are we trying to
> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
> vaguely understand we are trying to "wait for an MMIO access" using
> synchronize here, but it is just a guest.

Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
to unplugged devices.

The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
pci::Bar is placed in a Revocable. The Revocable is revoked when the device
is detached from the driver (for instance because it has been unplugged).

By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
unmapped; a subsequent call to try_access() would fail.

But yes, if the device is unplugged while holding the RCU guard, one is on their
own; that's also why keeping the critical sections short is desirable.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-24 12:07       ` Danilo Krummrich
  2025-02-24 12:11         ` Danilo Krummrich
@ 2025-02-25 14:11         ` Alexandre Courbot
  2025-02-25 15:06           ` Danilo Krummrich
  2025-02-27 21:37           ` Dave Airlie
  1 sibling, 2 replies; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-25 14:11 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Nouveau

On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> CC: Gary
>
> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
>> This inability to sleep while we are accessing registers seems very
>> constraining to me, if not dangerous. It is pretty common to have
>> functions intermingle hardware accesses with other operations that might
>> sleep, and this constraint means that in such cases the caller would
>> need to perform guard lifetime management manually:
>> 
>>   let bar_guard = bar.try_access()?;
>>   /* do something non-sleeping with bar_guard */
>>   drop(bar_guard);
>> 
>>   /* do something that might sleep */
>> 
>>   let bar_guard = bar.try_access()?;
>>   /* do something non-sleeping with bar_guard */
>>   drop(bar_guard);
>> 
>>   ...
>> 
>> Failure to drop the guard potentially introduces a race condition, which
>> will receive no compile-time warning and potentialy not even a runtime
>> one unless lockdep is enabled. This problem does not exist with the
>> equivalent C code AFAICT, which makes the Rust version actually more
>> error-prone and dangerous, the opposite of what we are trying to achieve
>> with Rust. Or am I missing something?
>
> Generally you are right, but you have to see it from a different perspective.
>
> What you describe is not an issue that comes from the design of the API, but is
> a limitation of Rust in the kernel. People are aware of the issue and with klint
> [1] there are solutions for that in the pipeline, see also [2] and [3].
>
> [1] https://rust-for-linux.com/klint
> [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/

Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
its own admission. But even if the ignore the safety issue, the other one
(ergonomics) is still there.

Basically this way of accessing registers imposes quite a mental burden on its
users. It requires a very different (and harsher) discipline than when writing
the same code in C, and the correct granularity to use is unclear to me.

For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
particular register in a busy loop, should I call try_access() once before the
loop? Or every time before accessing the register? I'm afraid having to check
that the resource is still alive before accessing any register is going to
become tedious very quickly.

I understand that we want to protect against accessing the IO region of an
unplugged device ; but still there is no guarantee that the device won't be
unplugged in the middle of a critical section, however short. Thus the driver
code should be able to recognize that the device has fallen off the bus when it
e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
extra protection that AFAICT isn't used in C?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 14:11         ` Alexandre Courbot
@ 2025-02-25 15:06           ` Danilo Krummrich
  2025-02-25 15:23             ` Alexandre Courbot
  2025-02-27 21:37           ` Dave Airlie
  1 sibling, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-25 15:06 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Nouveau

On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> > CC: Gary
> >
> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> This inability to sleep while we are accessing registers seems very
> >> constraining to me, if not dangerous. It is pretty common to have
> >> functions intermingle hardware accesses with other operations that might
> >> sleep, and this constraint means that in such cases the caller would
> >> need to perform guard lifetime management manually:
> >> 
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >> 
> >>   /* do something that might sleep */
> >> 
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >> 
> >>   ...
> >> 
> >> Failure to drop the guard potentially introduces a race condition, which
> >> will receive no compile-time warning and potentialy not even a runtime
> >> one unless lockdep is enabled. This problem does not exist with the
> >> equivalent C code AFAICT, which makes the Rust version actually more
> >> error-prone and dangerous, the opposite of what we are trying to achieve
> >> with Rust. Or am I missing something?
> >
> > Generally you are right, but you have to see it from a different perspective.
> >
> > What you describe is not an issue that comes from the design of the API, but is
> > a limitation of Rust in the kernel. People are aware of the issue and with klint
> > [1] there are solutions for that in the pipeline, see also [2] and [3].
> >
> > [1] https://rust-for-linux.com/klint
> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
> 
> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
> its own admission. But even if the ignore the safety issue, the other one
> (ergonomics) is still there.
> 
> Basically this way of accessing registers imposes quite a mental burden on its
> users. It requires a very different (and harsher) discipline than when writing
> the same code in C

We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
drm_dev_unplug().

> and the correct granularity to use is unclear to me.
> 
> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
> particular register in a busy loop, should I call try_access() once before the
> loop? Or every time before accessing the register?

I think we should re-acquire the guard in each iteration and drop it before the
delay. I think a simple closure would work very well for this pattern?

> I'm afraid having to check
> that the resource is still alive before accessing any register is going to
> become tedious very quickly.
> 
> I understand that we want to protect against accessing the IO region of an
> unplugged device ; but still there is no guarantee that the device won't be
> unplugged in the middle of a critical section, however short. Thus the driver
> code should be able to recognize that the device has fallen off the bus when it
> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> extra protection that AFAICT isn't used in C?

As mentioned above, we already do similar things in C.

Also, think about what's the alternative. If we remove the Devres wrapper of
pci::Bar, we lose the control over the lifetime of the bar mapping and it can
easily out-live the device / driver binding. This makes the API unsound.

With this drivers would be able to keep resources acquired. What if after a
hotplug the physical address region is re-used and to be mapped by another
driver?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 15:06           ` Danilo Krummrich
@ 2025-02-25 15:23             ` Alexandre Courbot
  2025-02-25 15:53               ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Alexandre Courbot @ 2025-02-25 15:23 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Nouveau

On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich wrote:
> On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
>> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
>> > CC: Gary
>> >
>> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
>> >> This inability to sleep while we are accessing registers seems very
>> >> constraining to me, if not dangerous. It is pretty common to have
>> >> functions intermingle hardware accesses with other operations that might
>> >> sleep, and this constraint means that in such cases the caller would
>> >> need to perform guard lifetime management manually:
>> >> 
>> >>   let bar_guard = bar.try_access()?;
>> >>   /* do something non-sleeping with bar_guard */
>> >>   drop(bar_guard);
>> >> 
>> >>   /* do something that might sleep */
>> >> 
>> >>   let bar_guard = bar.try_access()?;
>> >>   /* do something non-sleeping with bar_guard */
>> >>   drop(bar_guard);
>> >> 
>> >>   ...
>> >> 
>> >> Failure to drop the guard potentially introduces a race condition, which
>> >> will receive no compile-time warning and potentialy not even a runtime
>> >> one unless lockdep is enabled. This problem does not exist with the
>> >> equivalent C code AFAICT, which makes the Rust version actually more
>> >> error-prone and dangerous, the opposite of what we are trying to achieve
>> >> with Rust. Or am I missing something?
>> >
>> > Generally you are right, but you have to see it from a different perspective.
>> >
>> > What you describe is not an issue that comes from the design of the API, but is
>> > a limitation of Rust in the kernel. People are aware of the issue and with klint
>> > [1] there are solutions for that in the pipeline, see also [2] and [3].
>> >
>> > [1] https://rust-for-linux.com/klint
>> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
>> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
>> 
>> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
>> its own admission. But even if the ignore the safety issue, the other one
>> (ergonomics) is still there.
>> 
>> Basically this way of accessing registers imposes quite a mental burden on its
>> users. It requires a very different (and harsher) discipline than when writing
>> the same code in C
>
> We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
> drm_dev_unplug().

Granted, but the use of these is much more coarsed-grained than what is
expected of IO resources, right?

>
>> and the correct granularity to use is unclear to me.
>> 
>> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
>> particular register in a busy loop, should I call try_access() once before the
>> loop? Or every time before accessing the register?
>
> I think we should re-acquire the guard in each iteration and drop it before the
> delay. I think a simple closure would work very well for this pattern?
>
>> I'm afraid having to check
>> that the resource is still alive before accessing any register is going to
>> become tedious very quickly.
>> 
>> I understand that we want to protect against accessing the IO region of an
>> unplugged device ; but still there is no guarantee that the device won't be
>> unplugged in the middle of a critical section, however short. Thus the driver
>> code should be able to recognize that the device has fallen off the bus when it
>> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
>> extra protection that AFAICT isn't used in C?
>
> As mentioned above, we already do similar things in C.
>
> Also, think about what's the alternative. If we remove the Devres wrapper of
> pci::Bar, we lose the control over the lifetime of the bar mapping and it can
> easily out-live the device / driver binding. This makes the API unsound.

Oh my issue is not with the Devres wrapper, I think it makes sense -
it's more the use of RCU to control access to the resource that I find
too constraining. And I'm pretty sure there will be more users of the
same opinion as more drivers using it get written.

>
> With this drivers would be able to keep resources acquired. What if after a
> hotplug the physical address region is re-used and to be mapped by another
> driver?

Actually - wouldn't that issue also be addressed by a PCI equivalent to
drm_dev_enter() and friends that ensures the device (and thus its
devres resources) stay in place?

Using Rust, I can imagine (but not picture precisely yet) some method of
the device that returns a reference to an inner structure containing its
resources, available with immediate access. Since it would be
coarser-grained, it could rely on something less constraining than RCU
without a noticeable performance penalty.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-24 23:44             ` Danilo Krummrich
@ 2025-02-25 15:52               ` Joel Fernandes
  2025-02-25 16:09                 ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Joel Fernandes @ 2025-02-25 15:52 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck



On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
>> Hi Danilo,
>>
>> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
>>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
>>>> CC: Gary
>>>>
>>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
>>>>> This inability to sleep while we are accessing registers seems very
>>>>> constraining to me, if not dangerous. It is pretty common to have
>>>>> functions intermingle hardware accesses with other operations that might
>>>>> sleep, and this constraint means that in such cases the caller would
>>>>> need to perform guard lifetime management manually:
>>>>>
>>>>>   let bar_guard = bar.try_access()?;
>>>>>   /* do something non-sleeping with bar_guard */
>>>>>   drop(bar_guard);
>>>>>
>>>>>   /* do something that might sleep */
>>>>>
>>>>>   let bar_guard = bar.try_access()?;
>>>>>   /* do something non-sleeping with bar_guard */
>>>>>   drop(bar_guard);
>>>>>
>>>>>   ...
>>>>>
>>>>> Failure to drop the guard potentially introduces a race condition, which
>>>>> will receive no compile-time warning and potentialy not even a runtime
>>>>> one unless lockdep is enabled. This problem does not exist with the
>>>>> equivalent C code AFAICT
>>>
>>> Without klint [1] it is exactly the same as in C, where I have to remember to
>>> not call into something that might sleep from atomic context.
>>>
>>
>> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
>> not sleeping?
> 
> It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
> just that the synchronization mechanism (RCU) used for the Revocable type
> implies that.
> 
> In C we have something that is pretty similar with drm_dev_enter() /
> drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.
> 
> In DRM this is used to prevent accesses to device resources after the device has
> been unplugged.

Thanks a lot for the response. Might it make more sense to use SRCU then? The
use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing.

Another colleague told me RDMA also uses SRCU for a similar purpose as well.

>> I am fairly new to rust, could you help elaborate more about why these MMIO
>> accesses need to have RevocableGuard in Rust? What problem are we trying to
>> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
>> vaguely understand we are trying to "wait for an MMIO access" using
>> synchronize here, but it is just a guest.
> 
> Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
> to unplugged devices.
> 
> The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
> pci::Bar is placed in a Revocable. The Revocable is revoked when the device
> is detached from the driver (for instance because it has been unplugged).

I guess the Devres concept of revoking resources on driver detach is not a rust
thing (even for PCI)... but correct me if I'm wrong.

> By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
> unmapped; a subsequent call to try_access() would fail.
> 
> But yes, if the device is unplugged while holding the RCU guard, one is on their
> own; that's also why keeping the critical sections short is desirable.

I have heard some concern around whether Rust is changing the driver model when
it comes to driver detach / driver remove.  Can you elaborate may be a bit about
how Rust changes that mechanism versus C, when it comes to that?  Ideally we
would not want Rust drivers to have races with user space accesses when they are
detached/remove. But we also don't want accesses to be non-sleepable sections
where this guard is held, it seems restrictive (though to your point the
sections are expected to be small).

thanks,

 - Joel





^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 15:23             ` Alexandre Courbot
@ 2025-02-25 15:53               ` Danilo Krummrich
  0 siblings, 0 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-25 15:53 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Nouveau

On Wed, Feb 26, 2025 at 12:23:40AM +0900, Alexandre Courbot wrote:
> On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich wrote:
> > On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
> >> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> >> > CC: Gary
> >> >
> >> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> >> This inability to sleep while we are accessing registers seems very
> >> >> constraining to me, if not dangerous. It is pretty common to have
> >> >> functions intermingle hardware accesses with other operations that might
> >> >> sleep, and this constraint means that in such cases the caller would
> >> >> need to perform guard lifetime management manually:
> >> >> 
> >> >>   let bar_guard = bar.try_access()?;
> >> >>   /* do something non-sleeping with bar_guard */
> >> >>   drop(bar_guard);
> >> >> 
> >> >>   /* do something that might sleep */
> >> >> 
> >> >>   let bar_guard = bar.try_access()?;
> >> >>   /* do something non-sleeping with bar_guard */
> >> >>   drop(bar_guard);
> >> >> 
> >> >>   ...
> >> >> 
> >> >> Failure to drop the guard potentially introduces a race condition, which
> >> >> will receive no compile-time warning and potentialy not even a runtime
> >> >> one unless lockdep is enabled. This problem does not exist with the
> >> >> equivalent C code AFAICT, which makes the Rust version actually more
> >> >> error-prone and dangerous, the opposite of what we are trying to achieve
> >> >> with Rust. Or am I missing something?
> >> >
> >> > Generally you are right, but you have to see it from a different perspective.
> >> >
> >> > What you describe is not an issue that comes from the design of the API, but is
> >> > a limitation of Rust in the kernel. People are aware of the issue and with klint
> >> > [1] there are solutions for that in the pipeline, see also [2] and [3].
> >> >
> >> > [1] https://rust-for-linux.com/klint
> >> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> >> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
> >> 
> >> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
> >> its own admission. But even if the ignore the safety issue, the other one
> >> (ergonomics) is still there.
> >> 
> >> Basically this way of accessing registers imposes quite a mental burden on its
> >> users. It requires a very different (and harsher) discipline than when writing
> >> the same code in C
> >
> > We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
> > drm_dev_unplug().
> 
> Granted, but the use of these is much more coarsed-grained than what is
> expected of IO resources, right?

Potentially, yes. But exactly this characteristic has been criticised [1].

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

> 
> >
> >> and the correct granularity to use is unclear to me.
> >> 
> >> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
> >> particular register in a busy loop, should I call try_access() once before the
> >> loop? Or every time before accessing the register?
> >
> > I think we should re-acquire the guard in each iteration and drop it before the
> > delay. I think a simple closure would work very well for this pattern?
> >
> >> I'm afraid having to check
> >> that the resource is still alive before accessing any register is going to
> >> become tedious very quickly.
> >> 
> >> I understand that we want to protect against accessing the IO region of an
> >> unplugged device ; but still there is no guarantee that the device won't be
> >> unplugged in the middle of a critical section, however short. Thus the driver
> >> code should be able to recognize that the device has fallen off the bus when it
> >> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> >> extra protection that AFAICT isn't used in C?
> >
> > As mentioned above, we already do similar things in C.
> >
> > Also, think about what's the alternative. If we remove the Devres wrapper of
> > pci::Bar, we lose the control over the lifetime of the bar mapping and it can
> > easily out-live the device / driver binding. This makes the API unsound.
> 
> Oh my issue is not with the Devres wrapper, I think it makes sense -
> it's more the use of RCU to control access to the resource that I find
> too constraining. And I'm pretty sure there will be more users of the
> same opinion as more drivers using it get written.

What do you suggest?

> 
> >
> > With this drivers would be able to keep resources acquired. What if after a
> > hotplug the physical address region is re-used and to be mapped by another
> > driver?
> 
> Actually - wouldn't that issue also be addressed by a PCI equivalent to
> drm_dev_enter() and friends that ensures the device (and thus its
> devres resources) stay in place?

I'm not sure I get the idea, but we can *not* have the device resources stay in
place once the device is unbound (e.g. keep the resource region acquired by the
driver).

Consequently, we have to have a way to revoke access to the corresponding
pci::Bar.

> 
> Using Rust, I can imagine (but not picture precisely yet) some method of
> the device that returns a reference to an inner structure containing its
> resources, available with immediate access. Since it would be
> coarser-grained, it could rely on something less constraining than RCU
> without a noticeable performance penalty.

We had similar attempts when we designed this API, i.e. a common Revocable in
the driver private data of a device. But it had some chicken-egg issues on
initialization in probe(). Besides that, it wouldn't get you rid of the
Revocable, since the corresponding resource are only valid while the driver is
bound to a device, not for the entire lifetime of the device.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 15:52               ` Joel Fernandes
@ 2025-02-25 16:09                 ` Danilo Krummrich
  2025-02-25 21:02                   ` Joel Fernandes
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-25 16:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> 
> 
> On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> >> Hi Danilo,
> >>
> >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> >>>> CC: Gary
> >>>>
> >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >>>>> This inability to sleep while we are accessing registers seems very
> >>>>> constraining to me, if not dangerous. It is pretty common to have
> >>>>> functions intermingle hardware accesses with other operations that might
> >>>>> sleep, and this constraint means that in such cases the caller would
> >>>>> need to perform guard lifetime management manually:
> >>>>>
> >>>>>   let bar_guard = bar.try_access()?;
> >>>>>   /* do something non-sleeping with bar_guard */
> >>>>>   drop(bar_guard);
> >>>>>
> >>>>>   /* do something that might sleep */
> >>>>>
> >>>>>   let bar_guard = bar.try_access()?;
> >>>>>   /* do something non-sleeping with bar_guard */
> >>>>>   drop(bar_guard);
> >>>>>
> >>>>>   ...
> >>>>>
> >>>>> Failure to drop the guard potentially introduces a race condition, which
> >>>>> will receive no compile-time warning and potentialy not even a runtime
> >>>>> one unless lockdep is enabled. This problem does not exist with the
> >>>>> equivalent C code AFAICT
> >>>
> >>> Without klint [1] it is exactly the same as in C, where I have to remember to
> >>> not call into something that might sleep from atomic context.
> >>>
> >>
> >> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
> >> not sleeping?
> > 
> > It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
> > just that the synchronization mechanism (RCU) used for the Revocable type
> > implies that.
> > 
> > In C we have something that is pretty similar with drm_dev_enter() /
> > drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.
> > 
> > In DRM this is used to prevent accesses to device resources after the device has
> > been unplugged.
> 
> Thanks a lot for the response. Might it make more sense to use SRCU then? The
> use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing.

Allowing to hold on to the guard for too long is a bit contradictive to the goal
of detecting hotunplug I guess.

Besides that I don't really see why we can't just re-acquire it after we sleep?
Rust provides good options to implement it ergonimcally I think.

> 
> Another colleague told me RDMA also uses SRCU for a similar purpose as well.

See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

> 
> >> I am fairly new to rust, could you help elaborate more about why these MMIO
> >> accesses need to have RevocableGuard in Rust? What problem are we trying to
> >> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
> >> vaguely understand we are trying to "wait for an MMIO access" using
> >> synchronize here, but it is just a guest.
> > 
> > Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
> > to unplugged devices.
> > 
> > The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
> > pci::Bar is placed in a Revocable. The Revocable is revoked when the device
> > is detached from the driver (for instance because it has been unplugged).
> 
> I guess the Devres concept of revoking resources on driver detach is not a rust
> thing (even for PCI)... but correct me if I'm wrong.

I'm not sure what you mean with that, can you expand a bit?

> 
> > By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
> > unmapped; a subsequent call to try_access() would fail.
> > 
> > But yes, if the device is unplugged while holding the RCU guard, one is on their
> > own; that's also why keeping the critical sections short is desirable.
> 
> I have heard some concern around whether Rust is changing the driver model when
> it comes to driver detach / driver remove.  Can you elaborate may be a bit about
> how Rust changes that mechanism versus C, when it comes to that?

I think that one is simple, Rust does *not* change the driver model.

What makes you think so?

> Ideally we
> would not want Rust drivers to have races with user space accesses when they are
> detached/remove. But we also don't want accesses to be non-sleepable sections
> where this guard is held, it seems restrictive (though to your point the
> sections are expected to be small).

In the very extreme case, nothing prevents you from implementing a wrapper like:

	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) -> Result<u32> {
		let bar = bar.try_access()?;
		bar.read32(offset);
	}

Which limits the RCU read side critical section to my_write32().

Similarly you can have custom functions for short sequences of I/O ops, or use
closures. I don't understand the concern.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 16:09                 ` Danilo Krummrich
@ 2025-02-25 21:02                   ` Joel Fernandes
  2025-02-25 22:02                     ` Danilo Krummrich
  2025-02-25 22:57                     ` Jason Gunthorpe
  0 siblings, 2 replies; 104+ messages in thread
From: Joel Fernandes @ 2025-02-25 21:02 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck, Jason Gunthorpe

On Tue, Feb 25, 2025 at 05:09:35PM +0100, Danilo Krummrich wrote:
> On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> > 
> > 
> > On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> > >> Hi Danilo,
> > >>
> > >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> > >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> > >>>> CC: Gary
> > >>>>
> > >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> > >>>>> This inability to sleep while we are accessing registers seems very
> > >>>>> constraining to me, if not dangerous. It is pretty common to have
> > >>>>> functions intermingle hardware accesses with other operations that might
> > >>>>> sleep, and this constraint means that in such cases the caller would
> > >>>>> need to perform guard lifetime management manually:
> > >>>>>
> > >>>>>   let bar_guard = bar.try_access()?;
> > >>>>>   /* do something non-sleeping with bar_guard */
> > >>>>>   drop(bar_guard);
> > >>>>>
> > >>>>>   /* do something that might sleep */
> > >>>>>
> > >>>>>   let bar_guard = bar.try_access()?;
> > >>>>>   /* do something non-sleeping with bar_guard */
> > >>>>>   drop(bar_guard);
> > >>>>>
> > >>>>>   ...
> > >>>>>
> > >>>>> Failure to drop the guard potentially introduces a race condition, which
> > >>>>> will receive no compile-time warning and potentialy not even a runtime
> > >>>>> one unless lockdep is enabled. This problem does not exist with the
> > >>>>> equivalent C code AFAICT
> > >>>
> > >>> Without klint [1] it is exactly the same as in C, where I have to remember to
> > >>> not call into something that might sleep from atomic context.
> > >>>
> > >>
> > >> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
> > >> not sleeping?
> > > 
> > > It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
> > > just that the synchronization mechanism (RCU) used for the Revocable type
> > > implies that.
> > > 
> > > In C we have something that is pretty similar with drm_dev_enter() /
> > > drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.
> > > 
> > > In DRM this is used to prevent accesses to device resources after the device has
> > > been unplugged.
> > 
> > Thanks a lot for the response. Might it make more sense to use SRCU then? The
> > use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing.
> 
> Allowing to hold on to the guard for too long is a bit contradictive to the goal
> of detecting hotunplug I guess.
> 
> Besides that I don't really see why we can't just re-acquire it after we sleep?
> Rust provides good options to implement it ergonimcally I think.
> 
> > 
> > Another colleague told me RDMA also uses SRCU for a similar purpose as well.
> 
> See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?
> 
> [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

Hmm, so you're saying SRCU sections blocking indefinitely is a concern as per
that thread. But I think SRCU GPs should not be stalled in normal operation.
If it is, that is a bug anyway. Stalling SRCU grace periods is not really a
good thing anyway, you could run out of memory (even though stalling RCU is
even more dangerous).

For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct
me if I'm wrong about the RDMA user but this is what I recollect discussing
with you.

> > 
> > >> I am fairly new to rust, could you help elaborate more about why these MMIO
> > >> accesses need to have RevocableGuard in Rust? What problem are we trying to
> > >> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
> > >> vaguely understand we are trying to "wait for an MMIO access" using
> > >> synchronize here, but it is just a guest.
> > > 
> > > Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
> > > to unplugged devices.
> > > 
> > > The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
> > > pci::Bar is placed in a Revocable. The Revocable is revoked when the device
> > > is detached from the driver (for instance because it has been unplugged).
> > 
> > I guess the Devres concept of revoking resources on driver detach is not a rust
> > thing (even for PCI)... but correct me if I'm wrong.
> 
> I'm not sure what you mean with that, can you expand a bit?

I was reading the devres documentation earlier. It mentios that one of its
use is to clean up resources. Maybe I mixed up the meaning of "clean up" and
"revoke" as I was reading it.

Honestly, I am still confused a bit by the difference between "revoking" and
"cleaning up".

> > 
> > > By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
> > > unmapped; a subsequent call to try_access() would fail.
> > > 
> > > But yes, if the device is unplugged while holding the RCU guard, one is on their
> > > own; that's also why keeping the critical sections short is desirable.
> > 
> > I have heard some concern around whether Rust is changing the driver model when
> > it comes to driver detach / driver remove.  Can you elaborate may be a bit about
> > how Rust changes that mechanism versus C, when it comes to that?
> 
> I think that one is simple, Rust does *not* change the driver model.
> 
> What makes you think so?

Well, the revocable concept for one is rust-only right?

It is also possibly just some paranoia based on discussions, but I'm not sure
at the moment.

> > Ideally we
> > would not want Rust drivers to have races with user space accesses when they are
> > detached/remove. But we also don't want accesses to be non-sleepable sections
> > where this guard is held, it seems restrictive (though to your point the
> > sections are expected to be small).
> 
> In the very extreme case, nothing prevents you from implementing a wrapper like:
> 
> 	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) -> Result<u32> {
> 		let bar = bar.try_access()?;
> 		bar.read32(offset);
> 	}
> 
> Which limits the RCU read side critical section to my_write32().
> 
> Similarly you can have custom functions for short sequences of I/O ops, or use
> closures. I don't understand the concern.

Yeah, this is certainly possible. I think one concern is similar to what you
raised on the other thread you shared [1]:
"Maybe we even want to replace it with SRCU entirely to ensure that drivers
can't stall the RCU grace period for too long by accident."

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 21:02                   ` Joel Fernandes
@ 2025-02-25 22:02                     ` Danilo Krummrich
  2025-02-25 22:42                       ` Dave Airlie
  2025-02-25 22:57                     ` Jason Gunthorpe
  1 sibling, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-25 22:02 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck, Jason Gunthorpe

On Tue, Feb 25, 2025 at 04:02:28PM -0500, Joel Fernandes wrote:
> On Tue, Feb 25, 2025 at 05:09:35PM +0100, Danilo Krummrich wrote:
> > On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> > > 
> > > 
> > > On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > > > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> > > >> Hi Danilo,
> > > >>
> > > >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> > > >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> > > >>>> CC: Gary
> > > >>>>
> > > >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> > > >>>>> This inability to sleep while we are accessing registers seems very
> > > >>>>> constraining to me, if not dangerous. It is pretty common to have
> > > >>>>> functions intermingle hardware accesses with other operations that might
> > > >>>>> sleep, and this constraint means that in such cases the caller would
> > > >>>>> need to perform guard lifetime management manually:
> > > >>>>>
> > > >>>>>   let bar_guard = bar.try_access()?;
> > > >>>>>   /* do something non-sleeping with bar_guard */
> > > >>>>>   drop(bar_guard);
> > > >>>>>
> > > >>>>>   /* do something that might sleep */
> > > >>>>>
> > > >>>>>   let bar_guard = bar.try_access()?;
> > > >>>>>   /* do something non-sleeping with bar_guard */
> > > >>>>>   drop(bar_guard);
> > > >>>>>
> > > >>>>>   ...
> > > >>>>>
> > > >>>>> Failure to drop the guard potentially introduces a race condition, which
> > > >>>>> will receive no compile-time warning and potentialy not even a runtime
> > > >>>>> one unless lockdep is enabled. This problem does not exist with the
> > > >>>>> equivalent C code AFAICT
> > > >>>
> > > >>> Without klint [1] it is exactly the same as in C, where I have to remember to
> > > >>> not call into something that might sleep from atomic context.
> > > >>>
> > > >>
> > > >> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
> > > >> not sleeping?
> > > > 
> > > > It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
> > > > just that the synchronization mechanism (RCU) used for the Revocable type
> > > > implies that.
> > > > 
> > > > In C we have something that is pretty similar with drm_dev_enter() /
> > > > drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.
> > > > 
> > > > In DRM this is used to prevent accesses to device resources after the device has
> > > > been unplugged.
> > > 
> > > Thanks a lot for the response. Might it make more sense to use SRCU then? The
> > > use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing.
> > 
> > Allowing to hold on to the guard for too long is a bit contradictive to the goal
> > of detecting hotunplug I guess.
> > 
> > Besides that I don't really see why we can't just re-acquire it after we sleep?
> > Rust provides good options to implement it ergonimcally I think.
> > 
> > > 
> > > Another colleague told me RDMA also uses SRCU for a similar purpose as well.
> > 
> > See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?
> > 
> > [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/
> 
> Hmm, so you're saying SRCU sections blocking indefinitely is a concern as per
> that thread. But I think SRCU GPs should not be stalled in normal operation.
> If it is, that is a bug anyway. Stalling SRCU grace periods is not really a
> good thing anyway, you could run out of memory (even though stalling RCU is
> even more dangerous).

I'm saying that extending the time of critical sections is a concern, because
it's more likely to miss the unplug event and it's just not necessary. You grab
the guard, do a few I/O ops and drop it -- simple.

If you want to sleep in between just re-acquire it when you're done sleeping.
You can easily avoid explicit drop(guard) calls by moving critical sections to
their own function or closures.

I still don't understand why you're thinking that it's crucial to sleep while
holding the RevocableGuard?

> 
> For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct
> me if I'm wrong about the RDMA user but this is what I recollect discussing
> with you.
> 
> > > 
> > > >> I am fairly new to rust, could you help elaborate more about why these MMIO
> > > >> accesses need to have RevocableGuard in Rust? What problem are we trying to
> > > >> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
> > > >> vaguely understand we are trying to "wait for an MMIO access" using
> > > >> synchronize here, but it is just a guest.
> > > > 
> > > > Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
> > > > to unplugged devices.
> > > > 
> > > > The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
> > > > pci::Bar is placed in a Revocable. The Revocable is revoked when the device
> > > > is detached from the driver (for instance because it has been unplugged).
> > > 
> > > I guess the Devres concept of revoking resources on driver detach is not a rust
> > > thing (even for PCI)... but correct me if I'm wrong.
> > 
> > I'm not sure what you mean with that, can you expand a bit?
> 
> I was reading the devres documentation earlier. It mentios that one of its
> use is to clean up resources. Maybe I mixed up the meaning of "clean up" and
> "revoke" as I was reading it.
> 
> Honestly, I am still confused a bit by the difference between "revoking" and
> "cleaning up".

The Devres [1] implementation implements the devres callback that is called when the
device is unbound from the driver.

Once that happens, it revokes the underlying resource (e.g. the PCI bar mapping)
by using a Revocable [2] internally. Once the resource is revoked, try_access()
returns None and the resource (e.g. pci::Bar is dropped). By dropping the
pci::Bar the mapping is unmapped and the resource region is removed (which is
typically called cleanup).

[1] https://rust.docs.kernel.org/kernel/devres/struct.Devres.html
[2] https://rust.docs.kernel.org/kernel/revocable/struct.Revocable.html

> 
> > > 
> > > > By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
> > > > unmapped; a subsequent call to try_access() would fail.
> > > > 
> > > > But yes, if the device is unplugged while holding the RCU guard, one is on their
> > > > own; that's also why keeping the critical sections short is desirable.
> > > 
> > > I have heard some concern around whether Rust is changing the driver model when
> > > it comes to driver detach / driver remove.  Can you elaborate may be a bit about
> > > how Rust changes that mechanism versus C, when it comes to that?
> > 
> > I think that one is simple, Rust does *not* change the driver model.
> > 
> > What makes you think so?
> 
> Well, the revocable concept for one is rust-only right?

Yes, but that has nothing to do with changing the driver model. It is just an
additional implementation detail to ensure safety.

IIRC there are also have been efforts for a similar mechanism in C.

> 
> It is also possibly just some paranoia based on discussions, but I'm not sure
> at the moment.

Again there is nothing different to C, except one additional step to ensure
safety. For instance, let's take devm_kzalloc(). Once the device is detached
from the driver the memory allocated with this function is freed automatically.

The additional step in Rust is, that we'd not only free the memory, but also
revoke the access to the pointer that has been returned by devm_kzalloc() for
the driver, such that it can't be used by accident anymore.

Besides that, I'd be interested to what kind of discussion you're referring to.

> 
> > > Ideally we
> > > would not want Rust drivers to have races with user space accesses when they are
> > > detached/remove. But we also don't want accesses to be non-sleepable sections
> > > where this guard is held, it seems restrictive (though to your point the
> > > sections are expected to be small).
> > 
> > In the very extreme case, nothing prevents you from implementing a wrapper like:
> > 
> > 	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) -> Result<u32> {
> > 		let bar = bar.try_access()?;
> > 		bar.read32(offset);
> > 	}
> > 
> > Which limits the RCU read side critical section to my_write32().
> > 
> > Similarly you can have custom functions for short sequences of I/O ops, or use
> > closures. I don't understand the concern.
> 
> Yeah, this is certainly possible. I think one concern is similar to what you
> raised on the other thread you shared [1]:
> "Maybe we even want to replace it with SRCU entirely to ensure that drivers
> can't stall the RCU grace period for too long by accident."

Yeah, I was just thinking out loud, but I think it wasn't a good idea -- we
really do want to keep the critical sections short, so RCU is fine. Prohibit
drivers to use RCU, just because they could mess up, wasn't a good reason.

> 
> [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/
> 
> thanks,
> 
>  - Joel
> 
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 22:02                     ` Danilo Krummrich
@ 2025-02-25 22:42                       ` Dave Airlie
  0 siblings, 0 replies; 104+ messages in thread
From: Dave Airlie @ 2025-02-25 22:42 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Joel Fernandes, Alexandre Courbot, Gary Guo, Joel Fernandes,
	Boqun Feng, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck, Jason Gunthorpe

>
> I'm saying that extending the time of critical sections is a concern, because
> it's more likely to miss the unplug event and it's just not necessary. You grab
> the guard, do a few I/O ops and drop it -- simple.

At least for nova-core I've realised I got this partly wrong,

https://gitlab.freedesktop.org/nouvelles/kernel/-/blob/nova-core-experiments/drivers/gpu/nova-core/falcon.rs?ref_type=heads#L305

However in this case I expect the sleeps small enough to end up in
udelay perhaps instead of actual sleeps,

but I wouldn't be too worried about the overhead of adding a bit of
extra code in the wake up from sleep path, the sleep is going to take
the time, a few extra instructions in the poll won't be noticeable.

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 21:02                   ` Joel Fernandes
  2025-02-25 22:02                     ` Danilo Krummrich
@ 2025-02-25 22:57                     ` Jason Gunthorpe
  2025-02-25 23:26                       ` Danilo Krummrich
  2025-02-25 23:45                       ` Danilo Krummrich
  1 sibling, 2 replies; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 22:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Danilo Krummrich, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Tue, Feb 25, 2025 at 04:02:28PM -0500, Joel Fernandes wrote:
> > Besides that I don't really see why we can't just re-acquire it after we sleep?
> > Rust provides good options to implement it ergonimcally I think.
> > 
> > > 
> > > Another colleague told me RDMA also uses SRCU for a similar purpose as well.
> > 
> > See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?
> > 
> > [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

> For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct
> me if I'm wrong about the RDMA user but this is what I recollect discussing
> with you.

In RDMA SRCU is not unbounded. It is limited to a system call
duration, and we don't have system calls that become time unbounded
inside drivers.

The motivation for RDMA was not really hotplug, but to support kernel
module upgrade. Some data center HA users were very interested in
this. To achieve it the driver module itself cannot have an elevated
module refcount. This requires swapping the module refcount for a
sleepable RW lock like SRCU or rwsem protecting all driver
callbacks. [1]

To be very clear, in RDMA you can open /dev/infiniband/uverbsX, run a
ioctl on the FD and then successfully rmmod the driver module while
the FD is open and while the ioctl is running. Any driver op will
complete, future ioctls will fail, and the module will complete.

So, from my perspective, this revocable idea would completely destroy
the actual production purpose we built the fast hot-plug machinery
for. It does not guarentee that driver threads are fenced prior to
completing remove. Intead it must rely on the FD itself to hold the
module refcount on the driver to keep the .text alive while driver
callbacks continue to be called. Making the module refcount linked to
userspace closing a FD renders the module unloadable in practice.

The common driver shutdown process in the kernel, that is well tested
and copied, makes the driver single threaded during the remove()
callback. Effectively instead of trying to micro-revoke individual
resources we revoke all concurrency threads and then it is completely
safe to destroy all the resources. This also guarentees that after
completing remove there is no Execute After Free risk to the driver
code.

SRCU/rwsem across all driver ops function pointer calls is part of
this scheme, but also things like cancling/flushing work queues,
blocking new work submission, preventing interrupts, removing syfs
files (they block concurrent threads internally), synchronizing any
possibly outstanding RCU callbacks, and more.

So, I'd suggest that if you have system calls that wait, the typical
expected solution would be to shoot down the waits during a remove
event so they can become time bounded.

> > > I have heard some concern around whether Rust is changing the driver model when
> > > it comes to driver detach / driver remove.  Can you elaborate may be a bit about
> > > how Rust changes that mechanism versus C, when it comes to that?
> > 
> > I think that one is simple, Rust does *not* change the driver model.

I think this resource-revoke idea is deviating from the normal
expected driver model by allowing driver code to continue to run in
other threads once remove completes. That is definitely abnormal at
least.

It is not necessarily *wrong*, but it sure is weird and as I explained
above it has bad system level properties.

Further, it seems to me there is a very unique DRM specific issue at
work "time unbounded driver callbacks". A weird solution to this
should not be baked into the common core kernel rust bindings and
break the working model of all other subsystems that don't have that
problem.

> > Similarly you can have custom functions for short sequences of I/O ops, or use
> > closures. I don't understand the concern.
> 
> Yeah, this is certainly possible. I think one concern is similar to what you
> raised on the other thread you shared [1]:
> "Maybe we even want to replace it with SRCU entirely to ensure that drivers
> can't stall the RCU grace period for too long by accident."

I'd be worried about introducing a whole bunch more untestable failure
paths in drivers. Especially in areas like work queue submit that are
designed not to fail [2]. Non-failing work queues is a critical property
that I've relied on countless times. I'm not sure you even *can* recover
from this correctly in all cases.

Then in the other email did you say that even some memory allocations
go into this scheme? Yikes!

Further, hiding a synchronize_rcu in a devm destructor [3], once per
revocable object is awful. If you imagine having a rcu around each of
your revocable objects, how many synchronize_rcu()s is devm going to
call post-remove()?

On a busy server it is known to take a long time. So it is easy to
imagine driver remove times going into the many 10's of seconds for no
good reason. Maybe even multiple minutes if the driver ends up with
many of these objects.

[1] - Module .text is not unplugged from the kernel until all probed
drivers affiliated with that module have completed their remove
operations.

[2] - It is important that drivers shut down all their concurrency in
workqueues during remove because work queues do not hold the module
refcount. The only way .text lifecyle works is for drivers using work
queues is to rely on [1] to protect against Execute after Free.

[3] - Personally I agree with Laurent's points and I strongly dislike
devm. I'm really surprised to see Rust using it, I imagined Rust has
sufficiently strong object lifecycle management that it would not be
needed :(

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 22:57                     ` Jason Gunthorpe
@ 2025-02-25 23:26                       ` Danilo Krummrich
  2025-02-25 23:45                       ` Danilo Krummrich
  1 sibling, 0 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-25 23:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Tue, Feb 25, 2025 at 06:57:56PM -0400, Jason Gunthorpe wrote:
> I think this resource-revoke idea is deviating from the normal
> expected driver model by allowing driver code to continue to run in
> other threads once remove completes. That is definitely abnormal at
> least.

No, it simply guarantees that once remove() completed the pointer to the
resource can't be accessed anymore and the resource can't be kept alive
(which includes the actual memory mapping as well as the allocated resource
region).

It also solves the unplug problem, where ioctls can't access the resource
anymore after remove(). This is indeed a problem that does not affect all
subsystems.

> 
> It is not necessarily *wrong*, but it sure is weird and as I explained
> above it has bad system level properties.
> 
> Further, it seems to me there is a very unique DRM specific issue at
> work "time unbounded driver callbacks". A weird solution to this
> should not be baked into the common core kernel rust bindings and
> break the working model of all other subsystems that don't have that
> problem.
> 
> > > Similarly you can have custom functions for short sequences of I/O ops, or use
> > > closures. I don't understand the concern.
> > 
> > Yeah, this is certainly possible. I think one concern is similar to what you
> > raised on the other thread you shared [1]:
> > "Maybe we even want to replace it with SRCU entirely to ensure that drivers
> > can't stall the RCU grace period for too long by accident."
> 
> I'd be worried about introducing a whole bunch more untestable failure
> paths in drivers. Especially in areas like work queue submit that are
> designed not to fail [2]. Non-failing work queues is a critical property
> that I've relied on countless times. I'm not sure you even *can* recover
> from this correctly in all cases.
> 
> Then in the other email did you say that even some memory allocations
> go into this scheme? Yikes!

"For instance, let's take devm_kzalloc(). Once the device is detached
from the driver the memory allocated with this function is freed automatically.

The additional step in Rust is, that we'd not only free the memory, but also
revoke the access to the pointer that has been returned by devm_kzalloc() for
the driver, such that it can't be used by accident anymore."

This was just an analogy to explain what we're doing here. Obviously, memory
allocations can be managed by Rust's object lifetime management.

The reason we have Devres for device resources is that the lifetime of a
pci::Bar is *not* bound to the object lifetime directly, but to the lifetime of
the binding between a device and a driver. That's why it needs to be revoked
(which forcefully drops the object) when the device is unbound *not* when the
pci::Bar object is dropped regularly.

That's all the magic we're doing here. And again, this is not a change to the
device / driver model. It is making use of the device / driver model to ensure
safety.

> 
> Further, hiding a synchronize_rcu in a devm destructor [3], once per
> revocable object is awful. If you imagine having a rcu around each of
> your revocable objects, how many synchronize_rcu()s is devm going to
> call post-remove()?

As many as you have MMIO mappings in your driver. But we can probably optimize
this to just a single synchronize_rcu().

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 22:57                     ` Jason Gunthorpe
  2025-02-25 23:26                       ` Danilo Krummrich
@ 2025-02-25 23:45                       ` Danilo Krummrich
  2025-02-26  0:49                         ` Jason Gunthorpe
  1 sibling, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-25 23:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Tue, Feb 25, 2025 at 06:57:56PM -0400, Jason Gunthorpe wrote:
> The common driver shutdown process in the kernel, that is well tested
> and copied, makes the driver single threaded during the remove()
> callback.

All devres callbacks run in the same callchain, __device_release_driver() first
calls remove() and then all the devres callbacks, where we revoke the pci::Bar,
by which gets dropped and hence the bar is unmapped and resource regions are
free. It's not different to C drivers. Except that in C you don't lose access to
the void pointer that still points to the (unmapped) MMIO address.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 23:45                       ` Danilo Krummrich
@ 2025-02-26  0:49                         ` Jason Gunthorpe
  2025-02-26  1:16                           ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-26  0:49 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 12:45:45AM +0100, Danilo Krummrich wrote:
> On Tue, Feb 25, 2025 at 06:57:56PM -0400, Jason Gunthorpe wrote:
> > The common driver shutdown process in the kernel, that is well tested
> > and copied, makes the driver single threaded during the remove()
> > callback.
> 
> All devres callbacks run in the same callchain, __device_release_driver() first
> calls remove() and then all the devres callbacks, where we revoke the pci::Bar,
> by which gets dropped and hence the bar is unmapped and resource regions are
> free. It's not different to C drivers. Except that in C you don't lose access to
> the void pointer that still points to the (unmapped) MMIO address.

I understand how devm works.

I'm pointing out the fundamental different in approachs. The typical
widely used pattern results in __device_release_driver() completing
with no concurrent driver code running.

DRM achieves this, in part, by using drm_dev_unplug().

The Rust approach ends up with __device_release_driver() completing
and leaving driver code still running in other threads.

This is a significant different outcome that must be mitigated somehow
to prevent an Execute After Free failure for Rust. DRM happens to be
safe because it ends up linking the driver module refcount to FD
lifetime. This also prevents unloading the driver (bad!!).

However, for instance, you can't rely on the module reference count
with work queues, so this scheme is not generally applicable.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26  0:49                         ` Jason Gunthorpe
@ 2025-02-26  1:16                           ` Danilo Krummrich
  2025-02-26 17:21                             ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-26  1:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Tue, Feb 25, 2025 at 08:49:16PM -0400, Jason Gunthorpe wrote:
> I'm pointing out the fundamental different in approachs. The typical
> widely used pattern results in __device_release_driver() completing
> with no concurrent driver code running.

Typically yes, but there are exceptions, such as DRM.

> 
> DRM achieves this, in part, by using drm_dev_unplug().

No, DRM can have concurrent driver code running, which is why drm_dev_enter()
returns whether the device is unplugged already, such that subsequent
operations, (e.g. I/O) can be omitted.

> 
> The Rust approach ends up with __device_release_driver() completing
> and leaving driver code still running in other threads.

No, this has nothing to do with Rust device / driver or I/O abstractions.

It entirely depends on the driver you implement. If you register a DRM device,
then yes, there may be concurrent driver code running after
__device_release_driver() completes. But this is specific to the DRM
implementation, *not* to Rust.

Again, the reason a pci::Bar needs to be revocable in Rust is that we can't have
the driver potentially keep the pci::Bar alive (or even access it) after the
device is unbound.

A driver can also be unbound without the module being removed, and if the driver
would be able to keep the pci::Bar alive, it would mean that the resource region
is not freed and the MMIO mapping is not unmapped, because the resource region
and the MMIO mapping is bound to the lifetime of the pci::Bar object. This would
not be acceptable for a Rust driver.

That this also comes in handy for subsystems like DRM, where we could have
attempts to access to the pci::Bar object after the device is unbound by design,
can be seen as a nice side effect.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26  1:16                           ` Danilo Krummrich
@ 2025-02-26 17:21                             ` Jason Gunthorpe
  2025-02-26 21:31                               ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 17:21 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 02:16:58AM +0100, Danilo Krummrich wrote:
> > DRM achieves this, in part, by using drm_dev_unplug().
> 
> No, DRM can have concurrent driver code running, which is why drm_dev_enter()
> returns whether the device is unplugged already, such that subsequent
> operations, (e.g. I/O) can be omitted.

Ah, I did notice that the driver was the one providing the
file_operations struct so of course the core code can't protect the
driver ops. Yuk :\

> Again, the reason a pci::Bar needs to be revocable in Rust is that we can't have
> the driver potentially keep the pci::Bar alive (or even access it) after the
> device is unbound.

My impression is that nobody has yet come up with a Rust way to
implement the normal kernel design pattern of revoke threads then free
objects in safe rust.

Yes, this is a peculiar lifetime model, but it is pretty important in
the kernel. I'm not convinced you can just fully ignore it in Rust as
a design pattern. We use it pretty much everywhere a function pointer
is involved.

For instance, I'm looking at workqueue.rs and wondering why is it safe
against Execute After Free races. I see none of the C functions I
would expect to be used to prevent those races in the code.

Even the simple example:

//! fn print_later(val: Arc<MyStruct>) {
//!     let _ = workqueue::system().enqueue(val);
//! }

Seems to be missing the EAF prevention ie WorkItem::run() is in .text
of THIS_MODULE and I see nothing is preventing THIS_MODULE from being
unloaded.

The expectation of work queues is to follow the above revoke threads
then free pattern. A module should do that sequence in the driver
remove() or module __exit function.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26 17:21                             ` Jason Gunthorpe
@ 2025-02-26 21:31                               ` Danilo Krummrich
  2025-02-26 23:47                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-26 21:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 01:21:20PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 02:16:58AM +0100, Danilo Krummrich wrote:
> > Again, the reason a pci::Bar needs to be revocable in Rust is that we can't have
> > the driver potentially keep the pci::Bar alive (or even access it) after the
> > device is unbound.
> 
> My impression is that nobody has yet come up with a Rust way to
> implement the normal kernel design pattern of revoke threads then free
> objects in safe rust.

I get where you're coming from (and I agree), but that is a different issue.

Let's take a step back and look again why we have Devres (and Revocable) for
e.g. pci::Bar.

The device / driver model requires that device resources are only held by a
driver, as long as the driver is bound to the device.

For instance, in C we achieve this by calling

	pci_iounmap()
	pci_release_region()

from remove().

We rely on this, we trust drivers to actually do this.

We also trust drivers that they don't access the pointer originally returned by
pci_iomap() after remove(). Typically, drivers do this by shutting down all
asynchronous execution paths, e.g. workqueues. Some other drivers might still
run code after remove() and hence needs some synchronization, like DRM.

In Rust pci_iounmap() and pci_release_region() are called when the pci::Bar
object is dropped. But we don't want to trust the driver to actually do this.
Instead, we want to ensure that the driver can *not* do something that is not
allowed by the device / driver model.

Therefore, we never hand out a raw pci::Bar to driver, but a Devres<pci::Bar>.
With this a driver can't prevent the pci::Bar being dropped once the device is
unbound.

So, the main objective here is to ensure that a driver can't keep the pci::Bar
(and hence the memory mapping) alive arbitrarily.

Now, let's get back to concurrent code that might still attempt to use the
pci::Bar. Surely, we need mechanisms to shut down all asynchronous execution
paths (e.g. workqueues) once the device is unbound. But that's not the job of
Devres<pci::Bar>. The job of Devres<pci::Bar> is to be robust against misuse.

Again, that the revocable characteristic comes in handy for drivers that still
run code after remove() intentionally, is a nice coincidence.

> 
> Yes, this is a peculiar lifetime model, but it is pretty important in
> the kernel. I'm not convinced you can just fully ignore it in Rust as
> a design pattern. We use it pretty much everywhere a function pointer
> is involved.
> 
> For instance, I'm looking at workqueue.rs and wondering why is it safe
> against Execute After Free races. I see none of the C functions I
> would expect to be used to prevent those races in the code.
> 
> Even the simple example:
> 
> //! fn print_later(val: Arc<MyStruct>) {
> //!     let _ = workqueue::system().enqueue(val);
> //! }
> 
> Seems to be missing the EAF prevention ie WorkItem::run() is in .text
> of THIS_MODULE and I see nothing is preventing THIS_MODULE from being
> unloaded.
> 
> The expectation of work queues is to follow the above revoke threads
> then free pattern. A module should do that sequence in the driver
> remove() or module __exit function.

Fully agree with that.

I guess you're referring to cancel_work_sync() and friends as well as
destroy_workqueue(), etc.

They're indeed missing, this is because the workqueue work originates from the
Rust binder efforts and binder is only used builtin, so there was no need so
far.

But yes, once people start using workqueues for other modules, we surely need to
extend the abstraction accordingly.

Other abstractions do consider this though, e.g. the upcoming hrtimer work. [1]

In terms of IOCTLs it depends on the particular subsystem, but this is (or will
be) also reflected by the corresponding abstraction. Dropping a
MiscDeviceRegistration [2] on module_exit() for instance will ensure that there
are no concurrent IOCTLs, just like the corresponding C code.

[1] https://lore.kernel.org/rust-for-linux/20250224-hrtimer-v3-v6-12-rc2-v9-0-5bd3bf0ce6cc@kernel.org/
[2] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/rust/kernel/miscdevice.rs#n50

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26 21:31                               ` Danilo Krummrich
@ 2025-02-26 23:47                                 ` Jason Gunthorpe
  2025-02-27  0:41                                   ` Boqun Feng
                                                     ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 23:47 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> Let's take a step back and look again why we have Devres (and Revocable) for
> e.g. pci::Bar.
> 
> The device / driver model requires that device resources are only held by a
> driver, as long as the driver is bound to the device.
> 
> For instance, in C we achieve this by calling
> 
> 	pci_iounmap()
> 	pci_release_region()
> 
> from remove().
> 
> We rely on this, we trust drivers to actually do this.

Right, exactly

But it is not just PCI bar. There are a *huge* number of kernel APIs
that have built in to them the same sort of requirement - teardown
MUST run with remove, and once done the resource cannot be used by
another thread.

Basically most things involving function pointers has this sort of
lifecycle requirement because it is a common process that prevents a
EAF of module unload.

This is all incredibly subtle and driver writers never seem to
understand it properly.. See below for my thoughts on hrtimer bindings
having the same EAF issue.

My fear, that is intensifying as we go through this discussion, is
that rust binding authors have not fully comprehended what the kernel
life cycle model and common design pattern actually is, and have not
fully thought through issues like module unload creating a lifetime
cycle for *function pointers*.

This stuff is really hard. C programers rarely understand it. Existing
drivers tend to frequenly have these bug classes. Without an obvious
easy to use Rust framework to, effectively, revoke function pointers
and synchronously destroy objects during remove, I think this will be
a recurring problem going forward.

I assume that Rust philsophy should be quite concerned if it does not
protect against function pointers becoming asynchronously invalid due
to module unload races. That sounds like a safety problem to me??

> We also trust drivers that they don't access the pointer originally
> returned by pci_iomap() after remove().

Yes, I get it, you are trying to use a reference tracking type design
pattern when the C API is giving you a fencing design pattern, they
are not compatible and it is hard to interwork them.

> Now, let's get back to concurrent code that might still attempt to use the
> pci::Bar. Surely, we need mechanisms to shut down all asynchronous execution
> paths (e.g. workqueues) once the device is unbound. But that's not the job of
> Devres<pci::Bar>. The job of Devres<pci::Bar> is to be robust against misuse.

The thing is once you have a mechanism to shutdown all the stuff you
don't need the overhead of this revocable checking on the normal
paths. What you need is a way to bring your pci::Bar into a safety
contract that remove will shootdown concurrency and that directly
denies references to pci::Bar, and the same contract will guarentee it
frees pci::Bar memory.

A more fancy version of devm, if you will.

> I guess you're referring to cancel_work_sync() and friends as well as
> destroy_workqueue(), etc.

Yes, and flush, and you often need to design special locking to avoid
work-self-requeing. It is tricky stuff, again I've seen lots and lots
of bugs in these remove paths here.

Hopefully rust can describe this adequately without limiting work
queue functionality :\

> But yes, once people start using workqueues for other modules, we
> surely need to extend the abstraction accordingly.

You say that like it will be easy, but it is exactly the same type of
lifetime issue as pci_iomap, and that seems to be quite a challenge
here???

> Other abstractions do consider this though, e.g. the upcoming hrtimer work. [1]

Does it??? hrtimer uses function pointers. Any time you take a
function pointer you have to reason about how does the .text lifetime
work relative to the usage of the function pointer.

So how does [1] guarentee that the hrtimer C code will not call the
function pointer after driver remove() completes?

My rust is aweful, but it looks to me like the timer lifetime is
linked to the HrTimerHandle lifetime, but nothing seems to hard link
that to the driver bound, or module lifetime?

This is what I'm talking about, the design pattern you are trying to
fix with revocable is *everywhere* in the C APIs, it is very subtle,
but must be considered. One solution would be to force hrtimer into
a revocable too.

And on and on for basically every kernel API that uses function
pointers.

This does not seem reasonable to me at all, it certainly isn't better
than the standard pattern.

> be) also reflected by the corresponding abstraction. Dropping a
> MiscDeviceRegistration [2] on module_exit() for instance will ensure that there
> are no concurrent IOCTLs, just like the corresponding C code.

The way misc device works you can't unload the module until all the
FDs are closed and the misc code directly handles races with opening
new FDs while modules are unloading. It is quite a different scheme
than discussed in this thread.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26 23:47                                 ` Jason Gunthorpe
@ 2025-02-27  0:41                                   ` Boqun Feng
  2025-02-27 14:46                                     ` Jason Gunthorpe
  2025-02-27  1:02                                   ` Greg KH
  2025-02-27 11:32                                   ` Danilo Krummrich
  2 siblings, 1 reply; 104+ messages in thread
From: Boqun Feng @ 2025-02-27  0:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
[...]
> 
> > Other abstractions do consider this though, e.g. the upcoming hrtimer work. [1]
> 
> Does it??? hrtimer uses function pointers. Any time you take a
> function pointer you have to reason about how does the .text lifetime
> work relative to the usage of the function pointer.
> 
> So how does [1] guarentee that the hrtimer C code will not call the
> function pointer after driver remove() completes?
> 
> My rust is aweful, but it looks to me like the timer lifetime is
> linked to the HrTimerHandle lifetime, but nothing seems to hard link
> that to the driver bound, or module lifetime?
> 

So to write a module, normally you need to have a module struct, e.g.

	struct MyModule { ... }

and if a hrtimer is used by MyModule, you can put an HrTimerHandle in
it:

	struct MyModule {
	    ...
	    handle: Option<HrTimerHandle>
	}

, when module unloaded, every field of MyModule will call their drop()
function, and HrTimerHandle's drop() will cancel the hrtimer, so that
the function pointer won't be referenced by hrtimer core.

And if you don't store the HrTimerHandle anywhere, like you drop() it
right after start a hrtimer, it will immediately stop the timer. Does
this make sense?

Regards,
Boqun

> This is what I'm talking about, the design pattern you are trying to
> fix with revocable is *everywhere* in the C APIs, it is very subtle,
> but must be considered. One solution would be to force hrtimer into
> a revocable too.
> 
> And on and on for basically every kernel API that uses function
> pointers.
> 
> This does not seem reasonable to me at all, it certainly isn't better
> than the standard pattern.
> 
> > be) also reflected by the corresponding abstraction. Dropping a
> > MiscDeviceRegistration [2] on module_exit() for instance will ensure that there
> > are no concurrent IOCTLs, just like the corresponding C code.
> 
> The way misc device works you can't unload the module until all the
> FDs are closed and the misc code directly handles races with opening
> new FDs while modules are unloading. It is quite a different scheme
> than discussed in this thread.
> 
> Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26 23:47                                 ` Jason Gunthorpe
  2025-02-27  0:41                                   ` Boqun Feng
@ 2025-02-27  1:02                                   ` Greg KH
  2025-02-27  1:34                                     ` John Hubbard
  2025-02-27 14:23                                     ` Jason Gunthorpe
  2025-02-27 11:32                                   ` Danilo Krummrich
  2 siblings, 2 replies; 104+ messages in thread
From: Greg KH @ 2025-02-27  1:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> The way misc device works you can't unload the module until all the
> FDs are closed and the misc code directly handles races with opening
> new FDs while modules are unloading. It is quite a different scheme
> than discussed in this thread.

And I would argue that is it the _right_ scheme to be following overall
here.  Removing modules with in-flight devices/drivers is to me is odd,
and only good for developers doing work, not for real systems, right?

Yes, networking did add that functionality to allow modules to be
unloaded with network connections open, and I'm guessing RDMA followed
that, but really, why?

What is the requirement that means that you have to do this for function
pointers?  I can understand the disconnect issue between devices and
drivers and open file handles (or sockets), as that is a normal thing,
but not removing code from the system, that is not normal.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27  1:02                                   ` Greg KH
@ 2025-02-27  1:34                                     ` John Hubbard
  2025-02-27 21:42                                       ` Dave Airlie
  2025-02-28 10:52                                       ` Simona Vetter
  2025-02-27 14:23                                     ` Jason Gunthorpe
  1 sibling, 2 replies; 104+ messages in thread
From: John Hubbard @ 2025-02-27  1:34 UTC (permalink / raw)
  To: Greg KH, Jason Gunthorpe
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Wed Feb 26, 2025 at 5:02 PM PST, Greg KH wrote:
> On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
>> The way misc device works you can't unload the module until all the
>> FDs are closed and the misc code directly handles races with opening
>> new FDs while modules are unloading. It is quite a different scheme
>> than discussed in this thread.
>
> And I would argue that is it the _right_ scheme to be following overall
> here.  Removing modules with in-flight devices/drivers is to me is odd,
> and only good for developers doing work, not for real systems, right?

Right...I think. I'm not experienced with misc, but I do know that the
"run driver code after driver release" is very, very concerning.

I'm quite new to drivers/gpu/drm, so this is the first time I've learned
about this DRM behavior...

>
> Yes, networking did add that functionality to allow modules to be
> unloaded with network connections open, and I'm guessing RDMA followed
> that, but really, why?
>
> What is the requirement that means that you have to do this for function
> pointers?  I can understand the disconnect issue between devices and
> drivers and open file handles (or sockets), as that is a normal thing,
> but not removing code from the system, that is not normal.
>

I really hope that this "run after release" is something that Rust for
Linux drivers, and in particular, the gpu/nova*, gpu/drm/nova* drivers,
can *leave behind*.

DRM may have had ${reasons} for this approach, but this nova effort is
rebuilding from the ground up. So we should avoid just blindly following
this aspect of the original DRM design.

thanks,
--
John Hubbard

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-26 23:47                                 ` Jason Gunthorpe
  2025-02-27  0:41                                   ` Boqun Feng
  2025-02-27  1:02                                   ` Greg KH
@ 2025-02-27 11:32                                   ` Danilo Krummrich
  2025-02-27 15:07                                     ` Jason Gunthorpe
  2 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-27 11:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > Let's take a step back and look again why we have Devres (and Revocable) for
> > e.g. pci::Bar.
> > 
> > The device / driver model requires that device resources are only held by a
> > driver, as long as the driver is bound to the device.
> > 
> > For instance, in C we achieve this by calling
> > 
> > 	pci_iounmap()
> > 	pci_release_region()
> > 
> > from remove().
> > 
> > We rely on this, we trust drivers to actually do this.
> 
> Right, exactly
> 
> But it is not just PCI bar. There are a *huge* number of kernel APIs
> that have built in to them the same sort of requirement - teardown
> MUST run with remove, and once done the resource cannot be used by
> another thread.
> 
> Basically most things involving function pointers has this sort of
> lifecycle requirement because it is a common process that prevents a
> EAF of module unload.

You're still mixing topics, the whole Devres<pci::Bar> thing as about limiting
object lifetime to the point where the driver is unbound.

Shutting down asynchronous execution of things, i.e. workqueues, timers, IOCTLs
to prevent unexpected access to the module .text section is a whole different
topic.

You seem to imply that if we ensure the latter, we don't need to enforce the
former, but that isn't true.

In other words, assuming that we properly enforce that there are no async
execution paths after remove() or module_exit() (not necessarily the same),
we still need to ensure that a pci::Bar object does not outlive remove().

Device resources are a bit special, since their lifetime must be cap'd at device
unbind, *independent* of the object lifetime they reside in. Hence the Devres
container.

For a memory allocation for instance (e.g. KBox<T>), this is different. If it
outlives remove(), e.g. because it's part of the Module structure, that's fine.
It's only important that it's dropped eventually.

> 
> This is all incredibly subtle and driver writers never seem to
> understand it properly.. See below for my thoughts on hrtimer bindings
> having the same EAF issue.

I don't think it has, see Boqun's reply [1].

> 
> My fear, that is intensifying as we go through this discussion, is
> that rust binding authors have not fully comprehended what the kernel
> life cycle model and common design pattern actually is, and have not
> fully thought through issues like module unload creating a lifetime
> cycle for *function pointers*.

I do *not* see where you take the evidence from to make such a generic
statement.

Especially because there aren't a lot of abstractions upstream yet that fall
under this category.

If you think that a particular abstraction has a design issue, you're very
welcome to chime in on the particular mail thread and help improve things.

But implying that no one considers this is not representing reality.

> 
> This stuff is really hard. C programers rarely understand it. Existing
> drivers tend to frequenly have these bug classes. Without an obvious
> easy to use Rust framework to, effectively, revoke function pointers
> and synchronously destroy objects during remove, I think this will be
> a recurring problem going forward.
> 
> I assume that Rust philsophy should be quite concerned if it does not
> protect against function pointers becoming asynchronously invalid due
> to module unload races. That sounds like a safety problem to me??

Yes, it would be a safety problem. That's why HrTimer for instance implements
the corresponding synchronization when dropped.

> 
> > We also trust drivers that they don't access the pointer originally
> > returned by pci_iomap() after remove().
> 
> Yes, I get it, you are trying to use a reference tracking type design
> pattern when the C API is giving you a fencing design pattern, they
> are not compatible and it is hard to interwork them.
> 
> > Now, let's get back to concurrent code that might still attempt to use the
> > pci::Bar. Surely, we need mechanisms to shut down all asynchronous execution
> > paths (e.g. workqueues) once the device is unbound. But that's not the job of
> > Devres<pci::Bar>. The job of Devres<pci::Bar> is to be robust against misuse.
> 
> The thing is once you have a mechanism to shutdown all the stuff you
> don't need the overhead of this revocable checking on the normal
> paths. What you need is a way to bring your pci::Bar into a safety
> contract that remove will shootdown concurrency and that directly
> denies references to pci::Bar, and the same contract will guarentee it
> frees pci::Bar memory.

This contract needs to be technically enforced, not by convention as we do in C.

And with embedding the pci::Bar in Devres, such that it is automatically revoked
when the device is unbound, we do exactly that.

(Again, we don't do it primarily to protect against some concurrent access, we do
it to forcefully ensure that a pci::Bar object can not outlive the device /
driver binding.)

If we don't do that, how else do we do it? How do we prevent a driver from
keeping the pci::Bar (and hence the memory mapping and memory region reservation
alive after the device has been unbound?

> > But yes, once people start using workqueues for other modules, we
> > surely need to extend the abstraction accordingly.
> 
> You say that like it will be easy, but it is exactly the same type of
> lifetime issue as pci_iomap, and that seems to be quite a challenge
> here???

No, the workqueue implementation can so something similar to what Boqun
explained for HrTimer [1].

A workqueue lifetime also does not need a hard cap at device unbind, like device
resources. If it is bound to the module lifetime for instance, that is fine too.

Data that is accessed from a work item can't be freed under the workqueue by
design in Rust.

[1] https://lore.kernel.org/rust-for-linux/Z7-0pOmWO6r_KeQI@boqun-archlinux/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27  1:02                                   ` Greg KH
  2025-02-27  1:34                                     ` John Hubbard
@ 2025-02-27 14:23                                     ` Jason Gunthorpe
  1 sibling, 0 replies; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 14:23 UTC (permalink / raw)
  To: Greg KH
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 05:02:23PM -0800, Greg KH wrote:
> On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> > The way misc device works you can't unload the module until all the
> > FDs are closed and the misc code directly handles races with opening
> > new FDs while modules are unloading. It is quite a different scheme
> > than discussed in this thread.
> 
> And I would argue that is it the _right_ scheme to be following overall
> here.  Removing modules with in-flight devices/drivers is to me is odd,
> and only good for developers doing work, not for real systems, right?

There are two issues and I've found these discussions get confused
about two interrelated things:

1) Module lifetime and when modules are refcounted
2) How does device_driver remove() work, especially with hot unplug
   and /sys/../unbind while the module is *still loaded*.

Noting, very explicitly, that you can unbind a device_driver without
unloading the module.

#1 should be strictly based around the needs of function pointers in
the system. Ie stuff like ".owner = THIS_MODULE".

#2 is challenging when the driver has a file descriptor.

AFAIK there are only two broad choices:
 a) wait for all FDs to close in remove() (boo!)
 b) leave the FDs open but disable them and complete remove(). eg
    return -ENODEV to all system calls

I think the kernel community has a strong preference for (b), but rdma
had started with (a) long ago. So we fixed it to (b), netdev does (b),
so do alot of places because (a) is, frankly, awful.

Now.. how does that relate to module unbinding? The drivers are
unbound now because we properly support hotunplug via (b). So when is
it OK to allow a module with no bound drivers to remove while a zombie
FD is still open?

That largely revolves around who owns the struct file_operations. For
misc_dev the driver module would own it, so it is impossible to unload
the driver module even if the device driver was hot unplugged/unbound.

For a subsystem, like rdma, the subsystem can own the
file_operations. Now to allow the driver module to be unloaded we
"simply" require the subsystem to fence all driver callbacks during
device driver remove and subsystem unregister. ie if the subsystem
knows it no longer can call the driver then it no longer needs a
refcount on the driver module.

This fence was necessary anyhow for RDMA because drivers had the
pre-existing assumption that unregister was fencing all driver
callbacks by waiting for the FDs to close. Drivers did not handle UAF
races with something like pci_iounmp() and their concurrent driver
callback threads.

Once the fence was built it was straightforward to also allow driver
module unload since the core code has NULL'd its copy of all the
driver function pointers during unregister.

Further, I'd argue this is the best model for subsystems to
follow. Allowing driver code to continue to run after subsystem
unregister forces the driver to deal with UAF removal races. This is
too hard for drivers to implement correctly, and prevents unloading
the driver module after the drivers have been unbound.

Why do people care? Aside from obvious hot-unplug cases, like physical
PCI hot plug on high-avaibility servers hated (a), there was a strong
desire from folks running software HA schemes to be able to upgrade
the driver module with minimal hits. They want to leave the
application running and it is able to fast-recover when the FD becomes
-ENODEV by opening a new one and keeping most of their internal state
alive.

> What is the requirement that means that you have to do this for function
> pointers? 

I'm just pointing out that function pointers are not guaranteed to be
valid forever in the linux model. Every function pointer is somehow
being protected by a lifecycle that links back to the module
lifecycle.

Most of the time a driver author can ignore function pointer lifecycle
analysis, but not always..

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27  0:41                                   ` Boqun Feng
@ 2025-02-27 14:46                                     ` Jason Gunthorpe
  2025-02-27 15:18                                       ` Boqun Feng
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 14:46 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:
> And if you don't store the HrTimerHandle anywhere, like you drop() it
> right after start a hrtimer, it will immediately stop the timer. Does
> this make sense?

Oh, I understand that, but it is not sufficient in the kernel.

You are making an implicit argument that something external to the
rust universe will hold the module alive until all rust destructors
are run. That is trivialy obvious in your example above.

However, make it more complex. Run the destructor call for your
hrtimer in a workqueue thread. Use workqueue.rs. Now you don't have
this implicit argument anymore, and it will EAF things.

Danilo argues this is a bug in workqueue.rs.

Regardless, it seems like EAF is an overlooked topic in the safety
analysis.

Further, you and Danilo are making opposing correctness arguments:

 1) all rust destructors run before module __exit completes
 2) rust destructors can run after driver removal completes

I understand the technical underpinnings why these are different, but
I feel that if you can make #1 reliably true for __exit then it is
highly desirable to use the same techniques to make it true for
remove() too.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 11:32                                   ` Danilo Krummrich
@ 2025-02-27 15:07                                     ` Jason Gunthorpe
  2025-02-27 16:51                                       ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 15:07 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 12:32:45PM +0100, Danilo Krummrich wrote:
> On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > > Let's take a step back and look again why we have Devres (and Revocable) for
> > > e.g. pci::Bar.
> > > 
> > > The device / driver model requires that device resources are only held by a
> > > driver, as long as the driver is bound to the device.
> > > 
> > > For instance, in C we achieve this by calling
> > > 
> > > 	pci_iounmap()
> > > 	pci_release_region()
> > > 
> > > from remove().
> > > 
> > > We rely on this, we trust drivers to actually do this.
> > 
> > Right, exactly
> > 
> > But it is not just PCI bar. There are a *huge* number of kernel APIs
> > that have built in to them the same sort of requirement - teardown
> > MUST run with remove, and once done the resource cannot be used by
> > another thread.
> > 
> > Basically most things involving function pointers has this sort of
> > lifecycle requirement because it is a common process that prevents a
> > EAF of module unload.
> 
> You're still mixing topics, the whole Devres<pci::Bar> thing as about limiting
> object lifetime to the point where the driver is unbound.
> 
> Shutting down asynchronous execution of things, i.e. workqueues, timers, IOCTLs
> to prevent unexpected access to the module .text section is a whole different
> topic.

Again, the standard kernel design pattern is to put these things
together so that shutdown isolates concurrency which permits free
without UAF.

> In other words, assuming that we properly enforce that there are no async
> execution paths after remove() or module_exit() (not necessarily the same),
> we still need to ensure that a pci::Bar object does not outlive remove().

Yes, you just have to somehow use rust to ensure a call pci_iounmap()
happens during remove, after the isolation.

You are already doing it with devm.  It seems to me the only problem
you have is nobody has invented a way in rust to contract that the devm
won't run until the threads are isolated.

I don't see this as insolvable, you can have some input argument to
any API that creates concurrency that also pushes an ordered
destructor to the struct device lifecycle that ensures it cancels that
concurrency.

> Device resources are a bit special, since their lifetime must be cap'd at device
> unbind, *independent* of the object lifetime they reside in. Hence the Devres
> container.

I'd argue many resources should be limited to device unbind. Memory is
perhaps the only exception.

> > My fear, that is intensifying as we go through this discussion, is
> > that rust binding authors have not fully comprehended what the kernel
> > life cycle model and common design pattern actually is, and have not
> > fully thought through issues like module unload creating a lifetime
> > cycle for *function pointers*.
> 
> I do *not* see where you take the evidence from to make such a generic
> statement.

Well, I take the basic insistance that is OK to leak stuff from driver
scope to module scope is not well designed.

> Especially because there aren't a lot of abstractions upstream yet that fall
> under this category.

And I am thinking forward to other APIs you will need and how they
will interact and not feeling good about this direction.

> > The thing is once you have a mechanism to shutdown all the stuff you
> > don't need the overhead of this revocable checking on the normal
> > paths. What you need is a way to bring your pci::Bar into a safety
> > contract that remove will shootdown concurrency and that directly
> > denies references to pci::Bar, and the same contract will guarentee it
> > frees pci::Bar memory.
> 
> This contract needs to be technically enforced, not by convention as
> we do in C.

People do amazing things with contracts in rust, why is this case so
hard?

> Data that is accessed from a work item can't be freed under the
> workqueue by design in Rust.

What? That's madness, alot of work functions are freeing
something. They are often the terminal point of an object's lifecycle
because you often have to allocate memory to launch the work in the
first place.

Certainly if you restrict workqueues to be very limited then alot of
their challenging problems disappear :\

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 14:46                                     ` Jason Gunthorpe
@ 2025-02-27 15:18                                       ` Boqun Feng
  2025-02-27 16:17                                         ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Boqun Feng @ 2025-02-27 15:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:
> > And if you don't store the HrTimerHandle anywhere, like you drop() it
> > right after start a hrtimer, it will immediately stop the timer. Does
> > this make sense?
> 
> Oh, I understand that, but it is not sufficient in the kernel.
> 
> You are making an implicit argument that something external to the
> rust universe will hold the module alive until all rust destructors
> are run. That is trivialy obvious in your example above.
> 

The question in your previous email is about function pointer of hrtimer
EAF because of module unload, are you moving to a broader topic here?
If no, the for module unload, the argument is not implicit because in
rust/macro/module.rs the module __exit() function is generated by Rust,
and in that function, `assume_init_drop()` will call these destructors.

> However, make it more complex. Run the destructor call for your
> hrtimer in a workqueue thread. Use workqueue.rs. Now you don't have
> this implicit argument anymore, and it will EAF things.
> 

Note that HrTimerHandle holds a "reference" (could be a normal
reference, or an refcounted reference, like Arc) to the hrtimer (and the
struct contains it), therefore as long as HrTimerHandle exists, the
destructor call of the hrtimer won't be call. Hence the argument is not
implicit, it literally is:

* If a HrTimerHandle exists, it means the timer has been started, and
  since the timer has been started, the existence of HrTimerHandle will
  prevent the destructors of the hrtimer.

* drop() on HrTimerHandle will 1) stop the timer and 2) release the
  reference to the hrtimer, so then the destructors could be called.

> Danilo argues this is a bug in workqueue.rs.
> 
> Regardless, it seems like EAF is an overlooked topic in the safety
> analysis.
> 

Well, no. See above.

> Further, you and Danilo are making opposing correctness arguments:
> 
>  1) all rust destructors run before module __exit completes

What do you mean by "all rust destructor"? In my previous email, I was
talking about the particular destructors of fields in module struct,
right?

>  2) rust destructors can run after driver removal completes
> 

I will defer this to Danilo, because I'm not sure that's what he was
talking about.

Regards,
Boqun

> I understand the technical underpinnings why these are different, but
> I feel that if you can make #1 reliably true for __exit then it is
> highly desirable to use the same techniques to make it true for
> remove() too.
> 
> Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 15:18                                       ` Boqun Feng
@ 2025-02-27 16:17                                         ` Jason Gunthorpe
  2025-02-27 16:55                                           ` Boqun Feng
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 16:17 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng wrote:
> On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:
> > > And if you don't store the HrTimerHandle anywhere, like you drop() it
> > > right after start a hrtimer, it will immediately stop the timer. Does
> > > this make sense?
> > 
> > Oh, I understand that, but it is not sufficient in the kernel.
> > 
> > You are making an implicit argument that something external to the
> > rust universe will hold the module alive until all rust destructors
> > are run. That is trivialy obvious in your example above.
> > 
> 
> The question in your previous email is about function pointer of hrtimer
> EAF because of module unload, are you moving to a broader topic
> here?

No

> If no, the for module unload, the argument is not implicit because in
> rust/macro/module.rs the module __exit() function is generated by Rust,
> and in that function, `assume_init_drop()` will call these
> destructors.

That is not what I mean. You can be running code in multiple threads
from multiple functions in the module those are all being protected
implicitly by external C code functions. Rust itself is not managing
module life time.

Then you are making the argument that everything created by a rust
module somehow traces its reference back to the module itself,
regardless of what thread, callback or memory was used to create it.

So all bindings for everything are expected to clean themselves up,
recursively.

That does make sense, but then it still raises questions that things
like workqueue don't seem to have the cleanup.

I still wonder why you couldn't also have these reliable reference
counts rooted on the device driver instead of only on the module.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 15:07                                     ` Jason Gunthorpe
@ 2025-02-27 16:51                                       ` Danilo Krummrich
  0 siblings, 0 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-27 16:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, Boqun Feng, John Hubbard, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 11:07:09AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 12:32:45PM +0100, Danilo Krummrich wrote:
> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > > > Let's take a step back and look again why we have Devres (and Revocable) for
> > > > e.g. pci::Bar.
> > > > 
> > > > The device / driver model requires that device resources are only held by a
> > > > driver, as long as the driver is bound to the device.
> > > > 
> > > > For instance, in C we achieve this by calling
> > > > 
> > > > 	pci_iounmap()
> > > > 	pci_release_region()
> > > > 
> > > > from remove().
> > > > 
> > > > We rely on this, we trust drivers to actually do this.
> > > 
> > > Right, exactly
> > > 
> > > But it is not just PCI bar. There are a *huge* number of kernel APIs
> > > that have built in to them the same sort of requirement - teardown
> > > MUST run with remove, and once done the resource cannot be used by
> > > another thread.
> > > 
> > > Basically most things involving function pointers has this sort of
> > > lifecycle requirement because it is a common process that prevents a
> > > EAF of module unload.
> > 
> > You're still mixing topics, the whole Devres<pci::Bar> thing as about limiting
> > object lifetime to the point where the driver is unbound.
> > 
> > Shutting down asynchronous execution of things, i.e. workqueues, timers, IOCTLs
> > to prevent unexpected access to the module .text section is a whole different
> > topic.
> 
> Again, the standard kernel design pattern is to put these things
> together so that shutdown isolates concurrency which permits free
> without UAF.
> 
> > In other words, assuming that we properly enforce that there are no async
> > execution paths after remove() or module_exit() (not necessarily the same),
> > we still need to ensure that a pci::Bar object does not outlive remove().
> 
> Yes, you just have to somehow use rust to ensure a call pci_iounmap()
> happens during remove, after the isolation.
> 
> You are already doing it with devm.  It seems to me the only problem
> you have is nobody has invented a way in rust to contract that the devm
> won't run until the threads are isolated.

You can do that, pci::Driver::probe() returns a Pin<KBox<Self>>. This object is
dropped when the device is unbound and it runs before the devres callbacks.

Using miscdevice as example, your MiscDeviceRegistration would be a member of
this object and hence dropped on remove() before the devres callbacks revoke
device resources.

> 
> I don't see this as insolvable, you can have some input argument to
> any API that creates concurrency that also pushes an ordered
> destructor to the struct device lifecycle that ensures it cancels that
> concurrency.
> 
> > Device resources are a bit special, since their lifetime must be cap'd at device
> > unbind, *independent* of the object lifetime they reside in. Hence the Devres
> > container.
> 
> I'd argue many resources should be limited to device unbind. Memory is
> perhaps the only exception.

There is a difference between should and must. A driver is fully free to bind
the lifetime of a miscdevice to either to the driver lifetime (object returned
by probe) or the module lifetime, both can be valid. That's a question of
semantics.

A device resource though is only allowed to be held by a driver as long as the
corresponding device is bound to the driver. Hence an API that does not ensure
that the pci::Bar is actually, forcefully dropped on device unbind is unsound.

So, let me ask you again, how do you ensure that a pci::Bar is dropped on device
unbind if we hand it out without the Devres container?

> 
> > > My fear, that is intensifying as we go through this discussion, is
> > > that rust binding authors have not fully comprehended what the kernel
> > > life cycle model and common design pattern actually is, and have not
> > > fully thought through issues like module unload creating a lifetime
> > > cycle for *function pointers*.
> > 
> > I do *not* see where you take the evidence from to make such a generic
> > statement.
> 
> Well, I take the basic insistance that is OK to leak stuff from driver
> scope to module scope is not well designed.
> 

Who insists on leaking stuff from driver scope to module scope is OK?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 16:17                                         ` Jason Gunthorpe
@ 2025-02-27 16:55                                           ` Boqun Feng
  2025-02-27 17:32                                             ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Boqun Feng @ 2025-02-27 16:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng wrote:
> > On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:
> > > > And if you don't store the HrTimerHandle anywhere, like you drop() it
> > > > right after start a hrtimer, it will immediately stop the timer. Does
> > > > this make sense?
> > > 
> > > Oh, I understand that, but it is not sufficient in the kernel.
> > > 
> > > You are making an implicit argument that something external to the
> > > rust universe will hold the module alive until all rust destructors
> > > are run. That is trivialy obvious in your example above.
> > > 
> > 
> > The question in your previous email is about function pointer of hrtimer
> > EAF because of module unload, are you moving to a broader topic
> > here?
> 
> No
> 
> > If no, the for module unload, the argument is not implicit because in
> > rust/macro/module.rs the module __exit() function is generated by Rust,
> > and in that function, `assume_init_drop()` will call these
> > destructors.
> 
> That is not what I mean. You can be running code in multiple threads
> from multiple functions in the module those are all being protected
> implicitly by external C code functions. Rust itself is not managing
> module life time.
> 
> Then you are making the argument that everything created by a rust
> module somehow traces its reference back to the module itself,
> regardless of what thread, callback or memory was used to create it.
> 
> So all bindings for everything are expected to clean themselves up,
> recursively.
> 

Right, that would be the most cases in Rust if you want to control the 
cleanup orderings.

> That does make sense, but then it still raises questions that things
> like workqueue don't seem to have the cleanup.
> 

It was because the existing Workqueue was designed for built-in cases,
and we should fix that. Thank you for spotting that.

> I still wonder why you couldn't also have these reliable reference
> counts rooted on the device driver instead of only on the module.
> 

You could put reliable reference counts anywhere you want, as long as it
reflects the resource dependencies.

Regards,
Boqun

> Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 16:55                                           ` Boqun Feng
@ 2025-02-27 17:32                                             ` Danilo Krummrich
  2025-02-27 19:23                                               ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-27 17:32 UTC (permalink / raw)
  To: Boqun Feng, Jason Gunthorpe
  Cc: Joel Fernandes, Alexandre Courbot, Dave Airlie, Gary Guo,
	Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 08:55:09AM -0800, Boqun Feng wrote:
> On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:
> 
> > I still wonder why you couldn't also have these reliable reference
> > counts rooted on the device driver instead of only on the module.
> > 
> 
> You could put reliable reference counts anywhere you want, as long as it
> reflects the resource dependencies.

Right, as I explained in a different reply, the signature for PCI driver probe()
looks like this:

	fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>>

The returned Pin<KBox<Self>> has the lifetime of the driver being bound to the
device.

Which means a driver can bind things to this lifetime. But, it isn't forced to,
it can also put things into an Arc and share it with the rest of the world.

If something is crucial to be bound to the lifetime of a driver being bound to a
device (i.e. device resources), you have to expose it as Devres<T>.

Subsequently, you can put the Devres<T> in an Arc and do whatever you want, the
result will still be that T is dropped once the device is unbound.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 17:32                                             ` Danilo Krummrich
@ 2025-02-27 19:23                                               ` Jason Gunthorpe
  2025-02-27 21:25                                                 ` Boqun Feng
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 19:23 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Boqun Feng, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 06:32:15PM +0100, Danilo Krummrich wrote:
> On Thu, Feb 27, 2025 at 08:55:09AM -0800, Boqun Feng wrote:
> > On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:
> > 
> > > I still wonder why you couldn't also have these reliable reference
> > > counts rooted on the device driver instead of only on the module.
> > > 
> > 
> > You could put reliable reference counts anywhere you want, as long as it
> > reflects the resource dependencies.
> 
> Right, as I explained in a different reply, the signature for PCI driver probe()
> looks like this:
> 
> 	fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>>
> 
> The returned Pin<KBox<Self>> has the lifetime of the driver being bound to the
> device.
> 
> Which means a driver can bind things to this lifetime. But, it isn't forced to,
> it can also put things into an Arc and share it with the rest of the world.

This statement right here seems to be the fundamental problem.

The design pattern says that 'share it with the rest of the world' is
a bug. A driver following the pattern cannot do that, it must contain
the driver objects within the driver scope and free them. In C we
inspect for this manually, and check for it with kmemleak
progamatically.

It appears to me that the main issue here is that nobody has figured
out how to make rust have rules that can enforce that design pattern.

Have the compiler prevent the driver author from incorrectly extending
the lifetime of a driver-object beyond the driver's inherent scope, ie
that Self object above.

Instead we get this:

> If something is crucial to be bound to the lifetime of a driver being bound to a
> device (i.e. device resources), you have to expose it as Devres<T>.

Which creates a costly way to work around this missing design pattern
by adding runtime checks to every single access of T in all the
operational threads. Failable rcu_lock across every batch of register
access.

The reason the kernel has these design patterns of shutdown then
destroy is to avoid that runtime overhead! We optimize by swapping
fine grained locks for coarse locks that probably already exist. It is
a valid pattern, works well and has alot of APIs designed to support
it.

This giant thread started because people were objecting to the cost
and usability of the runtime checks on the operational paths.

So, I think you can say it can't be done, that the alternative is only
a little worse. Sad, but OK, but let's please acknowledge the
limitation.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 19:23                                               ` Jason Gunthorpe
@ 2025-02-27 21:25                                                 ` Boqun Feng
  2025-02-27 22:00                                                   ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Boqun Feng @ 2025-02-27 21:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 03:23:21PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 06:32:15PM +0100, Danilo Krummrich wrote:
> > On Thu, Feb 27, 2025 at 08:55:09AM -0800, Boqun Feng wrote:
> > > On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:
> > > 
> > > > I still wonder why you couldn't also have these reliable reference
> > > > counts rooted on the device driver instead of only on the module.
> > > > 
> > > 
> > > You could put reliable reference counts anywhere you want, as long as it
> > > reflects the resource dependencies.
> > 
> > Right, as I explained in a different reply, the signature for PCI driver probe()
> > looks like this:
> > 
> > 	fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>>
> > 
> > The returned Pin<KBox<Self>> has the lifetime of the driver being bound to the
> > device.
> > 
> > Which means a driver can bind things to this lifetime. But, it isn't forced to,
> > it can also put things into an Arc and share it with the rest of the world.
> 
> This statement right here seems to be the fundamental problem.
> 
> The design pattern says that 'share it with the rest of the world' is
> a bug. A driver following the pattern cannot do that, it must contain
> the driver objects within the driver scope and free them. In C we

I cannot speak for Danilo, but IIUC, the 'share it with the rest of the
world' things are the ones that drivers can share, for example, I
suppose (not a network expert) a NIC driver can share the packet object
with the upper layer of netowrk.

> inspect for this manually, and check for it with kmemleak

In Rust, it's better (of course, depending on your PoV ;-)), because
your driver or module data structures need to track the things they use
(otherwise they will be cancelled and maybe freed, e.g. the hrtimer
case). So you have that part covered by compiler. But could there be
corner cases? Probably. We will just resolve that case by case.

> progamatically.
> 
> It appears to me that the main issue here is that nobody has figured
> out how to make rust have rules that can enforce that design pattern.
> 

Most of the cases, it should be naturally achieved, because you already
bind the objects into your module or driver, otherwise they would be
already cancelled and freed. Handwavingly, it provides a
"data/type-oriented" resource management instead of "oh I have to
remember to call this function before module unload". Again, I believe
there are and will be corner cases, but happy to look into them.

> Have the compiler prevent the driver author from incorrectly extending
> the lifetime of a driver-object beyond the driver's inherent scope, ie
> that Self object above.
> 

Compilers can help in the cases where they know which objects are belong
to a driver/module.

So I think in Rust you can have the "design pattern", the difference is
instead of putting cancel/free functions carefully in some remove()
function, you will need to (still!) carefully arrange the fields in your
driver/module data structure, and you can have more fine grained control
by writting the drop() function for the driver/module data structure.

> Instead we get this:
> 
> > If something is crucial to be bound to the lifetime of a driver being bound to a
> > device (i.e. device resources), you have to expose it as Devres<T>.
> 

I feel I'm still missing some contexts why Devres<T> is related to the
"design pattern", so I will just skip this part for now... Hope we are
on the same page of the "design pattern" in Rust?

Regards,
Boqun

> Which creates a costly way to work around this missing design pattern
> by adding runtime checks to every single access of T in all the
> operational threads. Failable rcu_lock across every batch of register
> access.
> 
[...]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-25 14:11         ` Alexandre Courbot
  2025-02-25 15:06           ` Danilo Krummrich
@ 2025-02-27 21:37           ` Dave Airlie
  2025-02-28  1:49             ` Timur Tabi
  1 sibling, 1 reply; 104+ messages in thread
From: Dave Airlie @ 2025-02-27 21:37 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, Gary Guo, Joel Fernandes, Boqun Feng,
	John Hubbard, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, Nouveau

On Wed, 26 Feb 2025 at 00:11, Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> > CC: Gary
> >
> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> This inability to sleep while we are accessing registers seems very
> >> constraining to me, if not dangerous. It is pretty common to have
> >> functions intermingle hardware accesses with other operations that might
> >> sleep, and this constraint means that in such cases the caller would
> >> need to perform guard lifetime management manually:
> >>
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >>
> >>   /* do something that might sleep */
> >>
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >>
> >>   ...
> >>
> >> Failure to drop the guard potentially introduces a race condition, which
> >> will receive no compile-time warning and potentialy not even a runtime
> >> one unless lockdep is enabled. This problem does not exist with the
> >> equivalent C code AFAICT, which makes the Rust version actually more
> >> error-prone and dangerous, the opposite of what we are trying to achieve
> >> with Rust. Or am I missing something?
> >
> > Generally you are right, but you have to see it from a different perspective.
> >
> > What you describe is not an issue that comes from the design of the API, but is
> > a limitation of Rust in the kernel. People are aware of the issue and with klint
> > [1] there are solutions for that in the pipeline, see also [2] and [3].
> >
> > [1] https://rust-for-linux.com/klint
> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
>
> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
> its own admission. But even if the ignore the safety issue, the other one
> (ergonomics) is still there.
>
> Basically this way of accessing registers imposes quite a mental burden on its
> users. It requires a very different (and harsher) discipline than when writing
> the same code in C, and the correct granularity to use is unclear to me.
>
> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
> particular register in a busy loop, should I call try_access() once before the
> loop? Or every time before accessing the register? I'm afraid having to check
> that the resource is still alive before accessing any register is going to
> become tedious very quickly.
>
> I understand that we want to protect against accessing the IO region of an
> unplugged device ; but still there is no guarantee that the device won't be
> unplugged in the middle of a critical section, however short. Thus the driver
> code should be able to recognize that the device has fallen off the bus when it
> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> extra protection that AFAICT isn't used in C?

Yes.

I've tried to retrofit checking 0xffffffff to drivers a lot, I'd
prefer not to. Drivers getting stuck in wait for clear bits for ever.

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27  1:34                                     ` John Hubbard
@ 2025-02-27 21:42                                       ` Dave Airlie
  2025-02-27 23:06                                         ` John Hubbard
  2025-02-28 10:52                                       ` Simona Vetter
  1 sibling, 1 reply; 104+ messages in thread
From: Dave Airlie @ 2025-02-27 21:42 UTC (permalink / raw)
  To: John Hubbard
  Cc: Greg KH, Jason Gunthorpe, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Gary Guo, Joel Fernandes, Boqun Feng,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	paulmck

On Thu, 27 Feb 2025 at 11:34, John Hubbard <jhubbard@nvidia.com> wrote:
>
> On Wed Feb 26, 2025 at 5:02 PM PST, Greg KH wrote:
> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> >> The way misc device works you can't unload the module until all the
> >> FDs are closed and the misc code directly handles races with opening
> >> new FDs while modules are unloading. It is quite a different scheme
> >> than discussed in this thread.
> >
> > And I would argue that is it the _right_ scheme to be following overall
> > here.  Removing modules with in-flight devices/drivers is to me is odd,
> > and only good for developers doing work, not for real systems, right?
>
> Right...I think. I'm not experienced with misc, but I do know that the
> "run driver code after driver release" is very, very concerning.
>
> I'm quite new to drivers/gpu/drm, so this is the first time I've learned
> about this DRM behavior...

> >
> > Yes, networking did add that functionality to allow modules to be
> > unloaded with network connections open, and I'm guessing RDMA followed
> > that, but really, why?
> >
> > What is the requirement that means that you have to do this for function
> > pointers?  I can understand the disconnect issue between devices and
> > drivers and open file handles (or sockets), as that is a normal thing,
> > but not removing code from the system, that is not normal.
> >
>
> I really hope that this "run after release" is something that Rust for
> Linux drivers, and in particular, the gpu/nova*, gpu/drm/nova* drivers,
> can *leave behind*.
>
> DRM may have had ${reasons} for this approach, but this nova effort is
> rebuilding from the ground up. So we should avoid just blindly following
> this aspect of the original DRM design.
>

nova is just a drm driver, it's not a rewrite of the drm subsystem,
that sort of effort would entail a much larger commitment.

DRM has reasons for doing what drm does, that is a separate discussion
of how a rust driver fits into the DRM. The rust code has to conform
to the C expectations for the subsystems they are fitting into.

The drm has spent years moving things to devm/drmm type constructs,
adding hotplug with the unplug mechanisms, but it's a long journey and
certainly not something nova would want to wait to reconstruct from
scratch.

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 21:25                                                 ` Boqun Feng
@ 2025-02-27 22:00                                                   ` Jason Gunthorpe
  2025-02-27 22:40                                                     ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 22:00 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Danilo Krummrich, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 01:25:10PM -0800, Boqun Feng wrote:
> > The design pattern says that 'share it with the rest of the world' is
> > a bug. A driver following the pattern cannot do that, it must contain
> > the driver objects within the driver scope and free them. In C we
> 
> I cannot speak for Danilo, but IIUC, the 'share it with the rest of the
> world' things are the ones that drivers can share, for example, I
> suppose (not a network expert) a NIC driver can share the packet object
> with the upper layer of netowrk.

I'm having a bit of trouble parsing this sentence..

In your example a NIC driver passing a packet into the network stack
would still be within the world of the driver.

Outside the world would be assigning the packet to a global
variable.

It is a very similar idea to what you explained about the module
lifetime, just instead of module __exit being the terminal point, it
is driver remove().

> > It appears to me that the main issue here is that nobody has figured
> > out how to make rust have rules that can enforce that design pattern.
> 
> Most of the cases, it should be naturally achieved, because you already
> bind the objects into your module or driver, otherwise they would be
> already cancelled and freed. 

I'm getting the feeling you can probably naturally achieve the
required destructors, but I think Danillo is concerned that since it
isn't *mandatory* it isn't safe/sound.

> So I think in Rust you can have the "design pattern", the difference is
> instead of putting cancel/free functions carefully in some remove()
> function, you will need to (still!) carefully arrange the fields in your
> driver/module data structure, and you can have more fine grained control
> by writting the drop() function for the driver/module data structure.

That all makes sense, but again, the challenge seems to be making that
mandatory so it is all safe.

If you can show people a sketch how you think that could work it would
probably help.

> I feel I'm still missing some contexts why Devres<T> is related to the
> "design pattern", so I will just skip this part for now... Hope we are
> on the same page of the "design pattern" in Rust?

There is a requirement that a certain C destructor function be
*guaranteed* to be called during remove().

Ie if I told you that the C functions *required* hrtimer_cancel() be
called in the device_driver remove for correctness, how could you
accomplish this? And avoid an UAF of the hrtimer from other threads?
And do it without adding new locks. And prevent the driver author from
escaping these requirements.

Then you have the design pattern I'm talking about.

Is it clearer?

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 22:00                                                   ` Jason Gunthorpe
@ 2025-02-27 22:40                                                     ` Danilo Krummrich
  2025-02-28 18:55                                                       ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-02-27 22:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boqun Feng, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 06:00:13PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 01:25:10PM -0800, Boqun Feng wrote:
> > 
> > Most of the cases, it should be naturally achieved, because you already
> > bind the objects into your module or driver, otherwise they would be
> > already cancelled and freed. 
> 
> I'm getting the feeling you can probably naturally achieve the
> required destructors, but I think Danillo is concerned that since it
> isn't *mandatory* it isn't safe/sound.

Of course you can "naturally" achieve the required destructors, I even explained
that in [1]. :-)

And yes, for *device resources* it is unsound if we do not ensure that the
device resource is actually dropped at device unbind.

Maybe some example code does help. Look at this example where we assume that
pci::Device::iomap_region() does return a pci::Bar instead of a
Devres<pci::Bar>, which it actually does. (The example won't compile, since,
for readability, it's heavily simplified.)

    fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> {
        let bar: pci::Bar = pdev.iomap_region()?;

        let drvdata = Arc::new(bar, GFP_KERNEL)?;

        let drm = drm::device::Device::new(pdev.as_ref(), drvdata)?;
        let reg = drm::drv::Registration::new(drm)?;

        // Everything in `Self` is dropped on remove(), hence the DRM driver is
        // unregistered on remove().
        Ok(KBox::new(Self(reg), GFP_KERNEL)?)
    }

This is already broken, because now the lifetime of the pci::Bar is bound to the
DRM device and the DRM device is allowed to outlive remove().

Subsequently, pci_iounmap() and pci_release_region() are not called in remove(),
but whenever the DRM device is dropped.

The fact that this is possible with safe code, makes this API unsound.

Now let's assume iomap_region() would return a Devres<pci::Device>. That fixes
the problem, because even if the DRM device keeps the Devres<pci::Device> object
alive, it is still dropped when the driver is unbound, and subsequently
pci_iounmap() and pci_release_region() are called when they're supposed to be
called.

Note that this would not be a problem if pci::Device would not be a device
resource. Other stuff may be perfectly fine to bind to the lifetime of the DRM
device.

[1] https://lore.kernel.org/rust-for-linux/Z8CX__mIlFUFEkIh@cassiopeiae/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 21:42                                       ` Dave Airlie
@ 2025-02-27 23:06                                         ` John Hubbard
  2025-02-28  4:10                                           ` Dave Airlie
  0 siblings, 1 reply; 104+ messages in thread
From: John Hubbard @ 2025-02-27 23:06 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Greg KH, Jason Gunthorpe, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Gary Guo, Joel Fernandes, Boqun Feng,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	paulmck

On Thu Feb 27, 2025 at 1:42 PM PST, Dave Airlie wrote:
> On Thu, 27 Feb 2025 at 11:34, John Hubbard <jhubbard@nvidia.com> wrote:
>> On Wed Feb 26, 2025 at 5:02 PM PST, Greg KH wrote:
>> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
...
> nova is just a drm driver, it's not a rewrite of the drm subsystem,
> that sort of effort would entail a much larger commitment.

Maybe at this point in the discussion it would help to discern between
nova-core and nova-drm:

    drivers/gpu/nova-core/ (under discussion here)
    drivers/gpu/drm/nova/ (Future)

...keeping in mind that nova-core will be used by other, non-DRM things,
notably VFIO.

>
> DRM has reasons for doing what drm does, that is a separate discussion
> of how a rust driver fits into the DRM. The rust code has to conform
> to the C expectations for the subsystems they are fitting into.
>
> The drm has spent years moving things to devm/drmm type constructs,
> adding hotplug with the unplug mechanisms, but it's a long journey and
> certainly not something nova would want to wait to reconstruct from
> scratch.

ack.

thanks,
John Hubbard



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 21:37           ` Dave Airlie
@ 2025-02-28  1:49             ` Timur Tabi
  2025-02-28  2:24               ` Dave Airlie
  0 siblings, 1 reply; 104+ messages in thread
From: Timur Tabi @ 2025-02-28  1:49 UTC (permalink / raw)
  To: Alexandre Courbot, airlied@gmail.com
  Cc: nouveau-bounces@lists.freedesktop.org, John Hubbard,
	gary@garyguo.net, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	boqun.feng@gmail.com, dakr@kernel.org,
	nouveau@lists.freedesktop.org, joel@joelfernandes.org, Ben Skeggs

On Fri, 2025-02-28 at 07:37 +1000, Dave Airlie wrote:
> I've tried to retrofit checking 0xffffffff to drivers a lot, I'd
> prefer not to. Drivers getting stuck in wait for clear bits for ever.

That's what read_poll_timeout() is for.  I'm surprised Nouveau doesn't use it.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-28  1:49             ` Timur Tabi
@ 2025-02-28  2:24               ` Dave Airlie
  0 siblings, 0 replies; 104+ messages in thread
From: Dave Airlie @ 2025-02-28  2:24 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Alexandre Courbot, nouveau-bounces@lists.freedesktop.org,
	John Hubbard, gary@garyguo.net, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	boqun.feng@gmail.com, dakr@kernel.org,
	nouveau@lists.freedesktop.org, joel@joelfernandes.org, Ben Skeggs

On Fri, 28 Feb 2025 at 11:49, Timur Tabi <ttabi@nvidia.com> wrote:
>
> On Fri, 2025-02-28 at 07:37 +1000, Dave Airlie wrote:
> > I've tried to retrofit checking 0xffffffff to drivers a lot, I'd
> > prefer not to. Drivers getting stuck in wait for clear bits for ever.
>
> That's what read_poll_timeout() is for.  I'm surprised Nouveau doesn't use it.

That doesn't handle the PCIE returns 0xffffffff case at all, which is
the thing we most want to handle, it also uses the CPU timer whereas
nouveau's wait infrastructure uses the GPU timer usually (though that
could be changed).

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 23:06                                         ` John Hubbard
@ 2025-02-28  4:10                                           ` Dave Airlie
  2025-02-28 18:50                                             ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Dave Airlie @ 2025-02-28  4:10 UTC (permalink / raw)
  To: John Hubbard
  Cc: Greg KH, Jason Gunthorpe, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Gary Guo, Joel Fernandes, Boqun Feng,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	paulmck

On Fri, 28 Feb 2025 at 09:07, John Hubbard <jhubbard@nvidia.com> wrote:
>
> On Thu Feb 27, 2025 at 1:42 PM PST, Dave Airlie wrote:
> > On Thu, 27 Feb 2025 at 11:34, John Hubbard <jhubbard@nvidia.com> wrote:
> >> On Wed Feb 26, 2025 at 5:02 PM PST, Greg KH wrote:
> >> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> ...
> > nova is just a drm driver, it's not a rewrite of the drm subsystem,
> > that sort of effort would entail a much larger commitment.
>
> Maybe at this point in the discussion it would help to discern between
> nova-core and nova-drm:
>
>     drivers/gpu/nova-core/ (under discussion here)

nova-core won't be suffering any of the issues Jason is raising,
nova-core isn't going to have userspace facing interfaces or be part
of any subsystem with major lifetime expectations. It has to deal with
the hardware going away due to hot unplugs, and that is what this
devres is for.

nova-core will be a kernel internal pci driver, and vfio and nova-drm
will load on top of it, once those drivers are loaded and talking to
userspace they will keep references on the nova-core driver module
through normal means.

Dave.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27  1:34                                     ` John Hubbard
  2025-02-27 21:42                                       ` Dave Airlie
@ 2025-02-28 10:52                                       ` Simona Vetter
  2025-02-28 18:40                                         ` Jason Gunthorpe
  1 sibling, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-02-28 10:52 UTC (permalink / raw)
  To: John Hubbard
  Cc: Greg KH, Jason Gunthorpe, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Wed, Feb 26, 2025 at 05:34:01PM -0800, John Hubbard wrote:
> On Wed Feb 26, 2025 at 5:02 PM PST, Greg KH wrote:
> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> >> The way misc device works you can't unload the module until all the
> >> FDs are closed and the misc code directly handles races with opening
> >> new FDs while modules are unloading. It is quite a different scheme
> >> than discussed in this thread.
> >
> > And I would argue that is it the _right_ scheme to be following overall
> > here.  Removing modules with in-flight devices/drivers is to me is odd,
> > and only good for developers doing work, not for real systems, right?
> 
> Right...I think. I'm not experienced with misc, but I do know that the
> "run driver code after driver release" is very, very concerning.
> 
> I'm quite new to drivers/gpu/drm, so this is the first time I've learned
> about this DRM behavior...

It's really, really complicated, and not well documented. Probably should
fix that. The issue is that you have at least 4 different lifetimes
involved here, and people mix them up all the time and get confused. I
discuss 3 of those here:

https://lists.freedesktop.org/archives/intel-xe/2024-April/034195.html

The 4th is the lifetime of the module .text section, for which we need
try_module_get. Now the issue with that is that developers much prefer
convenience over correctness on this, and routinely complain _very_ loudly
about "unnecessary" module references. They're not, but to break the cycle
that Jason points out a bit earlier you need 2 steps:

- Nuke the driver binding manually through sysfs with the unbind files.
- Nuke all userspace that might beholding files and other resources open.
- At this point the module refcount should be zero and you can unload it.

Except developers really don't like the manual unbind step, and so we're
missing try_module_get() in a bunch of places where it really should be.

Now wrt why you can't just solve this all at the subsystem level and
guarantee that after drm_dev_unplug no code is running in driver callbacks
anymore:

In really, really simple subsystems like backlight this is doable. In drm
with arbitrary ioctl this isn't, and you get to make a choice:

- You wait until all driver code finishes, which could be never if there's
  ioctl that wait for render to complete and don't handle hotunplug
  correctly. This is a deadlock.

  In my experience this is theorically possible, practically no one gets
  this right and defacto means that actual hotunplug under load has a good
  chance of just hanging forever. Which is why drm doesn't do this.

- You make the revoceable critical sections as small as possible, which
  makes hotunplug much, much less likely do deadlock. But means that after
  revoking hw access you still have arbitrary driver code running in
  callbacks, and you need to deal with.

  This is why I like the rust Revocable so much, because it's a normal rcu
  section, so disallows all sleeping. You might still deadlock on a busy
  loop waiting for hw without having a timeout. But that's generally
  fairly easy to spot, and good drivers have macros/helpers for this so
  that there is always a timeout.

  drm_dev_unplug uses sleepable rcu for practicality reasons and so has a
  much, much higher chance of deadlocks. Note that strictly speaking
  drm_device should hold a module reference on the driver, but see above
  for why we don't have that - developers prefer convenience over
  correctness in this area.

> > Yes, networking did add that functionality to allow modules to be
> > unloaded with network connections open, and I'm guessing RDMA followed
> > that, but really, why?
> >
> > What is the requirement that means that you have to do this for function
> > pointers?  I can understand the disconnect issue between devices and
> > drivers and open file handles (or sockets), as that is a normal thing,
> > but not removing code from the system, that is not normal.
> >
> 
> I really hope that this "run after release" is something that Rust for
> Linux drivers, and in particular, the gpu/nova*, gpu/drm/nova* drivers,
> can *leave behind*.
> 
> DRM may have had ${reasons} for this approach, but this nova effort is
> rebuilding from the ground up. So we should avoid just blindly following
> this aspect of the original DRM design.

We can and should definitely try to make this much better. I think we can
get to full correctness wrt the first 3 lifetime things in rust. I'm not
sure whether handling module unload/.text lifetime is worth the bother,
it's probably only going to upset developers if we try. At least that's
been my experience.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-28 10:52                                       ` Simona Vetter
@ 2025-02-28 18:40                                         ` Jason Gunthorpe
  2025-03-04 16:10                                           ` Simona Vetter
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-28 18:40 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote:

> - Nuke the driver binding manually through sysfs with the unbind files.
> - Nuke all userspace that might beholding files and other resources open.
> - At this point the module refcount should be zero and you can unload it.
> 
> Except developers really don't like the manual unbind step, and so we're
> missing try_module_get() in a bunch of places where it really should be.

IMHO they are not missing, we just have a general rule that if a
cleanup function, required to be called prior to module exit, revokes
any .text pointers then you don't need to hold the module refcount.

file_operations doesn't have such a cleanup function which is why it
takes the refcount.

hrtimer does have such a function which is why it doesn't take the
refcount.

> Now wrt why you can't just solve this all at the subsystem level and
> guarantee that after drm_dev_unplug no code is running in driver callbacks
> anymore:
> 
> In really, really simple subsystems like backlight this is doable. In drm
> with arbitrary ioctl this isn't, and you get to make a choice:

It is certainly doable, you list the right way to do it right below
and RDMA implements that successfully.

The subsytem owns all FDs and proxies all file_opertions to the driver
(after improving them :) and that is protected by a rwsem/SRCU that
is safe against the removal path setting all driver ops to NULL.

> - You wait until all driver code finishes, which could be never if there's
>   ioctl that wait for render to complete and don't handle hotunplug
>   correctly. This is a deadlock.

Meh. We waited for all FDs to close for along time. It isn't a
"deadlock" it is just a wait on userspace that extends to module
unload. Very undesirable yes, but not the end of the world, it can
resolve itself if userspace shutsdown.

But, IMHO, the subsystem and driver should shoot down the waits during
remove.

Your infinite waits are all interruptable right? :)

>   In my experience this is theorically possible, practically no one gets
>   this right and defacto means that actual hotunplug under load has a good
>   chance of just hanging forever. Which is why drm doesn't do this.

See, we didn't have this problem as we don't have infinite waits in
driver as part of the API. The API toward the driver is event driven..

I can understand that adding the shootdown logic all over the place
would be hard and you'd get it wrong.

But so is half removing the driver while it is doing *anything* and
trying to mitigate that with a different kind of hard to do locking
fix. *shrug*

>   This is why I like the rust Revocable so much, because it's a normal rcu
>   section, so disallows all sleeping. You might still deadlock on a busy
>   loop waiting for hw without having a timeout. But that's generally
>   fairly easy to spot, and good drivers have macros/helpers for this so
>   that there is always a timeout.

The Recovable version narrows the critical sections to very small
regions, but having critical sections at all is still, IMHO, hacky.

What you should ask Rust to solve for you is the infinite waits! That
is the root cause of your problem. Compiler enforces no waits with out
a revocation option on DRM callbacks!

Wouldn't that be much better??

>   drm_dev_unplug uses sleepable rcu for practicality reasons and so has a
>   much, much higher chance of deadlocks. Note that strictly speaking
>   drm_device should hold a module reference on the driver, but see above
>   for why we don't have that - developers prefer convenience over
>   correctness in this area.

Doesn't DRM have a module reference because the fops is in the driver
and the file core takes the driver module reference during
fops_get()/replace_fops() in drm_stub_open()? Or do I misunderstand
what that stub is for?

Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ?

> We can and should definitely try to make this much better. I think we can
> get to full correctness wrt the first 3 lifetime things in rust. I'm not
> sure whether handling module unload/.text lifetime is worth the bother,
> it's probably only going to upset developers if we try. 

It hurts to read a suggestion we should ignore .text lifetime rules :(
DRM can be be like this, but please don't push that mess onto the rest
of the world in the common rust bindings or common rust design
patterns. Especially after places have invested alot to properly and
fully fix these problems without EAF bugs, infinite wait problems or
otherwise.

My suggestion is that new DRM rust drivers should have the file
operations isolation like RDMA does and a design goal to have
revocable sleeps. No EAF issue. You don't have to fix the whole DRM
subsystem to get here, just some fairly small work that only new rust
drivers would use. Start off on a good foot. <shrug>

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-28  4:10                                           ` Dave Airlie
@ 2025-02-28 18:50                                             ` Jason Gunthorpe
  0 siblings, 0 replies; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-28 18:50 UTC (permalink / raw)
  To: Dave Airlie
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Gary Guo, Joel Fernandes, Boqun Feng,
	Ben Skeggs, linux-kernel, rust-for-linux, nouveau, dri-devel,
	paulmck

On Fri, Feb 28, 2025 at 02:10:39PM +1000, Dave Airlie wrote:
> On Fri, 28 Feb 2025 at 09:07, John Hubbard <jhubbard@nvidia.com> wrote:
> >
> > On Thu Feb 27, 2025 at 1:42 PM PST, Dave Airlie wrote:
> > > On Thu, 27 Feb 2025 at 11:34, John Hubbard <jhubbard@nvidia.com> wrote:
> > >> On Wed Feb 26, 2025 at 5:02 PM PST, Greg KH wrote:
> > >> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> > ...
> > > nova is just a drm driver, it's not a rewrite of the drm subsystem,
> > > that sort of effort would entail a much larger commitment.
> >
> > Maybe at this point in the discussion it would help to discern between
> > nova-core and nova-drm:
> >
> >     drivers/gpu/nova-core/ (under discussion here)
> 
> nova-core won't be suffering any of the issues Jason is raising,
> nova-core isn't going to have userspace facing interfaces or be part
> of any subsystem with major lifetime expectations. It has to deal with
> the hardware going away due to hot unplugs, and that is what this
> devres is for.

It will suffer the general problem because it provides interfaces to
the nova DRM module and DRM will hold a reference on it's 'nova core
object' till the DRM file_operations release.

So you end up with nova core remove running and being unable to clean
things because of that DRM reference, even though the nova DRM driver
has completed remove.

You could wrap the nova core reference in a devres, but I think that
would be objectionable. :)

Though I expect no module EAF because you'd be direct linking..

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-27 22:40                                                     ` Danilo Krummrich
@ 2025-02-28 18:55                                                       ` Jason Gunthorpe
  2025-03-03 19:36                                                         ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-02-28 18:55 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Boqun Feng, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Thu, Feb 27, 2025 at 11:40:53PM +0100, Danilo Krummrich wrote:
> On Thu, Feb 27, 2025 at 06:00:13PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 27, 2025 at 01:25:10PM -0800, Boqun Feng wrote:
> > > 
> > > Most of the cases, it should be naturally achieved, because you already
> > > bind the objects into your module or driver, otherwise they would be
> > > already cancelled and freed. 
> > 
> > I'm getting the feeling you can probably naturally achieve the
> > required destructors, but I think Danillo is concerned that since it
> > isn't *mandatory* it isn't safe/sound.
> 
> Of course you can "naturally" achieve the required destructors, I even explained
> that in [1]. :-)
> 
> And yes, for *device resources* it is unsound if we do not ensure that the
> device resource is actually dropped at device unbind.

Why not do a runtime validation then?

It would be easy to have an atomic counting how many devres objects
are still alive.

Have remove() WARN_ON to the atomic and a dumb sleep loop until it is 0.

Properly written drives never hit it. Buggy drivers will throw a
warning and otherwise function safely.

I'm thinking the standard design pattern is a problem that is too
complex for static analysis to solve.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-28 18:55                                                       ` Jason Gunthorpe
@ 2025-03-03 19:36                                                         ` Danilo Krummrich
  2025-03-03 21:50                                                           ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-03-03 19:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boqun Feng, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Feb 28, 2025 at 02:55:34PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 11:40:53PM +0100, Danilo Krummrich wrote:
> > On Thu, Feb 27, 2025 at 06:00:13PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 27, 2025 at 01:25:10PM -0800, Boqun Feng wrote:
> > > > 
> > > > Most of the cases, it should be naturally achieved, because you already
> > > > bind the objects into your module or driver, otherwise they would be
> > > > already cancelled and freed. 
> > > 
> > > I'm getting the feeling you can probably naturally achieve the
> > > required destructors, but I think Danillo is concerned that since it
> > > isn't *mandatory* it isn't safe/sound.
> > 
> > Of course you can "naturally" achieve the required destructors, I even explained
> > that in [1]. :-)
> > 
> > And yes, for *device resources* it is unsound if we do not ensure that the
> > device resource is actually dropped at device unbind.
> 
> Why not do a runtime validation then?
> 
> It would be easy to have an atomic counting how many devres objects
> are still alive.

(1) It would not be easy at all, if not impossible.

A Devres object doesn't know whether it's embedded in an Arc<Devres>, nor does
it know whether it is embedded in subsequent Arc containers, e.g.
Arc<Arc<Devres>>.

It is impossible for a Devres object to have a global view on how many
references keep it alive.

> 
> Have remove() WARN_ON to the atomic and a dumb sleep loop until it is 0.
> 
> Properly written drives never hit it. Buggy drivers will throw a
> warning and otherwise function safely.

Ignoring (1), I think that's exactly the opposite of what we want to achieve.

This would mean that the Rust abstraction does *not avoid* but *only detect* the
problem.

The formal problem: The resulting API would be unsound by definition.

The practical problem: Buggy drivers could (as you propose) stall the
corresponding task forever, never releasing the device resource. Not releasing
the device resource may stall subsequent drivers trying to probe the device, or,
if the physical memory region has been reassigned to another device, prevent
another device from probing. This is *not* what I would call "function safely".

With the current API nothing of that kind is possible at all. And that is what
we want to achieve as good as possible: Make Rust driver APIs robust enough,
such that even buggy drivers can't mess up the whole kernel. Especially for a
monolithic kernel this seems quite desirable.

- Danilo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-03 19:36                                                         ` Danilo Krummrich
@ 2025-03-03 21:50                                                           ` Jason Gunthorpe
  2025-03-04  9:57                                                             ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-03 21:50 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Boqun Feng, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Mon, Mar 03, 2025 at 08:36:34PM +0100, Danilo Krummrich wrote:
> > > And yes, for *device resources* it is unsound if we do not ensure that the
> > > device resource is actually dropped at device unbind.
> > 
> > Why not do a runtime validation then?
> > 
> > It would be easy to have an atomic counting how many devres objects
> > are still alive.
> 
> (1) It would not be easy at all, if not impossible.
> 
> A Devres object doesn't know whether it's embedded in an Arc<Devres>, nor does
> it know whether it is embedded in subsequent Arc containers, e.g.
> Arc<Arc<Devres>>.

You aren't tracking that. If Rust has a problem, implement it in C:

  devm_rsgc_start(struct device *)
  devm_rsgc_pci_iomap(struct device *,..)
  devm_rsgc_pci_iounmap(struct device *,..)
  devm_rsgc_fence(struct device *)

Surely that isn't a problem for bindings?

> > Properly written drives never hit it. Buggy drivers will throw a
> > warning and otherwise function safely.
> 
> Ignoring (1), I think that's exactly the opposite of what we want to achieve.
> 
> This would mean that the Rust abstraction does *not avoid* but *only detect* the
> problem.

It is preventing memory leaks. We would like to prevent memory leaks
in the kernel. Currently you can't even detect them, so this seems
like an improvement.

I am using the practical kernel definition for "memory leak" of memory
outliving remove() rather than the more philisophical rust version
that memory is someday, eventually, freed.

In an environment like the kernel the distinction is important and
desirable.

> The practical problem: Buggy drivers could (as you propose) stall the
> corresponding task forever, never releasing the device resource. 

Should't be never. Rust never leaks memory? It will eventually be
unstuck when Rust frees the memory in the concurrent context that is
holding it.

> Not releasing the device resource may stall subsequent drivers
> trying to probe the device, or, if the physical memory region has
> been reassigned to another device, prevent another device from
> probing. This is *not* what I would call "function safely".

It didn't create a security problem.

The driver could also write 

 while (true)
    sleep(1)

In the remove function and achieve all of the above bad things, should
you try to statically prevent that too? I've written stuff like the
above in drivers before, it is an easy way to make thing safe when
working with FDs, so don't think nobody would do that.

So I don't see alot of value in arguing you can close of 1 way to
achieve something bad and still leave open all sorts of others. At
least is more philisophical than I prefer to get.

It is better to focus on usability, performance, and bug class
detection, IMHO.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-03 21:50                                                           ` Jason Gunthorpe
@ 2025-03-04  9:57                                                             ` Danilo Krummrich
  0 siblings, 0 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-03-04  9:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boqun Feng, Joel Fernandes, Alexandre Courbot, Dave Airlie,
	Gary Guo, Joel Fernandes, John Hubbard, Ben Skeggs, linux-kernel,
	rust-for-linux, nouveau, dri-devel, paulmck

On Mon, Mar 03, 2025 at 05:50:02PM -0400, Jason Gunthorpe wrote:
> On Mon, Mar 03, 2025 at 08:36:34PM +0100, Danilo Krummrich wrote:
> > > > And yes, for *device resources* it is unsound if we do not ensure that the
> > > > device resource is actually dropped at device unbind.
> > > 
> > > Why not do a runtime validation then?
> > > 
> > > It would be easy to have an atomic counting how many devres objects
> > > are still alive.
> > 
> > (1) It would not be easy at all, if not impossible.
> > 
> > A Devres object doesn't know whether it's embedded in an Arc<Devres>, nor does
> > it know whether it is embedded in subsequent Arc containers, e.g.
> > Arc<Arc<Devres>>.
> 
> You aren't tracking that. If Rust has a problem, implement it in C:

Rust does not have a problem here. Arc is implemented as:

	pub struct Arc<T: ?Sized> {
	    ptr: NonNull<ArcInner<T>>,
	    _p: PhantomData<ArcInner<T>>,
	}

	#[repr(C)]
	struct ArcInner<T: ?Sized> {
	    refcount: Opaque<bindings::refcount_t>,
	    data: T,
	}

Which simply means that 'data' can't access its reference count.

You did identify this as a problem, since you think the correct solution is to
WARN() if an object is held alive across a certain boundary by previously taken
references. Where the actual solution (in Rust) would be to design things in a
way that this isn't possible in the first place.

Let me also point out that the current solution in Rust is perfectly aligned
with what you typically do in C conceptually.

In C you would call

	devm_request_region()
	devm_ioremap()

and you would store the resulting pointer in some driver private structure.

This structure may be reference counted and may outlive remove(). However, if
that's the case it doesn't prevent that the memory region is freed and the
memory mapping is unmapped by the corresponding devres callbacks.

We're following the exact same semantics in Rust, except that we don't leave the
driver private structure with an invalid pointer.

In contrast you keep proposing to turn that into some kind of busy loop in
remove(). If you think that's better, why don't you try to introduce this design
in C first?

> 
>   devm_rsgc_start(struct device *)
>   devm_rsgc_pci_iomap(struct device *,..)
>   devm_rsgc_pci_iounmap(struct device *,..)
>   devm_rsgc_fence(struct device *)
> 
> Surely that isn't a problem for bindings?

Wrapping those functions? Yes, that's not a problem in general.

But obviously I don't know what they're doing, i.e. how is devm_rsgc_pci_iomap()
different from pcim_iomap()?

I also don't know how you would encode them into an abstraction and how this
abstraction fits into the device / driver lifecycle.

Neither can I guess that from four function names, nor can I tell you whether
that results in a safe, sound and reasonable API.

And given that you don't seem to know about at least some Rust fundamentals
(i.e. how Arc works), I highly doubt that you can predict that either without
coding it up entirely, can you?

I explained the reasons for going with the current upstream design in detail,
and I'm happy to explain further if there are more questions.

But, if you just want to change the existing design, this isn't the place.
Instead, please follow the usual process, i.e. write up some patches, explain
how they improve the existing code and then we can discuss them.

> > The practical problem: Buggy drivers could (as you propose) stall the
> > corresponding task forever, never releasing the device resource. 
> 
> Should't be never. Rust never leaks memory? It will eventually be
> unstuck when Rust frees the memory in the concurrent context that is
> holding it.

We're passing Arc objects as ForeignOwnable to C code. If the C code leaks
memory, the corresponding memory allocated in Rust leaks too.

For the definition of "safe" in Rust we assume that C code used underneath is
never buggy, but the reality is different.

> 
> > Not releasing the device resource may stall subsequent drivers
> > trying to probe the device, or, if the physical memory region has
> > been reassigned to another device, prevent another device from
> > probing. This is *not* what I would call "function safely".
> 
> It didn't create a security problem.
> 
> The driver could also write 
> 
>  while (true)
>     sleep(1)

Seriously? Are you really arguing "My potentially infinite loop is not a problem
because a driver could also write the above"?

> In the remove function and achieve all of the above bad things, should
> you try to statically prevent that too?

Absolutely.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-02-28 18:40                                         ` Jason Gunthorpe
@ 2025-03-04 16:10                                           ` Simona Vetter
  2025-03-04 16:42                                             ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-04 16:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Feb 28, 2025 at 02:40:13PM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote:
> 
> > - Nuke the driver binding manually through sysfs with the unbind files.
> > - Nuke all userspace that might beholding files and other resources open.
> > - At this point the module refcount should be zero and you can unload it.
> > 
> > Except developers really don't like the manual unbind step, and so we're
> > missing try_module_get() in a bunch of places where it really should be.
> 
> IMHO they are not missing, we just have a general rule that if a
> cleanup function, required to be called prior to module exit, revokes
> any .text pointers then you don't need to hold the module refcount.
> 
> file_operations doesn't have such a cleanup function which is why it
> takes the refcount.
> 
> hrtimer does have such a function which is why it doesn't take the
> refcount.

I was talking about a bunch of other places, where it works like
file_operations, except we don't bother with the module reference count.
I've seen patches fly by where people "fix" these things because module
unload is "broken".

> > Now wrt why you can't just solve this all at the subsystem level and
> > guarantee that after drm_dev_unplug no code is running in driver callbacks
> > anymore:
> > 
> > In really, really simple subsystems like backlight this is doable. In drm
> > with arbitrary ioctl this isn't, and you get to make a choice:
> 
> It is certainly doable, you list the right way to do it right below
> and RDMA implements that successfully.
> 
> The subsytem owns all FDs and proxies all file_opertions to the driver
> (after improving them :) and that is protected by a rwsem/SRCU that
> is safe against the removal path setting all driver ops to NULL.

I'm not saying that any of these approaches are bad. For some cases we
plan to use them in gpu code too even. The above is pretty much the plan
we have for dma_fence.

> > - You wait until all driver code finishes, which could be never if there's
> >   ioctl that wait for render to complete and don't handle hotunplug
> >   correctly. This is a deadlock.
> 
> Meh. We waited for all FDs to close for along time. It isn't a
> "deadlock" it is just a wait on userspace that extends to module
> unload. Very undesirable yes, but not the end of the world, it can
> resolve itself if userspace shutsdown.
> 
> But, IMHO, the subsystem and driver should shoot down the waits during
> remove.
> 
> Your infinite waits are all interruptable right? :)

So yeah userspace you can shoot down with SIGKILL, assuming really good
programming. But there's also all the in-kernel operations between various
work queues and other threads. This can be easily fixed by just rewriting
the entire thing into a strict message passing paradigm. Unfortunately
rust has to interop with the current existing mess.

gpu drivers can hog console_lock (yes we're trying to get away from that
as much as possible), at that point a cavalier attitude of "you can just
wait" isn't very appreciated.

And once you've made sure that really everything can bail out of you've
gotten pretty close to reimplementing revocable resources.

> >   In my experience this is theorically possible, practically no one gets
> >   this right and defacto means that actual hotunplug under load has a good
> >   chance of just hanging forever. Which is why drm doesn't do this.
> 
> See, we didn't have this problem as we don't have infinite waits in
> driver as part of the API. The API toward the driver is event driven..

Yeah rolling everything over to event passing and message queues would
sort this out a lot. It's kinda not where we are though.

> I can understand that adding the shootdown logic all over the place
> would be hard and you'd get it wrong.
> 
> But so is half removing the driver while it is doing *anything* and
> trying to mitigate that with a different kind of hard to do locking
> fix. *shrug*

The thing is that rust helps you enormously with implementing revocable
resources and making sure you're not cheating with all the bail-out paths.

It cannot help you with making sure you have interruptible/abortable
sleeps in all the right places. Yes this is a bit a disappointment, but
fundamentally rust cannot model negative contexts (unlike strictly
functional languages like haskell) where certain operations are not
allowed. But it is much, much better than C at "this could fail, you must
handle it and not screw up".

In some cases you can plug this gap with runtime validation, like fake
lockdep contexts behind the might_alloc_gfp() checks and similar tricks
we're using on the C side too. Given that I'm still struggling with
weeding out design deadlocks at normal operations. For example runtime pm
is an absolute disaster on this, and a lot of drivers fail real bad once
you add lockdep annotations for runtime pm. I'll probably retire before I
get to doing this for driver unload.

> >   This is why I like the rust Revocable so much, because it's a normal rcu
> >   section, so disallows all sleeping. You might still deadlock on a busy
> >   loop waiting for hw without having a timeout. But that's generally
> >   fairly easy to spot, and good drivers have macros/helpers for this so
> >   that there is always a timeout.
> 
> The Recovable version narrows the critical sections to very small
> regions, but having critical sections at all is still, IMHO, hacky.
> 
> What you should ask Rust to solve for you is the infinite waits! That
> is the root cause of your problem. Compiler enforces no waits with out
> a revocation option on DRM callbacks!
> 
> Wouldn't that be much better??

It would indeed be nice. I haven't seen that rust unicorn yet though, and
from my understanding it's just not something rust can give you. Rust
isn't magic, it's just a tool that can do a few fairly specific things a
lot better than C. But otherwise it's still the same mess.

> >   drm_dev_unplug uses sleepable rcu for practicality reasons and so has a
> >   much, much higher chance of deadlocks. Note that strictly speaking
> >   drm_device should hold a module reference on the driver, but see above
> >   for why we don't have that - developers prefer convenience over
> >   correctness in this area.
> 
> Doesn't DRM have a module reference because the fops is in the driver
> and the file core takes the driver module reference during
> fops_get()/replace_fops() in drm_stub_open()? Or do I misunderstand
> what that stub is for?
> 
> Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ?

Yeah it's there, except only for the userspace references and not for the
kernel internal ones. Because developers get a bit prickle about adding
those unfortunately due to "it breaks module unload". Maybe we just should
add them, at least for rust.

> > We can and should definitely try to make this much better. I think we can
> > get to full correctness wrt the first 3 lifetime things in rust. I'm not
> > sure whether handling module unload/.text lifetime is worth the bother,
> > it's probably only going to upset developers if we try. 
> 
> It hurts to read a suggestion we should ignore .text lifetime rules :(
> DRM can be be like this, but please don't push that mess onto the rest
> of the world in the common rust bindings or common rust design
> patterns. Especially after places have invested alot to properly and
> fully fix these problems without EAF bugs, infinite wait problems or
> otherwise.
> 
> My suggestion is that new DRM rust drivers should have the file
> operations isolation like RDMA does and a design goal to have
> revocable sleeps. No EAF issue. You don't have to fix the whole DRM
> subsystem to get here, just some fairly small work that only new rust
> drivers would use. Start off on a good foot. <shrug>

You've missed the "it will upset developers part". I've seen people remove
module references that are needed, to "fix" driver unloading.

The other part is that rust isn't magic, the compiler cannot reasons
through every possible correct api. Which means that sometimes it forces a
failure path on you that you know cannot ever happen, but you cannot teach
the compiler how to prove that. You can side-step that by runtime death in
rust aka BUG_ON(). Which isn't popular really either.

The third part is that I'm not aware of anything in rust that would
guarantee that the function pointer and the module reference actually
belong to each another. Which means another runtime check most likely, and
hence another thing that shouldn't fail which kinda can now.

Hence my conclusion that maybe it's just not the top priority to get this
all perfect.

Cheers, Sima

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-04 16:10                                           ` Simona Vetter
@ 2025-03-04 16:42                                             ` Jason Gunthorpe
  2025-03-05  7:30                                               ` Simona Vetter
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-04 16:42 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Tue, Mar 04, 2025 at 05:10:45PM +0100, Simona Vetter wrote:
> On Fri, Feb 28, 2025 at 02:40:13PM -0400, Jason Gunthorpe wrote:
> > On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote:
> > 
> > > - Nuke the driver binding manually through sysfs with the unbind files.
> > > - Nuke all userspace that might beholding files and other resources open.
> > > - At this point the module refcount should be zero and you can unload it.
> > > 
> > > Except developers really don't like the manual unbind step, and so we're
> > > missing try_module_get() in a bunch of places where it really should be.
> > 
> > IMHO they are not missing, we just have a general rule that if a
> > cleanup function, required to be called prior to module exit, revokes
> > any .text pointers then you don't need to hold the module refcount.
> > 
> > file_operations doesn't have such a cleanup function which is why it
> > takes the refcount.
> > 
> > hrtimer does have such a function which is why it doesn't take the
> > refcount.
> 
> I was talking about a bunch of other places, where it works like
> file_operations, except we don't bother with the module reference count.
> I've seen patches fly by where people "fix" these things because module
> unload is "broken".

Sure, but there are only two correct API approaches, either you
require the user to make a cancel call that sanitizes the module
references, or you manage them internally.

Hope and pray isn't an option :)

> gpu drivers can hog console_lock (yes we're trying to get away from that
> as much as possible), at that point a cavalier attitude of "you can just
> wait" isn't very appreciated.

What are you trying to solve here? If the system is already stuck
infinitely on the console lock why is module remove even being
considered?

module remove shouldn't be a remedy for a crashed driver...

> > But so is half removing the driver while it is doing *anything* and
> > trying to mitigate that with a different kind of hard to do locking
> > fix. *shrug*
> 
> The thing is that rust helps you enormously with implementing revocable
> resources and making sure you're not cheating with all the bail-out paths.

Assuming a half alive driver with MMIO and interrupts ripped away
doesn't lock up.

Assuming all your interrupt triggered sleeps have gained a shootdown
mechanism.

Assuming all the new extra error paths this creates don't corrupt the
internal state of the driver and cause it to lockup.

Meh. It doesn't seem like such an obvious win to me. Personally I'm
terrified of the idea of a zombie driver half sitting around in a
totally untestable configuration working properly..

> It cannot help you with making sure you have interruptible/abortable
> sleeps in all the right places. 

:(

> > Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ?
> 
> Yeah it's there, except only for the userspace references and not for the
> kernel internal ones. Because developers get a bit prickle about adding
> those unfortunately due to "it breaks module unload". Maybe we just should
> add them, at least for rust.

Yeah, I think such obviously wrong things should be pushed back
against. We don't want EAF bugs in the kernel, we want security...

> You've missed the "it will upset developers part". I've seen people remove
> module references that are needed, to "fix" driver unloading.

When done properly the module can be unloaded. Most rdma driver
modules are unloadable, live, while FDs are open.

> The third part is that I'm not aware of anything in rust that would
> guarantee that the function pointer and the module reference actually
> belong to each another. Which means another runtime check most likely, and
> hence another thing that shouldn't fail which kinda can now.

I suspect it has to come from the C code API contracts, which leak
into the binding design.

If the C API handles module refcounting internally then rust is fine
so long as it enforces THIS_MODULE.

If the C API requires cancel then rust is fine so long as the binding
guarantees cancel before module unload.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-04 16:42                                             ` Jason Gunthorpe
@ 2025-03-05  7:30                                               ` Simona Vetter
  2025-03-05 15:10                                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-05  7:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Tue, Mar 04, 2025 at 12:42:01PM -0400, Jason Gunthorpe wrote:
> On Tue, Mar 04, 2025 at 05:10:45PM +0100, Simona Vetter wrote:
> > On Fri, Feb 28, 2025 at 02:40:13PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote:
> > > 
> > > > - Nuke the driver binding manually through sysfs with the unbind files.
> > > > - Nuke all userspace that might beholding files and other resources open.
> > > > - At this point the module refcount should be zero and you can unload it.
> > > > 
> > > > Except developers really don't like the manual unbind step, and so we're
> > > > missing try_module_get() in a bunch of places where it really should be.
> > > 
> > > IMHO they are not missing, we just have a general rule that if a
> > > cleanup function, required to be called prior to module exit, revokes
> > > any .text pointers then you don't need to hold the module refcount.
> > > 
> > > file_operations doesn't have such a cleanup function which is why it
> > > takes the refcount.
> > > 
> > > hrtimer does have such a function which is why it doesn't take the
> > > refcount.
> > 
> > I was talking about a bunch of other places, where it works like
> > file_operations, except we don't bother with the module reference count.
> > I've seen patches fly by where people "fix" these things because module
> > unload is "broken".
> 
> Sure, but there are only two correct API approaches, either you
> require the user to make a cancel call that sanitizes the module
> references, or you manage them internally.
> 
> Hope and pray isn't an option :)
> 
> > gpu drivers can hog console_lock (yes we're trying to get away from that
> > as much as possible), at that point a cavalier attitude of "you can just
> > wait" isn't very appreciated.
> 
> What are you trying to solve here? If the system is already stuck
> infinitely on the console lock why is module remove even being
> considered?
> 
> module remove shouldn't be a remedy for a crashed driver...

I mean hotunplug here, and trying to make that correct.

This confusion is is why this is so hard, because there's really two main
users for all this:

- developers who want to quickly test new driver versions without full
  reboot. They're often preferring convenience over correctness, like with
  the removal of module refcounting that's strictly needed but means they
  first have to unbind drivers in sysfs before they can unload the driver.

  Another one is that this use-case prefers that the hw is cleanly shut
  down, so that you can actually load the new driver from a well-known
  state. And it's entirely ok if this all fails occasionally, it's just
  for development and testing.

- hotunplug as an actual use-case. Bugs are not ok. The hw can go away at
  any moment. And it might happen while you're holding console_lock. You
  generally do not remove the actual module here, which is why for the
  actual production use-case getting that part right isn't really
  required. But getting the lifetimes of all the various
  structs/objects/resources perfectly right is required.

So the "stuck on console_lock" is the 2nd case, not the first. Module
unload doesn't even come into play on that one.

> > > But so is half removing the driver while it is doing *anything* and
> > > trying to mitigate that with a different kind of hard to do locking
> > > fix. *shrug*
> > 
> > The thing is that rust helps you enormously with implementing revocable
> > resources and making sure you're not cheating with all the bail-out paths.
> 
> Assuming a half alive driver with MMIO and interrupts ripped away
> doesn't lock up.

Rust's drop takes care of that for you. It's not guaranteed, but it's a
case of "the minimal amount of typing yields correct code", unlike C,
where that just blows up for sure.

> Assuming all your interrupt triggered sleeps have gained a shootdown
> mechanism.

Hence why I want revocable to only be rcu, not srcu.

> Assuming all the new extra error paths this creates don't corrupt the
> internal state of the driver and cause it to lockup.

Yeah this one is a bit scary. Corrupting the state is doable, locking up
is much less likely I think, it seems to be more leaks that you get if
rust goes wrong.

> Meh. It doesn't seem like such an obvious win to me. Personally I'm
> terrified of the idea of a zombie driver half sitting around in a
> totally untestable configuration working properly..

Yeah agreed. I might really badly regret this all. But I'm not sold that
switching to message passing design is really going to be better, while
it's definitely going to be a huge amount of work.

> > It cannot help you with making sure you have interruptible/abortable
> > sleeps in all the right places. 
> 
> :(
> 
> > > Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ?
> > 
> > Yeah it's there, except only for the userspace references and not for the
> > kernel internal ones. Because developers get a bit prickle about adding
> > those unfortunately due to "it breaks module unload". Maybe we just should
> > add them, at least for rust.
> 
> Yeah, I think such obviously wrong things should be pushed back
> against. We don't want EAF bugs in the kernel, we want security...

Maybe the two different use-cases above help explain why I'm a bit more
pragmatic here. As long as the hotunplug case does not gain bugs (or gets
some fixed) I'm fairly lax with hacks for the driver developer use-case of
reloading modules.

> > You've missed the "it will upset developers part". I've seen people remove
> > module references that are needed, to "fix" driver unloading.
> 
> When done properly the module can be unloaded. Most rdma driver
> modules are unloadable, live, while FDs are open.
> 
> > The third part is that I'm not aware of anything in rust that would
> > guarantee that the function pointer and the module reference actually
> > belong to each another. Which means another runtime check most likely, and
> > hence another thing that shouldn't fail which kinda can now.
> 
> I suspect it has to come from the C code API contracts, which leak
> into the binding design.
> 
> If the C API handles module refcounting internally then rust is fine
> so long as it enforces THIS_MODULE.

You could do contrived stuff and pass function pointers around, so that
THIS_MODULE doesn't actually match up with the function pointer. Sure it's
really stupid, but the idea with rust is that for memory safety stuff like
this, it's not just stupid, but impossible and the compiler will catch
you. So we need a tad more for rust.

> If the C API requires cancel then rust is fine so long as the binding
> guarantees cancel before module unload.

Yeah this is again where I think rust needs a bit more, because the
compiler can't always nicely proof this for you in all the "obvious"
cases.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-05  7:30                                               ` Simona Vetter
@ 2025-03-05 15:10                                                 ` Jason Gunthorpe
  2025-03-06 10:42                                                   ` Simona Vetter
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-05 15:10 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Wed, Mar 05, 2025 at 08:30:34AM +0100, Simona Vetter wrote:
> - developers who want to quickly test new driver versions without full
>   reboot. They're often preferring convenience over correctness, like with
>   the removal of module refcounting that's strictly needed but means they
>   first have to unbind drivers in sysfs before they can unload the driver.
> 
>   Another one is that this use-case prefers that the hw is cleanly shut
>   down, so that you can actually load the new driver from a well-known
>   state. And it's entirely ok if this all fails occasionally, it's just
>   for development and testing.

I've never catered to this because if you do this one:

> - hotunplug as an actual use-case. Bugs are not ok. The hw can go away at
>   any moment. And it might happen while you're holding console_lock. You
>   generally do not remove the actual module here, which is why for the
>   actual production use-case getting that part right isn't really
>   required. But getting the lifetimes of all the various
>   structs/objects/resources perfectly right is required.

Fully and properly then developers are happy too..

And we were always able to do this one..

> So the "stuck on console_lock" is the 2nd case, not the first. Module
> unload doesn't even come into play on that one.

I don't see reliable hot unplug if the driver can get stuck on a
lock :|

> > Assuming all your interrupt triggered sleeps have gained a shootdown
> > mechanism.
> 
> Hence why I want revocable to only be rcu, not srcu.

Sorry, I was not clear. You also have to make the PCI interrupt(s)
revocable. Just like the MMIO it cannot leak past the remove() as a
matter of driver-model correctness.

So, you end up disabling the interrupt while the driver is still
running and any sleeps in the driver that are waiting for an interrupt
still need to be shot down.

Further, I just remembered, (Danilo please notice!) there is another
related issue here that DMA mappings *may not* outlive remove()
either. netdev had a bug related to this recently and it was all
agreed that it is not allowed. The kernel can crash in a couple of
different ways if you try to do this.

https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c

 > a device with no driver bound should not be passed to the DMA API,
 > much less a dead device that's already been removed from its parent
 > bus.

So now we have a driver design that must have:
 1) Revocable MMIO
 2) Revocable Interrupts
 3) Revocable DMA mappings
 4) Revocable HW DMA - the HW MUST stop doing DMA before the DMA API
    is shut down. Failure is a correctness/UAF/security issue

Somehow the driver has to implement this, not get confused or lock up,
all while Rust doesn't help you guarentee much of any of the important
properties related to #2/#3/#4. And worse all this manual recvocable
stuff is special and unique to hot-unplug. So it will all be untested
and broken.

Looks really hard to me. *Especially* the wild DMA thing.

This has clearly been missed here as with the current suggestion to
just revoke MMIO means the driver can't actually go out and shutdown
it's HW DMA after-the-fact since the MMIO is gone. Thus you are pretty
much guaranteed to fail #4, by design, which is a serious issue.

I'm sorry it has taken so many emails to reach this, I did know it,
but didn't put the pieces coherently together till just now :\

Compare that to how RDMA works, where we do a DMA shutdown by
destroying all the objects just the same as if the user closed a
FD. The normal destruction paths fence the HW DMA and we end up in
remove with cleanly shutdown HW and no DMA API open. The core code
manages all of this. Simple, correct, no buggy hotplug only paths.

> Yeah agreed. I might really badly regret this all. But I'm not sold that
> switching to message passing design is really going to be better, while
> it's definitely going to be a huge amount of work.

Yeah, I'd think from where DRM is now continuing trying to address the
sleeps is more tractable and achievable than a message passing
redesign..

> > If the C API handles module refcounting internally then rust is fine
> > so long as it enforces THIS_MODULE.
> 
> You could do contrived stuff and pass function pointers around, so that
> THIS_MODULE doesn't actually match up with the function pointer.

Ah.. I guess rust would have to validate the function pointers and the
THIS_MODULE are consistent at runtime time before handing them off to
C to prevent this. Seems like a reasonable thing to put under some
CONFIG_DEBUG, also seems a bit hard to implement perhaps..

> > If the C API requires cancel then rust is fine so long as the binding
> > guarantees cancel before module unload.
> 
> Yeah this is again where I think rust needs a bit more, because the
> compiler can't always nicely proof this for you in all the "obvious"
> cases.

But in the discussion about the hrtimer it was asserted that Rust can :)

I believe it could be, so long as rust bindings are pretty restricted
and everything rolls up and cancels when things are destroyed. Nothing
should be able to leak out as a principle of the all the binding
designs.

Seems like a hard design to enforce across all bindings, eg workqeue
is already outside of it. Seems like something that should be written
down in a binding design document..

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-05 15:10                                                 ` Jason Gunthorpe
@ 2025-03-06 10:42                                                   ` Simona Vetter
  2025-03-06 15:32                                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-06 10:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Wed, Mar 05, 2025 at 11:10:12AM -0400, Jason Gunthorpe wrote:
> On Wed, Mar 05, 2025 at 08:30:34AM +0100, Simona Vetter wrote:
> > - developers who want to quickly test new driver versions without full
> >   reboot. They're often preferring convenience over correctness, like with
> >   the removal of module refcounting that's strictly needed but means they
> >   first have to unbind drivers in sysfs before they can unload the driver.
> > 
> >   Another one is that this use-case prefers that the hw is cleanly shut
> >   down, so that you can actually load the new driver from a well-known
> >   state. And it's entirely ok if this all fails occasionally, it's just
> >   for development and testing.
> 
> I've never catered to this because if you do this one:
> 
> > - hotunplug as an actual use-case. Bugs are not ok. The hw can go away at
> >   any moment. And it might happen while you're holding console_lock. You
> >   generally do not remove the actual module here, which is why for the
> >   actual production use-case getting that part right isn't really
> >   required. But getting the lifetimes of all the various
> >   structs/objects/resources perfectly right is required.
> 
> Fully and properly then developers are happy too..
> 
> And we were always able to do this one..
> 
> > So the "stuck on console_lock" is the 2nd case, not the first. Module
> > unload doesn't even come into play on that one.
> 
> I don't see reliable hot unplug if the driver can get stuck on a
> lock :|
> 
> > > Assuming all your interrupt triggered sleeps have gained a shootdown
> > > mechanism.
> > 
> > Hence why I want revocable to only be rcu, not srcu.
> 
> Sorry, I was not clear. You also have to make the PCI interrupt(s)
> revocable. Just like the MMIO it cannot leak past the remove() as a
> matter of driver-model correctness.
> 
> So, you end up disabling the interrupt while the driver is still
> running and any sleeps in the driver that are waiting for an interrupt
> still need to be shot down.
> 
> Further, I just remembered, (Danilo please notice!) there is another
> related issue here that DMA mappings *may not* outlive remove()
> either. netdev had a bug related to this recently and it was all
> agreed that it is not allowed. The kernel can crash in a couple of
> different ways if you try to do this.
> 
> https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c

Hm for the physical dma I thought disabling pci bust master should put a
stop to all this stuff?

For the sw lifecycle stuff I honestly didn't know that was an issue, I
guess that needs to be adressed in the dma-api wrappers or rust can blow
up in funny ways. C drivers just walk all mappings and shoot them.

>  > a device with no driver bound should not be passed to the DMA API,
>  > much less a dead device that's already been removed from its parent
>  > bus.
> 
> So now we have a driver design that must have:
>  1) Revocable MMIO
>  2) Revocable Interrupts
>  3) Revocable DMA mappings
>  4) Revocable HW DMA - the HW MUST stop doing DMA before the DMA API
>     is shut down. Failure is a correctness/UAF/security issue
> 
> Somehow the driver has to implement this, not get confused or lock up,
> all while Rust doesn't help you guarentee much of any of the important
> properties related to #2/#3/#4. And worse all this manual recvocable
> stuff is special and unique to hot-unplug. So it will all be untested
> and broken.

The trouble is that for real hotunplug, you need all this anyway. Because
when you physically hotunplug the interrupts will be dead, the mmio will
be gone any momem (not just at the beginnning of an rcu revocable
section), so real hotunplug is worse than what we're trying to do here.

Which means I think this actually helps you with testing, since it's much
easier to test stuff with pure software than physically yanking hardware.
You could perhaps fake that with mmiotrace-like infrastructure, but that's
not easy either.

So randomly interrupts not happening is something you need to cope with no
matter what.

> Looks really hard to me. *Especially* the wild DMA thing.
> 
> This has clearly been missed here as with the current suggestion to
> just revoke MMIO means the driver can't actually go out and shutdown
> it's HW DMA after-the-fact since the MMIO is gone. Thus you are pretty
> much guaranteed to fail #4, by design, which is a serious issue.
> 
> I'm sorry it has taken so many emails to reach this, I did know it,
> but didn't put the pieces coherently together till just now :\
> 
> Compare that to how RDMA works, where we do a DMA shutdown by
> destroying all the objects just the same as if the user closed a
> FD. The normal destruction paths fence the HW DMA and we end up in
> remove with cleanly shutdown HW and no DMA API open. The core code
> manages all of this. Simple, correct, no buggy hotplug only paths.

This is where it gets really annoying, because with a physical hotunplug
you don't need to worry about dma happening after ->remove, it already
stopped before ->remove even started.

But for a driver unbind you _do_ have to worry about cleanly shutting down
the hardware. For the above reasons and also in general putting hardware
into a well-known (all off usually) state is better for then reloading a
new driver version and binding that. Except that there's no way to tell
whether your ->remove got called for unbinding or hotunplug. And you could
get called for unbinding and then get hotunplugged in the middle to make
this even more messy. At least last time around I chatted with Greg about
this he really didn't like the idea of allowing drivers to know whether a
pci device was physically unplugged or not, and so for developer
convenience most pci drivers go with the "cleanly shut down everything"
approach, which is the wrong thing to do for actual hotunplug.

> > Yeah agreed. I might really badly regret this all. But I'm not sold that
> > switching to message passing design is really going to be better, while
> > it's definitely going to be a huge amount of work.
> 
> Yeah, I'd think from where DRM is now continuing trying to address the
> sleeps is more tractable and achievable than a message passing
> redesign..
> 
> > > If the C API handles module refcounting internally then rust is fine
> > > so long as it enforces THIS_MODULE.
> > 
> > You could do contrived stuff and pass function pointers around, so that
> > THIS_MODULE doesn't actually match up with the function pointer.
> 
> Ah.. I guess rust would have to validate the function pointers and the
> THIS_MODULE are consistent at runtime time before handing them off to
> C to prevent this. Seems like a reasonable thing to put under some
> CONFIG_DEBUG, also seems a bit hard to implement perhaps..

We should know the .text section of a module, so checking whether a
pointer is within that shouldn't be too hard.

> > > If the C API requires cancel then rust is fine so long as the binding
> > > guarantees cancel before module unload.
> > 
> > Yeah this is again where I think rust needs a bit more, because the
> > compiler can't always nicely proof this for you in all the "obvious"
> > cases.
> 
> But in the discussion about the hrtimer it was asserted that Rust can :)
> 
> I believe it could be, so long as rust bindings are pretty restricted
> and everything rolls up and cancels when things are destroyed. Nothing
> should be able to leak out as a principle of the all the binding
> designs.
> 
> Seems like a hard design to enforce across all bindings, eg workqeue
> is already outside of it. Seems like something that should be written
> down in a binding design document..

Yeah ...

I think a big issue is that very often all these things aren't even
documented on the C side, like the dma-api unmapping lifetime I wasn't
aware of at all.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-06 10:42                                                   ` Simona Vetter
@ 2025-03-06 15:32                                                     ` Jason Gunthorpe
  2025-03-07 10:28                                                       ` Simona Vetter
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-06 15:32 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Thu, Mar 06, 2025 at 11:42:38AM +0100, Simona Vetter wrote:
> > Further, I just remembered, (Danilo please notice!) there is another
> > related issue here that DMA mappings *may not* outlive remove()
> > either. netdev had a bug related to this recently and it was all
> > agreed that it is not allowed. The kernel can crash in a couple of
> > different ways if you try to do this.
> > 
> > https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> 
> Hm for the physical dma I thought disabling pci bust master should put a
> stop to all this stuff?

Not in the general case. Many device classes (eg platform) don't have
something like "PCI bus master". It is also not always possible to
reset a device, even in PCI.

So the way things work today for module reload relies on the driver
duing a full quiet down so that the next driver to attach can safely
start up the device. Otherwise the next driver flips PCI bus master
back on and immediately UAFs memory through rouge DMA.

Relying on PCI Bus master also exposes a weakness we battled with in
kexec. When the new driver boots up it has to gain control of the
device and stop the DMA before flipping "PCI Bus Master" off. Almost
no drivers actually do this, and some HW can't even achieve it without
PCI reset (which is not always available). Meaning you end up with a
likely UAF flow if you rely on this technique.

> For the sw lifecycle stuff I honestly didn't know that was an issue, I
> guess that needs to be adressed in the dma-api wrappers or rust can blow
> up in funny ways. C drivers just walk all mappings and shoot them.

I wonder what people will come up with. DMA API is performance path,
people are not going to accept pointless overheads there.

IMHO whatever path the DMA API takes the MMIO design should follow
it.

> The trouble is that for real hotunplug, you need all this anyway. Because
> when you physically hotunplug the interrupts will be dead, the mmio will
> be gone any momem (not just at the beginnning of an rcu revocable
> section), so real hotunplug is worse than what we're trying to do here.

There are two kinds of real hotunplug, the friendly kind that we see
in physical PCI where you actually plonk a button on the case and wait
for the light to go off. Ie it is interactive and safe with the
OS. Very similar to module reloading.

And the hostile kind, like in thunderbolt, where it just goes away and
dies.

In the server world, other than nvme, we seem to focus on the friendly
kind.

> So randomly interrupts not happening is something you need to cope with no
> matter what.

Yes
 
> But for a driver unbind you _do_ have to worry about cleanly shutting down
> the hardware. For the above reasons and also in general putting hardware
> into a well-known (all off usually) state is better for then reloading a
> new driver version and binding that. Except that there's no way to tell
> whether your ->remove got called for unbinding or hotunplug.

IMHO it doesn't really matter, the driver has to support the most
difficult scenario anyhow. The only practical difference is that the
MMIO might return -1 to all reads and the interrupts are dead. If you
want to detect a gone PCI device then just do a register read and
check for -1, which some drivers like mlx5 are doing as part of their
resiliency strategy.

> pci device was physically unplugged or not, and so for developer
> convenience most pci drivers go with the "cleanly shut down everything"
> approach, which is the wrong thing to do for actual hotunplug.

I wouldn't say it is wrong. It is still the correct thing to do, and
following down the normal cleanup paths is a good way to ensure the
special case doesn't have bugs. The primary difference is you want to
understand the device is dead and stop waiting on it faster. Drivers
need to consider these things anyhow if they want resiliency against
device crashes, PCI link wobbles and so on that don't involve
remove().

Regardless, I think the point is clear that the driver author bears
alot of responsibility to sequence this stuff correctly as part of
their remove() implementation. The idea that Rust can magically make
all this safe against UAF or lockups seems incorrect.

> > Ah.. I guess rust would have to validate the function pointers and the
> > THIS_MODULE are consistent at runtime time before handing them off to
> > C to prevent this. Seems like a reasonable thing to put under some
> > CONFIG_DEBUG, also seems a bit hard to implement perhaps..
> 
> We should know the .text section of a module, so checking whether a
> pointer is within that shouldn't be too hard.

It is legal to pass a pointer to a function in a module that this
module is linked to as well. We do that sometimes.. Eg a fops having a
simple_xx pointer. So you'd need to do some graph analysis.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-06 15:32                                                     ` Jason Gunthorpe
@ 2025-03-07 10:28                                                       ` Simona Vetter
  2025-03-07 12:32                                                         ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-07 10:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Thu, Mar 06, 2025 at 11:32:36AM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 06, 2025 at 11:42:38AM +0100, Simona Vetter wrote:
> > > Further, I just remembered, (Danilo please notice!) there is another
> > > related issue here that DMA mappings *may not* outlive remove()
> > > either. netdev had a bug related to this recently and it was all
> > > agreed that it is not allowed. The kernel can crash in a couple of
> > > different ways if you try to do this.
> > > 
> > > https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c
> > 
> > Hm for the physical dma I thought disabling pci bust master should put a
> > stop to all this stuff?
> 
> Not in the general case. Many device classes (eg platform) don't have
> something like "PCI bus master". It is also not always possible to
> reset a device, even in PCI.
> 
> So the way things work today for module reload relies on the driver
> duing a full quiet down so that the next driver to attach can safely
> start up the device. Otherwise the next driver flips PCI bus master
> back on and immediately UAFs memory through rouge DMA.
> 
> Relying on PCI Bus master also exposes a weakness we battled with in
> kexec. When the new driver boots up it has to gain control of the
> device and stop the DMA before flipping "PCI Bus Master" off. Almost
> no drivers actually do this, and some HW can't even achieve it without
> PCI reset (which is not always available). Meaning you end up with a
> likely UAF flow if you rely on this technique.

Yeah this gets really hairy really fast. We might need some pragmatism
here of not being able to be better than C.

And the entire "load driver after previously the linux driver messed with
it already" is a very broad issue, from rebinding to module reload to
kexec. With some hw it's just not possible to do safely, and with a lot
more hw not reliably due to complexity. E.g. drm/i915/display can take
over the gpu if outputs are enabled and fully recover hw state into sw
state. But defacto that only works for configurations the fw/bootloader
leaves behind, and not in full generality. Plus we don't handle
misprogrammed hw at all.

> > For the sw lifecycle stuff I honestly didn't know that was an issue, I
> > guess that needs to be adressed in the dma-api wrappers or rust can blow
> > up in funny ways. C drivers just walk all mappings and shoot them.
> 
> I wonder what people will come up with. DMA API is performance path,
> people are not going to accept pointless overheads there.
> 
> IMHO whatever path the DMA API takes the MMIO design should follow
> it.

I think this needs to be subsystem specific, since very often there's
already data structures to track all mappings, and so easy to add a bit of
glue to nuke them all forcefully. Or at least data structures to track all
pending requests, and so again we can enforce that we stall for them all
to finish.

We'll probably end up with rust bindings being a lot more opinionated
about how a driver should work, which has the risk of going too far into
the midlayer mistake antipattern. I guess we'll see how that all pans out.

> > The trouble is that for real hotunplug, you need all this anyway. Because
> > when you physically hotunplug the interrupts will be dead, the mmio will
> > be gone any momem (not just at the beginnning of an rcu revocable
> > section), so real hotunplug is worse than what we're trying to do here.
> 
> There are two kinds of real hotunplug, the friendly kind that we see
> in physical PCI where you actually plonk a button on the case and wait
> for the light to go off. Ie it is interactive and safe with the
> OS. Very similar to module reloading.
> 
> And the hostile kind, like in thunderbolt, where it just goes away and
> dies.
> 
> In the server world, other than nvme, we seem to focus on the friendly
> kind.

Yeah gpus tend to hang out in external enclosures sometimes, so I'm not
sure we can ignore the hostile kind.

> > So randomly interrupts not happening is something you need to cope with no
> > matter what.
> 
> Yes
>  
> > But for a driver unbind you _do_ have to worry about cleanly shutting down
> > the hardware. For the above reasons and also in general putting hardware
> > into a well-known (all off usually) state is better for then reloading a
> > new driver version and binding that. Except that there's no way to tell
> > whether your ->remove got called for unbinding or hotunplug.
> 
> IMHO it doesn't really matter, the driver has to support the most
> difficult scenario anyhow. The only practical difference is that the
> MMIO might return -1 to all reads and the interrupts are dead. If you
> want to detect a gone PCI device then just do a register read and
> check for -1, which some drivers like mlx5 are doing as part of their
> resiliency strategy.
> 
> > pci device was physically unplugged or not, and so for developer
> > convenience most pci drivers go with the "cleanly shut down everything"
> > approach, which is the wrong thing to do for actual hotunplug.
> 
> I wouldn't say it is wrong. It is still the correct thing to do, and
> following down the normal cleanup paths is a good way to ensure the
> special case doesn't have bugs. The primary difference is you want to
> understand the device is dead and stop waiting on it faster. Drivers
> need to consider these things anyhow if they want resiliency against
> device crashes, PCI link wobbles and so on that don't involve
> remove().

Might need to revisit that discussion, but Greg didn't like when we asked
for a pci helper to check whether the device is physically gone (at least
per the driver model). Hacking that in drivers is doable, but feels icky.

> Regardless, I think the point is clear that the driver author bears
> alot of responsibility to sequence this stuff correctly as part of
> their remove() implementation. The idea that Rust can magically make
> all this safe against UAF or lockups seems incorrect.

Agreed, it's not pure magic. I do think it can help a lot though, or at
least I'm hoping.

> > > Ah.. I guess rust would have to validate the function pointers and the
> > > THIS_MODULE are consistent at runtime time before handing them off to
> > > C to prevent this. Seems like a reasonable thing to put under some
> > > CONFIG_DEBUG, also seems a bit hard to implement perhaps..
> > 
> > We should know the .text section of a module, so checking whether a
> > pointer is within that shouldn't be too hard.
> 
> It is legal to pass a pointer to a function in a module that this
> module is linked to as well. We do that sometimes.. Eg a fops having a
> simple_xx pointer. So you'd need to do some graph analysis.

Hm right, indirect deps are fine too ...

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 10:28                                                       ` Simona Vetter
@ 2025-03-07 12:32                                                         ` Jason Gunthorpe
  2025-03-07 13:09                                                           ` Simona Vetter
  2025-03-07 14:00                                                           ` Greg KH
  0 siblings, 2 replies; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 12:32 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Mar 07, 2025 at 11:28:37AM +0100, Simona Vetter wrote:

> > I wouldn't say it is wrong. It is still the correct thing to do, and
> > following down the normal cleanup paths is a good way to ensure the
> > special case doesn't have bugs. The primary difference is you want to
> > understand the device is dead and stop waiting on it faster. Drivers
> > need to consider these things anyhow if they want resiliency against
> > device crashes, PCI link wobbles and so on that don't involve
> > remove().
> 
> Might need to revisit that discussion, but Greg didn't like when we asked
> for a pci helper to check whether the device is physically gone (at least
> per the driver model). Hacking that in drivers is doable, but feels
> icky.

I think Greg is right here, the driver model has less knowledge than
the driver if the device is alive.

The resiliency/fast-failure issue is not just isolated to having
observed a proper hot-unplug, but there are many classes of failure
that cause the device HW to malfunction that a robust driver can
detect and recover from. mlx5 attempts to do this for instance.

It turns out when you deploy clusters with 800,000 NICs in them there
are weird HW fails constantly and you have to be resilient on the SW
side and try to recover from them when possible.

So I'd say checking for a -1 read return on PCI is a sufficient
technique for the driver to use to understand if it's device is still
present. mlx5 devices further have an interactive register operation
"health check" that proves the device and it's PCI path is alive.

Failing health checks trigger recovery, which shoot downs sleeps,
cleanly destroys stuff, resets the device, and starts running
again. IIRC this is actually done with a rdma hot unplug/plug sequence
autonomously executed inside the driver.

A driver can do a health check immediately in remove() and make a
decision if the device is alive or not to speed up removal in the
hostile hot unplug case.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 12:32                                                         ` Jason Gunthorpe
@ 2025-03-07 13:09                                                           ` Simona Vetter
  2025-03-07 14:55                                                             ` Jason Gunthorpe
  2025-03-07 14:00                                                           ` Greg KH
  1 sibling, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-07 13:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Mar 07, 2025 at 08:32:55AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 07, 2025 at 11:28:37AM +0100, Simona Vetter wrote:
> 
> > > I wouldn't say it is wrong. It is still the correct thing to do, and
> > > following down the normal cleanup paths is a good way to ensure the
> > > special case doesn't have bugs. The primary difference is you want to
> > > understand the device is dead and stop waiting on it faster. Drivers
> > > need to consider these things anyhow if they want resiliency against
> > > device crashes, PCI link wobbles and so on that don't involve
> > > remove().
> > 
> > Might need to revisit that discussion, but Greg didn't like when we asked
> > for a pci helper to check whether the device is physically gone (at least
> > per the driver model). Hacking that in drivers is doable, but feels
> > icky.
> 
> I think Greg is right here, the driver model has less knowledge than
> the driver if the device is alive.

Maybe I misremember, but iirc he was fairly fundamentally opposed to
trying to guess whether the hw is gone or not in the ->remove callback.
But maybe that's more from the usb world, where all the hotremove race
conditions are handled in the subsystem and you only have to deal with
errno from calling into usb functions and unwind. So much, much easier
situation.

> The resiliency/fast-failure issue is not just isolated to having
> observed a proper hot-unplug, but there are many classes of failure
> that cause the device HW to malfunction that a robust driver can
> detect and recover from. mlx5 attempts to do this for instance.
> 
> It turns out when you deploy clusters with 800,000 NICs in them there
> are weird HW fails constantly and you have to be resilient on the SW
> side and try to recover from them when possible.
> 
> So I'd say checking for a -1 read return on PCI is a sufficient
> technique for the driver to use to understand if it's device is still
> present. mlx5 devices further have an interactive register operation
> "health check" that proves the device and it's PCI path is alive.
> 
> Failing health checks trigger recovery, which shoot downs sleeps,
> cleanly destroys stuff, resets the device, and starts running
> again. IIRC this is actually done with a rdma hot unplug/plug sequence
> autonomously executed inside the driver.
> 
> A driver can do a health check immediately in remove() and make a
> decision if the device is alive or not to speed up removal in the
> hostile hot unplug case.

Hm ... I guess when you get an all -1 read you check with a specific
register to make sure it's not a false positive? Since for some registers
that's a valid value.

But yeah maybe this approach is more solid. The current C approach we have
with an srcu revoceable section is definitely a least worse attempt from a
very, very bad starting point.

I think maybe we should also have two levels here:

- Ideal driver design, probably what you've outlined above. This will need
  some hw/driver specific thought to get the optimal design most likely.
  This part is probably more bus and subsystem specific best practices
  documentation than things we enforce with the rust abstractions.

- The "at least we don't blow up with memory safety issues" bare minimum
  that the rust abstractions should guarantee. So revocable and friends.

And I think the latter safety fallback does not prevent you from doing the
full fancy design, e.g. for revocable resources that only happens after
your explicitly-coded ->remove() callback has finished. Which means you
still have full access to the hw like anywhere else.

Does this sounds like a possible conclusion of this thread, or do we need
to keep digging?

Also now that I look at this problem as a two-level issue, I think drm is
actually a lot better than what I explained. If you clean up driver state
properly in ->remove (or as stack automatic cleanup functions that run
before all the mmio/irq/whatever stuff disappears), then we are largely
there already with being able to fully quiescent driver state enough to
make sure no new requests can sneak in. As an example
drm_atomic_helper_shutdown does a full kernel modesetting commit across
all resources, which guarantees that all preceeding in-flight commits have
finished (or timed out, we should probably be a bit smarter on this so the
timeouts are shorter when the hw is gone for good). And if you do that
after drm_dev_unplug then nothing new should have been able to sneak in I
think, at least conceptually. In practice we might have a bunch of funny
races that are worth plugging I guess.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 12:32                                                         ` Jason Gunthorpe
  2025-03-07 13:09                                                           ` Simona Vetter
@ 2025-03-07 14:00                                                           ` Greg KH
  2025-03-07 14:46                                                             ` Jason Gunthorpe
  1 sibling, 1 reply; 104+ messages in thread
From: Greg KH @ 2025-03-07 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Danilo Krummrich, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 07, 2025 at 08:32:55AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 07, 2025 at 11:28:37AM +0100, Simona Vetter wrote:
> 
> > > I wouldn't say it is wrong. It is still the correct thing to do, and
> > > following down the normal cleanup paths is a good way to ensure the
> > > special case doesn't have bugs. The primary difference is you want to
> > > understand the device is dead and stop waiting on it faster. Drivers
> > > need to consider these things anyhow if they want resiliency against
> > > device crashes, PCI link wobbles and so on that don't involve
> > > remove().
> > 
> > Might need to revisit that discussion, but Greg didn't like when we asked
> > for a pci helper to check whether the device is physically gone (at least
> > per the driver model). Hacking that in drivers is doable, but feels
> > icky.
> 
> I think Greg is right here, the driver model has less knowledge than
> the driver if the device is alive.

That's not why I don't want this.  Think about this sequence:
	if (!device_is_gone(dev)) {
		// do something
	}
right after you check it, the value can change.  So all you really can
check for is:
	if (device_is_gone(dev)) {
		// clean up
	}
which is going to be racy as well, because you should already be
handling this if you care about it because the device could be gone but
not yet told the driver core / bus yet.

So this type of check can't really work, which is why I don't want
people to even consider it.

> The resiliency/fast-failure issue is not just isolated to having
> observed a proper hot-unplug, but there are many classes of failure
> that cause the device HW to malfunction that a robust driver can
> detect and recover from. mlx5 attempts to do this for instance.
> 
> It turns out when you deploy clusters with 800,000 NICs in them there
> are weird HW fails constantly and you have to be resilient on the SW
> side and try to recover from them when possible.
> 
> So I'd say checking for a -1 read return on PCI is a sufficient
> technique for the driver to use to understand if it's device is still
> present. mlx5 devices further have an interactive register operation
> "health check" that proves the device and it's PCI path is alive.

The -1 read is what PCI says will happen if the device is gone, so all
drivers have to do this if they care about it.  USB does something
different, as does all other busses.  So this is a very driver/bus
specific thing as you say.

> Failing health checks trigger recovery, which shoot downs sleeps,
> cleanly destroys stuff, resets the device, and starts running
> again. IIRC this is actually done with a rdma hot unplug/plug sequence
> autonomously executed inside the driver.
> 
> A driver can do a health check immediately in remove() and make a
> decision if the device is alive or not to speed up removal in the
> hostile hot unplug case.

Agreed.

But really, all these gyrations just to attempt to make it easier for
driver developers, the smallest number of people who will ever interact
with the device in the world, just to prevent rebooting, seems not
really all that important.

Handle the real cases, like you are are saying here, and then all should
be ok.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 14:00                                                           ` Greg KH
@ 2025-03-07 14:46                                                             ` Jason Gunthorpe
  2025-03-07 15:19                                                               ` Greg KH
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 14:46 UTC (permalink / raw)
  To: Greg KH
  Cc: John Hubbard, Danilo Krummrich, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 07, 2025 at 03:00:09PM +0100, Greg KH wrote:
> On Fri, Mar 07, 2025 at 08:32:55AM -0400, Jason Gunthorpe wrote:
> > On Fri, Mar 07, 2025 at 11:28:37AM +0100, Simona Vetter wrote:
> > 
> > > > I wouldn't say it is wrong. It is still the correct thing to do, and
> > > > following down the normal cleanup paths is a good way to ensure the
> > > > special case doesn't have bugs. The primary difference is you want to
> > > > understand the device is dead and stop waiting on it faster. Drivers
> > > > need to consider these things anyhow if they want resiliency against
> > > > device crashes, PCI link wobbles and so on that don't involve
> > > > remove().
> > > 
> > > Might need to revisit that discussion, but Greg didn't like when we asked
> > > for a pci helper to check whether the device is physically gone (at least
> > > per the driver model). Hacking that in drivers is doable, but feels
> > > icky.
> > 
> > I think Greg is right here, the driver model has less knowledge than
> > the driver if the device is alive.
> 
> That's not why I don't want this.  Think about this sequence:
> 	if (!device_is_gone(dev)) {
> 		// do something
> 	}
> right after you check it, the value can change. 

Oh, I imagined this would latch off. For instance if you hotunplug a
PCI struct device then that struct device will be destroyed
eventually. If in the meantime a PCI device is re-discovered at the
same BDF it would have to wait until the prior one is sufficiently
destroyed before creating a new struct device and getting plugged in.

> Handle the real cases, like you are are saying here, and then all should
> be ok.

Yes, if you handle physical device unplug, PCI device unplug, and PCI
device failure recovery then you cover all the actual production use
cases. That is already so comprehesive and hard that driver writers
will be overjoyed with the result anyhow :)

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 13:09                                                           ` Simona Vetter
@ 2025-03-07 14:55                                                             ` Jason Gunthorpe
  2025-03-13 14:32                                                               ` Simona Vetter
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 14:55 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Mar 07, 2025 at 02:09:12PM +0100, Simona Vetter wrote:

> > A driver can do a health check immediately in remove() and make a
> > decision if the device is alive or not to speed up removal in the
> > hostile hot unplug case.
> 
> Hm ... I guess when you get an all -1 read you check with a specific
> register to make sure it's not a false positive? Since for some registers
> that's a valid value.

Yes. mlx5 has HW designed to support this, but I imagine on most
devices you could find an ID register or something that won't be -1.

> - The "at least we don't blow up with memory safety issues" bare minimum
>   that the rust abstractions should guarantee. So revocable and friends.

I still really dislike recovable because it imposes a cost that is
unnecessary.

> And I think the latter safety fallback does not prevent you from doing the
> full fancy design, e.g. for revocable resources that only happens after
> your explicitly-coded ->remove() callback has finished. Which means you
> still have full access to the hw like anywhere else.

Yes, if you use rust bindings with something like RDMA then I would
expect that by the time remove is done everything is cleaned up and
all the revokable stuff was useless and never used.

This is why I dislike revoke so much. It is adding a bunch of garbage
all over the place that is *never used* if the driver is working
correctly.

I believe it is much better to runtime check that the driver is
correct and not burden the API design with this.

Giving people these features will only encourage them to write wrong
drivers.

This is not even a new idea, devm introduces automatic lifetime into
the kernel and I've sat in presentations about how devm has all sorts
of bug classes because of misuse. :\

> Does this sounds like a possible conclusion of this thread, or do we need
> to keep digging?

IDK, I think this should be socialized more. It is important as it
effects all drivers here out, and it is radically different to how the
kernel works today.

> Also now that I look at this problem as a two-level issue, I think drm is
> actually a lot better than what I explained. If you clean up driver state
> properly in ->remove (or as stack automatic cleanup functions that run
> before all the mmio/irq/whatever stuff disappears), then we are largely
> there already with being able to fully quiescent driver state enough to
> make sure no new requests can sneak in. 

That is the typical subsystem design!

Thanks,
Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 14:46                                                             ` Jason Gunthorpe
@ 2025-03-07 15:19                                                               ` Greg KH
  2025-03-07 15:25                                                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Greg KH @ 2025-03-07 15:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Danilo Krummrich, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 07, 2025 at 10:46:29AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 07, 2025 at 03:00:09PM +0100, Greg KH wrote:
> > On Fri, Mar 07, 2025 at 08:32:55AM -0400, Jason Gunthorpe wrote:
> > > On Fri, Mar 07, 2025 at 11:28:37AM +0100, Simona Vetter wrote:
> > > 
> > > > > I wouldn't say it is wrong. It is still the correct thing to do, and
> > > > > following down the normal cleanup paths is a good way to ensure the
> > > > > special case doesn't have bugs. The primary difference is you want to
> > > > > understand the device is dead and stop waiting on it faster. Drivers
> > > > > need to consider these things anyhow if they want resiliency against
> > > > > device crashes, PCI link wobbles and so on that don't involve
> > > > > remove().
> > > > 
> > > > Might need to revisit that discussion, but Greg didn't like when we asked
> > > > for a pci helper to check whether the device is physically gone (at least
> > > > per the driver model). Hacking that in drivers is doable, but feels
> > > > icky.
> > > 
> > > I think Greg is right here, the driver model has less knowledge than
> > > the driver if the device is alive.
> > 
> > That's not why I don't want this.  Think about this sequence:
> > 	if (!device_is_gone(dev)) {
> > 		// do something
> > 	}
> > right after you check it, the value can change. 
> 
> Oh, I imagined this would latch off. For instance if you hotunplug a
> PCI struct device then that struct device will be destroyed
> eventually.

That is true.

> If in the meantime a PCI device is re-discovered at the
> same BDF it would have to wait until the prior one is sufficiently
> destroyed before creating a new struct device and getting plugged in.

I think we just create a new one and away you go!  :)

Just like other busses, if PCI can't handle this at the core hotplug
layer (i.e. by giving up new resources to new devices) then the bus core
for it should handle this type of locking scheme as really, that feels
wrong.  A new device is a new device, should have nothing to do with any
old previous one ever plugged in.

> > Handle the real cases, like you are are saying here, and then all should
> > be ok.
> 
> Yes, if you handle physical device unplug, PCI device unplug, and PCI
> device failure recovery then you cover all the actual production use
> cases. That is already so comprehesive and hard that driver writers
> will be overjoyed with the result anyhow :)

Kernel programming is hard, let's go shopping :)

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 15:19                                                               ` Greg KH
@ 2025-03-07 15:25                                                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-07 15:25 UTC (permalink / raw)
  To: Greg KH
  Cc: John Hubbard, Danilo Krummrich, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 07, 2025 at 04:19:30PM +0100, Greg KH wrote:

> Just like other busses, if PCI can't handle this at the core hotplug
> layer (i.e. by giving up new resources to new devices) then the bus core
> for it should handle this type of locking scheme as really, that feels
> wrong.  A new device is a new device, should have nothing to do with any
> old previous one ever plugged in.

I think it would break the iommu assumptions to have two struct
devices with the same PCI BDF co-exist in the system at once.

There is only one HW IOMMU table routing BDFs..

Most likely the new device would have it's iommu setup blown up as the
old device completes its shutdown and tries to tear down its iommu
setup that is now actually owned by the new device...

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-07 14:55                                                             ` Jason Gunthorpe
@ 2025-03-13 14:32                                                               ` Simona Vetter
  2025-03-19 17:21                                                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-13 14:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Mar 07, 2025 at 10:55:57AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 07, 2025 at 02:09:12PM +0100, Simona Vetter wrote:
> 
> > > A driver can do a health check immediately in remove() and make a
> > > decision if the device is alive or not to speed up removal in the
> > > hostile hot unplug case.
> > 
> > Hm ... I guess when you get an all -1 read you check with a specific
> > register to make sure it's not a false positive? Since for some registers
> > that's a valid value.
> 
> Yes. mlx5 has HW designed to support this, but I imagine on most
> devices you could find an ID register or something that won't be -1.
> 
> > - The "at least we don't blow up with memory safety issues" bare minimum
> >   that the rust abstractions should guarantee. So revocable and friends.
> 
> I still really dislike recovable because it imposes a cost that is
> unnecessary.
> 
> > And I think the latter safety fallback does not prevent you from doing the
> > full fancy design, e.g. for revocable resources that only happens after
> > your explicitly-coded ->remove() callback has finished. Which means you
> > still have full access to the hw like anywhere else.
> 
> Yes, if you use rust bindings with something like RDMA then I would
> expect that by the time remove is done everything is cleaned up and
> all the revokable stuff was useless and never used.
> 
> This is why I dislike revoke so much. It is adding a bunch of garbage
> all over the place that is *never used* if the driver is working
> correctly.
> 
> I believe it is much better to runtime check that the driver is
> correct and not burden the API design with this.

You can do that with for example runtime proofs. R4l has that with
Mutex from one structure protecting other structures (like in a tree). But
since the compiler can't prove those you trade in the possibility that you
will hit a runtime BUG if things don't line up.

So subsystems that ensure that driver callbacks never run concurrently
with a revocation could guarantee that revocable resources are always
present.

> Giving people these features will only encourage them to write wrong
> drivers.

So I think you can still achieve that building on top of revocable and a
few more abstractions that are internally unsafe. Or are you thinking of
different runtime checks?

> This is not even a new idea, devm introduces automatic lifetime into
> the kernel and I've sat in presentations about how devm has all sorts
> of bug classes because of misuse. :\

Yeah automatic lifetime is great, until people mix up things with
different lifetimes, then it all goes wrong.

> > Does this sounds like a possible conclusion of this thread, or do we need
> > to keep digging?
> 
> IDK, I think this should be socialized more. It is important as it
> effects all drivers here out, and it is radically different to how the
> kernel works today.
> 
> > Also now that I look at this problem as a two-level issue, I think drm is
> > actually a lot better than what I explained. If you clean up driver state
> > properly in ->remove (or as stack automatic cleanup functions that run
> > before all the mmio/irq/whatever stuff disappears), then we are largely
> > there already with being able to fully quiescent driver state enough to
> > make sure no new requests can sneak in. 
> 
> That is the typical subsystem design!

Yeah maybe we're not that far really. But I'm still not clear how to do
an entirely revoke-less world.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-13 14:32                                                               ` Simona Vetter
@ 2025-03-19 17:21                                                                 ` Jason Gunthorpe
  2025-03-21 10:35                                                                   ` Simona Vetter
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-19 17:21 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Thu, Mar 13, 2025 at 03:32:14PM +0100, Simona Vetter wrote:

> So I think you can still achieve that building on top of revocable and a
> few more abstractions that are internally unsafe. Or are you thinking of
> different runtime checks?

I'm thinking on the access side of the revocable you don't have a
failure path. Instead you get the access or runtime violation if the
driver is buggy. This eliminates all the objectionable failure paths
and costs on the performance paths of the driver.

And perhaps also on the remove path you have runtime checking if
"driver lifetime bound" objects have all been cleaned up.

The point is to try to behave more like the standard fence pattern and
get some level of checking that can make r4l comfortable without
inventing new kernel lifecycle models.

> Yeah maybe we're not that far really. But I'm still not clear how to do
> an entirely revoke-less world.

Not entirely, you end up revoking big things. Like RDMA revokes the
driver ops callbacks using SRCU. It doesn't revoke individual
resources or DMA maps.

I have the same feeling about this micro-revoke direction, I don't
know how to implement this. The DMA API is very challenging,
especially the performance use of DMA API.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-19 17:21                                                                 ` Jason Gunthorpe
@ 2025-03-21 10:35                                                                   ` Simona Vetter
  2025-03-21 12:04                                                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Simona Vetter @ 2025-03-21 10:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Wed, Mar 19, 2025 at 02:21:32PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 13, 2025 at 03:32:14PM +0100, Simona Vetter wrote:
> 
> > So I think you can still achieve that building on top of revocable and a
> > few more abstractions that are internally unsafe. Or are you thinking of
> > different runtime checks?
> 
> I'm thinking on the access side of the revocable you don't have a
> failure path. Instead you get the access or runtime violation if the
> driver is buggy. This eliminates all the objectionable failure paths
> and costs on the performance paths of the driver.
> 
> And perhaps also on the remove path you have runtime checking if
> "driver lifetime bound" objects have all been cleaned up.
> 
> The point is to try to behave more like the standard fence pattern and
> get some level of checking that can make r4l comfortable without
> inventing new kernel lifecycle models.
> 
> > Yeah maybe we're not that far really. But I'm still not clear how to do
> > an entirely revoke-less world.
> 
> Not entirely, you end up revoking big things. Like RDMA revokes the
> driver ops callbacks using SRCU. It doesn't revoke individual
> resources or DMA maps.
> 
> I have the same feeling about this micro-revoke direction, I don't
> know how to implement this. The DMA API is very challenging,
> especially the performance use of DMA API.

Ah I think we're in agreement, I think once we get to big subsystems we
really want subsystem-level revokes like you describe here. And rust
already has this concept of a "having one thing guarantess you access to
another". For example an overall lock to a big datastructure gives you
access to all the invidiual nodes, see LockedBy. So I think we're covered
here.

For me the basic Revocable really is more for all the odd-ball
random pieces that aren't covered by subsystem constructs already. And
maybe drm needs to rethink a bunch of things in this area in general, not
just for rust. So maybe we should extend the rustdoc to explain that bare
Revocable isn't how entire subsystems rust abstractions should be built?

Cheers, Aima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-21 10:35                                                                   ` Simona Vetter
@ 2025-03-21 12:04                                                                     ` Jason Gunthorpe
  2025-03-21 12:12                                                                       ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-21 12:04 UTC (permalink / raw)
  To: John Hubbard, Greg KH, Danilo Krummrich, Joel Fernandes,
	Alexandre Courbot, Dave Airlie, Gary Guo, Joel Fernandes,
	Boqun Feng, Ben Skeggs, linux-kernel, rust-for-linux, nouveau,
	dri-devel, paulmck

On Fri, Mar 21, 2025 at 11:35:40AM +0100, Simona Vetter wrote:
> On Wed, Mar 19, 2025 at 02:21:32PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 13, 2025 at 03:32:14PM +0100, Simona Vetter wrote:
> > 
> > > So I think you can still achieve that building on top of revocable and a
> > > few more abstractions that are internally unsafe. Or are you thinking of
> > > different runtime checks?
> > 
> > I'm thinking on the access side of the revocable you don't have a
> > failure path. Instead you get the access or runtime violation if the
> > driver is buggy. This eliminates all the objectionable failure paths
> > and costs on the performance paths of the driver.
> > 
> > And perhaps also on the remove path you have runtime checking if
> > "driver lifetime bound" objects have all been cleaned up.
> > 
> > The point is to try to behave more like the standard fence pattern and
> > get some level of checking that can make r4l comfortable without
> > inventing new kernel lifecycle models.
> > 
> > > Yeah maybe we're not that far really. But I'm still not clear how to do
> > > an entirely revoke-less world.
> > 
> > Not entirely, you end up revoking big things. Like RDMA revokes the
> > driver ops callbacks using SRCU. It doesn't revoke individual
> > resources or DMA maps.
> > 
> > I have the same feeling about this micro-revoke direction, I don't
> > know how to implement this. The DMA API is very challenging,
> > especially the performance use of DMA API.
> 
> Ah I think we're in agreement, I think once we get to big subsystems we
> really want subsystem-level revokes like you describe here. And rust
> already has this concept of a "having one thing guarantess you access to
> another". For example an overall lock to a big datastructure gives you
> access to all the invidiual nodes, see LockedBy. So I think we're covered
> here.

Make some sense if Rust can do that.

> For me the basic Revocable really is more for all the odd-ball
> random pieces that aren't covered by subsystem constructs already. And
> maybe drm needs to rethink a bunch of things in this area in general, not
> just for rust. So maybe we should extend the rustdoc to explain that bare
> Revocable isn't how entire subsystems rust abstractions should be built?

Then why provide it? Like why provide revoke for DMA API or MMIO as
mandatory part of the core kernel rust bindings if it isn't supposed
to be used and instead rely on this LockedBy sort of thing?

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-21 12:04                                                                     ` Jason Gunthorpe
@ 2025-03-21 12:12                                                                       ` Danilo Krummrich
  2025-03-21 17:49                                                                         ` Jason Gunthorpe
  0 siblings, 1 reply; 104+ messages in thread
From: Danilo Krummrich @ 2025-03-21 12:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 21, 2025 at 09:04:16AM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 21, 2025 at 11:35:40AM +0100, Simona Vetter wrote:
> > On Wed, Mar 19, 2025 at 02:21:32PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Mar 13, 2025 at 03:32:14PM +0100, Simona Vetter wrote:
> > > 
> > > > So I think you can still achieve that building on top of revocable and a
> > > > few more abstractions that are internally unsafe. Or are you thinking of
> > > > different runtime checks?
> > > 
> > > I'm thinking on the access side of the revocable you don't have a
> > > failure path. Instead you get the access or runtime violation if the
> > > driver is buggy. This eliminates all the objectionable failure paths
> > > and costs on the performance paths of the driver.
> > > 
> > > And perhaps also on the remove path you have runtime checking if
> > > "driver lifetime bound" objects have all been cleaned up.
> > > 
> > > The point is to try to behave more like the standard fence pattern and
> > > get some level of checking that can make r4l comfortable without
> > > inventing new kernel lifecycle models.
> > > 
> > > > Yeah maybe we're not that far really. But I'm still not clear how to do
> > > > an entirely revoke-less world.
> > > 
> > > Not entirely, you end up revoking big things. Like RDMA revokes the
> > > driver ops callbacks using SRCU. It doesn't revoke individual
> > > resources or DMA maps.
> > > 
> > > I have the same feeling about this micro-revoke direction, I don't
> > > know how to implement this. The DMA API is very challenging,
> > > especially the performance use of DMA API.
> > 
> > Ah I think we're in agreement, I think once we get to big subsystems we
> > really want subsystem-level revokes like you describe here. And rust
> > already has this concept of a "having one thing guarantess you access to
> > another". For example an overall lock to a big datastructure gives you
> > access to all the invidiual nodes, see LockedBy. So I think we're covered
> > here.
> 
> Make some sense if Rust can do that.
> 
> > For me the basic Revocable really is more for all the odd-ball
> > random pieces that aren't covered by subsystem constructs already. And
> > maybe drm needs to rethink a bunch of things in this area in general, not
> > just for rust. So maybe we should extend the rustdoc to explain that bare
> > Revocable isn't how entire subsystems rust abstractions should be built?
> 
> Then why provide it? Like why provide revoke for DMA API or MMIO as
> mandatory part of the core kernel rust bindings if it isn't supposed
> to be used and instead rely on this LockedBy sort of thing?

Not all device resources are managed in the context of the subsystem, so
subsystem-level revokes do not apply.

For the DMA coherent allocations, please see my comment in [1]. Revoking the
device resources associated with a DMA coherent allocation should hence never
cause any overhead for accessing DMA memory.

[1] https://github.com/Rust-for-Linux/linux/blob/rust-next/rust/kernel/dma.rs#L120

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-21 12:12                                                                       ` Danilo Krummrich
@ 2025-03-21 17:49                                                                         ` Jason Gunthorpe
  2025-03-21 18:54                                                                           ` Danilo Krummrich
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Gunthorpe @ 2025-03-21 17:49 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: John Hubbard, Greg KH, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 21, 2025 at 01:12:30PM +0100, Danilo Krummrich wrote:

> Not all device resources are managed in the context of the subsystem, so
> subsystem-level revokes do not apply.

They could, you could say that these rust APIs are only safe to use
for device drivers with C code providing a fence semantic, eg through
a subsystem.

> For the DMA coherent allocations, please see my comment in [1]. Revoking the
> device resources associated with a DMA coherent allocation should hence never
> cause any overhead for accessing DMA memory.

> [1] https://github.com/Rust-for-Linux/linux/blob/rust-next/rust/kernel/dma.rs#L120

I don't know what to make of this. You argued so much to support
revocable for rust ideological reasons and in the end the proposal is
to just completely gives up on all of that?

Not even an optional runtime check? :(

And I'm not sure about the comment written:

> // However, it is neither desirable nor necessary to protect the allocated memory of the DMA
> // allocation from surviving device unbind;

There are alot of things on this path that depend on the struct
device, there are two kinds of per-device coherent memory allocators
and swiotlb as well.

It looks like there are cases where the actual memory must not outlive
the driver binding.

So, I'd argue that it is necessary, and changing that on the C side
looks like a big project.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
  2025-03-21 17:49                                                                         ` Jason Gunthorpe
@ 2025-03-21 18:54                                                                           ` Danilo Krummrich
  0 siblings, 0 replies; 104+ messages in thread
From: Danilo Krummrich @ 2025-03-21 18:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Greg KH, Joel Fernandes, Alexandre Courbot,
	Dave Airlie, Gary Guo, Joel Fernandes, Boqun Feng, Ben Skeggs,
	linux-kernel, rust-for-linux, nouveau, dri-devel, paulmck

On Fri, Mar 21, 2025 at 02:49:20PM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 21, 2025 at 01:12:30PM +0100, Danilo Krummrich wrote:
> 
> > Not all device resources are managed in the context of the subsystem, so
> > subsystem-level revokes do not apply.
> 
> They could, you could say that these rust APIs are only safe to use
> for device drivers with C code providing a fence semantic, eg through
> a subsystem.

Not sure how such an implementation would look like. How would you model such a
requirement through the type system? Or do you propose to just use unsafe {}?

> > For the DMA coherent allocations, please see my comment in [1]. Revoking the
> > device resources associated with a DMA coherent allocation should hence never
> > cause any overhead for accessing DMA memory.
> 
> > [1] https://github.com/Rust-for-Linux/linux/blob/rust-next/rust/kernel/dma.rs#L120
> 
> I don't know what to make of this. You argued so much to support
> revocable for rust ideological reasons and in the end the proposal is
> to just completely gives up on all of that?

And I still do so for certain cases, such as I/O memory.

For the CoherentAllocation though, AFAICS, there is no reason to revoke all the
memory, but just the device resources behind.

You gave me the relevant cause for thought on this. I don't see why you would be
unhappy with the idea. We still correspond with the rules of the driver core,
i.e. ensure that device resources do not out-live driver unbind, but still don't
have to deal with a revocable wrapper.

(One side note, please stop continuously accusing me of things, such as having
"rust ideological reasons", not having gone through proper review with the
driver core abstractions, etc. It won't achieve anything and does not come
across very professional.)

> 
> Not even an optional runtime check? :(

What kind of runtime check do you propose in the meantime? Maybe you can send a
patch? That'd be very much appreciated.

> 
> And I'm not sure about the comment written:
> 
> > // However, it is neither desirable nor necessary to protect the allocated memory of the DMA
> > // allocation from surviving device unbind;
> 
> There are alot of things on this path that depend on the struct
> device, there are two kinds of per-device coherent memory allocators
> and swiotlb as well.
> 
> It looks like there are cases where the actual memory must not outlive
> the driver binding.

I don't see why actual memory allocations would ever be required to be freed
immediately when the driver is unbound. It has nothing to do with the device.

Do you have an example where this would be a requirement?

> 
> So, I'd argue that it is necessary, and changing that on the C side
> looks like a big project.

I don't think that the idea is trivial to implement, but ideally the C side
should benefit as well.

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2025-03-21 18:55 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
2025-02-17 20:47   ` Sergio González Collado
2025-02-17 21:10   ` Daniel Almeida
2025-02-18 13:16     ` Alexandre Courbot
2025-02-18 20:51       ` Timur Tabi
2025-02-19  1:21         ` Alexandre Courbot
2025-02-19  3:24           ` John Hubbard
2025-02-19 12:51             ` Alexandre Courbot
2025-02-19 20:22               ` John Hubbard
2025-02-19 20:23                 ` Dave Airlie
2025-02-19 23:13                   ` Daniel Almeida
2025-02-20  0:14                     ` John Hubbard
2025-02-21 11:35                       ` Alexandre Courbot
2025-02-21 12:31                         ` Danilo Krummrich
2025-02-19 20:11           ` Sergio González Collado
2025-02-18 10:07   ` Dirk Behme
2025-02-18 13:07     ` Alexandre Courbot
2025-02-20  6:23       ` Dirk Behme
2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
2025-02-17 21:15   ` Daniel Almeida
2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
2025-02-18  8:07   ` Greg KH
2025-02-18 13:23     ` Alexandre Courbot
2025-02-17 21:33 ` Danilo Krummrich
2025-02-18  1:46   ` Dave Airlie
2025-02-18 10:26     ` Danilo Krummrich
2025-02-19 12:58       ` Simona Vetter
2025-02-24  1:40     ` Alexandre Courbot
2025-02-24 12:07       ` Danilo Krummrich
2025-02-24 12:11         ` Danilo Krummrich
2025-02-24 18:45           ` Joel Fernandes
2025-02-24 23:44             ` Danilo Krummrich
2025-02-25 15:52               ` Joel Fernandes
2025-02-25 16:09                 ` Danilo Krummrich
2025-02-25 21:02                   ` Joel Fernandes
2025-02-25 22:02                     ` Danilo Krummrich
2025-02-25 22:42                       ` Dave Airlie
2025-02-25 22:57                     ` Jason Gunthorpe
2025-02-25 23:26                       ` Danilo Krummrich
2025-02-25 23:45                       ` Danilo Krummrich
2025-02-26  0:49                         ` Jason Gunthorpe
2025-02-26  1:16                           ` Danilo Krummrich
2025-02-26 17:21                             ` Jason Gunthorpe
2025-02-26 21:31                               ` Danilo Krummrich
2025-02-26 23:47                                 ` Jason Gunthorpe
2025-02-27  0:41                                   ` Boqun Feng
2025-02-27 14:46                                     ` Jason Gunthorpe
2025-02-27 15:18                                       ` Boqun Feng
2025-02-27 16:17                                         ` Jason Gunthorpe
2025-02-27 16:55                                           ` Boqun Feng
2025-02-27 17:32                                             ` Danilo Krummrich
2025-02-27 19:23                                               ` Jason Gunthorpe
2025-02-27 21:25                                                 ` Boqun Feng
2025-02-27 22:00                                                   ` Jason Gunthorpe
2025-02-27 22:40                                                     ` Danilo Krummrich
2025-02-28 18:55                                                       ` Jason Gunthorpe
2025-03-03 19:36                                                         ` Danilo Krummrich
2025-03-03 21:50                                                           ` Jason Gunthorpe
2025-03-04  9:57                                                             ` Danilo Krummrich
2025-02-27  1:02                                   ` Greg KH
2025-02-27  1:34                                     ` John Hubbard
2025-02-27 21:42                                       ` Dave Airlie
2025-02-27 23:06                                         ` John Hubbard
2025-02-28  4:10                                           ` Dave Airlie
2025-02-28 18:50                                             ` Jason Gunthorpe
2025-02-28 10:52                                       ` Simona Vetter
2025-02-28 18:40                                         ` Jason Gunthorpe
2025-03-04 16:10                                           ` Simona Vetter
2025-03-04 16:42                                             ` Jason Gunthorpe
2025-03-05  7:30                                               ` Simona Vetter
2025-03-05 15:10                                                 ` Jason Gunthorpe
2025-03-06 10:42                                                   ` Simona Vetter
2025-03-06 15:32                                                     ` Jason Gunthorpe
2025-03-07 10:28                                                       ` Simona Vetter
2025-03-07 12:32                                                         ` Jason Gunthorpe
2025-03-07 13:09                                                           ` Simona Vetter
2025-03-07 14:55                                                             ` Jason Gunthorpe
2025-03-13 14:32                                                               ` Simona Vetter
2025-03-19 17:21                                                                 ` Jason Gunthorpe
2025-03-21 10:35                                                                   ` Simona Vetter
2025-03-21 12:04                                                                     ` Jason Gunthorpe
2025-03-21 12:12                                                                       ` Danilo Krummrich
2025-03-21 17:49                                                                         ` Jason Gunthorpe
2025-03-21 18:54                                                                           ` Danilo Krummrich
2025-03-07 14:00                                                           ` Greg KH
2025-03-07 14:46                                                             ` Jason Gunthorpe
2025-03-07 15:19                                                               ` Greg KH
2025-03-07 15:25                                                                 ` Jason Gunthorpe
2025-02-27 14:23                                     ` Jason Gunthorpe
2025-02-27 11:32                                   ` Danilo Krummrich
2025-02-27 15:07                                     ` Jason Gunthorpe
2025-02-27 16:51                                       ` Danilo Krummrich
2025-02-25 14:11         ` Alexandre Courbot
2025-02-25 15:06           ` Danilo Krummrich
2025-02-25 15:23             ` Alexandre Courbot
2025-02-25 15:53               ` Danilo Krummrich
2025-02-27 21:37           ` Dave Airlie
2025-02-28  1:49             ` Timur Tabi
2025-02-28  2:24               ` Dave Airlie
2025-02-18 13:35   ` Alexandre Courbot
2025-02-18  1:42 ` Dave Airlie
2025-02-18 13:47   ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).