public inbox for driver-core@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Fixes the stack overflow
@ 2026-03-09 16:34 Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 1/4] rust: ptr: add `KnownSize` trait to support DST size info extraction Tim Kovalenko via B4 Relay
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core, Tim Kovalenko


---
Changes in v4:
- Rebase on top of projection changes
- Use index projection when writing PTEs
- Keep PteArray, as discussed in V2, and move entry calculation there
- Avoid stack allocation by writing PTEs directly to DMA memory

- Link to v3: https://lore.kernel.org/r/20260217-drm-rust-next-v3-1-9e7e95c597dc@proton.me

Changes in v3:
- Addressed the comments and re-instated the PteArray type.
- PteArray now uses `init` instead of `new` where it writes to `self`
  page by page.
- PteArray just needs a pte pointer obtained from the `gsp_mem.as_slice_mut`.

I hope I understood everything in the V2 email chain and implemented it correctly :)

- Link to v2: https://lore.kernel.org/r/20260213-drm-rust-next-v2-1-aa094f78721a@proton.me

Changes in v2:
- Missed a code formatting issue.
- Link to v1: https://lore.kernel.org/r/20260212-drm-rust-next-v1-1-409398b12e61@proton.me

---
Gary Guo (3):
      rust: ptr: add `KnownSize` trait to support DST size info extraction
      rust: ptr: add projection infrastructure
      rust: dma: use pointer projection infra for `dma_{read,write}` macro

Tim Kovalenko (1):
      gpu: nova-core: fix stack overflow in GSP memory allocation

 drivers/gpu/nova-core/gsp.rs      |  48 ++++---
 drivers/gpu/nova-core/gsp/boot.rs |   2 +-
 drivers/gpu/nova-core/gsp/cmdq.rs |  23 ++-
 rust/kernel/dma.rs                | 114 +++++++--------
 rust/kernel/lib.rs                |   4 +
 rust/kernel/ptr.rs                |  30 +++-
 rust/kernel/ptr/projection.rs     | 294 ++++++++++++++++++++++++++++++++++++++
 samples/rust/rust_dma.rs          |  30 ++--
 scripts/Makefile.build            |   4 +-
 9 files changed, 443 insertions(+), 106 deletions(-)
---
base-commit: dd8a93dafe6ef50b49d2a7b44862264d74a7aafa
change-id: 20260212-drm-rust-next-beb92aee9d75

Best regards,
-- 
Tim Kovalenko <tim.kovalenko@proton.me>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/4] rust: ptr: add `KnownSize` trait to support DST size info extraction
  2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
@ 2026-03-09 16:34 ` Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 2/4] rust: ptr: add projection infrastructure Tim Kovalenko via B4 Relay
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core

From: Gary Guo <gary@garyguo.net>

Add a `KnownSize` trait which is used obtain a size from a raw pointer's
metadata. This makes it possible to obtain size information on a raw slice
pointer. This is similar to Rust `core::mem::size_of_val_raw` which is not
yet stable.

Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/ptr.rs | 27 ++++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 3da92f18f4eed16335c3c251e6bff68dcf7e781e..510cc7fe496113f85c34f420b1c4be95596297ad 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -20,6 +20,7 @@
 #![feature(generic_nonzero)]
 #![feature(inline_const)]
 #![feature(pointer_is_aligned)]
+#![feature(slice_ptr_len)]
 //
 // Stable since Rust 1.80.0.
 #![feature(slice_flatten)]
diff --git a/rust/kernel/ptr.rs b/rust/kernel/ptr.rs
index 5b6a382637fef11e9c0ae9122906422a970d7fcd..cf980a103acf19ee3bd17bb1dfdbcadfe30467ae 100644
--- a/rust/kernel/ptr.rs
+++ b/rust/kernel/ptr.rs
@@ -2,7 +2,10 @@
 
 //! Types and functions to work with pointers and addresses.
 
-use core::mem::align_of;
+use core::mem::{
+    align_of,
+    size_of, //
+};
 use core::num::NonZero;
 
 /// Type representing an alignment, which is always a power of two.
@@ -225,3 +228,25 @@ fn align_up(self, alignment: Alignment) -> Option<Self> {
 }
 
 impl_alignable_uint!(u8, u16, u32, u64, usize);
+
+/// Trait to represent compile-time known size information.
+///
+/// This is a generalization of what [`size_of`] which works for dynamically sized types.
+pub trait KnownSize {
+    /// Get the size of an object of this type in bytes, with the metadata of the given pointer.
+    fn size(p: *const Self) -> usize;
+}
+
+impl<T> KnownSize for T {
+    #[inline(always)]
+    fn size(_: *const Self) -> usize {
+        size_of::<T>()
+    }
+}
+
+impl<T> KnownSize for [T] {
+    #[inline(always)]
+    fn size(p: *const Self) -> usize {
+        p.len() * size_of::<T>()
+    }
+}

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/4] rust: ptr: add projection infrastructure
  2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 1/4] rust: ptr: add `KnownSize` trait to support DST size info extraction Tim Kovalenko via B4 Relay
@ 2026-03-09 16:34 ` Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 3/4] rust: dma: use pointer projection infra for `dma_{read,write}` macro Tim Kovalenko via B4 Relay
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core

From: Gary Guo <gary@garyguo.net>

Add a generic infrastructure for performing field and index projections on
raw pointers. This will form the basis of performing I/O projections.

Pointers manipulations are intentionally using the safe wrapping variants
instead of the unsafe variants, as the latter requires pointers to be
inside an allocation which is not necessarily true for I/O pointers.

This projection macro protects against rogue `Deref` implementation, which
can causes the projected pointer to be outside the bounds of starting
pointer. This is extremely unlikely and Rust has a lint to catch this, but
is unsoundness regardless. The protection works by inducing type inference
ambiguity when `Deref` is implemented.

This projection macro also stops projecting into unaligned fields (i.e.
fields of `#[repr(packed)]` structs), as misaligned pointers require
special handling. This is implemented by attempting to create reference to
projected field inside a `if false` block. Despite being unreachable, Rust
still checks that they're not unaligned fields.

The projection macro supports both fallible and infallible index
projections. These are described in detail inside the documentation.

Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
---
 rust/kernel/lib.rs            |   3 +
 rust/kernel/ptr.rs            |   3 +
 rust/kernel/ptr/projection.rs | 294 ++++++++++++++++++++++++++++++++++++++++++
 scripts/Makefile.build        |   4 +-
 4 files changed, 303 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 510cc7fe496113f85c34f420b1c4be95596297ad..d93292d47420f1f298a452ade5feefedce5ade86 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -38,6 +38,9 @@
 #![feature(const_ptr_write)]
 #![feature(const_refs_to_cell)]
 //
+// Stable since Rust 1.84.0.
+#![feature(strict_provenance)]
+//
 // Expected to become stable.
 #![feature(arbitrary_self_types)]
 //
diff --git a/rust/kernel/ptr.rs b/rust/kernel/ptr.rs
index cf980a103acf19ee3bd17bb1dfdbcadfe30467ae..930f523a4659b2cd5974e681ebf4c15c892a5e5a 100644
--- a/rust/kernel/ptr.rs
+++ b/rust/kernel/ptr.rs
@@ -2,6 +2,9 @@
 
 //! Types and functions to work with pointers and addresses.
 
+pub mod projection;
+pub use crate::project_pointer as project;
+
 use core::mem::{
     align_of,
     size_of, //
diff --git a/rust/kernel/ptr/projection.rs b/rust/kernel/ptr/projection.rs
new file mode 100644
index 0000000000000000000000000000000000000000..7686d8502095108b90e863e90cbc3a626c65031e
--- /dev/null
+++ b/rust/kernel/ptr/projection.rs
@@ -0,0 +1,294 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Infrastructure for handling projections.
+
+use core::{
+    mem::MaybeUninit,
+    ops::Deref, //
+};
+
+use crate::prelude::*;
+
+/// Error raised when a projection is attempted on array or slices out of bounds.
+pub struct OutOfBound;
+
+impl From<OutOfBound> for Error {
+    #[inline(always)]
+    fn from(_: OutOfBound) -> Self {
+        ERANGE
+    }
+}
+
+/// A helper trait to perform index projection.
+///
+/// This is similar to `core::slice::SliceIndex`, but operate on raw pointers safely and fallibly.
+///
+/// # Safety
+///
+/// The implementation of `index` and `get` (if `Some` is returned) must ensure that, if provided
+/// input pointer `slice` and returned pointer `output`, then:
+/// - `output` has the same provenance as `slice`;
+/// - `output.byte_offset_from(slice)` is between 0 to
+///   `KnownSize::size(slice) - KnownSize::size(output)`.
+///
+/// This means that if the input pointer is valid, then pointer returned by `get` or `index` is
+/// also valid.
+#[diagnostic::on_unimplemented(message = "`{Self}` cannot be used to index `{T}`")]
+#[doc(hidden)]
+pub unsafe trait ProjectIndex<T: ?Sized>: Sized {
+    type Output: ?Sized;
+
+    /// Returns an index-projected pointer, if in bounds.
+    fn get(self, slice: *mut T) -> Option<*mut Self::Output>;
+
+    /// Returns an index-projected pointer; fail the build if it cannot be proved to be in bounds.
+    #[inline(always)]
+    fn index(self, slice: *mut T) -> *mut Self::Output {
+        Self::get(self, slice).unwrap_or_else(|| build_error!())
+    }
+}
+
+// Forward array impl to slice impl.
+// SAFETY: safety requirement guaranteed by the forwarded impl.
+unsafe impl<T, I, const N: usize> ProjectIndex<[T; N]> for I
+where
+    I: ProjectIndex<[T]>,
+{
+    type Output = <I as ProjectIndex<[T]>>::Output;
+
+    #[inline(always)]
+    fn get(self, slice: *mut [T; N]) -> Option<*mut Self::Output> {
+        <I as ProjectIndex<[T]>>::get(self, slice)
+    }
+
+    #[inline(always)]
+    fn index(self, slice: *mut [T; N]) -> *mut Self::Output {
+        <I as ProjectIndex<[T]>>::index(self, slice)
+    }
+}
+
+// SAFETY: `get` returned pointers has same provenance as `slice` and the offset is checked to not
+// exceed the required bound.
+unsafe impl<T> ProjectIndex<[T]> for usize {
+    type Output = T;
+
+    #[inline(always)]
+    fn get(self, slice: *mut [T]) -> Option<*mut T> {
+        if self >= slice.len() {
+            None
+        } else {
+            Some(slice.cast::<T>().wrapping_add(self))
+        }
+    }
+}
+
+// SAFETY: `get` returned pointers has same provenance as `slice` and the offset is checked to not
+// exceed the required bound.
+unsafe impl<T> ProjectIndex<[T]> for core::ops::Range<usize> {
+    type Output = [T];
+
+    #[inline(always)]
+    fn get(self, slice: *mut [T]) -> Option<*mut [T]> {
+        let new_len = self.end.checked_sub(self.start)?;
+        if self.end > slice.len() {
+            return None;
+        }
+        Some(core::ptr::slice_from_raw_parts_mut(
+            slice.cast::<T>().wrapping_add(self.start),
+            new_len,
+        ))
+    }
+}
+
+// SAFETY: safety requirement guaranteed by the forwarded impl.
+unsafe impl<T> ProjectIndex<[T]> for core::ops::RangeTo<usize> {
+    type Output = [T];
+
+    #[inline(always)]
+    fn get(self, slice: *mut [T]) -> Option<*mut [T]> {
+        (0..self.end).get(slice)
+    }
+}
+
+// SAFETY: safety requirement guaranteed by the forwarded impl.
+unsafe impl<T> ProjectIndex<[T]> for core::ops::RangeFrom<usize> {
+    type Output = [T];
+
+    #[inline(always)]
+    fn get(self, slice: *mut [T]) -> Option<*mut [T]> {
+        (self.start..slice.len()).get(slice)
+    }
+}
+
+// SAFETY: `get` returned the pointer as is, so it always have the same provenance and offset of 0.
+unsafe impl<T> ProjectIndex<[T]> for core::ops::RangeFull {
+    type Output = [T];
+
+    #[inline(always)]
+    fn get(self, slice: *mut [T]) -> Option<*mut [T]> {
+        Some(slice)
+    }
+}
+
+/// A helper trait to perform field projection.
+///
+/// This trait has a `DEREF` generic parameter so it can be implemented twice for types that
+/// implement `Deref`. This will cause an ambiguity error and thus block `Deref` types being used
+/// as base of projection, as they can inject unsoundness. Users therefore must not specify `DEREF`
+/// and should always leave it to be inferred.
+///
+/// # Safety
+///
+/// `proj` may only invoke `f` with a valid allocation, as documentation described.
+#[doc(hidden)]
+pub unsafe trait ProjectField<const DEREF: bool> {
+    /// Project a pointer to a type to a pointer of a field.
+    ///
+    /// `f` may only be invoked with a valid allocation so it can safely obtain raw pointers to
+    /// fields using `&raw mut`.
+    ///
+    /// This is needed because `base` might not point to a valid allocation, while `&raw mut`
+    /// requires pointers to be in bounds of a valid allocation.
+    ///
+    /// # Safety
+    ///
+    /// `f` must return a pointer in bounds of the provided pointer.
+    unsafe fn proj<F>(base: *mut Self, f: impl FnOnce(*mut Self) -> *mut F) -> *mut F;
+}
+
+// NOTE: in theory, this API should work for `T: ?Sized` and `F: ?Sized`, too. However we cannot
+// currently support that as we need to obtain a valid allocation that `&raw const` can operate on.
+// SAFETY: `proj` invokes `f` with valid allocation.
+unsafe impl<T> ProjectField<false> for T {
+    #[inline(always)]
+    unsafe fn proj<F>(base: *mut Self, f: impl FnOnce(*mut Self) -> *mut F) -> *mut F {
+        // Create a valid allocation to start projection, as `base` is not necessarily so. The
+        // memory is never actually used so it will be optimized out, so it should work even for
+        // very large `T` (`memoffset` crate also relies on this). To be extra certain, we also
+        // annotate `f` closure with `#[inline(always)]` in the macro.
+        let mut place = MaybeUninit::uninit();
+        let place_base = place.as_mut_ptr();
+        let field = f(place_base);
+        // SAFETY: `field` is in bounds from `base` per safety requirement.
+        let offset = unsafe { field.byte_offset_from(place_base) };
+        // Use `wrapping_byte_offset` as `base` does not need to be of valid allocation.
+        base.wrapping_byte_offset(offset).cast()
+    }
+}
+
+// SAFETY: vacuously satisfied.
+unsafe impl<T: Deref> ProjectField<true> for T {
+    #[inline(always)]
+    unsafe fn proj<F>(_: *mut Self, _: impl FnOnce(*mut Self) -> *mut F) -> *mut F {
+        build_error!("this function is a guard against `Deref` impl and is never invoked");
+    }
+}
+
+/// Create a projection from a raw pointer.
+///
+/// The projected pointer is within the memory region marked by the input pointer. There is no
+/// requirement that the input raw pointer needs to be valid, so this macro may be used for
+/// projecting pointers outside normal address space, e.g. I/O pointers. However, if the input
+/// pointer is valid, the projected pointer is also valid.
+///
+/// Supported projections include field projections and index projections.
+/// It is not allowed to project into types that implement custom `Deref` or `Index`.
+///
+/// The macro has basic syntax of `kernel::ptr::project!(ptr, projection)`, where `ptr` is an
+/// expression that evaluates to a raw pointer which serves as the base of projection. `projection`
+/// can be a projection expression of form `.field` (normally identifer, or numeral in case of
+/// tuple structs) or of form `[index]`.
+///
+/// If mutable pointer is needed, the macro input can be prefixed with `mut` keyword, i.e.
+/// `kernel::ptr::project!(mut ptr, projection)`. By default, a const pointer is created.
+///
+/// `ptr::project!` macro can perform both fallible indexing and build-time checked indexing.
+/// `[index]` form performs build-time bounds checking; if compiler fails to prove `[index]` is in
+/// bounds, compilation will fail. `[index]?` can be used to perform runtime bounds checking;
+/// `OutOfBound` error is raised via `?` if the index is out of bounds.
+///
+/// # Examples
+///
+/// Field projections are performed with `.field_name`:
+/// ```
+/// struct MyStruct { field: u32, }
+/// let ptr: *const MyStruct = core::ptr::dangling();
+/// let field_ptr: *const u32 = kernel::ptr::project!(ptr, .field);
+///
+/// struct MyTupleStruct(u32, u32);
+///
+/// fn proj(ptr: *const MyTupleStruct) {
+///     let field_ptr: *const u32 = kernel::ptr::project!(ptr, .1);
+/// }
+/// ```
+///
+/// Index projections are performed with `[index]`:
+/// ```
+/// fn proj(ptr: *const [u8; 32]) -> Result {
+///     let field_ptr: *const u8 = kernel::ptr::project!(ptr, [1]);
+///     // This will fail the build.
+///     // kernel::ptr::project!(ptr, [128]);
+///     // This will raise an `OutOfBound` error (which is convertable to `ERANGE`).
+///     kernel::ptr::project!(ptr, [128]?);
+///     Ok(())
+/// }
+/// ```
+///
+/// If you need to match on the error instead of propagate, put the invocation inside a closure:
+/// ```
+/// let ptr: *const [u8; 32] = core::ptr::dangling();
+/// let field_ptr: Result<*const u8> = (|| -> Result<_> {
+///     Ok(kernel::ptr::project!(ptr, [128]?))
+/// })();
+/// assert!(field_ptr.is_err());
+/// ```
+///
+/// For mutable pointers, put `mut` as the first token in macro invocation.
+/// ```
+/// let ptr: *mut [(u8, u16); 32] = core::ptr::dangling_mut();
+/// let field_ptr: *mut u16 = kernel::ptr::project!(mut ptr, [1].1);
+/// ```
+#[macro_export]
+macro_rules! project_pointer {
+    (@gen $ptr:ident, ) => {};
+    // Field projection. `$field` needs to be `tt` to support tuple index like `.0`.
+    (@gen $ptr:ident, .$field:tt $($rest:tt)*) => {
+        // SAFETY: the provided closure always return in bounds pointer.
+        let $ptr = unsafe {
+            $crate::ptr::projection::ProjectField::proj($ptr, #[inline(always)] |ptr| {
+                // Check unaligned field. Not all users (e.g. DMA) can handle unaligned
+                // projections.
+                if false {
+                    let _ = &(*ptr).$field;
+                }
+                // SAFETY: `$field` is in bounds, and no implicit `Deref` is possible (if the
+                // type implements `Deref`, Rust cannot infer the generic parameter `DEREF`).
+                &raw mut (*ptr).$field
+            })
+        };
+        $crate::ptr::project!(@gen $ptr, $($rest)*)
+    };
+    // Fallible index projection.
+    (@gen $ptr:ident, [$index:expr]? $($rest:tt)*) => {
+        let $ptr = $crate::ptr::projection::ProjectIndex::get($index, $ptr)
+            .ok_or($crate::ptr::projection::OutOfBound)?;
+        $crate::ptr::project!(@gen $ptr, $($rest)*)
+    };
+    // Build-time checked index projection.
+    (@gen $ptr:ident, [$index:expr] $($rest:tt)*) => {
+        let $ptr = $crate::ptr::projection::ProjectIndex::index($index, $ptr);
+        $crate::ptr::project!(@gen $ptr, $($rest)*)
+    };
+    (mut $ptr:expr, $($proj:tt)*) => {{
+        let ptr: *mut _ = $ptr;
+        $crate::ptr::project!(@gen ptr, $($proj)*);
+        ptr
+    }};
+    ($ptr:expr, $($proj:tt)*) => {{
+        let ptr = <*const _>::cast_mut($ptr);
+        // We currently always project using mutable pointer, as it is not decided whether `&raw
+        // const` allows the resulting pointer to be mutated (see documentation of `addr_of!`).
+        $crate::ptr::project!(@gen ptr, $($proj)*);
+        ptr.cast_const()
+    }};
+}
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 32e209bc7985cbf26ad8af2b986fbd1920aa3bd4..3652b85be54594e3616be5a92c1967035e9d919d 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -310,16 +310,18 @@ $(obj)/%.lst: $(obj)/%.c FORCE
 
 # The features in this list are the ones allowed for non-`rust/` code.
 #
+#   - Stable since Rust 1.79.0: `feature(slice_ptr_len)`.
 #   - Stable since Rust 1.81.0: `feature(lint_reasons)`.
 #   - Stable since Rust 1.82.0: `feature(asm_const)`,
 #     `feature(offset_of_nested)`, `feature(raw_ref_op)`.
+#   - Stable since Rust 1.84.0: `feature(strict_provenance)`.
 #   - Stable since Rust 1.87.0: `feature(asm_goto)`.
 #   - Expected to become stable: `feature(arbitrary_self_types)`.
 #   - To be determined: `feature(used_with_arg)`.
 #
 # Please see https://github.com/Rust-for-Linux/linux/issues/2 for details on
 # the unstable features in use.
-rust_allowed_features := asm_const,asm_goto,arbitrary_self_types,lint_reasons,offset_of_nested,raw_ref_op,used_with_arg
+rust_allowed_features := asm_const,asm_goto,arbitrary_self_types,lint_reasons,offset_of_nested,raw_ref_op,slice_ptr_len,strict_provenance,used_with_arg
 
 # `--out-dir` is required to avoid temporaries being created by `rustc` in the
 # current working directory, which may be not accessible in the out-of-tree

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/4] rust: dma: use pointer projection infra for `dma_{read,write}` macro
  2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 1/4] rust: ptr: add `KnownSize` trait to support DST size info extraction Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 2/4] rust: ptr: add projection infrastructure Tim Kovalenko via B4 Relay
@ 2026-03-09 16:34 ` Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
  2026-03-09 17:00 ` [PATCH v4 0/4] Fixes the stack overflow Danilo Krummrich
  4 siblings, 0 replies; 13+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core

From: Gary Guo <gary@garyguo.net>

Current `dma_read!`, `dma_write!` macros also use a custom
`addr_of!()`-based implementation for projecting pointers, which has
soundness issue as it relies on absence of `Deref` implementation on types.
It also has a soundness issue where it does not protect against unaligned
fields (when `#[repr(packed)]` is used) so it can generate misaligned
accesses.

This commit migrates them to use the general pointer projection
infrastructure, which handles these cases correctly.

As part of migration, the macro is updated to have an improved surface
syntax. The current macro have

    dma_read!(a.b.c[d].e.f)

to mean `a.b.c` is a DMA coherent allocation and it should project into it
with `[d].e.f` and do a read, which is confusing as it makes the indexing
operator integral to the macro (so it will break if you have an array of
`CoherentAllocation`, for example).

This also is problematic as we would like to generalize
`CoherentAllocation` from just slices to arbitrary types.

Make the macro expects `dma_read!(path.to.dma, .path.inside.dma)` as the
canonical syntax. The index operator is no longer special and is just one
type of projection (in additional to field projection). Similarly, make
`dma_write!(path.to.dma, .path.inside.dma, value)` become the canonical
syntax for writing.

Another issue of the current macro is that it is always fallible. This
makes sense with existing design of `CoherentAllocation`, but once we
support fixed size arrays with `CoherentAllocation`, it is desirable to
have the ability to perform infallible indexing as well, e.g. doing a `[0]`
index of `[Foo; 2]` is okay and can be checked at build-time, so forcing
falliblity is non-ideal. To capture this, the macro is changed to use
`[idx]` as infallible projection and `[idx]?` as fallible index projection
(those syntax are part of the general projection infra). A benefit of this
is that while individual indexing operation may fail, the overall
read/write operation is not fallible.

Fixes: ad2907b4e308 ("rust: add dma coherent allocator abstraction")
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Gary Guo <gary@garyguo.net>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
---
 drivers/gpu/nova-core/gsp.rs      |  14 ++---
 drivers/gpu/nova-core/gsp/boot.rs |   2 +-
 drivers/gpu/nova-core/gsp/cmdq.rs |  10 +++-
 rust/kernel/dma.rs                | 114 +++++++++++++++++---------------------
 samples/rust/rust_dma.rs          |  30 +++++-----
 5 files changed, 81 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b9269cf35286dec3acc4d60918904..25cd48514c777cb405a2af0acf57196b2e2e7837 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -143,14 +143,14 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
                     // _kgspInitLibosLoggingStructures (allocates memory for buffers)
                     // kgspSetupLibosInitArgs_IMPL (creates pLibosInitArgs[] array)
                     dma_write!(
-                        libos[0] = LibosMemoryRegionInitArgument::new("LOGINIT", &loginit.0)
-                    )?;
+                        libos, [0]?, LibosMemoryRegionInitArgument::new("LOGINIT", &loginit.0)
+                    );
                     dma_write!(
-                        libos[1] = LibosMemoryRegionInitArgument::new("LOGINTR", &logintr.0)
-                    )?;
-                    dma_write!(libos[2] = LibosMemoryRegionInitArgument::new("LOGRM", &logrm.0))?;
-                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(cmdq))?;
-                    dma_write!(libos[3] = LibosMemoryRegionInitArgument::new("RMARGS", rmargs))?;
+                        libos, [1]?, LibosMemoryRegionInitArgument::new("LOGINTR", &logintr.0)
+                    );
+                    dma_write!(libos, [2]?, LibosMemoryRegionInitArgument::new("LOGRM", &logrm.0));
+                    dma_write!(rmargs, [0]?.inner, fw::GspArgumentsCached::new(cmdq));
+                    dma_write!(libos, [3]?, LibosMemoryRegionInitArgument::new("RMARGS", rmargs));
                 },
             }))
         })
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 9a00ddb922ac9d37db67e0abfacfcaa39f9a163d..f033c489d69c9fd8dffe3df0433020da18ff2297 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -166,7 +166,7 @@ pub(crate) fn boot(
 
         let wpr_meta =
             CoherentAllocation::<GspFwWprMeta>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
-        dma_write!(wpr_meta[0] = GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
+        dma_write!(wpr_meta, [0]?, GspFwWprMeta::new(&gsp_fw, &fb_layout));
 
         self.cmdq
             .send_command(bar, commands::SetSystemInfo::new(pdev))?;
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 87dbbd6d1be9d86e7fb45a84f9647265bd63f84e..0056bfbf0a44cfbc5a0ca08d069f881b877e1edc 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -202,9 +202,13 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
 
         let gsp_mem =
             CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
-        dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
-        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
-        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
+        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
+        dma_write!(
+            gsp_mem,
+            [0]?.cpuq.tx,
+            MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES)
+        );
+        dma_write!(gsp_mem, [0]?.cpuq.rx, MsgqRxHeader::new());
 
         Ok(Self(gsp_mem))
     }
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 909d56fd5118ee1db3585a8c10a99fe1d091dd00..cd2957b5f260b04c89e0762edba0820f11b064a4 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -461,6 +461,19 @@ pub fn size(&self) -> usize {
         self.count * core::mem::size_of::<T>()
     }
 
+    /// Returns the raw pointer to the allocated region in the CPU's virtual address space.
+    #[inline]
+    pub fn as_ptr(&self) -> *const [T] {
+        core::ptr::slice_from_raw_parts(self.cpu_addr.as_ptr(), self.count)
+    }
+
+    /// Returns the raw pointer to the allocated region in the CPU's virtual address space as
+    /// a mutable pointer.
+    #[inline]
+    pub fn as_mut_ptr(&self) -> *mut [T] {
+        core::ptr::slice_from_raw_parts_mut(self.cpu_addr.as_ptr(), self.count)
+    }
+
     /// Returns the base address to the allocated region in the CPU's virtual address space.
     pub fn start_ptr(&self) -> *const T {
         self.cpu_addr.as_ptr()
@@ -581,23 +594,6 @@ pub unsafe fn write(&mut self, src: &[T], offset: usize) -> Result {
         Ok(())
     }
 
-    /// Returns a pointer to an element from the region with bounds checking. `offset` is in
-    /// units of `T`, not the number of bytes.
-    ///
-    /// Public but hidden since it should only be used from [`dma_read`] and [`dma_write`] macros.
-    #[doc(hidden)]
-    pub fn item_from_index(&self, offset: usize) -> Result<*mut T> {
-        if offset >= self.count {
-            return Err(EINVAL);
-        }
-        // SAFETY:
-        // - The pointer is valid due to type invariant on `CoherentAllocation`
-        // and we've just checked that the range and index is within bounds.
-        // - `offset` can't overflow since it is smaller than `self.count` and we've checked
-        // that `self.count` won't overflow early in the constructor.
-        Ok(unsafe { self.cpu_addr.as_ptr().add(offset) })
-    }
-
     /// Reads the value of `field` and ensures that its type is [`FromBytes`].
     ///
     /// # Safety
@@ -670,6 +666,9 @@ unsafe impl<T: AsBytes + FromBytes + Send> Send for CoherentAllocation<T> {}
 
 /// Reads a field of an item from an allocated region of structs.
 ///
+/// The syntax is of form `kernel::dma_read!(dma, proj)` where `dma` is an expression to an
+/// [`CoherentAllocation`] and `proj` is a [projection specification](kernel::ptr::project!).
+///
 /// # Examples
 ///
 /// ```
@@ -684,36 +683,29 @@ unsafe impl<T: AsBytes + FromBytes + Send> Send for CoherentAllocation<T> {}
 /// unsafe impl kernel::transmute::AsBytes for MyStruct{};
 ///
 /// # fn test(alloc: &kernel::dma::CoherentAllocation<MyStruct>) -> Result {
-/// let whole = kernel::dma_read!(alloc[2]);
-/// let field = kernel::dma_read!(alloc[1].field);
+/// let whole = kernel::dma_read!(alloc, [2]?);
+/// let field = kernel::dma_read!(alloc, [1]?.field);
 /// # Ok::<(), Error>(()) }
 /// ```
 #[macro_export]
 macro_rules! dma_read {
-    ($dma:expr, $idx: expr, $($field:tt)*) => {{
-        (|| -> ::core::result::Result<_, $crate::error::Error> {
-            let item = $crate::dma::CoherentAllocation::item_from_index(&$dma, $idx)?;
-            // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
-            // dereferenced. The compiler also further validates the expression on whether `field`
-            // is a member of `item` when expanded by the macro.
-            unsafe {
-                let ptr_field = ::core::ptr::addr_of!((*item) $($field)*);
-                ::core::result::Result::Ok(
-                    $crate::dma::CoherentAllocation::field_read(&$dma, ptr_field)
-                )
-            }
-        })()
+    ($dma:expr, $($proj:tt)*) => {{
+        let dma = &$dma;
+        let ptr = $crate::ptr::project!(
+            $crate::dma::CoherentAllocation::as_ptr(dma), $($proj)*
+        );
+        // SAFETY: pointer created by projection is within DMA region.
+        unsafe { $crate::dma::CoherentAllocation::field_read(dma, ptr) }
     }};
-    ($dma:ident [ $idx:expr ] $($field:tt)* ) => {
-        $crate::dma_read!($dma, $idx, $($field)*)
-    };
-    ($($dma:ident).* [ $idx:expr ] $($field:tt)* ) => {
-        $crate::dma_read!($($dma).*, $idx, $($field)*)
-    };
 }
 
 /// Writes to a field of an item from an allocated region of structs.
 ///
+/// The syntax is of form `kernel::dma_write!(dma, proj, val)` where `dma` is an expression to an
+/// [`CoherentAllocation`] and `proj` is a [projection specification](kernel::ptr::project!),
+/// and `val` is the value to be written to the projected location.
+///
+///
 /// # Examples
 ///
 /// ```
@@ -728,37 +720,31 @@ macro_rules! dma_read {
 /// unsafe impl kernel::transmute::AsBytes for MyStruct{};
 ///
 /// # fn test(alloc: &kernel::dma::CoherentAllocation<MyStruct>) -> Result {
-/// kernel::dma_write!(alloc[2].member = 0xf);
-/// kernel::dma_write!(alloc[1] = MyStruct { member: 0xf });
+/// kernel::dma_write!(alloc, [2]?.member, 0xf);
+/// kernel::dma_write!(alloc, [1]?, MyStruct { member: 0xf });
 /// # Ok::<(), Error>(()) }
 /// ```
 #[macro_export]
 macro_rules! dma_write {
-    ($dma:ident [ $idx:expr ] $($field:tt)*) => {{
-        $crate::dma_write!($dma, $idx, $($field)*)
-    }};
-    ($($dma:ident).* [ $idx:expr ] $($field:tt)* ) => {{
-        $crate::dma_write!($($dma).*, $idx, $($field)*)
+    (@parse [$dma:expr] [$($proj:tt)*] [, $val:expr]) => {{
+        let dma = &$dma;
+        let ptr = $crate::ptr::project!(
+            mut $crate::dma::CoherentAllocation::as_mut_ptr(dma), $($proj)*
+        );
+        let val = $val;
+        // SAFETY: pointer created by projection is within DMA region.
+        unsafe { $crate::dma::CoherentAllocation::field_write(dma, ptr, val) }
     }};
-    ($dma:expr, $idx: expr, = $val:expr) => {
-        (|| -> ::core::result::Result<_, $crate::error::Error> {
-            let item = $crate::dma::CoherentAllocation::item_from_index(&$dma, $idx)?;
-            // SAFETY: `item_from_index` ensures that `item` is always a valid item.
-            unsafe { $crate::dma::CoherentAllocation::field_write(&$dma, item, $val) }
-            ::core::result::Result::Ok(())
-        })()
+    (@parse [$dma:expr] [$($proj:tt)*] [.$field:tt $($rest:tt)*]) => {
+        $crate::dma_write!(@parse [$dma] [$($proj)* .$field] [$($rest)*])
+    };
+    (@parse [$dma:expr] [$($proj:tt)*] [[$index:expr]? $($rest:tt)*]) => {
+        $crate::dma_write!(@parse [$dma] [$($proj)* [$index]?] [$($rest)*])
+    };
+    (@parse [$dma:expr] [$($proj:tt)*] [[$index:expr] $($rest:tt)*]) => {
+        $crate::dma_write!(@parse [$dma] [$($proj)* [$index]] [$($rest)*])
     };
-    ($dma:expr, $idx: expr, $(.$field:ident)* = $val:expr) => {
-        (|| -> ::core::result::Result<_, $crate::error::Error> {
-            let item = $crate::dma::CoherentAllocation::item_from_index(&$dma, $idx)?;
-            // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
-            // dereferenced. The compiler also further validates the expression on whether `field`
-            // is a member of `item` when expanded by the macro.
-            unsafe {
-                let ptr_field = ::core::ptr::addr_of_mut!((*item) $(.$field)*);
-                $crate::dma::CoherentAllocation::field_write(&$dma, ptr_field, $val)
-            }
-            ::core::result::Result::Ok(())
-        })()
+    ($dma:expr, $($rest:tt)*) => {
+        $crate::dma_write!(@parse [$dma] [] [$($rest)*])
     };
 }
diff --git a/samples/rust/rust_dma.rs b/samples/rust/rust_dma.rs
index 9c45851c876ef33414eb0071c42a2fb4ac3f1e78..ce39b55450978e69f40b20bb2a0479973f2843ad 100644
--- a/samples/rust/rust_dma.rs
+++ b/samples/rust/rust_dma.rs
@@ -68,7 +68,7 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
                 CoherentAllocation::alloc_coherent(pdev.as_ref(), TEST_VALUES.len(), GFP_KERNEL)?;
 
             for (i, value) in TEST_VALUES.into_iter().enumerate() {
-                kernel::dma_write!(ca[i] = MyStruct::new(value.0, value.1))?;
+                kernel::dma_write!(ca, [i]?, MyStruct::new(value.0, value.1));
             }
 
             let size = 4 * page::PAGE_SIZE;
@@ -85,24 +85,26 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
     }
 }
 
+impl DmaSampleDriver {
+    fn check_dma(&self) -> Result {
+        for (i, value) in TEST_VALUES.into_iter().enumerate() {
+            let val0 = kernel::dma_read!(self.ca, [i]?.h);
+            let val1 = kernel::dma_read!(self.ca, [i]?.b);
+
+            assert_eq!(val0, value.0);
+            assert_eq!(val1, value.1);
+        }
+
+        Ok(())
+    }
+}
+
 #[pinned_drop]
 impl PinnedDrop for DmaSampleDriver {
     fn drop(self: Pin<&mut Self>) {
         dev_info!(self.pdev, "Unload DMA test driver.\n");
 
-        for (i, value) in TEST_VALUES.into_iter().enumerate() {
-            let val0 = kernel::dma_read!(self.ca[i].h);
-            let val1 = kernel::dma_read!(self.ca[i].b);
-            assert!(val0.is_ok());
-            assert!(val1.is_ok());
-
-            if let Ok(val0) = val0 {
-                assert_eq!(val0, value.0);
-            }
-            if let Ok(val1) = val1 {
-                assert_eq!(val1, value.1);
-            }
-        }
+        assert!(self.check_dma().is_ok());
 
         for (i, entry) in self.sgt.iter().enumerate() {
             dev_info!(

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
                   ` (2 preceding siblings ...)
  2026-03-09 16:34 ` [PATCH v4 3/4] rust: dma: use pointer projection infra for `dma_{read,write}` macro Tim Kovalenko via B4 Relay
@ 2026-03-09 16:34 ` Tim Kovalenko via B4 Relay
  2026-03-09 19:40   ` Danilo Krummrich
                     ` (2 more replies)
  2026-03-09 17:00 ` [PATCH v4 0/4] Fixes the stack overflow Danilo Krummrich
  4 siblings, 3 replies; 13+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core, Tim Kovalenko

From: Tim Kovalenko <tim.kovalenko@proton.me>

The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.

Modify the `PteArray` to calculate and write the Page Table Entries
directly into the coherent DMA buffer one-by-one. This reduces the stack
usage quite a lot.

Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
 drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
 drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -47,16 +47,11 @@
 unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
 
 impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
-    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
-    fn new(start: DmaAddress) -> Result<Self> {
-        let mut ptes = [0u64; NUM_PAGES];
-        for (i, pte) in ptes.iter_mut().enumerate() {
-            *pte = start
-                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
-                .ok_or(EOVERFLOW)?;
-        }
-
-        Ok(Self(ptes))
+    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
+    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
+        start
+            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
+            .ok_or(EOVERFLOW)
     }
 }
 
@@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
             NUM_PAGES * GSP_PAGE_SIZE,
             GFP_KERNEL | __GFP_ZERO,
         )?);
-        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+        let start_addr = obj.0.dma_handle();
 
         // SAFETY: `obj` has just been created and we are its sole user.
-        unsafe {
-            // Copy the self-mapping PTE at the expected location.
+        let pte_region = unsafe {
             obj.0
-                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
-                .copy_from_slice(ptes.as_bytes())
+                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
         };
 
+        // This is a  one by one GSP Page write to the memory
+        // to avoid stack overflow when allocating the whole array at once.
+        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+            let pte_value = start_addr
+                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+                .ok_or(EOVERFLOW)?;
+
+            chunk.copy_from_slice(&pte_value.to_ne_bytes());
+        }
+
         Ok(obj)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
 
         let gsp_mem =
             CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
-        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
+
+        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+        let start = gsp_mem.dma_handle();
+        // One by one GSP Page write to the memory to avoid stack overflow when allocating
+        // the whole array at once.
+        for i in 0..NUM_PTES {
+            dma_write!(
+                gsp_mem,
+                [0]?.ptes.0[i],
+                PteArray::<NUM_PTES>::entry(start, i)?
+            );
+        }
+
         dma_write!(
             gsp_mem,
             [0]?.cpuq.tx,

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/4] Fixes the stack overflow
  2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
                   ` (3 preceding siblings ...)
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-03-09 17:00 ` Danilo Krummrich
  4 siblings, 0 replies; 13+ messages in thread
From: Danilo Krummrich @ 2026-03-09 17:00 UTC (permalink / raw)
  To: Tim Kovalenko via B4 Relay
  Cc: tim.kovalenko, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, nouveau, dri-devel, linux-kernel,
	rust-for-linux, linux-kbuild, driver-core

On Mon Mar 9, 2026 at 5:34 PM CET, Tim Kovalenko via B4 Relay wrote:
>
> ---
> Changes in v4:
> - Rebase on top of projection changes

Thanks for the quick follow-up!

There was no need to include the series you rebased onto, i.e. it is usually
enough to just mention that your patch is based on another series and provide a
link.

But don't worry, I just mention this for the next time, there is nothing that
needs to be done on your end.

(One additional thing to note, in case you will actually have to (re-)send
patches of other people in the future, is that such patches still need to have
your Signed-off-by: in addition, since by sending them you are involved in the
handling of those patches.)

Thanks,
Danilo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-03-09 19:40   ` Danilo Krummrich
  2026-03-09 22:40     ` Miguel Ojeda
  2026-03-10  1:40   ` Alexandre Courbot
  2026-03-10 17:40   ` Danilo Krummrich
  2 siblings, 1 reply; 13+ messages in thread
From: Danilo Krummrich @ 2026-03-09 19:40 UTC (permalink / raw)
  To: Tim Kovalenko via B4 Relay
  Cc: tim.kovalenko, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, nouveau, dri-devel, linux-kernel,
	rust-for-linux, linux-kbuild, driver-core

On Mon Mar 9, 2026 at 5:34 PM CET, Tim Kovalenko via B4 Relay wrote:
> From: Tim Kovalenko <tim.kovalenko@proton.me>
>
> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
> and was causing a stack overflow with 8216 bytes.
>
> Modify the `PteArray` to calculate and write the Page Table Entries
> directly into the coherent DMA buffer one-by-one. This reduces the stack
> usage quite a lot.
>

Reported-by: Gary Guo <gary@garyguo.net>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570375549
Fixes: f38b4f105cfc ("gpu: nova-core: Create initial Gsp")

> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>

A few nits below, but I can fix them up [1] on apply.

> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
> +            let pte_value = start_addr
> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                .ok_or(EOVERFLOW)?;

This should use PteArray::entry().

It would also be nice to get rid of the unsafe {} and use dma_write!() instead,
but this can be a follow-up patch.

> +
> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
> +        }
> +
>          Ok(obj)
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>  
>          let gsp_mem =
>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
> +
> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();

We can avoid duplicating this by making it a constant of GspMem.

> +        let start = gsp_mem.dma_handle();
> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
> +        // the whole array at once.
> +        for i in 0..NUM_PTES {
> +            dma_write!(
> +                gsp_mem,
> +                [0]?.ptes.0[i],
> +                PteArray::<NUM_PTES>::entry(start, i)?
> +            );
> +        }

-- [1] --

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 20170e483e04..b5ea14b7dad7 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -48,6 +48,7 @@ unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
 
 impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
     /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
+    // TODO: Replace with `IoView` projection once available.
     fn entry(start: DmaAddress, index: usize) -> Result<u64> {
         start
             .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
@@ -90,12 +91,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
                 .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
         };
 
-        // This is a  one by one GSP Page write to the memory
-        // to avoid stack overflow when allocating the whole array at once.
+        // Write values one by one to avoid an on-stack instance of `PteArray`.
         for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
-            let pte_value = start_addr
-                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
-                .ok_or(EOVERFLOW)?;
+            let pte_value = PteArray::<NUM_PAGES>::entry(start_addr, i)?;
 
             chunk.copy_from_slice(&pte_value.to_ne_bytes());
         }
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 9107a1473797..aa42d180f0d5 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -159,7 +159,7 @@ struct Msgq {
 #[repr(C)]
 struct GspMem {
     /// Self-mapping page table entries.
-    ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
+    ptes: PteArray<{ Self::PTE_ARRAY_SIZE }>,
     /// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
     /// write and read pointers that the CPU updates.
     ///
@@ -172,6 +172,10 @@ struct GspMem {
     gspq: Msgq,
 }
 
+impl GspMem {
+    const PTE_ARRAY_SIZE: usize = GSP_PAGE_SIZE / size_of::<u64>();
+}
+
 // SAFETY: These structs don't meet the no-padding requirements of AsBytes but
 // that is not a problem because they are not used outside the kernel.
 unsafe impl AsBytes for GspMem {}
@@ -202,16 +206,14 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
         let gsp_mem =
             CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
 
-        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
-
         let start = gsp_mem.dma_handle();
         // One by one GSP Page write to the memory to avoid stack overflow when allocating
         // the whole array at once.
-        for i in 0..NUM_PTES {
+        for i in 0..GspMem::PTE_ARRAY_SIZE {
             dma_write!(
                 gsp_mem,
                 [0]?.ptes.0[i],
-                PteArray::<NUM_PTES>::entry(start, i)?
+                PteArray::<{ GspMem::PTE_ARRAY_SIZE }>::entry(start, i)?
             );
         }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 19:40   ` Danilo Krummrich
@ 2026-03-09 22:40     ` Miguel Ojeda
  0 siblings, 0 replies; 13+ messages in thread
From: Miguel Ojeda @ 2026-03-09 22:40 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Tim Kovalenko via B4 Relay, tim.kovalenko, Alexandre Courbot,
	Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Boqun Feng, Nathan Chancellor, Nicolas Schier,
	Abdiel Janulgue, Daniel Almeida, Robin Murphy, nouveau, dri-devel,
	linux-kernel, rust-for-linux, linux-kbuild, driver-core

On Mon, Mar 9, 2026 at 8:40 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> Reported-by: Gary Guo <gary@garyguo.net>
> Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570375549
> Fixes: f38b4f105cfc ("gpu: nova-core: Create initial Gsp")

`objtool` also complains about it because it doesn't handle certain
instructions, so probably:

Link: https://lore.kernel.org/rust-for-linux/CANiq72mAQxbRJZDnik3Qmd4phvFwPA01O2jwaaXRh_T+2=L-qA@mail.gmail.com/

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
  2026-03-09 19:40   ` Danilo Krummrich
@ 2026-03-10  1:40   ` Alexandre Courbot
  2026-03-10  1:51     ` Gary Guo
  2026-03-10 17:40   ` Danilo Krummrich
  2 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2026-03-10  1:40 UTC (permalink / raw)
  To: Tim Kovalenko via B4 Relay
  Cc: tim.kovalenko, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, nouveau, dri-devel, linux-kernel,
	rust-for-linux, linux-kbuild, driver-core

On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
> From: Tim Kovalenko <tim.kovalenko@proton.me>
>
> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
> and was causing a stack overflow with 8216 bytes.
>
> Modify the `PteArray` to calculate and write the Page Table Entries
> directly into the coherent DMA buffer one-by-one. This reduces the stack
> usage quite a lot.
>
> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
> ---
>  drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
>  drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
>  2 files changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
> index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
> --- a/drivers/gpu/nova-core/gsp.rs
> +++ b/drivers/gpu/nova-core/gsp.rs
> @@ -47,16 +47,11 @@
>  unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
>  
>  impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
> -    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
> -    fn new(start: DmaAddress) -> Result<Self> {
> -        let mut ptes = [0u64; NUM_PAGES];
> -        for (i, pte) in ptes.iter_mut().enumerate() {
> -            *pte = start
> -                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> -                .ok_or(EOVERFLOW)?;
> -        }
> -
> -        Ok(Self(ptes))
> +    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
> +    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
> +        start
> +            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
> +            .ok_or(EOVERFLOW)
>      }
>  }
>  
> @@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>              NUM_PAGES * GSP_PAGE_SIZE,
>              GFP_KERNEL | __GFP_ZERO,
>          )?);
> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
> +
> +        let start_addr = obj.0.dma_handle();
>  
>          // SAFETY: `obj` has just been created and we are its sole user.
> -        unsafe {
> -            // Copy the self-mapping PTE at the expected location.
> +        let pte_region = unsafe {
>              obj.0
> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
> -                .copy_from_slice(ptes.as_bytes())
> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
>          };
>  
> +        // This is a  one by one GSP Page write to the memory
> +        // to avoid stack overflow when allocating the whole array at once.
> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
> +            let pte_value = start_addr
> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                .ok_or(EOVERFLOW)?;
> +
> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
> +        }
> +
>          Ok(obj)
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>  
>          let gsp_mem =
>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
> +
> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
> +
> +        let start = gsp_mem.dma_handle();
> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
> +        // the whole array at once.
> +        for i in 0..NUM_PTES {
> +            dma_write!(
> +                gsp_mem,
> +                [0]?.ptes.0[i],
> +                PteArray::<NUM_PTES>::entry(start, i)?

Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
to infer it?

In any case, the updated patch

Acked-by: Alexandre Courbot <acourbot@nvidia.com>

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-10  1:40   ` Alexandre Courbot
@ 2026-03-10  1:51     ` Gary Guo
  2026-03-10  2:28       ` Alexandre Courbot
  0 siblings, 1 reply; 13+ messages in thread
From: Gary Guo @ 2026-03-10  1:51 UTC (permalink / raw)
  To: Alexandre Courbot, Tim Kovalenko via B4 Relay
  Cc: tim.kovalenko, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, nouveau, dri-devel, linux-kernel,
	rust-for-linux, linux-kbuild, driver-core

On Tue Mar 10, 2026 at 1:40 AM GMT, Alexandre Courbot wrote:
> On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
>> From: Tim Kovalenko <tim.kovalenko@proton.me>
>>
>> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
>> and was causing a stack overflow with 8216 bytes.
>>
>> Modify the `PteArray` to calculate and write the Page Table Entries
>> directly into the coherent DMA buffer one-by-one. This reduces the stack
>> usage quite a lot.
>>
>> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
>> ---
>>  drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
>>  2 files changed, 33 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
>> index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
>> --- a/drivers/gpu/nova-core/gsp.rs
>> +++ b/drivers/gpu/nova-core/gsp.rs
>> @@ -47,16 +47,11 @@
>>  unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
>>  
>>  impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
>> -    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
>> -    fn new(start: DmaAddress) -> Result<Self> {
>> -        let mut ptes = [0u64; NUM_PAGES];
>> -        for (i, pte) in ptes.iter_mut().enumerate() {
>> -            *pte = start
>> -                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>> -                .ok_or(EOVERFLOW)?;
>> -        }
>> -
>> -        Ok(Self(ptes))
>> +    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
>> +    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
>> +        start
>> +            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
>> +            .ok_or(EOVERFLOW)
>>      }
>>  }
>>  
>> @@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>              NUM_PAGES * GSP_PAGE_SIZE,
>>              GFP_KERNEL | __GFP_ZERO,
>>          )?);
>> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
>> +
>> +        let start_addr = obj.0.dma_handle();
>>  
>>          // SAFETY: `obj` has just been created and we are its sole user.
>> -        unsafe {
>> -            // Copy the self-mapping PTE at the expected location.
>> +        let pte_region = unsafe {
>>              obj.0
>> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
>> -                .copy_from_slice(ptes.as_bytes())
>> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
>>          };
>>  
>> +        // This is a  one by one GSP Page write to the memory
>> +        // to avoid stack overflow when allocating the whole array at once.
>> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
>> +            let pte_value = start_addr
>> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>> +                .ok_or(EOVERFLOW)?;
>> +
>> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
>> +        }
>> +
>>          Ok(obj)
>>      }
>>  }
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>  
>>          let gsp_mem =
>>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
>> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
>> +
>> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
>> +
>> +        let start = gsp_mem.dma_handle();
>> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
>> +        // the whole array at once.
>> +        for i in 0..NUM_PTES {
>> +            dma_write!(
>> +                gsp_mem,
>> +                [0]?.ptes.0[i],
>> +                PteArray::<NUM_PTES>::entry(start, i)?
>
> Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
> to infer it?

The function signature doesn't mention NUM_PTES at all, so no. In fact, perhaps
the `entry` shouldn't be an associated method at all (even if is, it probably
should be of `PteArray::<0>` or something.

Best,
Gary

>
> In any case, the updated patch
>
> Acked-by: Alexandre Courbot <acourbot@nvidia.com>
>
> Thanks!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-10  1:51     ` Gary Guo
@ 2026-03-10  2:28       ` Alexandre Courbot
  2026-03-10 11:20         ` Danilo Krummrich
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2026-03-10  2:28 UTC (permalink / raw)
  To: Gary Guo
  Cc: Tim Kovalenko via B4 Relay, tim.kovalenko, Danilo Krummrich,
	Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Boqun Feng, Nathan Chancellor, Nicolas Schier,
	Abdiel Janulgue, Daniel Almeida, Robin Murphy, nouveau, dri-devel,
	linux-kernel, rust-for-linux, linux-kbuild, driver-core

On Tue Mar 10, 2026 at 10:51 AM JST, Gary Guo wrote:
> On Tue Mar 10, 2026 at 1:40 AM GMT, Alexandre Courbot wrote:
>> On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
>>> From: Tim Kovalenko <tim.kovalenko@proton.me>
>>>
>>> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
>>> and was causing a stack overflow with 8216 bytes.
>>>
>>> Modify the `PteArray` to calculate and write the Page Table Entries
>>> directly into the coherent DMA buffer one-by-one. This reduces the stack
>>> usage quite a lot.
>>>
>>> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
>>> ---
>>>  drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
>>>  drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
>>>  2 files changed, 33 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
>>> index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
>>> --- a/drivers/gpu/nova-core/gsp.rs
>>> +++ b/drivers/gpu/nova-core/gsp.rs
>>> @@ -47,16 +47,11 @@
>>>  unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
>>>  
>>>  impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
>>> -    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
>>> -    fn new(start: DmaAddress) -> Result<Self> {
>>> -        let mut ptes = [0u64; NUM_PAGES];
>>> -        for (i, pte) in ptes.iter_mut().enumerate() {
>>> -            *pte = start
>>> -                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>>> -                .ok_or(EOVERFLOW)?;
>>> -        }
>>> -
>>> -        Ok(Self(ptes))
>>> +    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
>>> +    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
>>> +        start
>>> +            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
>>> +            .ok_or(EOVERFLOW)
>>>      }
>>>  }
>>>  
>>> @@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>>              NUM_PAGES * GSP_PAGE_SIZE,
>>>              GFP_KERNEL | __GFP_ZERO,
>>>          )?);
>>> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
>>> +
>>> +        let start_addr = obj.0.dma_handle();
>>>  
>>>          // SAFETY: `obj` has just been created and we are its sole user.
>>> -        unsafe {
>>> -            // Copy the self-mapping PTE at the expected location.
>>> +        let pte_region = unsafe {
>>>              obj.0
>>> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
>>> -                .copy_from_slice(ptes.as_bytes())
>>> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
>>>          };
>>>  
>>> +        // This is a  one by one GSP Page write to the memory
>>> +        // to avoid stack overflow when allocating the whole array at once.
>>> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
>>> +            let pte_value = start_addr
>>> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>>> +                .ok_or(EOVERFLOW)?;
>>> +
>>> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
>>> +        }
>>> +
>>>          Ok(obj)
>>>      }
>>>  }
>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>>  
>>>          let gsp_mem =
>>>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
>>> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
>>> +
>>> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
>>> +
>>> +        let start = gsp_mem.dma_handle();
>>> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
>>> +        // the whole array at once.
>>> +        for i in 0..NUM_PTES {
>>> +            dma_write!(
>>> +                gsp_mem,
>>> +                [0]?.ptes.0[i],
>>> +                PteArray::<NUM_PTES>::entry(start, i)?
>>
>> Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
>> to infer it?
>
> The function signature doesn't mention NUM_PTES at all, so no. In fact, perhaps
> the `entry` shouldn't be an associated method at all (even if is, it probably
> should be of `PteArray::<0>` or something.

I had that thought as well - this calls for a redesign of the `PteArray`
business - but also didn't want to interfere too much as this fix is
very much (and quickly) needed. We will probably re-write this once we
have access to the new I/O code anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-10  2:28       ` Alexandre Courbot
@ 2026-03-10 11:20         ` Danilo Krummrich
  0 siblings, 0 replies; 13+ messages in thread
From: Danilo Krummrich @ 2026-03-10 11:20 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Gary Guo, Tim Kovalenko via B4 Relay, tim.kovalenko, Alice Ryhl,
	David Airlie, Simona Vetter, Miguel Ojeda, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, nouveau, dri-devel, linux-kernel,
	rust-for-linux, linux-kbuild, driver-core

On Tue Mar 10, 2026 at 3:28 AM CET, Alexandre Courbot wrote:
> On Tue Mar 10, 2026 at 10:51 AM JST, Gary Guo wrote:
>> On Tue Mar 10, 2026 at 1:40 AM GMT, Alexandre Courbot wrote:
>>> On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
>>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>>>> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
>>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>>> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>>>  
>>>>          let gsp_mem =
>>>>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
>>>> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
>>>> +
>>>> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
>>>> +
>>>> +        let start = gsp_mem.dma_handle();
>>>> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
>>>> +        // the whole array at once.
>>>> +        for i in 0..NUM_PTES {
>>>> +            dma_write!(
>>>> +                gsp_mem,
>>>> +                [0]?.ptes.0[i],
>>>> +                PteArray::<NUM_PTES>::entry(start, i)?
>>>
>>> Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
>>> to infer it?
>>
>> The function signature doesn't mention NUM_PTES at all, so no. In fact, perhaps
>> the `entry` shouldn't be an associated method at all (even if is, it probably
>> should be of `PteArray::<0>` or something.

I think <0> is probably the best choice for this fix for now.

> I had that thought as well - this calls for a redesign of the `PteArray`
> business - but also didn't want to interfere too much as this fix is
> very much (and quickly) needed. We will probably re-write this once we
> have access to the new I/O code anyway.

Not sure it actually needs a redesign, as I think this just goes away with I/O
projections. That's also why I would add the following TODO comment on entry().

	// TODO: Replace with `IoView` projections once available.

I.e. it is just a workaround for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
  2026-03-09 19:40   ` Danilo Krummrich
  2026-03-10  1:40   ` Alexandre Courbot
@ 2026-03-10 17:40   ` Danilo Krummrich
  2 siblings, 0 replies; 13+ messages in thread
From: Danilo Krummrich @ 2026-03-10 17:40 UTC (permalink / raw)
  To: Tim Kovalenko via B4 Relay
  Cc: tim.kovalenko, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, nouveau, dri-devel, linux-kernel,
	rust-for-linux, linux-kbuild, driver-core

On Mon Mar 9, 2026 at 5:34 PM CET, Tim Kovalenko via B4 Relay wrote:
> From: Tim Kovalenko <tim.kovalenko@proton.me>
>
> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
> and was causing a stack overflow with 8216 bytes.
>
> Modify the `PteArray` to calculate and write the Page Table Entries
> directly into the coherent DMA buffer one-by-one. This reduces the stack
> usage quite a lot.
>
> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>

Applied to drm-rust-fixes, thanks!

--- commit ---

commit c7940c8bf215b9dc6211781c77ce80e76982a723
Author: Tim Kovalenko <tim.kovalenko@proton.me>
Date:   Mon Mar 9 12:34:21 2026 -0400

    gpu: nova-core: fix stack overflow in GSP memory allocation

    The `Cmdq::new` function was allocating a `PteArray` struct on the stack
    and was causing a stack overflow with 8216 bytes.

    Modify the `PteArray` to calculate and write the Page Table Entries
    directly into the coherent DMA buffer one-by-one. This reduces the stack
    usage quite a lot.

    Reported-by: Gary Guo <gary@garyguo.net>
    Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570375549
    Link: https://lore.kernel.org/rust-for-linux/CANiq72mAQxbRJZDnik3Qmd4phvFwPA01O2jwaaXRh_T+2=L-qA@mail.gmail.com/
    Fixes: f38b4f105cfc ("gpu: nova-core: Create initial Gsp")
    Acked-by: Alexandre Courbot <acourbot@nvidia.com>
    Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
    Link: https://patch.msgid.link/20260309-drm-rust-next-v4-4-4ef485b19a4c@proton.me
    [ * Use PteArray::entry() in LogBuffer::new(),
      * Add TODO comment to use IoView projections once available,
      * Add PTE_ARRAY_SIZE constant to avoid duplication.

        - Danilo ]
    Signed-off-by: Danilo Krummrich <dakr@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-10 17:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 1/4] rust: ptr: add `KnownSize` trait to support DST size info extraction Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 2/4] rust: ptr: add projection infrastructure Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 3/4] rust: dma: use pointer projection infra for `dma_{read,write}` macro Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-03-09 19:40   ` Danilo Krummrich
2026-03-09 22:40     ` Miguel Ojeda
2026-03-10  1:40   ` Alexandre Courbot
2026-03-10  1:51     ` Gary Guo
2026-03-10  2:28       ` Alexandre Courbot
2026-03-10 11:20         ` Danilo Krummrich
2026-03-10 17:40   ` Danilo Krummrich
2026-03-09 17:00 ` [PATCH v4 0/4] Fixes the stack overflow Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox