* [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
@ 2026-02-13 19:40 Tim Kovalenko via B4 Relay
2026-02-13 21:34 ` Danilo Krummrich
0 siblings, 1 reply; 6+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-02-13 19:40 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Miguel Ojeda, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross
Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Tim Kovalenko
From: Tim Kovalenko <tim.kovalenko@proton.me>
The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.
Remove the `PteArray` and instead calculate and write the Page Table
Entries directly into the coherent DMA buffer one-by-one. This reduces
the stack usage quite a lot.
Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
Changes in v2:
- Missed a code formatting issue.
- Link to v1: https://lore.kernel.org/r/20260212-drm-rust-next-v1-1-409398b12e61@proton.me
---
drivers/gpu/nova-core/gsp.rs | 50 ++++++++++++++-------------------------
drivers/gpu/nova-core/gsp/cmdq.rs | 27 ++++++++++++++++++---
2 files changed, 42 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b9269cf35286dec3acc4d60918904..316eeaf87ec5ae67422a34426eefa747c9b6502b 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -2,16 +2,14 @@
mod boot;
+use core::iter::Iterator;
+
use kernel::{
device,
- dma::{
- CoherentAllocation,
- DmaAddress, //
- },
+ dma::CoherentAllocation,
dma_write,
pci,
- prelude::*,
- transmute::AsBytes, //
+ prelude::*, //
};
pub(crate) mod cmdq;
@@ -39,27 +37,6 @@
/// Number of GSP pages to use in a RM log buffer.
const RM_LOG_BUFFER_NUM_PAGES: usize = 0x10;
-/// Array of page table entries, as understood by the GSP bootloader.
-#[repr(C)]
-struct PteArray<const NUM_ENTRIES: usize>([u64; NUM_ENTRIES]);
-
-/// SAFETY: arrays of `u64` implement `AsBytes` and we are but a wrapper around one.
-unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
-
-impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
- /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
- fn new(start: DmaAddress) -> Result<Self> {
- let mut ptes = [0u64; NUM_PAGES];
- for (i, pte) in ptes.iter_mut().enumerate() {
- *pte = start
- .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
- .ok_or(EOVERFLOW)?;
- }
-
- Ok(Self(ptes))
- }
-}
-
/// The logging buffers are byte queues that contain encoded printf-like
/// messages from GSP-RM. They need to be decoded by a special application
/// that can parse the buffers.
@@ -86,16 +63,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
NUM_PAGES * GSP_PAGE_SIZE,
GFP_KERNEL | __GFP_ZERO,
)?);
- let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+ let start_addr = obj.0.dma_handle();
// SAFETY: `obj` has just been created and we are its sole user.
- unsafe {
- // Copy the self-mapping PTE at the expected location.
+ let pte_region = unsafe {
obj.0
- .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
- .copy_from_slice(ptes.as_bytes())
+ .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
};
+ // As in [`DmaGspMem`], this is a one by one GSP Page write to the memory
+ // to avoid stack overflow when allocating the whole array at once.
+ for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+ let pte_value = start_addr
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
+
+ chunk.copy_from_slice(&pte_value.to_ne_bytes());
+ }
+
Ok(obj)
}
}
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 46819a82a51adc58423502d9d45730923b843656..7a6cb261f4e62ac6210a80f9ecb61213cdb91b15 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -35,7 +35,6 @@
MsgqRxHeader,
MsgqTxHeader, //
},
- PteArray,
GSP_PAGE_SHIFT,
GSP_PAGE_SIZE, //
},
@@ -159,7 +158,7 @@ struct Msgq {
#[repr(C)]
struct GspMem {
/// Self-mapping page table entries.
- ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
+ ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
/// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
/// write and read pointers that the CPU updates.
///
@@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
let gsp_mem =
CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
- dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
+ const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+ // One by one GSP Page write to the memory to avoid stack overflow when allocating
+ // the whole array at once.
+ let item = gsp_mem.item_from_index(0)?;
+ for i in 0..NUM_PAGES {
+ let pte_value = gsp_mem
+ .dma_handle()
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
+
+ // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+ // dereferenced. The compiler also further validates the expression on whether `field`
+ // is a member of `item` when expanded by the macro.
+ //
+ // Further, this is dma_write! macro expanded and modified to allow for individual
+ // page write.
+ unsafe {
+ let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
+ gsp_mem.field_write(ptr_field, pte_value);
+ }
+ }
+
dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
---
base-commit: cea7b66a80412e2a5b74627b89ae25f1d0110a4b
change-id: 20260212-drm-rust-next-beb92aee9d75
Best regards,
--
Tim Kovalenko <tim.kovalenko@proton.me>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-02-13 21:34 ` Danilo Krummrich
2026-02-13 23:50 ` gary
0 siblings, 1 reply; 6+ messages in thread
From: Danilo Krummrich @ 2026-02-13 21:34 UTC (permalink / raw)
To: Tim Kovalenko via B4 Relay
Cc: tim.kovalenko, Alexandre Courbot, Alice Ryhl, David Airlie,
Simona Vetter, Miguel Ojeda, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, nouveau, dri-devel, linux-kernel, rust-for-linux
On Fri Feb 13, 2026 at 8:40 PM CET, Tim Kovalenko via B4 Relay wrote:
> @@ -159,7 +158,7 @@ struct Msgq {
> #[repr(C)]
> struct GspMem {
> /// Self-mapping page table entries.
> - ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
> + ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
> /// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
> /// write and read pointers that the CPU updates.
> ///
> @@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>
> let gsp_mem =
> CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> - dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
> + const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
> +
> + // One by one GSP Page write to the memory to avoid stack overflow when allocating
> + // the whole array at once.
> + let item = gsp_mem.item_from_index(0)?;
> + for i in 0..NUM_PAGES {
> + let pte_value = gsp_mem
> + .dma_handle()
> + .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> + .ok_or(EOVERFLOW)?;
> +
> + // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> + // dereferenced. The compiler also further validates the expression on whether `field`
> + // is a member of `item` when expanded by the macro.
> + //
> + // Further, this is dma_write! macro expanded and modified to allow for individual
> + // page write.
> + unsafe {
Both of the statements below are unsafe and should be within an individual
unsafe block with their corresponding justification.
> + let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
This should use &raw mut instead.
> + gsp_mem.field_write(ptr_field, pte_value);
Formally, we won't be able to justify the safety requirement of this method. :)
The good thing is, we don't have to:
I understand it seems like the problem here is that dma_read!() does not support
index projections. Well, it actually is a problem, which I think will be
resolved by Gary's work. However, I think the real problem here is a different
one:
This code does not need volatile writes in the first place. We just allocated
the DMA memory and haven't published the corresponding address to the device
yet.
So, for such initialization code we shouldn't have to use dma_write!() /
dma_read!() in the first place.
I think the proper solution for this is to provide an API that allows for
initialization with a "normal" reference / slice.
For instance, we could provide a `alloc_coherent_init()` function that takes a
closure which has `&mut [T]` argument, such that the closure can do the
initialization before dma::CoherentAllocation::alloc_coherent() even returns.
Another option would be a new type, e.g. dma::InitCoherentAllocation which has
safe as_slice() and as_slice_mut() methods, but does *not* provide a method to
get the DMA address. Subsequently, it can be converted to a "real"
dma::CoherentAllocation.
With this, I would also keep the PteArray type and change PteArray::new() to
PteArray::init() taking a &mut self.
This way the PteArray init logic remains nicely structured and isolated.
Thanks,
Danilo
> + }
> + }
> +
> dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
> dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 21:34 ` Danilo Krummrich
@ 2026-02-13 23:50 ` gary
2026-02-14 0:09 ` Danilo Krummrich
0 siblings, 1 reply; 6+ messages in thread
From: gary @ 2026-02-13 23:50 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Tim Kovalenko via B4 Relay, tim.kovalenko, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, nouveau, dri-devel, linux-kernel, rust-for-linux
(Sorry Danilo, resending as I didn't hit "Reply All". I was travelling
and not on my usual email setup).
On 2026-02-13 21:34, Danilo Krummrich wrote:
> On Fri Feb 13, 2026 at 8:40 PM CET, Tim Kovalenko via B4 Relay wrote:
>> @@ -159,7 +158,7 @@ struct Msgq {
>> #[repr(C)]
>> struct GspMem {
>> /// Self-mapping page table entries.
>> - ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
>> + ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
>> /// CPU queue: the driver writes commands here, and the GSP reads
>> them. It also contains the
>> /// write and read pointers that the CPU updates.
>> ///
>> @@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) ->
>> Result<Self> {
>>
>> let gsp_mem =
>> CoherentAllocation::<GspMem>::alloc_coherent(dev, 1,
>> GFP_KERNEL | __GFP_ZERO)?;
>> - dma_write!(gsp_mem[0].ptes =
>> PteArray::new(gsp_mem.dma_handle())?)?;
>> + const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
>> +
>> + // One by one GSP Page write to the memory to avoid stack
>> overflow when allocating
>> + // the whole array at once.
>> + let item = gsp_mem.item_from_index(0)?;
>> + for i in 0..NUM_PAGES {
>> + let pte_value = gsp_mem
>> + .dma_handle()
>> + .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>> + .ok_or(EOVERFLOW)?;
>> +
>> + // SAFETY: `item_from_index` ensures that `item` is
>> always a valid pointer and can be
>> + // dereferenced. The compiler also further validates the
>> expression on whether `field`
>> + // is a member of `item` when expanded by the macro.
>> + //
>> + // Further, this is dma_write! macro expanded and
>> modified to allow for individual
>> + // page write.
>> + unsafe {
>
> Both of the statements below are unsafe and should be within an
> individual
> unsafe block with their corresponding justification.
>
>> + let ptr_field =
>> core::ptr::addr_of_mut!((*item).ptes[i]);
>
> This should use &raw mut instead.
>
>> + gsp_mem.field_write(ptr_field, pte_value);
>
> Formally, we won't be able to justify the safety requirement of this
> method. :)
>
> The good thing is, we don't have to:
>
> I understand it seems like the problem here is that dma_read!() does
> not support
> index projections. Well, it actually is a problem, which I think will
> be
> resolved by Gary's work. However, I think the real problem here is a
> different
> one:
Yes, this will be fixed. In fact, I've already got a working version. If
you prefer,
I can send out a version with just improved projections without all the
generic I/O
improvement stuff.
>
> This code does not need volatile writes in the first place. We just
> allocated
> the DMA memory and haven't published the corresponding address to the
> device
> yet.
>
> So, for such initialization code we shouldn't have to use dma_write!()
> /
> dma_read!() in the first place.
>
> I think the proper solution for this is to provide an API that allows
> for
> initialization with a "normal" reference / slice.
>
> For instance, we could provide a `alloc_coherent_init()` function that
> takes a
> closure which has `&mut [T]` argument, such that the closure can do the
> initialization before dma::CoherentAllocation::alloc_coherent() even
> returns.
I've already suggested that in
ttps://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570456463.
The issue is that however the PTE array needs the actual DMA address to
initialize,
which Alex and I feel that we shouldn't provide to the initializer when
`dma::CoherentAllocation` is not yet constructed.
Best,
Gary
>
> Another option would be a new type, e.g. dma::InitCoherentAllocation
> which has
> safe as_slice() and as_slice_mut() methods, but does *not* provide a
> method to
> get the DMA address. Subsequently, it can be converted to a "real"
> dma::CoherentAllocation.
>
> With this, I would also keep the PteArray type and change
> PteArray::new() to
> PteArray::init() taking a &mut self.
>
> This way the PteArray init logic remains nicely structured and
> isolated.
>
> Thanks,
> Danilo
>
>> + }
>> + }
>> +
>> dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE,
>> RX_HDR_OFF, MSGQ_NUM_PAGES))?;
>> dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 23:50 ` gary
@ 2026-02-14 0:09 ` Danilo Krummrich
2026-02-14 0:18 ` Gary Guo
0 siblings, 1 reply; 6+ messages in thread
From: Danilo Krummrich @ 2026-02-14 0:09 UTC (permalink / raw)
To: gary
Cc: Tim Kovalenko via B4 Relay, tim.kovalenko, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, nouveau, dri-devel, linux-kernel, rust-for-linux
On Sat Feb 14, 2026 at 12:50 AM CET, gary wrote:
> (Sorry Danilo, resending as I didn't hit "Reply All". I was travelling
> and not on my usual email setup).
No worries, just noticed it before hitting "send". :)
> If you prefer, I can send out a version with just improved projections without
> all the generic I/O improvement stuff.
Yes, that would be great. Otherwise, we can also use as_slice_mut() to avoid
having to touch field_write() directly, but I much prefer the former.
In any case, I'd like to keep the PteArray type. It can at least provide a
function to calculate the value from the DMA address for a given index.
(Would be nice if we'd have something like a dma::Projection<T> type, such that we
could have PteArray::init(self: dma::Projection<Self>).)
> I've already suggested that in
> https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570456463.
> The issue is that however the PTE array needs the actual DMA address to
> initialize, which Alex and I feel that we shouldn't provide to the initializer
> when `dma::CoherentAllocation` is not yet constructed.
Ah, indeed -- that sucks.
And yes, it would be super pointless to give out the actual DMA address in the
initializer, as it would defeat its whole purpose. :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-14 0:09 ` Danilo Krummrich
@ 2026-02-14 0:18 ` Gary Guo
2026-02-14 0:29 ` Danilo Krummrich
0 siblings, 1 reply; 6+ messages in thread
From: Gary Guo @ 2026-02-14 0:18 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Tim Kovalenko via B4 Relay, tim.kovalenko, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, nouveau, dri-devel, linux-kernel, rust-for-linux
On 2026-02-14 00:09, Danilo Krummrich wrote:
> On Sat Feb 14, 2026 at 12:50 AM CET, gary wrote:
>> If you prefer, I can send out a version with just improved projections
>> without
>> all the generic I/O improvement stuff.
>
> Yes, that would be great. Otherwise, we can also use as_slice_mut() to
> avoid
> having to touch field_write() directly, but I much prefer the former.
>
> In any case, I'd like to keep the PteArray type. It can at least
> provide a
> function to calculate the value from the DMA address for a given index.
>
> (Would be nice if we'd have something like a dma::Projection<T> type,
> such that we
> could have PteArray::init(self: dma::Projection<Self>).)
I think once I/O projection work is done, this would just be
impl PteArray {
fn init<Base>(self: io::View<'_, Base, Self>>, dma: DmaAddress)
}
and the generic I/O projection is used inside.
Best,
Gary
>
>> I've already suggested that in
>> https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570456463.
>> The issue is that however the PTE array needs the actual DMA address
>> to
>> initialize, which Alex and I feel that we shouldn't provide to the
>> initializer
>> when `dma::CoherentAllocation` is not yet constructed.
>
> Ah, indeed -- that sucks.
>
> And yes, it would be super pointless to give out the actual DMA address
> in the
> initializer, as it would defeat its whole purpose. :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-14 0:18 ` Gary Guo
@ 2026-02-14 0:29 ` Danilo Krummrich
0 siblings, 0 replies; 6+ messages in thread
From: Danilo Krummrich @ 2026-02-14 0:29 UTC (permalink / raw)
To: Gary Guo
Cc: Tim Kovalenko via B4 Relay, tim.kovalenko, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda, Boqun Feng,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, nouveau, dri-devel, linux-kernel, rust-for-linux
On Sat Feb 14, 2026 at 1:18 AM CET, Gary Guo wrote:
> On 2026-02-14 00:09, Danilo Krummrich wrote:
>> On Sat Feb 14, 2026 at 12:50 AM CET, gary wrote:
>>> If you prefer, I can send out a version with just improved projections
>>> without
>>> all the generic I/O improvement stuff.
>>
>> Yes, that would be great. Otherwise, we can also use as_slice_mut() to
>> avoid
>> having to touch field_write() directly, but I much prefer the former.
>>
>> In any case, I'd like to keep the PteArray type. It can at least
>> provide a
>> function to calculate the value from the DMA address for a given index.
>>
>> (Would be nice if we'd have something like a dma::Projection<T> type,
>> such that we
>> could have PteArray::init(self: dma::Projection<Self>).)
>
> I think once I/O projection work is done, this would just be
>
> impl PteArray {
> fn init<Base>(self: io::View<'_, Base, Self>>, dma: DmaAddress)
> }
>
> and the generic I/O projection is used inside.
I guess I should have written "Would be nice if we'd have something like a
dma::Projection<T> type *already* [...]", i.e. I meant to say that this way we
could remain close to what the final implementation will look like eventually.
:)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-14 0:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 21:34 ` Danilo Krummrich
2026-02-13 23:50 ` gary
2026-02-14 0:09 ` Danilo Krummrich
2026-02-14 0:18 ` Gary Guo
2026-02-14 0:29 ` Danilo Krummrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox