Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Tim Kovalenko via B4 Relay"
	<devnull+tim.kovalenko.proton.me@kernel.org>
Cc: tim.kovalenko@proton.me, "Danilo Krummrich" <dakr@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>, "Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Boqun Feng" <boqun@kernel.org>,
	"Nathan Chancellor" <nathan@kernel.org>,
	"Nicolas Schier" <nsc@kernel.org>,
	"Abdiel Janulgue" <abdiel.janulgue@gmail.com>,
	"Daniel Almeida" <daniel.almeida@collabora.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	linux-kbuild@vger.kernel.org, driver-core@lists.linux.dev
Subject: Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
Date: Tue, 10 Mar 2026 10:40:54 +0900	[thread overview]
Message-ID: <DGYPX7TT8A4E.3KTO5Z5RS17B4@nvidia.com> (raw)
In-Reply-To: <20260309-drm-rust-next-v4-4-4ef485b19a4c@proton.me>

On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
> From: Tim Kovalenko <tim.kovalenko@proton.me>
>
> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
> and was causing a stack overflow with 8216 bytes.
>
> Modify the `PteArray` to calculate and write the Page Table Entries
> directly into the coherent DMA buffer one-by-one. This reduces the stack
> usage quite a lot.
>
> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
> ---
>  drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
>  drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
>  2 files changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
> index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
> --- a/drivers/gpu/nova-core/gsp.rs
> +++ b/drivers/gpu/nova-core/gsp.rs
> @@ -47,16 +47,11 @@
>  unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
>  
>  impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
> -    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
> -    fn new(start: DmaAddress) -> Result<Self> {
> -        let mut ptes = [0u64; NUM_PAGES];
> -        for (i, pte) in ptes.iter_mut().enumerate() {
> -            *pte = start
> -                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> -                .ok_or(EOVERFLOW)?;
> -        }
> -
> -        Ok(Self(ptes))
> +    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
> +    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
> +        start
> +            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
> +            .ok_or(EOVERFLOW)
>      }
>  }
>  
> @@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>              NUM_PAGES * GSP_PAGE_SIZE,
>              GFP_KERNEL | __GFP_ZERO,
>          )?);
> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
> +
> +        let start_addr = obj.0.dma_handle();
>  
>          // SAFETY: `obj` has just been created and we are its sole user.
> -        unsafe {
> -            // Copy the self-mapping PTE at the expected location.
> +        let pte_region = unsafe {
>              obj.0
> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
> -                .copy_from_slice(ptes.as_bytes())
> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
>          };
>  
> +        // This is a  one by one GSP Page write to the memory
> +        // to avoid stack overflow when allocating the whole array at once.
> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
> +            let pte_value = start_addr
> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                .ok_or(EOVERFLOW)?;
> +
> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
> +        }
> +
>          Ok(obj)
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>  
>          let gsp_mem =
>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
> +
> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
> +
> +        let start = gsp_mem.dma_handle();
> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
> +        // the whole array at once.
> +        for i in 0..NUM_PTES {
> +            dma_write!(
> +                gsp_mem,
> +                [0]?.ptes.0[i],
> +                PteArray::<NUM_PTES>::entry(start, i)?

Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
to infer it?

In any case, the updated patch

Acked-by: Alexandre Courbot <acourbot@nvidia.com>

Thanks!

WARNING: multiple messages have this Message-ID (diff)

From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Tim Kovalenko via B4 Relay"
	<devnull+tim.kovalenko.proton.me@kernel.org>
Cc: tim.kovalenko@proton.me, "Danilo Krummrich" <dakr@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>, "Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Boqun Feng" <boqun@kernel.org>,
	"Nathan Chancellor" <nathan@kernel.org>,
	"Nicolas Schier" <nsc@kernel.org>,
	"Abdiel Janulgue" <abdiel.janulgue@gmail.com>,
	"Daniel Almeida" <daniel.almeida@collabora.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	linux-kbuild@vger.kernel.org, driver-core@lists.linux.dev
Subject: Re: [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
Date: Tue, 10 Mar 2026 10:40:54 +0900	[thread overview]
Message-ID: <DGYPX7TT8A4E.3KTO5Z5RS17B4@nvidia.com> (raw)
In-Reply-To: <20260309-drm-rust-next-v4-4-4ef485b19a4c@proton.me>

On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
> From: Tim Kovalenko <tim.kovalenko@proton.me>
>
> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
> and was causing a stack overflow with 8216 bytes.
>
> Modify the `PteArray` to calculate and write the Page Table Entries
> directly into the coherent DMA buffer one-by-one. This reduces the stack
> usage quite a lot.
>
> Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
> ---
>  drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
>  drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
>  2 files changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
> index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
> --- a/drivers/gpu/nova-core/gsp.rs
> +++ b/drivers/gpu/nova-core/gsp.rs
> @@ -47,16 +47,11 @@
>  unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
>  
>  impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
> -    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
> -    fn new(start: DmaAddress) -> Result<Self> {
> -        let mut ptes = [0u64; NUM_PAGES];
> -        for (i, pte) in ptes.iter_mut().enumerate() {
> -            *pte = start
> -                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> -                .ok_or(EOVERFLOW)?;
> -        }
> -
> -        Ok(Self(ptes))
> +    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
> +    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
> +        start
> +            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
> +            .ok_or(EOVERFLOW)
>      }
>  }
>  
> @@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>              NUM_PAGES * GSP_PAGE_SIZE,
>              GFP_KERNEL | __GFP_ZERO,
>          )?);
> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
> +
> +        let start_addr = obj.0.dma_handle();
>  
>          // SAFETY: `obj` has just been created and we are its sole user.
> -        unsafe {
> -            // Copy the self-mapping PTE at the expected location.
> +        let pte_region = unsafe {
>              obj.0
> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
> -                .copy_from_slice(ptes.as_bytes())
> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
>          };
>  
> +        // This is a  one by one GSP Page write to the memory
> +        // to avoid stack overflow when allocating the whole array at once.
> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
> +            let pte_value = start_addr
> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                .ok_or(EOVERFLOW)?;
> +
> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
> +        }
> +
>          Ok(obj)
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>  
>          let gsp_mem =
>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> -        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
> +
> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
> +
> +        let start = gsp_mem.dma_handle();
> +        // One by one GSP Page write to the memory to avoid stack overflow when allocating
> +        // the whole array at once.
> +        for i in 0..NUM_PTES {
> +            dma_write!(
> +                gsp_mem,
> +                [0]?.ptes.0[i],
> +                PteArray::<NUM_PTES>::entry(start, i)?

Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
to infer it?

In any case, the updated patch

Acked-by: Alexandre Courbot <acourbot@nvidia.com>

Thanks!

next prev parent reply	other threads:[~2026-03-10  1:41 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko
2026-03-09 16:34 ` Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 1/4] rust: ptr: add `KnownSize` trait to support DST size info extraction Tim Kovalenko
2026-03-09 16:34   ` Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 2/4] rust: ptr: add projection infrastructure Tim Kovalenko
2026-03-09 16:34   ` Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 3/4] rust: dma: use pointer projection infra for `dma_{read,write}` macro Tim Kovalenko
2026-03-09 16:34   ` Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko
2026-03-09 16:34   ` Tim Kovalenko via B4 Relay
2026-03-09 19:40   ` Danilo Krummrich
2026-03-09 19:40     ` Danilo Krummrich
2026-03-09 22:40     ` Miguel Ojeda
2026-03-09 22:40       ` Miguel Ojeda
2026-03-10  1:40   ` Alexandre Courbot [this message]
2026-03-10  1:40     ` Alexandre Courbot
2026-03-10  1:51     ` Gary Guo
2026-03-10  1:51       ` Gary Guo
2026-03-10  2:28       ` Alexandre Courbot
2026-03-10  2:28         ` Alexandre Courbot
2026-03-10 11:20         ` Danilo Krummrich
2026-03-10 11:20           ` Danilo Krummrich
2026-03-10 17:40   ` Danilo Krummrich
2026-03-10 17:40     ` Danilo Krummrich
2026-03-09 17:00 ` [PATCH v4 0/4] Fixes the stack overflow Danilo Krummrich
2026-03-09 17:00   ` Danilo Krummrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DGYPX7TT8A4E.3KTO5Z5RS17B4@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=abdiel.janulgue@gmail.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun@kernel.org \
    --cc=dakr@kernel.org \
    --cc=daniel.almeida@collabora.com \
    --cc=devnull+tim.kovalenko.proton.me@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=driver-core@lists.linux.dev \
    --cc=gary@garyguo.net \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=nathan@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=nsc@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tim.kovalenko@proton.me \
    --cc=tmgross@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.