public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Danilo Krummrich" <dakr@kernel.org>
To: "Alexandre Courbot" <acourbot@nvidia.com>
Cc: "Alice Ryhl" <aliceryhl@google.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Alistair Popple" <apopple@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Timur Tabi" <ttabi@nvidia.com>, "Zhi Wang" <zhiw@nvidia.com>,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	<rust-for-linux@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
Date: Fri, 20 Mar 2026 13:54:24 +0100	[thread overview]
Message-ID: <DH7MIC1ECKRZ.1YN5T4ATB7A2T@kernel.org> (raw)
In-Reply-To: <20260319-cmdq-ub-fix-v1-1-0f9f6e8f3ce3@nvidia.com>

On Thu Mar 19, 2026 at 6:36 AM CET, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read of write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them is undefined behavior.
>
> Fix this by replacing the slice logic with pointer arithmetic and
> creating slices to valid regions only. It relies on unsafe code, but
> should be mostly replaced by `IoView` and `IoSlice` once they land.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Suggested-by: Danilo Krummrich <dakr@kernel.org>

Should be Reported-by:.

> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/

Should be Closes:.

> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>  1 file changed, 100 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index d36a62ba1c60..4200e7986774 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.cpu_write_ptr() as usize;
> -        let rx = self.gsp_read_ptr() as usize;
> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
> +        // Number of pages between `tx` and the end of the command queue.
> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>  
> +        // Pointer to the start of the CPU message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +        // - `self.0` contains exactly one element.
> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>  
> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for writing.
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
> +        //   the GSP.
> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.

I didn't do the math, but can't we just calculate the offset values in the below
if-else-if-else blocks and call from_raw_parts_mut() once with the safety
comment above? I think that'd be much cleaner. Similar for
driver_write_area_size().

>          if rx == 0 {
> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
> -            let last = after_tx.len() - 1;
> -            (&mut after_tx[..last], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
> +                        // PANIC: See PANIC[1].
> +                        after_tx_len - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          } else if rx <= tx {
>              // The area is discontiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
> -            //   `rx <= tx` in this branch.
> -            (after_tx, &mut before_tx[..(rx - 1)])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data,
> +                        // Leave one empty slot before `rx`.
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
> +                        rx - 1,
> +                    )
> +                },
> +            )
>          } else {
>              // The area is contiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> +                        rx - tx - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          }
>      }
>  
> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;
>  
> +        // Pointer to the start of the GSP message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> -        let data = &gsp_mem.gspq.msgq.data;
> +        // - `self.0` contains exactly one element.
> +        // - `gspq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
> +
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
> +        //   the GSP.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>  
> -        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for reading.
> -        // PANIC:
> -        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
> -        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
>          if rx <= tx {
>              // The area is contiguous.
> -            (&data[rx..tx], &[])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                // PANIC:
> +                // - See PANIC[1].
> +                // - Per the branch test, `rx <= tx`.
> +                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
> +                &[],
> +            )
>          } else {
>              // The area is discontiguous.
> -            (&data[rx..], &data[..tx])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                // PANIC: See PANIC[1].
> +                unsafe {
> +                    core::slice::from_raw_parts(
> +                        data.add(rx),
> +                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
> +                    )
> +                },
> +                // SAFETY: See SAFETY[1].
> +                // PANIC: See PANIC[1].
> +                unsafe { core::slice::from_raw_parts(data, tx) },
> +            )
>          }
>      }
>  
>
> ---
> base-commit: a19457958c3018783881c4416f272cd594f13049
> change-id: 20260319-cmdq-ub-fix-d57b09a745b9
>
> Best regards,
> -- 
> Alexandre Courbot <acourbot@nvidia.com>


  parent reply	other threads:[~2026-03-20 12:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
2026-03-19  6:41 ` Eliot Courtney
2026-03-19  8:24   ` Alexandre Courbot
2026-03-20 12:54 ` Danilo Krummrich [this message]
2026-03-23  2:58   ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DH7MIC1ECKRZ.1YN5T4ATB7A2T@kernel.org \
    --to=dakr@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ecourtney@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=ttabi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox