public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
@ 2026-03-19  5:36 Alexandre Courbot
  2026-03-19  6:41 ` Eliot Courtney
  2026-03-20 12:54 ` Danilo Krummrich
  0 siblings, 2 replies; 5+ messages in thread
From: Alexandre Courbot @ 2026-03-19  5:36 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Alistair Popple
  Cc: John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel,
	Alexandre Courbot

`driver_read_area` and `driver_write_area` are internal methods that
return slices containing the area of the command queue buffer that the
driver has exclusive read of write access, respectively.

While their returned value is correct and safe to use, internally they
temporarily create a reference to the whole command-buffer slice,
including GSP-owned regions. These regions can change without notice,
and thus creating a slice to them is undefined behavior.

Fix this by replacing the slice logic with pointer arithmetic and
creating slices to valid regions only. It relies on unsafe code, but
should be mostly replaced by `IoView` and `IoSlice` once they land.

Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
 1 file changed, 100 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index d36a62ba1c60..4200e7986774 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
     /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
     /// that case the second slice will have a non-zero length.
     fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
-        let tx = self.cpu_write_ptr() as usize;
-        let rx = self.gsp_read_ptr() as usize;
+        let tx = num::u32_as_usize(self.cpu_write_ptr());
+        let rx = num::u32_as_usize(self.gsp_read_ptr());
+        // Number of pages between `tx` and the end of the command queue.
+        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
+        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
 
+        // Pointer to the start of the CPU message queue.
+        //
         // SAFETY:
-        // - The `CoherentAllocation` contains exactly one object.
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
-        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
-        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
+        // - `self.0` contains exactly one element.
+        // - `cpuq.msgq.data[0]` is within the bounds of that element.
+        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
 
-        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for writing.
+        // Safety/Panic comments to be referenced by the code below.
+        //
+        // SAFETY[1]:
+        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
+        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
+        //   the GSP.
+        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
+        //
+        // PANIC[1]:
+        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
+        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
 
         if rx == 0 {
-            // Since `rx` is zero, leave an empty slot at end of the buffer.
-            let last = after_tx.len() - 1;
-            (&mut after_tx[..last], &mut [])
+            (
+                // SAFETY: See SAFETY[1].
+                unsafe {
+                    core::slice::from_raw_parts_mut(
+                        data.add(tx),
+                        // Since `rx` is zero, leave an empty slot at end of the buffer.
+                        // PANIC: See PANIC[1].
+                        after_tx_len - 1,
+                    )
+                },
+                &mut [],
+            )
         } else if rx <= tx {
             // The area is discontiguous and we leave an empty slot before `rx`.
-            // PANIC:
-            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
-            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
-            //   `rx <= tx` in this branch.
-            (after_tx, &mut before_tx[..(rx - 1)])
+            (
+                // SAFETY: See SAFETY[1].
+                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
+                // SAFETY: See SAFETY[1].
+                unsafe {
+                    core::slice::from_raw_parts_mut(
+                        data,
+                        // Leave one empty slot before `rx`.
+                        // PANIC:
+                        // - See PANIC[1].
+                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
+                        rx - 1,
+                    )
+                },
+            )
         } else {
             // The area is contiguous and we leave an empty slot before `rx`.
-            // PANIC:
-            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
-            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
-            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
-            (&mut after_tx[..(rx - tx - 1)], &mut [])
+            (
+                // SAFETY: See SAFETY[1].
+                unsafe {
+                    core::slice::from_raw_parts_mut(
+                        data.add(tx),
+                        // PANIC:
+                        // - See PANIC[1].
+                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
+                        rx - tx - 1,
+                    )
+                },
+                &mut [],
+            )
         }
     }
 
@@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
         let tx = self.gsp_write_ptr() as usize;
         let rx = self.cpu_read_ptr() as usize;
 
+        // Pointer to the start of the GSP message queue.
+        //
         // SAFETY:
-        // - The `CoherentAllocation` contains exactly one object.
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
-        let data = &gsp_mem.gspq.msgq.data;
+        // - `self.0` contains exactly one element.
+        // - `gspq.msgq.data[0]` is within the bounds of that element.
+        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
+
+        // Safety/Panic comments to be referenced by the code below.
+        //
+        // SAFETY[1]:
+        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
+        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
+        //   the GSP.
+        //
+        // PANIC[1]:
+        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
+        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
 
-        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for reading.
-        // PANIC:
-        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
-        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
         if rx <= tx {
             // The area is contiguous.
-            (&data[rx..tx], &[])
+            (
+                // SAFETY: See SAFETY[1].
+                // PANIC:
+                // - See PANIC[1].
+                // - Per the branch test, `rx <= tx`.
+                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
+                &[],
+            )
         } else {
             // The area is discontiguous.
-            (&data[rx..], &data[..tx])
+            (
+                // SAFETY: See SAFETY[1].
+                // PANIC: See PANIC[1].
+                unsafe {
+                    core::slice::from_raw_parts(
+                        data.add(rx),
+                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
+                    )
+                },
+                // SAFETY: See SAFETY[1].
+                // PANIC: See PANIC[1].
+                unsafe { core::slice::from_raw_parts(data, tx) },
+            )
         }
     }
 

---
base-commit: a19457958c3018783881c4416f272cd594f13049
change-id: 20260319-cmdq-ub-fix-d57b09a745b9

Best regards,
-- 
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
@ 2026-03-19  6:41 ` Eliot Courtney
  2026-03-19  8:24   ` Alexandre Courbot
  2026-03-20 12:54 ` Danilo Krummrich
  1 sibling, 1 reply; 5+ messages in thread
From: Eliot Courtney @ 2026-03-19  6:41 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Alistair Popple
  Cc: John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel

On Thu Mar 19, 2026 at 2:36 PM JST, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read of write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them is undefined behavior.
>
> Fix this by replacing the slice logic with pointer arithmetic and
> creating slices to valid regions only. It relies on unsafe code, but
> should be mostly replaced by `IoView` and `IoSlice` once they land.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>  1 file changed, 100 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index d36a62ba1c60..4200e7986774 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.cpu_write_ptr() as usize;
> -        let rx = self.gsp_read_ptr() as usize;
> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
> +        // Number of pages between `tx` and the end of the command queue.
> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>  
> +        // Pointer to the start of the CPU message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +        // - `self.0` contains exactly one element.
> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>  
> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for writing.
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
> +        //   the GSP.
> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
>  
>          if rx == 0 {
> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
> -            let last = after_tx.len() - 1;
> -            (&mut after_tx[..last], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
> +                        // PANIC: See PANIC[1].
> +                        after_tx_len - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          } else if rx <= tx {
>              // The area is discontiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
> -            //   `rx <= tx` in this branch.
> -            (after_tx, &mut before_tx[..(rx - 1)])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data,
> +                        // Leave one empty slot before `rx`.
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
> +                        rx - 1,
> +                    )
> +                },
> +            )
>          } else {
>              // The area is contiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> +                        rx - tx - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          }
>      }
>  
> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;

Should we use u32_as_usize here too?

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  6:41 ` Eliot Courtney
@ 2026-03-19  8:24   ` Alexandre Courbot
  0 siblings, 0 replies; 5+ messages in thread
From: Alexandre Courbot @ 2026-03-19  8:24 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Alistair Popple, John Hubbard, Joel Fernandes, Timur Tabi,
	Zhi Wang, rust-for-linux, dri-devel, linux-kernel

On Thu Mar 19, 2026 at 3:41 PM JST, Eliot Courtney wrote:
> On Thu Mar 19, 2026 at 2:36 PM JST, Alexandre Courbot wrote:
>> `driver_read_area` and `driver_write_area` are internal methods that
>> return slices containing the area of the command queue buffer that the
>> driver has exclusive read of write access, respectively.
>>
>> While their returned value is correct and safe to use, internally they
>> temporarily create a reference to the whole command-buffer slice,
>> including GSP-owned regions. These regions can change without notice,
>> and thus creating a slice to them is undefined behavior.
>>
>> Fix this by replacing the slice logic with pointer arithmetic and
>> creating slices to valid regions only. It relies on unsafe code, but
>> should be mostly replaced by `IoView` and `IoSlice` once they land.
>>
>> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
>> Suggested-by: Danilo Krummrich <dakr@kernel.org>
>> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>>  1 file changed, 100 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index d36a62ba1c60..4200e7986774 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>>      /// that case the second slice will have a non-zero length.
>>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
>> -        let tx = self.cpu_write_ptr() as usize;
>> -        let rx = self.gsp_read_ptr() as usize;
>> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
>> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
>> +        // Number of pages between `tx` and the end of the command queue.
>> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>>  
>> +        // Pointer to the start of the CPU message queue.
>> +        //
>>          // SAFETY:
>> -        // - The `CoherentAllocation` contains exactly one object.
>> -        // - We will only access the driver-owned part of the shared memory.
>> -        // - Per the safety statement of the function, no concurrent access will be performed.
>> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
>> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
>> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
>> +        // - `self.0` contains exactly one element.
>> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
>> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>>  
>> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
>> -        // belongs to the driver for writing.
>> +        // Safety/Panic comments to be referenced by the code below.
>> +        //
>> +        // SAFETY[1]:
>> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
>> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
>> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
>> +        //   the GSP.
>> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
>> +        //
>> +        // PANIC[1]:
>> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
>>  
>>          if rx == 0 {
>> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
>> -            let last = after_tx.len() - 1;
>> -            (&mut after_tx[..last], &mut [])
>> +            (
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe {
>> +                    core::slice::from_raw_parts_mut(
>> +                        data.add(tx),
>> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
>> +                        // PANIC: See PANIC[1].
>> +                        after_tx_len - 1,
>> +                    )
>> +                },
>> +                &mut [],
>> +            )
>>          } else if rx <= tx {
>>              // The area is discontiguous and we leave an empty slot before `rx`.
>> -            // PANIC:
>> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
>> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
>> -            //   `rx <= tx` in this branch.
>> -            (after_tx, &mut before_tx[..(rx - 1)])
>> +            (
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe {
>> +                    core::slice::from_raw_parts_mut(
>> +                        data,
>> +                        // Leave one empty slot before `rx`.
>> +                        // PANIC:
>> +                        // - See PANIC[1].
>> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
>> +                        rx - 1,
>> +                    )
>> +                },
>> +            )
>>          } else {
>>              // The area is contiguous and we leave an empty slot before `rx`.
>> -            // PANIC:
>> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
>> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
>> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
>> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
>> +            (
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe {
>> +                    core::slice::from_raw_parts_mut(
>> +                        data.add(tx),
>> +                        // PANIC:
>> +                        // - See PANIC[1].
>> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
>> +                        rx - tx - 1,
>> +                    )
>> +                },
>> +                &mut [],
>> +            )
>>          }
>>      }
>>  
>> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>>          let tx = self.gsp_write_ptr() as usize;
>>          let rx = self.cpu_read_ptr() as usize;
>
> Should we use u32_as_usize here too?

Absolutely - I forgot to update that part (although arguably that could
be a separate patch).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
  2026-03-19  6:41 ` Eliot Courtney
@ 2026-03-20 12:54 ` Danilo Krummrich
  2026-03-23  2:58   ` Alexandre Courbot
  1 sibling, 1 reply; 5+ messages in thread
From: Danilo Krummrich @ 2026-03-20 12:54 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alice Ryhl, David Airlie, Simona Vetter, Alistair Popple,
	John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel

On Thu Mar 19, 2026 at 6:36 AM CET, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read of write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them is undefined behavior.
>
> Fix this by replacing the slice logic with pointer arithmetic and
> creating slices to valid regions only. It relies on unsafe code, but
> should be mostly replaced by `IoView` and `IoSlice` once they land.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Suggested-by: Danilo Krummrich <dakr@kernel.org>

Should be Reported-by:.

> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/

Should be Closes:.

> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>  1 file changed, 100 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index d36a62ba1c60..4200e7986774 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.cpu_write_ptr() as usize;
> -        let rx = self.gsp_read_ptr() as usize;
> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
> +        // Number of pages between `tx` and the end of the command queue.
> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>  
> +        // Pointer to the start of the CPU message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +        // - `self.0` contains exactly one element.
> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>  
> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for writing.
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
> +        //   the GSP.
> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.

I didn't do the math, but can't we just calculate the offset values in the below
if-else-if-else blocks and call from_raw_parts_mut() once with the safety
comment above? I think that'd be much cleaner. Similar for
driver_write_area_size().

>          if rx == 0 {
> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
> -            let last = after_tx.len() - 1;
> -            (&mut after_tx[..last], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
> +                        // PANIC: See PANIC[1].
> +                        after_tx_len - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          } else if rx <= tx {
>              // The area is discontiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
> -            //   `rx <= tx` in this branch.
> -            (after_tx, &mut before_tx[..(rx - 1)])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data,
> +                        // Leave one empty slot before `rx`.
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
> +                        rx - 1,
> +                    )
> +                },
> +            )
>          } else {
>              // The area is contiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> +                        rx - tx - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          }
>      }
>  
> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;
>  
> +        // Pointer to the start of the GSP message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> -        let data = &gsp_mem.gspq.msgq.data;
> +        // - `self.0` contains exactly one element.
> +        // - `gspq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
> +
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
> +        //   the GSP.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>  
> -        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for reading.
> -        // PANIC:
> -        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
> -        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
>          if rx <= tx {
>              // The area is contiguous.
> -            (&data[rx..tx], &[])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                // PANIC:
> +                // - See PANIC[1].
> +                // - Per the branch test, `rx <= tx`.
> +                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
> +                &[],
> +            )
>          } else {
>              // The area is discontiguous.
> -            (&data[rx..], &data[..tx])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                // PANIC: See PANIC[1].
> +                unsafe {
> +                    core::slice::from_raw_parts(
> +                        data.add(rx),
> +                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
> +                    )
> +                },
> +                // SAFETY: See SAFETY[1].
> +                // PANIC: See PANIC[1].
> +                unsafe { core::slice::from_raw_parts(data, tx) },
> +            )
>          }
>      }
>  
>
> ---
> base-commit: a19457958c3018783881c4416f272cd594f13049
> change-id: 20260319-cmdq-ub-fix-d57b09a745b9
>
> Best regards,
> -- 
> Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-20 12:54 ` Danilo Krummrich
@ 2026-03-23  2:58   ` Alexandre Courbot
  0 siblings, 0 replies; 5+ messages in thread
From: Alexandre Courbot @ 2026-03-23  2:58 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alice Ryhl, David Airlie, Simona Vetter, Alistair Popple,
	John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel

On Fri Mar 20, 2026 at 9:54 PM JST, Danilo Krummrich wrote:
> On Thu Mar 19, 2026 at 6:36 AM CET, Alexandre Courbot wrote:
>> `driver_read_area` and `driver_write_area` are internal methods that
>> return slices containing the area of the command queue buffer that the
>> driver has exclusive read of write access, respectively.
>>
>> While their returned value is correct and safe to use, internally they
>> temporarily create a reference to the whole command-buffer slice,
>> including GSP-owned regions. These regions can change without notice,
>> and thus creating a slice to them is undefined behavior.
>>
>> Fix this by replacing the slice logic with pointer arithmetic and
>> creating slices to valid regions only. It relies on unsafe code, but
>> should be mostly replaced by `IoView` and `IoSlice` once they land.
>>
>> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
>> Suggested-by: Danilo Krummrich <dakr@kernel.org>
>
> Should be Reported-by:.
>
>> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
>
> Should be Closes:.
>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>>  1 file changed, 100 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index d36a62ba1c60..4200e7986774 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>>      /// that case the second slice will have a non-zero length.
>>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
>> -        let tx = self.cpu_write_ptr() as usize;
>> -        let rx = self.gsp_read_ptr() as usize;
>> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
>> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
>> +        // Number of pages between `tx` and the end of the command queue.
>> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>>  
>> +        // Pointer to the start of the CPU message queue.
>> +        //
>>          // SAFETY:
>> -        // - The `CoherentAllocation` contains exactly one object.
>> -        // - We will only access the driver-owned part of the shared memory.
>> -        // - Per the safety statement of the function, no concurrent access will be performed.
>> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
>> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
>> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
>> +        // - `self.0` contains exactly one element.
>> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
>> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>>  
>> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
>> -        // belongs to the driver for writing.
>> +        // Safety/Panic comments to be referenced by the code below.
>> +        //
>> +        // SAFETY[1]:
>> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
>> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
>> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
>> +        //   the GSP.
>> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
>> +        //
>> +        // PANIC[1]:
>> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
>
> I didn't do the math, but can't we just calculate the offset values in the below
> if-else-if-else blocks and call from_raw_parts_mut() once with the safety
> comment above? I think that'd be much cleaner. Similar for
> driver_write_area_size().

Much cleaner and much simpler - great suggestion, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-23  2:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
2026-03-19  6:41 ` Eliot Courtney
2026-03-19  8:24   ` Alexandre Courbot
2026-03-20 12:54 ` Danilo Krummrich
2026-03-23  2:58   ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox