* [PATCH 1/4] gpu: nova-core: gsp: fix incorrect advancing of write pointer
2026-01-22 2:59 [PATCH 0/4] gpu: nova-core: gsp: fix command queue ring buffer bugs Eliot Courtney
@ 2026-01-22 2:59 ` Eliot Courtney
2026-01-23 18:28 ` Gary Guo
2026-01-22 2:59 ` [PATCH 2/4] gpu: nova-core: gsp: clarify comments about invariants and pointer roles Eliot Courtney
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Eliot Courtney @ 2026-01-22 2:59 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel, Eliot Courtney
We should modulo not bitwise-and here. The current code could, for
example, set wptr to MSGQ_NUM_PAGES which is not valid.
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
drivers/gpu/nova-core/gsp/cmdq.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 46819a82a51a..f139aad7af3f 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -384,7 +384,7 @@ fn cpu_write_ptr(&self) -> u32 {
// Informs the GSP that it can process `elem_count` new pages from the command queue.
fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
- let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
+ let wptr = self.cpu_write_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
let gsp_mem = self.0.start_ptr_mut();
// SAFETY:
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 1/4] gpu: nova-core: gsp: fix incorrect advancing of write pointer
2026-01-22 2:59 ` [PATCH 1/4] gpu: nova-core: gsp: fix incorrect advancing of write pointer Eliot Courtney
@ 2026-01-23 18:28 ` Gary Guo
2026-01-26 4:02 ` Eliot Courtney
0 siblings, 1 reply; 13+ messages in thread
From: Gary Guo @ 2026-01-23 18:28 UTC (permalink / raw)
To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
David Airlie, Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel
On Thu Jan 22, 2026 at 2:59 AM GMT, Eliot Courtney wrote:
> We should modulo not bitwise-and here. The current code could, for
> example, set wptr to MSGQ_NUM_PAGES which is not valid.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
> drivers/gpu/nova-core/gsp/cmdq.rs | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 46819a82a51a..f139aad7af3f 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -384,7 +384,7 @@ fn cpu_write_ptr(&self) -> u32 {
>
> // Informs the GSP that it can process `elem_count` new pages from the command queue.
> fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> - let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
> + let wptr = self.cpu_write_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
So the actual number of pages is indeed 0x3F, not 0x40? That's interesting...
Best,
Gary
> let gsp_mem = self.0.start_ptr_mut();
>
> // SAFETY:
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 1/4] gpu: nova-core: gsp: fix incorrect advancing of write pointer
2026-01-23 18:28 ` Gary Guo
@ 2026-01-26 4:02 ` Eliot Courtney
0 siblings, 0 replies; 13+ messages in thread
From: Eliot Courtney @ 2026-01-26 4:02 UTC (permalink / raw)
To: Gary Guo, Eliot Courtney, Danilo Krummrich, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel
On Sat Jan 24, 2026 at 3:28 AM JST, Gary Guo wrote:
>> - let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
>> + let wptr = self.cpu_write_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
>
> So the actual number of pages is indeed 0x3F, not 0x40? That's interesting...
Yes, that's right. I thought it was weird as well, but the reason is
that the first page stores the rx/tx pointers (see `Msgq`) and the
remaining 63 are used for the actual ring buffer.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/4] gpu: nova-core: gsp: clarify comments about invariants and pointer roles
2026-01-22 2:59 [PATCH 0/4] gpu: nova-core: gsp: fix command queue ring buffer bugs Eliot Courtney
2026-01-22 2:59 ` [PATCH 1/4] gpu: nova-core: gsp: fix incorrect advancing of write pointer Eliot Courtney
@ 2026-01-22 2:59 ` Eliot Courtney
2026-01-22 2:59 ` [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq Eliot Courtney
2026-01-22 2:59 ` [PATCH 4/4] gpu: nova-core: gsp: fix improper indexing in driver_read_area Eliot Courtney
3 siblings, 0 replies; 13+ messages in thread
From: Eliot Courtney @ 2026-01-22 2:59 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel, Eliot Courtney
Disambiguate a few things in comments in cmdq.rs.
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
drivers/gpu/nova-core/gsp/cmdq.rs | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index f139aad7af3f..09c28eeb6f12 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -161,12 +161,14 @@ struct GspMem {
/// Self-mapping page table entries.
ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
/// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
- /// write and read pointers that the CPU updates.
+ /// write and read pointers that the CPU updates. This means that the read pointer here is an
+ /// index into the GSP queue.
///
/// This member is read-only for the GSP.
cpuq: Msgq,
/// GSP queue: the GSP writes messages here, and the driver reads them. It also contains the
- /// write and read pointers that the GSP updates.
+ /// write and read pointers that the GSP updates. This means that the read pointer here is an
+ /// index into the CPU queue.
///
/// This member is read-only for the driver.
gspq: Msgq,
@@ -222,7 +224,7 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
// - We will only access the driver-owned part of the shared memory.
// - Per the safety statement of the function, no concurrent access will be performed.
let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
- // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `<= MSGQ_NUM_PAGES`.
+ // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
if rx <= tx {
@@ -257,7 +259,7 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
// - We will only access the driver-owned part of the shared memory.
// - Per the safety statement of the function, no concurrent access will be performed.
let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
- // PANIC: per the invariant of `cpu_read_ptr`, `xx` is `<= MSGQ_NUM_PAGES`.
+ // PANIC: per the invariant of `cpu_read_ptr`, `rx` is `< MSGQ_NUM_PAGES`.
let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
match tx.cmp(&rx) {
@@ -315,7 +317,7 @@ fn allocate_command(&mut self, size: usize) -> Result<GspCommand<'_>> {
//
// # Invariants
//
- // - The returned value is between `0` and `MSGQ_NUM_PAGES`.
+ // - The returned value is between `0` and `MSGQ_NUM_PAGES - 1`, inclusive.
fn gsp_write_ptr(&self) -> u32 {
let gsp_mem = self.0.start_ptr();
@@ -329,7 +331,7 @@ fn gsp_write_ptr(&self) -> u32 {
//
// # Invariants
//
- // - The returned value is between `0` and `MSGQ_NUM_PAGES`.
+ // - The returned value is between `0` and `MSGQ_NUM_PAGES - 1`, inclusive.
fn gsp_read_ptr(&self) -> u32 {
let gsp_mem = self.0.start_ptr();
@@ -343,7 +345,7 @@ fn gsp_read_ptr(&self) -> u32 {
//
// # Invariants
//
- // - The returned value is between `0` and `MSGQ_NUM_PAGES`.
+ // - The returned value is between `0` and `MSGQ_NUM_PAGES - 1`, inclusive.
fn cpu_read_ptr(&self) -> u32 {
let gsp_mem = self.0.start_ptr();
@@ -372,7 +374,7 @@ fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
//
// # Invariants
//
- // - The returned value is between `0` and `MSGQ_NUM_PAGES`.
+ // - The returned value is between `0` and `MSGQ_NUM_PAGES - 1`, inclusive.
fn cpu_write_ptr(&self) -> u32 {
let gsp_mem = self.0.start_ptr();
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-22 2:59 [PATCH 0/4] gpu: nova-core: gsp: fix command queue ring buffer bugs Eliot Courtney
2026-01-22 2:59 ` [PATCH 1/4] gpu: nova-core: gsp: fix incorrect advancing of write pointer Eliot Courtney
2026-01-22 2:59 ` [PATCH 2/4] gpu: nova-core: gsp: clarify comments about invariants and pointer roles Eliot Courtney
@ 2026-01-22 2:59 ` Eliot Courtney
2026-01-22 3:26 ` John Hubbard
2026-01-23 18:31 ` Gary Guo
2026-01-22 2:59 ` [PATCH 4/4] gpu: nova-core: gsp: fix improper indexing in driver_read_area Eliot Courtney
3 siblings, 2 replies; 13+ messages in thread
From: Eliot Courtney @ 2026-01-22 2:59 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel, Eliot Courtney
The current code hands out buffers that go all the way up to and
including `rx - 1`, but we need to maintain an empty slot to prevent the
ring buffer from wrapping around into having 'tx == rx', which means
empty.
Also add more rigorous no-panic proofs.
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
drivers/gpu/nova-core/gsp/cmdq.rs | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 09c28eeb6f12..b6d6093e3ac0 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -227,21 +227,24 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
// PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
- if rx <= tx {
- // The area from `tx` up to the end of the ring, and from the beginning of the ring up
- // to `rx`, minus one unit, belongs to the driver.
- if rx == 0 {
- let last = after_tx.len() - 1;
- (&mut after_tx[..last], &mut before_tx[0..0])
- } else {
- (after_tx, &mut before_tx[..rx])
- }
+ // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
+ // belongs to the driver for writing.
+ if rx == 0 {
+ // Since `rx` is zero, leave an empty slot at end of the buffer.
+ let last = after_tx.len() - 1;
+ (&mut after_tx[..last], &mut before_tx[0..0])
+ } else if rx > tx {
+ // The area is contiguous and we leave an empty slot before `rx`.
+ // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
+ // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
+ // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
+ // -> `rx - tx - 1 <= after_tx.len()`
+ (&mut after_tx[..(rx - tx - 1)], &mut before_tx[0..0])
} else {
- // The area from `tx` to `rx`, minus one unit, belongs to the driver.
- //
- // PANIC: per the invariants of `cpu_write_ptr` and `gsp_read_ptr`, `rx` and `tx` are
- // `<= MSGQ_NUM_PAGES`, and the test above ensured that `rx > tx`.
- (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
+ // The area is discontiguous and we leave an empty slot before `rx`.
+ // PANIC: since `rx != 0 && rx is unsigned` we have `rx - 1 >= 0`
+ // PANIC: since `rx <= tx && before_tx.len() == tx` we have `rx - 1 <= before_tx.len()`
+ (after_tx, &mut before_tx[..(rx - 1)])
}
}
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-22 2:59 ` [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq Eliot Courtney
@ 2026-01-22 3:26 ` John Hubbard
2026-01-22 5:07 ` Eliot Courtney
2026-01-23 18:31 ` Gary Guo
1 sibling, 1 reply; 13+ messages in thread
From: John Hubbard @ 2026-01-22 3:26 UTC (permalink / raw)
To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
David Airlie, Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel
On 1/21/26 6:59 PM, Eliot Courtney wrote:
> + // The area is contiguous and we leave an empty slot before `rx`.
> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
> + // -> `rx - tx - 1 <= after_tx.len()`
Hi Eliot,
Documentation nit: the proofs are great, but the above just does
not go into my head easily, because it's a proof, rather than a
sentence.
Can you please reword these PANIC comments so that they are complete
sentences, along the lines of:
// PANIC: a > b and therefore c cannot overflow, therefore this
// cannot panic.
And please also use words "and", "therefore", "because", instead of
symbols such as "&&".
Same for the other patch with PANIC comments.
I did a quick search in Rust for Linux, before writing this, in order to
ensure that what I'm recommending is How It Is Done. (Not to claim that
Nova in particular is fully correct, though.)
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-22 3:26 ` John Hubbard
@ 2026-01-22 5:07 ` Eliot Courtney
2026-01-22 9:27 ` Alexandre Courbot
0 siblings, 1 reply; 13+ messages in thread
From: Eliot Courtney @ 2026-01-22 5:07 UTC (permalink / raw)
To: John Hubbard, Eliot Courtney, Danilo Krummrich, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel, dri-devel
On Thu Jan 22, 2026 at 12:26 PM JST, John Hubbard wrote:
> On 1/21/26 6:59 PM, Eliot Courtney wrote:
>> + // The area is contiguous and we leave an empty slot before `rx`.
>> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
>> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
>> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
>> + // -> `rx - tx - 1 <= after_tx.len()`
>
> Hi Eliot,
>
> Documentation nit: the proofs are great, but the above just does
> not go into my head easily, because it's a proof, rather than a
> sentence.
I had a look now and I agree that it looks like plain English is the
defacto standard for the PANIC comments, so I will update them.
But, I wonder what people think about this. IMO it makes sense to have
SAFETY and PANIC comments as rigorous proofs (where practical and
possible) to match the level of work the compiler does for us in the
infalliable areas of the code - if an issue occurs, unsafe or panicking
code is often the root cause IMO. Writing these in plain English is
easier to read but also harder to verify that the proof is correct and
harder to verify if there are any implicit assumptions.
I see there are some guidelines about SAFETY: comments but not about
PANIC: comments in Documentation/rust/coding-guidelines.rst.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-22 5:07 ` Eliot Courtney
@ 2026-01-22 9:27 ` Alexandre Courbot
2026-01-23 14:19 ` Miguel Ojeda
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2026-01-22 9:27 UTC (permalink / raw)
To: Eliot Courtney, Miguel Ojeda, Alice Ryhl
Cc: John Hubbard, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Alistair Popple, nouveau, rust-for-linux,
dri-devel, linux-kernel, dri-devel
On Thu Jan 22, 2026 at 2:07 PM JST, Eliot Courtney wrote:
> On Thu Jan 22, 2026 at 12:26 PM JST, John Hubbard wrote:
>> On 1/21/26 6:59 PM, Eliot Courtney wrote:
>>> + // The area is contiguous and we leave an empty slot before `rx`.
>>> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
>>> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
>>> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
>>> + // -> `rx - tx - 1 <= after_tx.len()`
>>
>> Hi Eliot,
>>
>> Documentation nit: the proofs are great, but the above just does
>> not go into my head easily, because it's a proof, rather than a
>> sentence.
> I had a look now and I agree that it looks like plain English is the
> defacto standard for the PANIC comments, so I will update them.
>
> But, I wonder what people think about this. IMO it makes sense to have
> SAFETY and PANIC comments as rigorous proofs (where practical and
> possible) to match the level of work the compiler does for us in the
> infalliable areas of the code - if an issue occurs, unsafe or panicking
> code is often the root cause IMO. Writing these in plain English is
> easier to read but also harder to verify that the proof is correct and
> harder to verify if there are any implicit assumptions.
>
> I see there are some guidelines about SAFETY: comments but not about
> PANIC: comments in Documentation/rust/coding-guidelines.rst.
Would be interesting to have Miguel/Alice and the core team's opinion
on this IMHO.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-22 9:27 ` Alexandre Courbot
@ 2026-01-23 14:19 ` Miguel Ojeda
0 siblings, 0 replies; 13+ messages in thread
From: Miguel Ojeda @ 2026-01-23 14:19 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Eliot Courtney, Miguel Ojeda, Alice Ryhl, John Hubbard,
Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple,
nouveau, rust-for-linux, dri-devel, linux-kernel, dri-devel
On Thu, Jan 22, 2026 at 10:29 AM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> Would be interesting to have Miguel/Alice and the core team's opinion
> on this IMHO.
[ Re-replying to this so that it gets saved in the list. ]
This is something that we have discussed back and forth, including
making it way more formal, even using special notation, etc.
Please take a look at Benno’s proposal at:
https://lore.kernel.org/all/20240717221133.459589-1-benno.lossin@proton.me/
(He may have a newer link)
Cheers,
Miguel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-22 2:59 ` [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq Eliot Courtney
2026-01-22 3:26 ` John Hubbard
@ 2026-01-23 18:31 ` Gary Guo
2026-01-26 4:17 ` Eliot Courtney
1 sibling, 1 reply; 13+ messages in thread
From: Gary Guo @ 2026-01-23 18:31 UTC (permalink / raw)
To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
David Airlie, Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel
On Thu Jan 22, 2026 at 2:59 AM GMT, Eliot Courtney wrote:
> The current code hands out buffers that go all the way up to and
> including `rx - 1`, but we need to maintain an empty slot to prevent the
> ring buffer from wrapping around into having 'tx == rx', which means
> empty.
Doesn't this mean that you're effectively wasting a slot? You can easily
implementing a ring buffer that allows you to disambiguate empty/full while
still using all slots.
A common approach is to only do modulo/masking operation before accessing the
slot. Then `write_ptr.wrapping_sub(read_ptr)` will give you the accurate length of
things inside the ring buffer.
Best,
Gary
>
> Also add more rigorous no-panic proofs.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
> drivers/gpu/nova-core/gsp/cmdq.rs | 31 +++++++++++++++++--------------
> 1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 09c28eeb6f12..b6d6093e3ac0 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -227,21 +227,24 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
>
> - if rx <= tx {
> - // The area from `tx` up to the end of the ring, and from the beginning of the ring up
> - // to `rx`, minus one unit, belongs to the driver.
> - if rx == 0 {
> - let last = after_tx.len() - 1;
> - (&mut after_tx[..last], &mut before_tx[0..0])
> - } else {
> - (after_tx, &mut before_tx[..rx])
> - }
> + // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> + // belongs to the driver for writing.
> + if rx == 0 {
> + // Since `rx` is zero, leave an empty slot at end of the buffer.
> + let last = after_tx.len() - 1;
> + (&mut after_tx[..last], &mut before_tx[0..0])
> + } else if rx > tx {
> + // The area is contiguous and we leave an empty slot before `rx`.
> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
> + // -> `rx - tx - 1 <= after_tx.len()`
> + (&mut after_tx[..(rx - tx - 1)], &mut before_tx[0..0])
> } else {
> - // The area from `tx` to `rx`, minus one unit, belongs to the driver.
> - //
> - // PANIC: per the invariants of `cpu_write_ptr` and `gsp_read_ptr`, `rx` and `tx` are
> - // `<= MSGQ_NUM_PAGES`, and the test above ensured that `rx > tx`.
> - (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
> + // The area is discontiguous and we leave an empty slot before `rx`.
> + // PANIC: since `rx != 0 && rx is unsigned` we have `rx - 1 >= 0`
> + // PANIC: since `rx <= tx && before_tx.len() == tx` we have `rx - 1 <= before_tx.len()`
> + (after_tx, &mut before_tx[..(rx - 1)])
> }
> }
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq
2026-01-23 18:31 ` Gary Guo
@ 2026-01-26 4:17 ` Eliot Courtney
0 siblings, 0 replies; 13+ messages in thread
From: Eliot Courtney @ 2026-01-26 4:17 UTC (permalink / raw)
To: Gary Guo, Eliot Courtney, Danilo Krummrich, Alexandre Courbot,
Alice Ryhl, David Airlie, Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel, dri-devel
On Sat Jan 24, 2026 at 3:31 AM JST, Gary Guo wrote:
> On Thu Jan 22, 2026 at 2:59 AM GMT, Eliot Courtney wrote:
>> The current code hands out buffers that go all the way up to and
>> including `rx - 1`, but we need to maintain an empty slot to prevent the
>> ring buffer from wrapping around into having 'tx == rx', which means
>> empty.
>
> Doesn't this mean that you're effectively wasting a slot? You can easily
> implementing a ring buffer that allows you to disambiguate empty/full while
> still using all slots.
>
> A common approach is to only do modulo/masking operation before accessing the
> slot. Then `write_ptr.wrapping_sub(read_ptr)` will give you the accurate length of
> things inside the ring buffer.
Yes, we are wasting a slot (the current code also does this). But,
this is the protocol between the CPU and GSP, so the GSP actually
expects this and we can't change the semantics easily, IIUC.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] gpu: nova-core: gsp: fix improper indexing in driver_read_area
2026-01-22 2:59 [PATCH 0/4] gpu: nova-core: gsp: fix command queue ring buffer bugs Eliot Courtney
` (2 preceding siblings ...)
2026-01-22 2:59 ` [PATCH 3/4] gpu: nova-core: gsp: fix improper handling of empty slot in cmdq Eliot Courtney
@ 2026-01-22 2:59 ` Eliot Courtney
3 siblings, 0 replies; 13+ messages in thread
From: Eliot Courtney @ 2026-01-22 2:59 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
Simona Vetter, Alistair Popple
Cc: nouveau, rust-for-linux, dri-devel, linux-kernel, Eliot Courtney
The current code indexes into `after_rx` using `tx` which is an index
for the whole buffer, not the split buffer `after_rx`.
Also add more rigorous no-panic proofs.
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
drivers/gpu/nova-core/gsp/cmdq.rs | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index b6d6093e3ac0..5ca1d757d4a3 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
use core::{
- cmp,
mem,
sync::atomic::{
fence,
@@ -265,10 +264,18 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
// PANIC: per the invariant of `cpu_read_ptr`, `rx` is `< MSGQ_NUM_PAGES`.
let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
- match tx.cmp(&rx) {
- cmp::Ordering::Equal => (&after_rx[0..0], &after_rx[0..0]),
- cmp::Ordering::Greater => (&after_rx[..tx], &before_rx[0..0]),
- cmp::Ordering::Less => (after_rx, &before_rx[..tx]),
+ // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
+ // belongs to the driver for reading.
+ if rx <= tx {
+ // The area is contiguous.
+ // PANIC: since `rx <= tx` we have `tx - rx >= 0`
+ // PANIC: since `tx < MSGQ_NUM_PAGES && after_rx.len() == MSGQ_NUM_PAGES - rx`:
+ // `tx - rx <= MSGQ_NUM_PAGES - rx` -> `tx - rx <= after_rx.len()`
+ (&after_rx[..(tx - rx)], &after_rx[0..0])
+ } else {
+ // The area is discontiguous.
+ // PANIC: since `tx < rx && before_rx.len() == rx` we have `tx <= before_rx.len()`
+ (after_rx, &before_rx[..tx])
}
}
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread