* [PATCH 0/2] gpu: nova-core: use GSP RPC sequence numbers properly
@ 2026-02-11 0:04 John Hubbard
2026-02-11 0:04 ` [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM John Hubbard
2026-02-11 0:04 ` [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification John Hubbard
0 siblings, 2 replies; 6+ messages in thread
From: John Hubbard @ 2026-02-11 0:04 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot
Cc: Joel Fernandes, Timur Tabi, Alistair Popple, Eliot Courtney,
Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, nouveau, rust-for-linux, LKML, John Hubbard,
Maneet Singh
There are two sequence number fields per GSP RPC message: an outer
transport-level seqNum in the GSP_MSG_QUEUE_ELEMENT header, and an
inner rpc.sequence in the rpc_message_header_v. Nova was leaving the
inner field at zero for all sends and had no way to tell async events
apart from command responses on the receive side.
This led to really confusing log messages and code as well.
Patch 1 fixes the wire-level sequence numbers to match Open RM.
Patch 2 updates the debug logging to show the message classification.
This is "GSP future friendly", in that it is written with GSP seq number
handling plans in mind. I'm including Maneet Singh to help poke holes in
that claim.
Example output:
GSP RPC: async send: seq# 0, function=GSP_SET_SYSTEM_INFO, length=0x3f0
GSP RPC: async send: seq# 1, function=SET_REGISTRY, length=0xc5
GSP RPC: async received: seq# 0, function=LOCKDOWN_NOTICE, length=0x51
GSP RPC: async received: seq# 17, function=INIT_DONE, length=0x50
GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
GSP RPC: response received: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
With the inner rpc.sequence now set correctly, GSP echoes back the
CPU's sequence number in command responses, so the response seq#
matches the send seq#.
This is based on the Feb 5, 2026 linux-next: commit 9845cf73f7db ("Add
linux-next specific files for 20260205").
Also, this should be applied on top of Eliot Courtney's command queue
ring buffer fix series [1]. Without it, I was seeing slice bounds panics
in driver_read_area().
And finally, because I expect this to go in likely after Blackwell (or
not?), I've based it on that. So putting all that together, there is
a branch with [1] and Blackwell and this series, here for convenience:
https://github.com/johnhubbard/linux/tree/nova-core-gsp-rpc-seq-numbers-v0
[1] https://lore.kernel.org/r/20260129-nova-core-cmdq1-v3-0-2ede85493a27@nvidia.com
Cc: Maneet Singh <mmaneetsingh@nvidia.com>
John Hubbard (2):
gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM
gpu: nova-core: improve GSP RPC debug logging with message
classification
drivers/gpu/nova-core/gsp/boot.rs | 6 +-
drivers/gpu/nova-core/gsp/cmdq.rs | 84 +++++++++++++++++++++------
drivers/gpu/nova-core/gsp/commands.rs | 2 +
drivers/gpu/nova-core/gsp/fw.rs | 35 +++++++++--
4 files changed, 104 insertions(+), 23 deletions(-)
base-commit: 6738f143db92177013a500fbea5edd771c173bf9
--
2.53.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM
2026-02-11 0:04 [PATCH 0/2] gpu: nova-core: use GSP RPC sequence numbers properly John Hubbard
@ 2026-02-11 0:04 ` John Hubbard
2026-02-17 8:15 ` Alexandre Courbot
2026-02-11 0:04 ` [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification John Hubbard
1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2026-02-11 0:04 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot
Cc: Joel Fernandes, Timur Tabi, Alistair Popple, Eliot Courtney,
Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, nouveau, rust-for-linux, LKML, John Hubbard,
Maneet Singh
Nova was setting the outer GSP_MSG_QUEUE_ELEMENT.seqNum to an
incrementing counter for all sends, but leaving the inner
rpc_message_header_v.sequence at zero for all sends. Open RM sets the
inner sequence to the counter for sync (command/response) calls and to
zero for async (fire-and-forget) calls.
Split GspMsgElement::init() into separate transport_seq (outer) and
rpc_seq (inner) parameters. The outer seqNum always increments as a
unique transport-level ID for every message. The inner rpc.sequence is
set to 0 for async commands and to the transport counter for sync
commands, matching Open RM behavior.
Add an IS_ASYNC const to the CommandToGsp trait and set it to true for
SetSystemInfo and SetRegistry, the two fire-and-forget RPCs.
Add MsgFunction::is_event() to classify received messages as
GSP-initiated async events vs command responses. This will be used by
the next commit to improve debug logging.
GSP does not yet include SetSystemInfo and SetRegistry in its sequence
counting, but a future GSP firmware update will fold them in. A comment
is added to note this.
Cc: Maneet Singh <mmaneetsingh@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
drivers/gpu/nova-core/gsp/boot.rs | 6 ++++-
drivers/gpu/nova-core/gsp/cmdq.rs | 17 +++++++++++--
drivers/gpu/nova-core/gsp/commands.rs | 2 ++
drivers/gpu/nova-core/gsp/fw.rs | 36 +++++++++++++++++++++++----
4 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 02eec2961b5f..f769e234dae6 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -403,7 +403,11 @@ pub(crate) fn boot(
dev_dbg!(dev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar));
- // Now that GSP is active, send system info and registry
+ // Now that GSP is active, send system info and registry.
+ //
+ // These are async (fire-and-forget) RPCs: no response comes back from GSP.
+ // GSP does not include them in its sequence number counting today, but a
+ // future GSP firmware update will fold them into the normal sequence space.
self.cmdq
.send_command(bar, commands::SetSystemInfo::new(pdev, chipset))?;
self.cmdq.send_command(bar, commands::SetRegistry::new())?;
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 16895f5281b7..7d6d7d81287c 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -58,6 +58,13 @@ pub(crate) trait CommandToGsp {
/// Function identifying this command to the GSP.
const FUNCTION: MsgFunction;
+ /// Whether this command is async (fire-and-forget), meaning no response is expected from GSP.
+ ///
+ /// Async commands get inner `rpc.sequence` set to 0. Sync commands get inner `rpc.sequence`
+ /// set to the transport counter, matching Open RM. The outer `seqNum` always increments
+ /// regardless.
+ const IS_ASYNC: bool = false;
+
/// Type generated by [`CommandToGsp::init`], to be written into the command queue buffer.
type Command: FromBytes + AsBytes;
@@ -439,7 +446,8 @@ struct GspMessage<'a> {
pub(crate) struct Cmdq {
/// Device this command queue belongs to.
dev: ARef<device::Device>,
- /// Current command sequence number.
+ /// Transport-level sequence number, incremented for every send. Used for the outer
+ /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.
seq: u32,
/// Memory area shared with the GSP for communicating commands and messages.
gsp_mem: DmaGspMem,
@@ -514,8 +522,13 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
// Extract area for the command itself.
let (cmd, payload_1) = M::Command::from_bytes_mut_prefix(dst.contents.0).ok_or(EIO)?;
+ // The outer seqNum always increments (transport-level, unique per message).
+ // The inner rpc.sequence is 0 for async (fire-and-forget) commands, or the
+ // sync counter for command/response pairs, matching Open RM behavior.
+ let rpc_seq = if M::IS_ASYNC { 0 } else { self.seq };
+
// Fill the header and command in-place.
- let msg_element = GspMsgElement::init(self.seq, command_size, M::FUNCTION);
+ let msg_element = GspMsgElement::init(self.seq, rpc_seq, command_size, M::FUNCTION);
// SAFETY: `msg_header` and `cmd` are valid references, and not touched if the initializer
// fails.
unsafe {
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index e6a9a1fc6296..c8a73bd30051 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -50,6 +50,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, chipset: Chipset) -> Sel
impl<'a> CommandToGsp for SetSystemInfo<'a> {
const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
+ const IS_ASYNC: bool = true;
type Command = GspSetSystemInfo;
type InitError = Error;
@@ -101,6 +102,7 @@ pub(crate) fn new() -> Self {
impl CommandToGsp for SetRegistry {
const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
+ const IS_ASYNC: bool = true;
type Command = PackedRegistryTable;
type InitError = Infallible;
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 927bcee6a5a5..e417ed58419f 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -260,6 +260,26 @@ pub(crate) enum MsgFunction {
UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
}
+impl MsgFunction {
+ /// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
+ /// a command response (NV_VGPU_MSG_FUNCTION_*).
+ #[expect(dead_code)]
+ pub(crate) fn is_event(&self) -> bool {
+ matches!(
+ self,
+ Self::GspInitDone
+ | Self::GspRunCpuSequencer
+ | Self::PostEvent
+ | Self::RcTriggered
+ | Self::MmuFaultQueued
+ | Self::OsErrorLog
+ | Self::GspPostNoCat
+ | Self::GspLockdownNotice
+ | Self::UcodeLibOsPrint //
+ )
+ }
+}
+
impl fmt::Display for MsgFunction {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
@@ -816,7 +836,7 @@ fn new() -> Self {
}
impl bindings::rpc_message_header_v {
- fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
+ fn init(cmd_size: usize, function: MsgFunction, sequence: u32) -> impl Init<Self, Error> {
type RpcMessageHeader = bindings::rpc_message_header_v;
try_init!(RpcMessageHeader {
@@ -829,6 +849,7 @@ fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
.and_then(|v| v.try_into().map_err(|_| EINVAL))?,
rpc_result: 0xffffffff,
rpc_result_private: 0xffffffff,
+ sequence,
..Zeroable::init_zeroed()
})
}
@@ -847,26 +868,31 @@ impl GspMsgElement {
///
/// # Arguments
///
- /// * `sequence` - Sequence number of the message.
+ /// * `transport_seq` - Transport-level sequence number for the outer message header
+ /// (`GSP_MSG_QUEUE_ELEMENT.seqNum`). Must be unique per message.
+ /// * `rpc_seq` - RPC-level sequence number for the inner RPC header
+ /// (`rpc_message_header_v.sequence`). Set to 0 for async (fire-and-forget) commands,
+ /// or to the sync counter for command/response pairs.
/// * `cmd_size` - Size of the command (not including the message element), in bytes.
/// * `function` - Function of the message.
#[allow(non_snake_case)]
pub(crate) fn init(
- sequence: u32,
+ transport_seq: u32,
+ rpc_seq: u32,
cmd_size: usize,
function: MsgFunction,
) -> impl Init<Self, Error> {
type RpcMessageHeader = bindings::rpc_message_header_v;
type InnerGspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
let init_inner = try_init!(InnerGspMsgElement {
- seqNum: sequence,
+ seqNum: transport_seq,
elemCount: size_of::<Self>()
.checked_add(cmd_size)
.ok_or(EOVERFLOW)?
.div_ceil(GSP_PAGE_SIZE)
.try_into()
.map_err(|_| EOVERFLOW)?,
- rpc <- RpcMessageHeader::init(cmd_size, function),
+ rpc <- RpcMessageHeader::init(cmd_size, function, rpc_seq),
..Zeroable::init_zeroed()
});
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification
2026-02-11 0:04 [PATCH 0/2] gpu: nova-core: use GSP RPC sequence numbers properly John Hubbard
2026-02-11 0:04 ` [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM John Hubbard
@ 2026-02-11 0:04 ` John Hubbard
2026-02-17 11:04 ` Alexandre Courbot
1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2026-02-11 0:04 UTC (permalink / raw)
To: Danilo Krummrich, Alexandre Courbot
Cc: Joel Fernandes, Timur Tabi, Alistair Popple, Eliot Courtney,
Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, nouveau, rust-for-linux, LKML, John Hubbard,
Maneet Singh
The debug logging printed a flat "send: seq#" and "receive: seq#" for
all GSP RPC messages, with no distinction between async events from GSP
(like GspLockdownNotice or GspInitDone) and command responses (like
GetGspStaticInfo).
Add driver-side tx_async_seq and rx_event_seq counters to independently
track async sends and async events. Move the receive debug log from
wait_for_msg() into receive_msg() where the message function is known.
Label all four message directions:
GSP RPC: async send: seq# 0, function=GSP_SET_SYSTEM_INFO, length=0x3f0
GSP RPC: async send: seq# 1, function=SET_REGISTRY, length=0xc5
GSP RPC: async received: seq# 0, function=LOCKDOWN_NOTICE, length=0x51
GSP RPC: async received: seq# 17, function=INIT_DONE, length=0x50
GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
GSP RPC: response received: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
The async received seq# values are driver-counted for now. For command
responses, GSP echoes back the inner rpc.sequence that the CPU sent, so
the response seq# matches the send seq#.
Cc: Maneet Singh <mmaneetsingh@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
drivers/gpu/nova-core/gsp/cmdq.rs | 67 ++++++++++++++++++++++++-------
drivers/gpu/nova-core/gsp/fw.rs | 1 -
2 files changed, 52 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 7d6d7d81287c..295e1a80d64d 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -449,6 +449,11 @@ pub(crate) struct Cmdq {
/// Transport-level sequence number, incremented for every send. Used for the outer
/// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.
seq: u32,
+ /// Async (fire-and-forget) send sequence number, for debug logging.
+ tx_async_seq: u32,
+ /// Async event receive sequence number, for debug logging. GSP does not populate
+ /// rpc.sequence for async events today, so the driver counts them itself.
+ rx_event_seq: u32,
/// Memory area shared with the GSP for communicating commands and messages.
gsp_mem: DmaGspMem,
}
@@ -477,6 +482,8 @@ pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<Cmdq> {
Ok(Cmdq {
dev: dev.into(),
seq: 0,
+ tx_async_seq: 0,
+ rx_event_seq: 0,
gsp_mem,
})
}
@@ -555,13 +562,24 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
dst.contents.1,
])));
- dev_dbg!(
- &self.dev,
- "GSP RPC: send: seq# {}, function={}, length=0x{:x}\n",
- self.seq,
- M::FUNCTION,
- dst.header.length(),
- );
+ if M::IS_ASYNC {
+ dev_dbg!(
+ &self.dev,
+ "GSP RPC: async send: seq# {}, function={}, length=0x{:x}\n",
+ self.tx_async_seq,
+ M::FUNCTION,
+ dst.header.length(),
+ );
+ self.tx_async_seq += 1;
+ } else {
+ dev_dbg!(
+ &self.dev,
+ "GSP RPC: send: seq# {}, function={}, length=0x{:x}\n",
+ self.seq,
+ M::FUNCTION,
+ dst.header.length(),
+ );
+ }
// All set - update the write pointer and inform the GSP of the new command.
let elem_count = dst.header.element_count();
@@ -606,14 +624,6 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
// Extract the `GspMsgElement`.
let (header, slice_1) = GspMsgElement::from_bytes_prefix(slice_1).ok_or(EIO)?;
- dev_dbg!(
- self.dev,
- "GSP RPC: receive: seq# {}, function={:?}, length=0x{:x}\n",
- header.sequence(),
- header.function(),
- header.length(),
- );
-
let payload_length = header.payload_length();
// Check that the driver read area is large enough for the message.
@@ -680,6 +690,27 @@ pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Resul
{
let message = self.wait_for_msg(timeout)?;
let function = message.header.function().map_err(|_| EINVAL)?;
+ let is_event = function.is_event();
+
+ if is_event {
+ dev_dbg!(
+ &self.dev,
+ "GSP RPC: async received: seq# {}, function={}, length=0x{:x}\n",
+ self.rx_event_seq,
+ function,
+ message.header.length(),
+ );
+ } else {
+ // GSP echoes back the inner rpc.sequence that the CPU sent with the
+ // corresponding command, so this should match the send seq#.
+ dev_dbg!(
+ &self.dev,
+ "GSP RPC: response received: seq# {}, function={}, length=0x{:x}\n",
+ message.header.sequence(),
+ function,
+ message.header.length(),
+ );
+ }
// Extract the message. Store the result as we want to advance the read pointer even in
// case of failure.
@@ -697,6 +728,12 @@ pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Resul
message.header.length().div_ceil(GSP_PAGE_SIZE),
)?);
+ // Deferred past message consumption to satisfy the borrow checker: message
+ // holds a reference into self.gsp_mem, so we can't mutate self until it's dropped.
+ if is_event {
+ self.rx_event_seq += 1;
+ }
+
result
}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index e417ed58419f..1535969c3ba9 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -263,7 +263,6 @@ pub(crate) enum MsgFunction {
impl MsgFunction {
/// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
/// a command response (NV_VGPU_MSG_FUNCTION_*).
- #[expect(dead_code)]
pub(crate) fn is_event(&self) -> bool {
matches!(
self,
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM
2026-02-11 0:04 ` [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM John Hubbard
@ 2026-02-17 8:15 ` Alexandre Courbot
2026-02-17 11:52 ` Danilo Krummrich
0 siblings, 1 reply; 6+ messages in thread
From: Alexandre Courbot @ 2026-02-17 8:15 UTC (permalink / raw)
To: John Hubbard
Cc: Danilo Krummrich, Joel Fernandes, Alistair Popple, Eliot Courtney,
Zhi Wang, Simona Vetter, Bjorn Helgaas, Miguel Ojeda, Alex Gaynor,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau,
rust-for-linux, LKML, Maneet Singh
On Wed Feb 11, 2026 at 9:04 AM JST, John Hubbard wrote:
> Nova was setting the outer GSP_MSG_QUEUE_ELEMENT.seqNum to an
> incrementing counter for all sends, but leaving the inner
> rpc_message_header_v.sequence at zero for all sends. Open RM sets the
> inner sequence to the counter for sync (command/response) calls and to
> zero for async (fire-and-forget) calls.
>
> Split GspMsgElement::init() into separate transport_seq (outer) and
> rpc_seq (inner) parameters. The outer seqNum always increments as a
> unique transport-level ID for every message. The inner rpc.sequence is
> set to 0 for async commands and to the transport counter for sync
> commands, matching Open RM behavior.
>
> Add an IS_ASYNC const to the CommandToGsp trait and set it to true for
> SetSystemInfo and SetRegistry, the two fire-and-forget RPCs.
>
> Add MsgFunction::is_event() to classify received messages as
> GSP-initiated async events vs command responses. This will be used by
> the next commit to improve debug logging.
>
> GSP does not yet include SetSystemInfo and SetRegistry in its sequence
> counting, but a future GSP firmware update will fold them in. A comment
> is added to note this.
>
> Cc: Maneet Singh <mmaneetsingh@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
> drivers/gpu/nova-core/gsp/boot.rs | 6 ++++-
> drivers/gpu/nova-core/gsp/cmdq.rs | 17 +++++++++++--
> drivers/gpu/nova-core/gsp/commands.rs | 2 ++
> drivers/gpu/nova-core/gsp/fw.rs | 36 +++++++++++++++++++++++----
> 4 files changed, 53 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index 02eec2961b5f..f769e234dae6 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -403,7 +403,11 @@ pub(crate) fn boot(
>
> dev_dbg!(dev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar));
>
> - // Now that GSP is active, send system info and registry
> + // Now that GSP is active, send system info and registry.
> + //
> + // These are async (fire-and-forget) RPCs: no response comes back from GSP.
> + // GSP does not include them in its sequence number counting today, but a
> + // future GSP firmware update will fold them into the normal sequence space.
From the point of view of the caller this is a superfluous
implementation detail. This comment is actually confusing as it mentions
a sequence number that the caller does not need to provide, so I'd just
remove it altogether.
> self.cmdq
> .send_command(bar, commands::SetSystemInfo::new(pdev, chipset))?;
> self.cmdq.send_command(bar, commands::SetRegistry::new())?;
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 16895f5281b7..7d6d7d81287c 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -58,6 +58,13 @@ pub(crate) trait CommandToGsp {
> /// Function identifying this command to the GSP.
> const FUNCTION: MsgFunction;
>
> + /// Whether this command is async (fire-and-forget), meaning no response is expected from GSP.
> + ///
> + /// Async commands get inner `rpc.sequence` set to 0. Sync commands get inner `rpc.sequence`
> + /// set to the transport counter, matching Open RM. The outer `seqNum` always increments
> + /// regardless.
This here is too much detail as well IMHO, even more if you follow the
last suggestion in this review. :)
> + const IS_ASYNC: bool = false;
> +
> /// Type generated by [`CommandToGsp::init`], to be written into the command queue buffer.
> type Command: FromBytes + AsBytes;
>
> @@ -439,7 +446,8 @@ struct GspMessage<'a> {
> pub(crate) struct Cmdq {
> /// Device this command queue belongs to.
> dev: ARef<device::Device>,
> - /// Current command sequence number.
> + /// Transport-level sequence number, incremented for every send. Used for the outer
> + /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.
Note that these types (especially `GSP_MSG_QUEUE_ELEMENT`) are never
visible in this module, so mentioning them here is assuming context that
the reader can't easily get.
Instead, how about just "Used for [`GspMsgElement`]'s sequence counter"?
> seq: u32,
> /// Memory area shared with the GSP for communicating commands and messages.
> gsp_mem: DmaGspMem,
> @@ -514,8 +522,13 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
> // Extract area for the command itself.
> let (cmd, payload_1) = M::Command::from_bytes_mut_prefix(dst.contents.0).ok_or(EIO)?;
>
> + // The outer seqNum always increments (transport-level, unique per message).
> + // The inner rpc.sequence is 0 for async (fire-and-forget) commands, or the
> + // sync counter for command/response pairs, matching Open RM behavior.
> + let rpc_seq = if M::IS_ASYNC { 0 } else { self.seq };
> +
> // Fill the header and command in-place.
> - let msg_element = GspMsgElement::init(self.seq, command_size, M::FUNCTION);
> + let msg_element = GspMsgElement::init(self.seq, rpc_seq, command_size, M::FUNCTION);
> // SAFETY: `msg_header` and `cmd` are valid references, and not touched if the initializer
> // fails.
> unsafe {
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index e6a9a1fc6296..c8a73bd30051 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -50,6 +50,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, chipset: Chipset) -> Sel
>
> impl<'a> CommandToGsp for SetSystemInfo<'a> {
> const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
> + const IS_ASYNC: bool = true;
> type Command = GspSetSystemInfo;
> type InitError = Error;
>
> @@ -101,6 +102,7 @@ pub(crate) fn new() -> Self {
>
> impl CommandToGsp for SetRegistry {
> const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
> + const IS_ASYNC: bool = true;
> type Command = PackedRegistryTable;
> type InitError = Infallible;
>
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 927bcee6a5a5..e417ed58419f 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -260,6 +260,26 @@ pub(crate) enum MsgFunction {
> UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
> }
>
> +impl MsgFunction {
> + /// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
> + /// a command response (NV_VGPU_MSG_FUNCTION_*).
> + #[expect(dead_code)]
> + pub(crate) fn is_event(&self) -> bool {
> + matches!(
> + self,
> + Self::GspInitDone
> + | Self::GspRunCpuSequencer
> + | Self::PostEvent
> + | Self::RcTriggered
> + | Self::MmuFaultQueued
> + | Self::OsErrorLog
> + | Self::GspPostNoCat
> + | Self::GspLockdownNotice
> + | Self::UcodeLibOsPrint //
> + )
> + }
Mmm using a method for this is quite fragile, as we are always at risk
of forgetting to handle a newly-added function. I wanted to eventually
split `MsgFunction` between its command and events constituants, maybe
that would be a good opportunity to do so.
Also, this is marked with `dead_code`? Looks like it belongs to the next
patch in this case.
> +}
> +
> impl fmt::Display for MsgFunction {
> fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> match self {
> @@ -816,7 +836,7 @@ fn new() -> Self {
> }
>
> impl bindings::rpc_message_header_v {
> - fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
> + fn init(cmd_size: usize, function: MsgFunction, sequence: u32) -> impl Init<Self, Error> {
> type RpcMessageHeader = bindings::rpc_message_header_v;
>
> try_init!(RpcMessageHeader {
> @@ -829,6 +849,7 @@ fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
> .and_then(|v| v.try_into().map_err(|_| EINVAL))?,
> rpc_result: 0xffffffff,
> rpc_result_private: 0xffffffff,
> + sequence,
> ..Zeroable::init_zeroed()
> })
> }
> @@ -847,26 +868,31 @@ impl GspMsgElement {
> ///
> /// # Arguments
> ///
> - /// * `sequence` - Sequence number of the message.
> + /// * `transport_seq` - Transport-level sequence number for the outer message header
> + /// (`GSP_MSG_QUEUE_ELEMENT.seqNum`). Must be unique per message.
> + /// * `rpc_seq` - RPC-level sequence number for the inner RPC header
> + /// (`rpc_message_header_v.sequence`). Set to 0 for async (fire-and-forget) commands,
> + /// or to the sync counter for command/response pairs.
> /// * `cmd_size` - Size of the command (not including the message element), in bytes.
> /// * `function` - Function of the message.
> #[allow(non_snake_case)]
> pub(crate) fn init(
> - sequence: u32,
> + transport_seq: u32,
> + rpc_seq: u32,
The caller site of `init` mentions that `rpc_seq` will always either be
equal to `transport_seq`, or be 0, depending on whether this is an async
command or not, yet this constructor allows any value to be used. I'd
like to enforce this invariant by making it impossible to build invalid
combinations of sequences.
As a matter of fact we already have this information in
`CommandToGsp::IS_ASYNC`, so the simpler way would be to pass *that* as
an extra argument to `init`, and let it handle the sequencing
internally. It also has the benefit of not making this message-layer
detail leak into the command queue code, making the discussion about
this detail even more irrelevant outside of the `fw` module.
An even better solution would be to to pass the `CommandToGsp` as a
generic argument - that way `init` could extract both the `FUNCTION` and
`IS_ASYNC`. Maybe use a public generic proxy method that calls into a
non-generic private one to limit monomorphization.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification
2026-02-11 0:04 ` [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification John Hubbard
@ 2026-02-17 11:04 ` Alexandre Courbot
0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Courbot @ 2026-02-17 11:04 UTC (permalink / raw)
To: John Hubbard
Cc: Danilo Krummrich, Joel Fernandes, Alistair Popple, Eliot Courtney,
Zhi Wang, Simona Vetter, Bjorn Helgaas, Miguel Ojeda, Alex Gaynor,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau,
rust-for-linux, LKML, Maneet Singh
On Wed Feb 11, 2026 at 9:04 AM JST, John Hubbard wrote:
> The debug logging printed a flat "send: seq#" and "receive: seq#" for
> all GSP RPC messages, with no distinction between async events from GSP
> (like GspLockdownNotice or GspInitDone) and command responses (like
> GetGspStaticInfo).
>
> Add driver-side tx_async_seq and rx_event_seq counters to independently
> track async sends and async events. Move the receive debug log from
> wait_for_msg() into receive_msg() where the message function is known.
> Label all four message directions:
>
> GSP RPC: async send: seq# 0, function=GSP_SET_SYSTEM_INFO, length=0x3f0
> GSP RPC: async send: seq# 1, function=SET_REGISTRY, length=0xc5
> GSP RPC: async received: seq# 0, function=LOCKDOWN_NOTICE, length=0x51
> GSP RPC: async received: seq# 17, function=INIT_DONE, length=0x50
> GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
> GSP RPC: response received: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
>
> The async received seq# values are driver-counted for now. For command
> responses, GSP echoes back the inner rpc.sequence that the CPU sent, so
> the response seq# matches the send seq#.
>
> Cc: Maneet Singh <mmaneetsingh@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
I took a good look at it, but still don't understand the point of this
patch. On top of the two sequence numbers we already have, this adds two
more that are purely driver-internal and have no relation to the GSP.
Async messages and events have no sequence numbers and we cannot refer
the new counters against anything, so what do they help with?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM
2026-02-17 8:15 ` Alexandre Courbot
@ 2026-02-17 11:52 ` Danilo Krummrich
0 siblings, 0 replies; 6+ messages in thread
From: Danilo Krummrich @ 2026-02-17 11:52 UTC (permalink / raw)
To: Alexandre Courbot
Cc: John Hubbard, Joel Fernandes, Alistair Popple, Eliot Courtney,
Zhi Wang, Simona Vetter, Bjorn Helgaas, Miguel Ojeda, Alex Gaynor,
Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau,
rust-for-linux, LKML, Maneet Singh
On Tue Feb 17, 2026 at 9:15 AM CET, Alexandre Courbot wrote:
>> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
>> index 02eec2961b5f..f769e234dae6 100644
>> --- a/drivers/gpu/nova-core/gsp/boot.rs
>> +++ b/drivers/gpu/nova-core/gsp/boot.rs
>> @@ -403,7 +403,11 @@ pub(crate) fn boot(
>>
>> dev_dbg!(dev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar));
>>
>> - // Now that GSP is active, send system info and registry
>> + // Now that GSP is active, send system info and registry.
>> + //
>> + // These are async (fire-and-forget) RPCs: no response comes back from GSP.
>> + // GSP does not include them in its sequence number counting today, but a
>> + // future GSP firmware update will fold them into the normal sequence space.
>
> From the point of view of the caller this is a superfluous
> implementation detail. This comment is actually confusing as it mentions
> a sequence number that the caller does not need to provide, so I'd just
> remove it altogether.
The comment is not a doc-comment, hence it can more be seen as a comment for the
audience of nova-core developers in general, rather than a description of a
specific API. So, I think it is OK to talk about an implementation detail here.
To me personally it seems useful and I'd like to keep it.
>
>> self.cmdq
>> .send_command(bar, commands::SetSystemInfo::new(pdev, chipset))?;
>> self.cmdq.send_command(bar, commands::SetRegistry::new())?;
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index 16895f5281b7..7d6d7d81287c 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -58,6 +58,13 @@ pub(crate) trait CommandToGsp {
>> /// Function identifying this command to the GSP.
>> const FUNCTION: MsgFunction;
>>
>> + /// Whether this command is async (fire-and-forget), meaning no response is expected from GSP.
>> + ///
>> + /// Async commands get inner `rpc.sequence` set to 0. Sync commands get inner `rpc.sequence`
>> + /// set to the transport counter, matching Open RM. The outer `seqNum` always increments
>> + /// regardless.
>
> This here is too much detail as well IMHO, even more if you follow the
> last suggestion in this review. :)
As this one is a doc-comment, agreed.
>
>> + const IS_ASYNC: bool = false;
>> +
>> /// Type generated by [`CommandToGsp::init`], to be written into the command queue buffer.
>> type Command: FromBytes + AsBytes;
>>
>> @@ -439,7 +446,8 @@ struct GspMessage<'a> {
>> pub(crate) struct Cmdq {
>> /// Device this command queue belongs to.
>> dev: ARef<device::Device>,
>> - /// Current command sequence number.
>> + /// Transport-level sequence number, incremented for every send. Used for the outer
>> + /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.
>
> Note that these types (especially `GSP_MSG_QUEUE_ELEMENT`) are never
> visible in this module, so mentioning them here is assuming context that
> the reader can't easily get.
>
> Instead, how about just "Used for [`GspMsgElement`]'s sequence counter"?
Same here.
>> seq: u32,
>> /// Memory area shared with the GSP for communicating commands and messages.
>> gsp_mem: DmaGspMem,
>> @@ -514,8 +522,13 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>> // Extract area for the command itself.
>> let (cmd, payload_1) = M::Command::from_bytes_mut_prefix(dst.contents.0).ok_or(EIO)?;
>>
>> + // The outer seqNum always increments (transport-level, unique per message).
>> + // The inner rpc.sequence is 0 for async (fire-and-forget) commands, or the
>> + // sync counter for command/response pairs, matching Open RM behavior.
>> + let rpc_seq = if M::IS_ASYNC { 0 } else { self.seq };
>> +
>> // Fill the header and command in-place.
>> - let msg_element = GspMsgElement::init(self.seq, command_size, M::FUNCTION);
>> + let msg_element = GspMsgElement::init(self.seq, rpc_seq, command_size, M::FUNCTION);
>> // SAFETY: `msg_header` and `cmd` are valid references, and not touched if the initializer
>> // fails.
>> unsafe {
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index e6a9a1fc6296..c8a73bd30051 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -50,6 +50,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, chipset: Chipset) -> Sel
>>
>> impl<'a> CommandToGsp for SetSystemInfo<'a> {
>> const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
>> + const IS_ASYNC: bool = true;
>> type Command = GspSetSystemInfo;
>> type InitError = Error;
>>
>> @@ -101,6 +102,7 @@ pub(crate) fn new() -> Self {
>>
>> impl CommandToGsp for SetRegistry {
>> const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
>> + const IS_ASYNC: bool = true;
>> type Command = PackedRegistryTable;
>> type InitError = Infallible;
>>
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 927bcee6a5a5..e417ed58419f 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -260,6 +260,26 @@ pub(crate) enum MsgFunction {
>> UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
>> }
>>
>> +impl MsgFunction {
>> + /// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
>> + /// a command response (NV_VGPU_MSG_FUNCTION_*).
>> + #[expect(dead_code)]
>> + pub(crate) fn is_event(&self) -> bool {
>> + matches!(
>> + self,
>> + Self::GspInitDone
>> + | Self::GspRunCpuSequencer
>> + | Self::PostEvent
>> + | Self::RcTriggered
>> + | Self::MmuFaultQueued
>> + | Self::OsErrorLog
>> + | Self::GspPostNoCat
>> + | Self::GspLockdownNotice
>> + | Self::UcodeLibOsPrint //
>> + )
>> + }
>
> Mmm using a method for this is quite fragile, as we are always at risk
> of forgetting to handle a newly-added function. I wanted to eventually
> split `MsgFunction` between its command and events constituants, maybe
> that would be a good opportunity to do so.
I also think we should leverage the type system for such things.
> Also, this is marked with `dead_code`? Looks like it belongs to the next
> patch in this case.
>
>> +}
>> +
>> impl fmt::Display for MsgFunction {
>> fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>> match self {
>> @@ -816,7 +836,7 @@ fn new() -> Self {
>> }
>>
>> impl bindings::rpc_message_header_v {
>> - fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
>> + fn init(cmd_size: usize, function: MsgFunction, sequence: u32) -> impl Init<Self, Error> {
>> type RpcMessageHeader = bindings::rpc_message_header_v;
>>
>> try_init!(RpcMessageHeader {
>> @@ -829,6 +849,7 @@ fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
>> .and_then(|v| v.try_into().map_err(|_| EINVAL))?,
>> rpc_result: 0xffffffff,
>> rpc_result_private: 0xffffffff,
>> + sequence,
>> ..Zeroable::init_zeroed()
>> })
>> }
>> @@ -847,26 +868,31 @@ impl GspMsgElement {
>> ///
>> /// # Arguments
>> ///
>> - /// * `sequence` - Sequence number of the message.
>> + /// * `transport_seq` - Transport-level sequence number for the outer message header
>> + /// (`GSP_MSG_QUEUE_ELEMENT.seqNum`). Must be unique per message.
>> + /// * `rpc_seq` - RPC-level sequence number for the inner RPC header
>> + /// (`rpc_message_header_v.sequence`). Set to 0 for async (fire-and-forget) commands,
>> + /// or to the sync counter for command/response pairs.
>> /// * `cmd_size` - Size of the command (not including the message element), in bytes.
>> /// * `function` - Function of the message.
>> #[allow(non_snake_case)]
>> pub(crate) fn init(
>> - sequence: u32,
>> + transport_seq: u32,
>> + rpc_seq: u32,
>
> The caller site of `init` mentions that `rpc_seq` will always either be
> equal to `transport_seq`, or be 0, depending on whether this is an async
> command or not, yet this constructor allows any value to be used. I'd
> like to enforce this invariant by making it impossible to build invalid
> combinations of sequences.
>
> As a matter of fact we already have this information in
> `CommandToGsp::IS_ASYNC`, so the simpler way would be to pass *that* as
> an extra argument to `init`, and let it handle the sequencing
> internally. It also has the benefit of not making this message-layer
> detail leak into the command queue code, making the discussion about
> this detail even more irrelevant outside of the `fw` module.
>
> An even better solution would be to to pass the `CommandToGsp` as a
> generic argument - that way `init` could extract both the `FUNCTION` and
> `IS_ASYNC`. Maybe use a public generic proxy method that calls into a
> non-generic private one to limit monomorphization.
I think we should go with the generic argument; but I don't think we need a
proxy method to fight monomorphization, i.e. I don't think we need to do work
for the optimizer in this case.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-17 11:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 0:04 [PATCH 0/2] gpu: nova-core: use GSP RPC sequence numbers properly John Hubbard
2026-02-11 0:04 ` [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM John Hubbard
2026-02-17 8:15 ` Alexandre Courbot
2026-02-17 11:52 ` Danilo Krummrich
2026-02-11 0:04 ` [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification John Hubbard
2026-02-17 11:04 ` Alexandre Courbot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox