From: "Danilo Krummrich" <dakr@kernel.org>
To: "Alexandre Courbot" <acourbot@nvidia.com>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Eliot Courtney" <ecourtney@nvidia.com>,
"Zhi Wang" <zhiw@nvidia.com>, "Simona Vetter" <simona@ffwll.ch>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Alex Gaynor" <alex.gaynor@gmail.com>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <lossin@kernel.org>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Trevor Gross" <tmgross@umich.edu>,
nouveau@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>,
"Maneet Singh" <mmaneetsingh@nvidia.com>
Subject: Re: [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM
Date: Tue, 17 Feb 2026 12:52:31 +0100 [thread overview]
Message-ID: <DGH7S2HJRDJJ.210D7DLTNFRNR@kernel.org> (raw)
In-Reply-To: <DGH35X9GCFYO.EBY01G1C6XQM@nvidia.com>
On Tue Feb 17, 2026 at 9:15 AM CET, Alexandre Courbot wrote:
>> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
>> index 02eec2961b5f..f769e234dae6 100644
>> --- a/drivers/gpu/nova-core/gsp/boot.rs
>> +++ b/drivers/gpu/nova-core/gsp/boot.rs
>> @@ -403,7 +403,11 @@ pub(crate) fn boot(
>>
>> dev_dbg!(dev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar));
>>
>> - // Now that GSP is active, send system info and registry
>> + // Now that GSP is active, send system info and registry.
>> + //
>> + // These are async (fire-and-forget) RPCs: no response comes back from GSP.
>> + // GSP does not include them in its sequence number counting today, but a
>> + // future GSP firmware update will fold them into the normal sequence space.
>
> From the point of view of the caller this is a superfluous
> implementation detail. This comment is actually confusing as it mentions
> a sequence number that the caller does not need to provide, so I'd just
> remove it altogether.
The comment is not a doc-comment, hence it can more be seen as a comment for the
audience of nova-core developers in general, rather than a description of a
specific API. So, I think it is OK to talk about an implementation detail here.
To me personally it seems useful and I'd like to keep it.
>
>> self.cmdq
>> .send_command(bar, commands::SetSystemInfo::new(pdev, chipset))?;
>> self.cmdq.send_command(bar, commands::SetRegistry::new())?;
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index 16895f5281b7..7d6d7d81287c 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -58,6 +58,13 @@ pub(crate) trait CommandToGsp {
>> /// Function identifying this command to the GSP.
>> const FUNCTION: MsgFunction;
>>
>> + /// Whether this command is async (fire-and-forget), meaning no response is expected from GSP.
>> + ///
>> + /// Async commands get inner `rpc.sequence` set to 0. Sync commands get inner `rpc.sequence`
>> + /// set to the transport counter, matching Open RM. The outer `seqNum` always increments
>> + /// regardless.
>
> This here is too much detail as well IMHO, even more if you follow the
> last suggestion in this review. :)
As this one is a doc-comment, agreed.
>
>> + const IS_ASYNC: bool = false;
>> +
>> /// Type generated by [`CommandToGsp::init`], to be written into the command queue buffer.
>> type Command: FromBytes + AsBytes;
>>
>> @@ -439,7 +446,8 @@ struct GspMessage<'a> {
>> pub(crate) struct Cmdq {
>> /// Device this command queue belongs to.
>> dev: ARef<device::Device>,
>> - /// Current command sequence number.
>> + /// Transport-level sequence number, incremented for every send. Used for the outer
>> + /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.
>
> Note that these types (especially `GSP_MSG_QUEUE_ELEMENT`) are never
> visible in this module, so mentioning them here is assuming context that
> the reader can't easily get.
>
> Instead, how about just "Used for [`GspMsgElement`]'s sequence counter"?
Same here.
>> seq: u32,
>> /// Memory area shared with the GSP for communicating commands and messages.
>> gsp_mem: DmaGspMem,
>> @@ -514,8 +522,13 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>> // Extract area for the command itself.
>> let (cmd, payload_1) = M::Command::from_bytes_mut_prefix(dst.contents.0).ok_or(EIO)?;
>>
>> + // The outer seqNum always increments (transport-level, unique per message).
>> + // The inner rpc.sequence is 0 for async (fire-and-forget) commands, or the
>> + // sync counter for command/response pairs, matching Open RM behavior.
>> + let rpc_seq = if M::IS_ASYNC { 0 } else { self.seq };
>> +
>> // Fill the header and command in-place.
>> - let msg_element = GspMsgElement::init(self.seq, command_size, M::FUNCTION);
>> + let msg_element = GspMsgElement::init(self.seq, rpc_seq, command_size, M::FUNCTION);
>> // SAFETY: `msg_header` and `cmd` are valid references, and not touched if the initializer
>> // fails.
>> unsafe {
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index e6a9a1fc6296..c8a73bd30051 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -50,6 +50,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, chipset: Chipset) -> Sel
>>
>> impl<'a> CommandToGsp for SetSystemInfo<'a> {
>> const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
>> + const IS_ASYNC: bool = true;
>> type Command = GspSetSystemInfo;
>> type InitError = Error;
>>
>> @@ -101,6 +102,7 @@ pub(crate) fn new() -> Self {
>>
>> impl CommandToGsp for SetRegistry {
>> const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
>> + const IS_ASYNC: bool = true;
>> type Command = PackedRegistryTable;
>> type InitError = Infallible;
>>
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 927bcee6a5a5..e417ed58419f 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -260,6 +260,26 @@ pub(crate) enum MsgFunction {
>> UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
>> }
>>
>> +impl MsgFunction {
>> + /// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
>> + /// a command response (NV_VGPU_MSG_FUNCTION_*).
>> + #[expect(dead_code)]
>> + pub(crate) fn is_event(&self) -> bool {
>> + matches!(
>> + self,
>> + Self::GspInitDone
>> + | Self::GspRunCpuSequencer
>> + | Self::PostEvent
>> + | Self::RcTriggered
>> + | Self::MmuFaultQueued
>> + | Self::OsErrorLog
>> + | Self::GspPostNoCat
>> + | Self::GspLockdownNotice
>> + | Self::UcodeLibOsPrint //
>> + )
>> + }
>
> Mmm using a method for this is quite fragile, as we are always at risk
> of forgetting to handle a newly-added function. I wanted to eventually
> split `MsgFunction` between its command and events constituants, maybe
> that would be a good opportunity to do so.
I also think we should leverage the type system for such things.
> Also, this is marked with `dead_code`? Looks like it belongs to the next
> patch in this case.
>
>> +}
>> +
>> impl fmt::Display for MsgFunction {
>> fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>> match self {
>> @@ -816,7 +836,7 @@ fn new() -> Self {
>> }
>>
>> impl bindings::rpc_message_header_v {
>> - fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
>> + fn init(cmd_size: usize, function: MsgFunction, sequence: u32) -> impl Init<Self, Error> {
>> type RpcMessageHeader = bindings::rpc_message_header_v;
>>
>> try_init!(RpcMessageHeader {
>> @@ -829,6 +849,7 @@ fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
>> .and_then(|v| v.try_into().map_err(|_| EINVAL))?,
>> rpc_result: 0xffffffff,
>> rpc_result_private: 0xffffffff,
>> + sequence,
>> ..Zeroable::init_zeroed()
>> })
>> }
>> @@ -847,26 +868,31 @@ impl GspMsgElement {
>> ///
>> /// # Arguments
>> ///
>> - /// * `sequence` - Sequence number of the message.
>> + /// * `transport_seq` - Transport-level sequence number for the outer message header
>> + /// (`GSP_MSG_QUEUE_ELEMENT.seqNum`). Must be unique per message.
>> + /// * `rpc_seq` - RPC-level sequence number for the inner RPC header
>> + /// (`rpc_message_header_v.sequence`). Set to 0 for async (fire-and-forget) commands,
>> + /// or to the sync counter for command/response pairs.
>> /// * `cmd_size` - Size of the command (not including the message element), in bytes.
>> /// * `function` - Function of the message.
>> #[allow(non_snake_case)]
>> pub(crate) fn init(
>> - sequence: u32,
>> + transport_seq: u32,
>> + rpc_seq: u32,
>
> The caller site of `init` mentions that `rpc_seq` will always either be
> equal to `transport_seq`, or be 0, depending on whether this is an async
> command or not, yet this constructor allows any value to be used. I'd
> like to enforce this invariant by making it impossible to build invalid
> combinations of sequences.
>
> As a matter of fact we already have this information in
> `CommandToGsp::IS_ASYNC`, so the simpler way would be to pass *that* as
> an extra argument to `init`, and let it handle the sequencing
> internally. It also has the benefit of not making this message-layer
> detail leak into the command queue code, making the discussion about
> this detail even more irrelevant outside of the `fw` module.
>
> An even better solution would be to to pass the `CommandToGsp` as a
> generic argument - that way `init` could extract both the `FUNCTION` and
> `IS_ASYNC`. Maybe use a public generic proxy method that calls into a
> non-generic private one to limit monomorphization.
I think we should go with the generic argument; but I don't think we need a
proxy method to fight monomorphization, i.e. I don't think we need to do work
for the optimizer in this case.
next prev parent reply other threads:[~2026-02-17 11:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-11 0:04 [PATCH 0/2] gpu: nova-core: use GSP RPC sequence numbers properly John Hubbard
2026-02-11 0:04 ` [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM John Hubbard
2026-02-17 8:15 ` Alexandre Courbot
2026-02-17 11:52 ` Danilo Krummrich [this message]
2026-02-11 0:04 ` [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification John Hubbard
2026-02-17 11:04 ` Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DGH7S2HJRDJJ.210D7DLTNFRNR@kernel.org \
--to=dakr@kernel.org \
--cc=a.hindborg@kernel.org \
--cc=acourbot@nvidia.com \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=apopple@nvidia.com \
--cc=bhelgaas@google.com \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=ecourtney@nvidia.com \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=mmaneetsingh@nvidia.com \
--cc=nouveau@lists.freedesktop.org \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=tmgross@umich.edu \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox