Re: [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM

public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed

From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "John Hubbard" <jhubbard@nvidia.com>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	"Zhi Wang" <zhiw@nvidia.com>, "Simona Vetter" <simona@ffwll.ch>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	nouveau@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	"Maneet Singh" <mmaneetsingh@nvidia.com>
Subject: Re: [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM
Date: Tue, 17 Feb 2026 17:15:31 +0900	[thread overview]
Message-ID: <DGH35X9GCFYO.EBY01G1C6XQM@nvidia.com> (raw)
In-Reply-To: <20260211000451.192109-2-jhubbard@nvidia.com>

On Wed Feb 11, 2026 at 9:04 AM JST, John Hubbard wrote:
> Nova was setting the outer GSP_MSG_QUEUE_ELEMENT.seqNum to an
> incrementing counter for all sends, but leaving the inner
> rpc_message_header_v.sequence at zero for all sends. Open RM sets the
> inner sequence to the counter for sync (command/response) calls and to
> zero for async (fire-and-forget) calls.
>
> Split GspMsgElement::init() into separate transport_seq (outer) and
> rpc_seq (inner) parameters. The outer seqNum always increments as a
> unique transport-level ID for every message. The inner rpc.sequence is
> set to 0 for async commands and to the transport counter for sync
> commands, matching Open RM behavior.
>
> Add an IS_ASYNC const to the CommandToGsp trait and set it to true for
> SetSystemInfo and SetRegistry, the two fire-and-forget RPCs.
>
> Add MsgFunction::is_event() to classify received messages as
> GSP-initiated async events vs command responses. This will be used by
> the next commit to improve debug logging.
>
> GSP does not yet include SetSystemInfo and SetRegistry in its sequence
> counting, but a future GSP firmware update will fold them in. A comment
> is added to note this.
>
> Cc: Maneet Singh <mmaneetsingh@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/boot.rs     |  6 ++++-
>  drivers/gpu/nova-core/gsp/cmdq.rs     | 17 +++++++++++--
>  drivers/gpu/nova-core/gsp/commands.rs |  2 ++
>  drivers/gpu/nova-core/gsp/fw.rs       | 36 +++++++++++++++++++++++----
>  4 files changed, 53 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index 02eec2961b5f..f769e234dae6 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -403,7 +403,11 @@ pub(crate) fn boot(
>  
>          dev_dbg!(dev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar));
>  
> -        // Now that GSP is active, send system info and registry
> +        // Now that GSP is active, send system info and registry.
> +        //
> +        // These are async (fire-and-forget) RPCs: no response comes back from GSP.
> +        // GSP does not include them in its sequence number counting today, but a
> +        // future GSP firmware update will fold them into the normal sequence space.

From the point of view of the caller this is a superfluous
implementation detail. This comment is actually confusing as it mentions
a sequence number that the caller does not need to provide, so I'd just
remove it altogether.

>          self.cmdq
>              .send_command(bar, commands::SetSystemInfo::new(pdev, chipset))?;
>          self.cmdq.send_command(bar, commands::SetRegistry::new())?;
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 16895f5281b7..7d6d7d81287c 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -58,6 +58,13 @@ pub(crate) trait CommandToGsp {
>      /// Function identifying this command to the GSP.
>      const FUNCTION: MsgFunction;
>  
> +    /// Whether this command is async (fire-and-forget), meaning no response is expected from GSP.
> +    ///
> +    /// Async commands get inner `rpc.sequence` set to 0. Sync commands get inner `rpc.sequence`
> +    /// set to the transport counter, matching Open RM. The outer `seqNum` always increments
> +    /// regardless.

This here is too much detail as well IMHO, even more if you follow the
last suggestion in this review. :)

> +    const IS_ASYNC: bool = false;
> +
>      /// Type generated by [`CommandToGsp::init`], to be written into the command queue buffer.
>      type Command: FromBytes + AsBytes;
>  
> @@ -439,7 +446,8 @@ struct GspMessage<'a> {
>  pub(crate) struct Cmdq {
>      /// Device this command queue belongs to.
>      dev: ARef<device::Device>,
> -    /// Current command sequence number.
> +    /// Transport-level sequence number, incremented for every send. Used for the outer
> +    /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.

Note that these types (especially `GSP_MSG_QUEUE_ELEMENT`) are never
visible in this module, so mentioning them here is assuming context that
the reader can't easily get.

Instead, how about just "Used for [`GspMsgElement`]'s sequence counter"?

>      seq: u32,
>      /// Memory area shared with the GSP for communicating commands and messages.
>      gsp_mem: DmaGspMem,
> @@ -514,8 +522,13 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>          // Extract area for the command itself.
>          let (cmd, payload_1) = M::Command::from_bytes_mut_prefix(dst.contents.0).ok_or(EIO)?;
>  
> +        // The outer seqNum always increments (transport-level, unique per message).
> +        // The inner rpc.sequence is 0 for async (fire-and-forget) commands, or the
> +        // sync counter for command/response pairs, matching Open RM behavior.
> +        let rpc_seq = if M::IS_ASYNC { 0 } else { self.seq };
> +
>          // Fill the header and command in-place.
> -        let msg_element = GspMsgElement::init(self.seq, command_size, M::FUNCTION);
> +        let msg_element = GspMsgElement::init(self.seq, rpc_seq, command_size, M::FUNCTION);
>          // SAFETY: `msg_header` and `cmd` are valid references, and not touched if the initializer
>          // fails.
>          unsafe {
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index e6a9a1fc6296..c8a73bd30051 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -50,6 +50,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, chipset: Chipset) -> Sel
>  
>  impl<'a> CommandToGsp for SetSystemInfo<'a> {
>      const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
> +    const IS_ASYNC: bool = true;
>      type Command = GspSetSystemInfo;
>      type InitError = Error;
>  
> @@ -101,6 +102,7 @@ pub(crate) fn new() -> Self {
>  
>  impl CommandToGsp for SetRegistry {
>      const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
> +    const IS_ASYNC: bool = true;
>      type Command = PackedRegistryTable;
>      type InitError = Infallible;
>  
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 927bcee6a5a5..e417ed58419f 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -260,6 +260,26 @@ pub(crate) enum MsgFunction {
>      UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
>  }
>  
> +impl MsgFunction {
> +    /// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
> +    /// a command response (NV_VGPU_MSG_FUNCTION_*).
> +    #[expect(dead_code)]
> +    pub(crate) fn is_event(&self) -> bool {
> +        matches!(
> +            self,
> +            Self::GspInitDone
> +                | Self::GspRunCpuSequencer
> +                | Self::PostEvent
> +                | Self::RcTriggered
> +                | Self::MmuFaultQueued
> +                | Self::OsErrorLog
> +                | Self::GspPostNoCat
> +                | Self::GspLockdownNotice
> +                | Self::UcodeLibOsPrint //
> +        )
> +    }

Mmm using a method for this is quite fragile, as we are always at risk
of forgetting to handle a newly-added function. I wanted to eventually
split `MsgFunction` between its command and events constituants, maybe
that would be a good opportunity to do so.

Also, this is marked with `dead_code`? Looks like it belongs to the next
patch in this case.

> +}
> +
>  impl fmt::Display for MsgFunction {
>      fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>          match self {
> @@ -816,7 +836,7 @@ fn new() -> Self {
>  }
>  
>  impl bindings::rpc_message_header_v {
> -    fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
> +    fn init(cmd_size: usize, function: MsgFunction, sequence: u32) -> impl Init<Self, Error> {
>          type RpcMessageHeader = bindings::rpc_message_header_v;
>  
>          try_init!(RpcMessageHeader {
> @@ -829,6 +849,7 @@ fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, Error> {
>                  .and_then(|v| v.try_into().map_err(|_| EINVAL))?,
>              rpc_result: 0xffffffff,
>              rpc_result_private: 0xffffffff,
> +            sequence,
>              ..Zeroable::init_zeroed()
>          })
>      }
> @@ -847,26 +868,31 @@ impl GspMsgElement {
>      ///
>      /// # Arguments
>      ///
> -    /// * `sequence` - Sequence number of the message.
> +    /// * `transport_seq` - Transport-level sequence number for the outer message header
> +    ///   (`GSP_MSG_QUEUE_ELEMENT.seqNum`). Must be unique per message.
> +    /// * `rpc_seq` - RPC-level sequence number for the inner RPC header
> +    ///   (`rpc_message_header_v.sequence`). Set to 0 for async (fire-and-forget) commands,
> +    ///   or to the sync counter for command/response pairs.
>      /// * `cmd_size` - Size of the command (not including the message element), in bytes.
>      /// * `function` - Function of the message.
>      #[allow(non_snake_case)]
>      pub(crate) fn init(
> -        sequence: u32,
> +        transport_seq: u32,
> +        rpc_seq: u32,

The caller site of `init` mentions that `rpc_seq` will always either be
equal to `transport_seq`, or be 0, depending on whether this is an async
command or not, yet this constructor allows any value to be used. I'd
like to enforce this invariant by making it impossible to build invalid
combinations of sequences.

As a matter of fact we already have this information in
`CommandToGsp::IS_ASYNC`, so the simpler way would be to pass *that* as
an extra argument to `init`, and let it handle the sequencing
internally. It also has the benefit of not making this message-layer
detail leak into the command queue code, making the discussion about
this detail even more irrelevant outside of the `fw` module.

An even better solution would be to to pass the `CommandToGsp` as a
generic argument - that way `init` could extract both the `FUNCTION` and
`IS_ASYNC`. Maybe use a public generic proxy method that calls into a
non-generic private one to limit monomorphization.

next prev parent reply	other threads:[~2026-02-17  8:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11  0:04 [PATCH 0/2] gpu: nova-core: use GSP RPC sequence numbers properly John Hubbard
2026-02-11  0:04 ` [PATCH 1/2] gpu: nova-core: fix GSP RPC send sequence numbers to match Open RM John Hubbard
2026-02-17  8:15   ` Alexandre Courbot [this message]
2026-02-17 11:52     ` Danilo Krummrich
2026-02-11  0:04 ` [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification John Hubbard
2026-02-17 11:04   ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DGH35X9GCFYO.EBY01G1C6XQM@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=ecourtney@nvidia.com \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=mmaneetsingh@nvidia.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tmgross@umich.edu \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox