[PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq
@ 2026-02-26 14:50 Eliot Courtney
  2026-02-26 14:50 ` [PATCH v2 1/4] gpu: nova-core: gsp: fix stale doc comments on command queue methods Eliot Courtney
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Eliot Courtney @ 2026-02-26 14:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Eliot Courtney,
	Zhi Wang

Add locking to Cmdq. This is required e.g. for unloading the driver,
which needs to send the UnloadingGuestDriver via the command queue
on unbind which may be on a different thread.

We have commands that need a reply ("sync") and commands that don't
("async"). For sync commands we want to make sure that they don't get
the reply of a different command back. The approach this patch series
takes is by making sync commands block until they get a response. For
now this should be ok, and we expect GSP to be fast anyway.

To do this, we need to know which commands expect a reply and which
don't. John's existing series[1] adds IS_ASYNC which solves part of the
problem, but we need to know a bit more. So instead, add an
associated type called Reply which tells us what the reply is.

An alternative would be to define traits inheriting CommandToGsp, e.g.
SyncCommand and AsyncCommand, instead of using the associated type. I
implemented the associated type version because it feels more
compositional rather than inherity so seemed a bit better to me. But
both of these approaches work and are fine, IMO.

In summary, this patch series has three steps:
1. Add the type infrastructure to know what replies are expected for a
   command and update each caller to explicitly send a sync or async
   command.
2. Make Cmdq pinned so we can use Mutex
3. Add a Mutex to protect Cmdq by moving the relevant state to an
   inner struct.

[1]: https://lore.kernel.org/all/20260211000451.192109-1-jhubbard@nvidia.com/

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
Changes in v2:
- Rebase on drm-rust-next
- Link to v1: https://lore.kernel.org/r/20260225-cmdq-locking-v1-0-bbf6b4156706@nvidia.com

---
Eliot Courtney (4):
      gpu: nova-core: gsp: fix stale doc comments on command queue methods
      gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
      gpu: nova-core: gsp: make `Cmdq` a pinned type
      gpu: nova-core: gsp: add mutex locking to Cmdq

 drivers/gpu/nova-core/gsp.rs           |   5 +-
 drivers/gpu/nova-core/gsp/boot.rs      |  13 +-
 drivers/gpu/nova-core/gsp/cmdq.rs      | 246 +++++++++++++++++++++------------
 drivers/gpu/nova-core/gsp/commands.rs  |  23 ++-
 drivers/gpu/nova-core/gsp/sequencer.rs |   2 +-
 5 files changed, 183 insertions(+), 106 deletions(-)
---
base-commit: 4a49fe23e357b48845e31fe9c28a802c05458198
change-id: 20260225-cmdq-locking-d32928a2c2cf
prerequisite-message-id: <20260226-cmdq-continuation-v3-0-572ab9916766@nvidia.com>
prerequisite-patch-id: fd45bc5b8eda5e2b54a052dddb1a1c363107f0a7
prerequisite-patch-id: d0f59ef489346e978a222755f9fb42dfe7af19e5
prerequisite-patch-id: 8844970d0e387488c70979a73732579ba174b46c
prerequisite-patch-id: e138a94ed48fa8d50d5ed1eb36524f98923ed478
prerequisite-patch-id: 4599a5e90d6665fa3acb7d4045de5d378ac28b4d
prerequisite-patch-id: 30ed64c398e541d6efbcb2e46ed9a9e6cf953f4f
prerequisite-patch-id: 9a965e9f29c8680c0b554e656ff8e9a1bfc67280
prerequisite-patch-id: d8cccc3dfb070f304805fc7e0f24121809b4b300
prerequisite-patch-id: c0a73dfd1fb630ab02486f0180b90f8fe850b4dc

Best regards,
-- 
Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/4] gpu: nova-core: gsp: fix stale doc comments on command queue methods
  2026-02-26 14:50 [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq Eliot Courtney
@ 2026-02-26 14:50 ` Eliot Courtney
  2026-02-26 14:50 ` [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq` Eliot Courtney
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Eliot Courtney @ 2026-02-26 14:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Eliot Courtney,
	Zhi Wang

Fix some inaccuracies / old doc comments.

Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index d68b93ddf7cc..a3e039117120 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -528,6 +528,7 @@ fn notify_gsp(bar: &Bar0) {
     ///
     /// # Errors
     ///
+    /// - `EMSGSIZE` if the command exceeds the maximum queue element size.
     /// - `ETIMEDOUT` if space does not become available within the timeout.
     /// - `EIO` if the variable payload requested by the command has not been entirely
     ///   written to by its [`CommandToGsp::init_variable_payload`] method.
@@ -710,22 +711,20 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
 
     /// Receive a message from the GSP.
     ///
-    /// `init` is a closure tasked with processing the message. It receives a reference to the
-    /// message in the message queue, and a [`SBufferIter`] pointing to its variable-length
-    /// payload, if any.
+    /// The expected message type is specified using the `M` generic parameter. If the pending
+    /// message has a different function code, `ERANGE` is returned and the message is consumed.
     ///
-    /// The expected message is specified using the `M` generic parameter. If the pending message
-    /// is different, `EAGAIN` is returned and the unexpected message is dropped.
-    ///
-    /// This design is by no means final, but it is simple and will let us go through GSP
-    /// initialization.
+    /// The read pointer is always advanced past the message, regardless of whether it matched.
     ///
     /// # Errors
     ///
     /// - `ETIMEDOUT` if `timeout` has elapsed before any message becomes available.
     /// - `EIO` if there was some inconsistency (e.g. message shorter than advertised) on the
     ///   message queue.
-    /// - `EINVAL` if the function of the message was unrecognized.
+    /// - `EINVAL` if the function code of the message was not recognized.
+    /// - `ERANGE` if the message had a recognized but non-matching function code.
+    ///
+    /// Error codes returned by [`MessageFromGsp::read`] are propagated as-is.
     pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Result<M>
     where
         // This allows all error types, including `Infallible`, to be used for `M::InitError`.

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-02-26 14:50 [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq Eliot Courtney
  2026-02-26 14:50 ` [PATCH v2 1/4] gpu: nova-core: gsp: fix stale doc comments on command queue methods Eliot Courtney
@ 2026-02-26 14:50 ` Eliot Courtney
  2026-02-28  6:11   ` John Hubbard
  2026-02-26 14:50 ` [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type Eliot Courtney
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Eliot Courtney @ 2026-02-26 14:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Eliot Courtney

Add sync and async command queue API and the type infrastructure to know
what reply is expected from each `CommandToGsp`.

Use a marker type `NoReply` which does not implement `MessageFromGsp` to
mark async commands which don't expect a response.

This prepares for adding locking to the queue.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs     |  5 ++--
 drivers/gpu/nova-core/gsp/cmdq.rs     | 54 ++++++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/gsp/commands.rs | 19 ++++++------
 3 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index c56029f444cb..55899eba75db 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -160,8 +160,9 @@ pub(crate) fn boot(
         dma_write!(wpr_meta[0] = GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         self.cmdq
-            .send_command(bar, commands::SetSystemInfo::new(pdev))?;
-        self.cmdq.send_command(bar, commands::SetRegistry::new())?;
+            .send_async_command(bar, commands::SetSystemInfo::new(pdev))?;
+        self.cmdq
+            .send_async_command(bar, commands::SetRegistry::new())?;
 
         gsp_falcon.reset(bar)?;
         let libos_handle = self.libos.dma_handle();
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index a3e039117120..daf3e1d153d4 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -48,6 +48,10 @@
     sbuffer::SBufferIter, //
 };
 
+/// Marker type representing the absence of a reply for a command. This does not implement
+/// `MessageFromGsp`.
+pub(crate) struct NoReply;
+
 /// Trait implemented by types representing a command to send to the GSP.
 ///
 /// The main purpose of this trait is to provide [`Cmdq::send_command`] with the information it
@@ -66,6 +70,9 @@ pub(crate) trait CommandToGsp {
     /// Type generated by [`CommandToGsp::init`], to be written into the command queue buffer.
     type Command: FromBytes + AsBytes;
 
+    /// Type of the reply expected from the GSP, or [`NoReply`] for async commands.
+    type Reply;
+
     /// Error type returned by [`CommandToGsp::init`].
     type InitError;
 
@@ -604,7 +611,7 @@ fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
     ///   written to by its [`CommandToGsp::init_variable_payload`] method.
     ///
     /// Error codes returned by the command initializers are propagated as-is.
-    pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
+    fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
     where
         M: CommandToGsp,
         Error: From<M::InitError>,
@@ -626,6 +633,51 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
         Ok(())
     }
 
+    /// Sends `command` to the GSP and waits for the reply.
+    ///
+    /// # Errors
+    ///
+    /// - `ETIMEDOUT` if space does not become available to send the command, or if the reply is
+    ///   not received within the timeout.
+    /// - `EIO` if the variable payload requested by the command has not been entirely
+    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
+    ///
+    /// Error codes returned by the command and reply initializers are propagated as-is.
+    pub(crate) fn send_sync_command<M>(&mut self, bar: &Bar0, command: M) -> Result<M::Reply>
+    where
+        M: CommandToGsp,
+        M::Reply: MessageFromGsp,
+        Error: From<M::InitError>,
+        Error: From<<M::Reply as MessageFromGsp>::InitError>,
+    {
+        self.send_command(bar, command)?;
+
+        loop {
+            match self.receive_msg::<M::Reply>(Delta::from_secs(10)) {
+                Ok(reply) => break Ok(reply),
+                Err(ERANGE) => continue,
+                Err(e) => break Err(e),
+            }
+        }
+    }
+
+    /// Sends `command` to the GSP without waiting for a reply.
+    ///
+    /// # Errors
+    ///
+    /// - `ETIMEDOUT` if space does not become available within the timeout.
+    /// - `EIO` if the variable payload requested by the command has not been entirely
+    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
+    ///
+    /// Error codes returned by the command initializers are propagated as-is.
+    pub(crate) fn send_async_command<M>(&mut self, bar: &Bar0, command: M) -> Result
+    where
+        M: CommandToGsp<Reply = NoReply>,
+        Error: From<M::InitError>,
+    {
+        self.send_command(bar, command)
+    }
+
     /// Wait for a message to become available on the message queue.
     ///
     /// This works purely at the transport layer and does not interpret or validate the message
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 74f875755e53..47ca31611927 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -26,7 +26,8 @@
             command_size,
             Cmdq,
             CommandToGsp,
-            MessageFromGsp, //
+            MessageFromGsp,
+            NoReply, //
         },
         fw::{
             commands::*,
@@ -53,6 +54,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>) -> Self {
 impl<'a> CommandToGsp for SetSystemInfo<'a> {
     const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
     type Command = GspSetSystemInfo;
+    type Reply = NoReply;
     type InitError = Error;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
@@ -104,6 +106,7 @@ pub(crate) fn new() -> Self {
 impl CommandToGsp for SetRegistry {
     const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
     type Command = PackedRegistryTable;
+    type Reply = NoReply;
     type InitError = Infallible;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
@@ -183,6 +186,7 @@ pub(crate) fn wait_gsp_init_done(cmdq: &mut Cmdq) -> Result {
 impl CommandToGsp for GetGspStaticInfo {
     const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
     type Command = GspStaticConfigInfo;
+    type Reply = GetGspStaticInfoReply;
     type InitError = Infallible;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
@@ -236,15 +240,7 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
 
 /// Send the [`GetGspInfo`] command and awaits for its reply.
 pub(crate) fn get_gsp_info(cmdq: &mut Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
-    cmdq.send_command(bar, GetGspStaticInfo)?;
-
-    loop {
-        match cmdq.receive_msg::<GetGspStaticInfoReply>(Delta::from_secs(5)) {
-            Ok(info) => return Ok(info),
-            Err(ERANGE) => continue,
-            Err(e) => return Err(e),
-        }
-    }
+    cmdq.send_sync_command(bar, GetGspStaticInfo)
 }
 
 /// The `ContinuationRecord` command.
@@ -262,6 +258,7 @@ pub(crate) fn new(data: &'a [u8]) -> Self {
 impl<'a> CommandToGsp for ContinuationRecord<'a> {
     const FUNCTION: MsgFunction = MsgFunction::ContinuationRecord;
     type Command = ();
+    type Reply = NoReply;
     type InitError = Infallible;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
@@ -356,6 +353,7 @@ pub(crate) enum SplitCommand<'a, C: CommandToGsp> {
 impl<'a, C: CommandToGsp> CommandToGsp for SplitCommand<'a, C> {
     const FUNCTION: MsgFunction = C::FUNCTION;
     type Command = C::Command;
+    type Reply = C::Reply;
     type InitError = C::InitError;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
@@ -412,6 +410,7 @@ fn new(len: usize) -> Result<Self> {
     impl CommandToGsp for TestPayload {
         const FUNCTION: MsgFunction = MsgFunction::Nop;
         type Command = ();
+        type Reply = NoReply;
         type InitError = Infallible;
 
         fn init(&self) -> impl Init<Self::Command, Self::InitError> {

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type
  2026-02-26 14:50 [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq Eliot Courtney
  2026-02-26 14:50 ` [PATCH v2 1/4] gpu: nova-core: gsp: fix stale doc comments on command queue methods Eliot Courtney
  2026-02-26 14:50 ` [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq` Eliot Courtney
@ 2026-02-26 14:50 ` Eliot Courtney
  2026-03-02 17:33   ` Gary Guo
  2026-02-26 14:50 ` [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq Eliot Courtney
  2026-02-26 18:48 ` [PATCH v2 0/4] gpu: nova-core: gsp: add " Zhi Wang
  4 siblings, 1 reply; 22+ messages in thread
From: Eliot Courtney @ 2026-02-26 14:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Eliot Courtney,
	Zhi Wang

Make `Cmdq` a pinned type. This is needed to use Mutex, which is needed
to add locking to `Cmdq`.

Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp.rs      | 5 +++--
 drivers/gpu/nova-core/gsp/cmdq.rs | 9 ++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b..a6f3918c20b1 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -112,6 +112,7 @@ pub(crate) struct Gsp {
     /// RM log buffer.
     logrm: LogBuffer,
     /// Command queue.
+    #[pin]
     pub(crate) cmdq: Cmdq,
     /// RM arguments.
     rmargs: CoherentAllocation<GspArgumentsPadded>,
@@ -132,7 +133,7 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
                 loginit: LogBuffer::new(dev)?,
                 logintr: LogBuffer::new(dev)?,
                 logrm: LogBuffer::new(dev)?,
-                cmdq: Cmdq::new(dev)?,
+                cmdq <- Cmdq::new(dev),
                 rmargs: CoherentAllocation::<GspArgumentsPadded>::alloc_coherent(
                     dev,
                     1,
@@ -149,7 +150,7 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
                         libos[1] = LibosMemoryRegionInitArgument::new("LOGINTR", &logintr.0)
                     )?;
                     dma_write!(libos[2] = LibosMemoryRegionInitArgument::new("LOGRM", &logrm.0))?;
-                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(cmdq))?;
+                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
                     dma_write!(libos[3] = LibosMemoryRegionInitArgument::new("RMARGS", rmargs))?;
                 },
             }))
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index daf3e1d153d4..6bb1decd2af5 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -474,6 +474,7 @@ pub(crate) fn command_size<M>(command: &M) -> usize
 ///
 /// Provides the ability to send commands and receive messages from the GSP using a shared memory
 /// area.
+#[pin_data]
 pub(crate) struct Cmdq {
     /// Device this command queue belongs to.
     dev: ARef<device::Device>,
@@ -501,13 +502,11 @@ impl Cmdq {
     pub(crate) const NUM_PTES: usize = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
 
     /// Creates a new command queue for `dev`.
-    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<Cmdq> {
-        let gsp_mem = DmaGspMem::new(dev)?;
-
-        Ok(Cmdq {
+    pub(crate) fn new(dev: &device::Device<device::Bound>) -> impl PinInit<Self, Error> + '_ {
+        try_pin_init!(Self {
+            gsp_mem: DmaGspMem::new(dev)?,
             dev: dev.into(),
             seq: 0,
-            gsp_mem,
         })
     }
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq
  2026-02-26 14:50 [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq Eliot Courtney
                   ` (2 preceding siblings ...)
  2026-02-26 14:50 ` [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type Eliot Courtney
@ 2026-02-26 14:50 ` Eliot Courtney
  2026-03-02 17:36   ` Gary Guo
  2026-02-26 18:48 ` [PATCH v2 0/4] gpu: nova-core: gsp: add " Zhi Wang
  4 siblings, 1 reply; 22+ messages in thread
From: Eliot Courtney @ 2026-02-26 14:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Eliot Courtney,
	Zhi Wang

Wrap `Cmdq`'s mutable state in a new struct `CmdqInner` and wrap that in
a Mutex. This lets `Cmdq` methods take &self instead of &mut self, which
lets required commands be sent e.g. while unloading the driver.

The mutex is held over both send and receive in `send_sync_command` to
make sure that it doesn't get the reply of some other command that could
have been sent just beforehand.

Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs      |   8 +-
 drivers/gpu/nova-core/gsp/cmdq.rs      | 266 ++++++++++++++++++---------------
 drivers/gpu/nova-core/gsp/commands.rs  |   4 +-
 drivers/gpu/nova-core/gsp/sequencer.rs |   2 +-
 4 files changed, 153 insertions(+), 127 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 55899eba75db..d12ad1bd2cd8 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -128,7 +128,7 @@ fn run_fwsec_frts(
     ///
     /// Upon return, the GSP is up and running, and its runtime object given as return value.
     pub(crate) fn boot(
-        mut self: Pin<&mut Self>,
+        self: Pin<&mut Self>,
         pdev: &pci::Device<device::Bound>,
         bar: &Bar0,
         chipset: Chipset,
@@ -214,13 +214,13 @@ pub(crate) fn boot(
             dev: pdev.as_ref().into(),
             bar,
         };
-        GspSequencer::run(&mut self.cmdq, seq_params)?;
+        GspSequencer::run(&self.cmdq, seq_params)?;
 
         // Wait until GSP is fully initialized.
-        commands::wait_gsp_init_done(&mut self.cmdq)?;
+        commands::wait_gsp_init_done(&self.cmdq)?;
 
         // Obtain and display basic GPU information.
-        let info = commands::get_gsp_info(&mut self.cmdq, bar)?;
+        let info = commands::get_gsp_info(&self.cmdq, bar)?;
         match info.gpu_name() {
             Ok(name) => dev_info!(pdev, "GPU name: {}\n", name),
             Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 6bb1decd2af5..5010587c96f9 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -16,8 +16,12 @@
     },
     dma_write,
     io::poll::read_poll_timeout,
+    new_mutex,
     prelude::*,
-    sync::aref::ARef,
+    sync::{
+        aref::ARef,
+        Mutex, //
+    },
     time::Delta,
     transmute::{
         AsBytes,
@@ -54,8 +58,8 @@
 
 /// Trait implemented by types representing a command to send to the GSP.
 ///
-/// The main purpose of this trait is to provide [`Cmdq::send_command`] with the information it
-/// needs to send a given command.
+/// The main purpose of this trait is to provide [`Cmdq`] with the information it needs to send
+/// a given command.
 ///
 /// [`CommandToGsp::init`] in particular is responsible for initializing the command directly
 /// into the space reserved for it in the command queue buffer.
@@ -470,66 +474,15 @@ pub(crate) fn command_size<M>(command: &M) -> usize
     size_of::<M::Command>() + command.variable_payload_len()
 }
 
-/// GSP command queue.
-///
-/// Provides the ability to send commands and receive messages from the GSP using a shared memory
-/// area.
-#[pin_data]
-pub(crate) struct Cmdq {
-    /// Device this command queue belongs to.
-    dev: ARef<device::Device>,
+/// Inner mutex protected state of [`Cmdq`].
+struct CmdqInner {
     /// Current command sequence number.
     seq: u32,
     /// Memory area shared with the GSP for communicating commands and messages.
     gsp_mem: DmaGspMem,
 }
 
-impl Cmdq {
-    /// Offset of the data after the PTEs.
-    const POST_PTE_OFFSET: usize = core::mem::offset_of!(GspMem, cpuq);
-
-    /// Offset of command queue ring buffer.
-    pub(crate) const CMDQ_OFFSET: usize = core::mem::offset_of!(GspMem, cpuq)
-        + core::mem::offset_of!(Msgq, msgq)
-        - Self::POST_PTE_OFFSET;
-
-    /// Offset of message queue ring buffer.
-    pub(crate) const STATQ_OFFSET: usize = core::mem::offset_of!(GspMem, gspq)
-        + core::mem::offset_of!(Msgq, msgq)
-        - Self::POST_PTE_OFFSET;
-
-    /// Number of page table entries for the GSP shared region.
-    pub(crate) const NUM_PTES: usize = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
-
-    /// Creates a new command queue for `dev`.
-    pub(crate) fn new(dev: &device::Device<device::Bound>) -> impl PinInit<Self, Error> + '_ {
-        try_pin_init!(Self {
-            gsp_mem: DmaGspMem::new(dev)?,
-            dev: dev.into(),
-            seq: 0,
-        })
-    }
-
-    /// Computes the checksum for the message pointed to by `it`.
-    ///
-    /// A message is made of several parts, so `it` is an iterator over byte slices representing
-    /// these parts.
-    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
-        let sum64 = it
-            .enumerate()
-            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
-            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
-
-        ((sum64 >> 32) as u32) ^ (sum64 as u32)
-    }
-
-    /// Notifies the GSP that we have updated the command queue pointers.
-    fn notify_gsp(bar: &Bar0) {
-        regs::NV_PGSP_QUEUE_HEAD::default()
-            .set_address(0)
-            .write(bar);
-    }
-
+impl CmdqInner {
     /// Sends `command` to the GSP, without splitting it.
     ///
     /// # Errors
@@ -540,7 +493,7 @@ fn notify_gsp(bar: &Bar0) {
     ///   written to by its [`CommandToGsp::init_variable_payload`] method.
     ///
     /// Error codes returned by the command initializers are propagated as-is.
-    fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
+    fn send_single_command<M>(&mut self, dev: &device::Device, bar: &Bar0, command: M) -> Result
     where
         M: CommandToGsp,
         // This allows all error types, including `Infallible`, to be used for `M::InitError`.
@@ -583,7 +536,7 @@ fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
             ])));
 
         dev_dbg!(
-            &self.dev,
+            dev,
             "GSP RPC: send: seq# {}, function={:?}, length=0x{:x}\n",
             self.seq,
             M::FUNCTION,
@@ -610,73 +563,27 @@ fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
     ///   written to by its [`CommandToGsp::init_variable_payload`] method.
     ///
     /// Error codes returned by the command initializers are propagated as-is.
-    fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
+    fn send_command<M>(&mut self, dev: &device::Device, bar: &Bar0, command: M) -> Result
     where
         M: CommandToGsp,
         Error: From<M::InitError>,
     {
         let mut state = SplitState::new(&command)?;
-
-        self.send_single_command(bar, state.command(command))?;
+        self.send_single_command(dev, bar, state.command(command))?;
 
         while let Some(continuation) = state.next_continuation_record() {
             dev_dbg!(
-                &self.dev,
+                dev,
                 "GSP RPC: send continuation: size=0x{:x}\n",
                 command_size(&continuation),
             );
             // Turbofish needed because the compiler cannot infer M here.
-            self.send_single_command::<ContinuationRecord<'_>>(bar, continuation)?;
+            self.send_single_command::<ContinuationRecord<'_>>(dev, bar, continuation)?;
         }
 
         Ok(())
     }
 
-    /// Sends `command` to the GSP and waits for the reply.
-    ///
-    /// # Errors
-    ///
-    /// - `ETIMEDOUT` if space does not become available to send the command, or if the reply is
-    ///   not received within the timeout.
-    /// - `EIO` if the variable payload requested by the command has not been entirely
-    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
-    ///
-    /// Error codes returned by the command and reply initializers are propagated as-is.
-    pub(crate) fn send_sync_command<M>(&mut self, bar: &Bar0, command: M) -> Result<M::Reply>
-    where
-        M: CommandToGsp,
-        M::Reply: MessageFromGsp,
-        Error: From<M::InitError>,
-        Error: From<<M::Reply as MessageFromGsp>::InitError>,
-    {
-        self.send_command(bar, command)?;
-
-        loop {
-            match self.receive_msg::<M::Reply>(Delta::from_secs(10)) {
-                Ok(reply) => break Ok(reply),
-                Err(ERANGE) => continue,
-                Err(e) => break Err(e),
-            }
-        }
-    }
-
-    /// Sends `command` to the GSP without waiting for a reply.
-    ///
-    /// # Errors
-    ///
-    /// - `ETIMEDOUT` if space does not become available within the timeout.
-    /// - `EIO` if the variable payload requested by the command has not been entirely
-    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
-    ///
-    /// Error codes returned by the command initializers are propagated as-is.
-    pub(crate) fn send_async_command<M>(&mut self, bar: &Bar0, command: M) -> Result
-    where
-        M: CommandToGsp<Reply = NoReply>,
-        Error: From<M::InitError>,
-    {
-        self.send_command(bar, command)
-    }
-
     /// Wait for a message to become available on the message queue.
     ///
     /// This works purely at the transport layer and does not interpret or validate the message
@@ -695,7 +602,7 @@ pub(crate) fn send_async_command<M>(&mut self, bar: &Bar0, command: M) -> Result
     ///   message queue.
     ///
     /// Error codes returned by the message constructor are propagated as-is.
-    fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
+    fn wait_for_msg(&self, dev: &device::Device, timeout: Delta) -> Result<GspMessage<'_>> {
         // Wait for a message to arrive from the GSP.
         let (slice_1, slice_2) = read_poll_timeout(
             || Ok(self.gsp_mem.driver_read_area()),
@@ -712,7 +619,7 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
         let (header, slice_1) = GspMsgElement::from_bytes_prefix(slice_1).ok_or(EIO)?;
 
         dev_dbg!(
-            self.dev,
+            dev,
             "GSP RPC: receive: seq# {}, function={:?}, length=0x{:x}\n",
             header.sequence(),
             header.function(),
@@ -747,7 +654,7 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
         ])) != 0
         {
             dev_err!(
-                self.dev,
+                dev,
                 "GSP RPC: receive: Call {} - bad checksum\n",
                 header.sequence()
             );
@@ -776,12 +683,12 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
     /// - `ERANGE` if the message had a recognized but non-matching function code.
     ///
     /// Error codes returned by [`MessageFromGsp::read`] are propagated as-is.
-    pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Result<M>
+    fn receive_msg<M: MessageFromGsp>(&mut self, dev: &device::Device, timeout: Delta) -> Result<M>
     where
         // This allows all error types, including `Infallible`, to be used for `M::InitError`.
         Error: From<M::InitError>,
     {
-        let message = self.wait_for_msg(timeout)?;
+        let message = self.wait_for_msg(dev, timeout)?;
         let function = message.header.function().map_err(|_| EINVAL)?;
 
         // Extract the message. Store the result as we want to advance the read pointer even in
@@ -794,11 +701,7 @@ pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Resul
                 .map_err(|e| e.into())
                 .inspect(|_| {
                     if !sbuffer.is_empty() {
-                        dev_warn!(
-                            &self.dev,
-                            "GSP message {:?} has unprocessed data\n",
-                            function
-                        );
+                        dev_warn!(dev, "GSP message {:?} has unprocessed data\n", function);
                     }
                 })
         } else {
@@ -812,9 +715,132 @@ pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Resul
 
         result
     }
+}
+
+/// GSP command queue.
+///
+/// Provides the ability to send commands and receive messages from the GSP using a shared memory
+/// area.
+#[pin_data]
+pub(crate) struct Cmdq {
+    /// Device this command queue belongs to.
+    dev: ARef<device::Device>,
+    /// Inner mutex-protected state.
+    #[pin]
+    inner: Mutex<CmdqInner>,
+}
+
+impl Cmdq {
+    /// Offset of the data after the PTEs.
+    const POST_PTE_OFFSET: usize = core::mem::offset_of!(GspMem, cpuq);
+
+    /// Offset of command queue ring buffer.
+    pub(crate) const CMDQ_OFFSET: usize = core::mem::offset_of!(GspMem, cpuq)
+        + core::mem::offset_of!(Msgq, msgq)
+        - Self::POST_PTE_OFFSET;
+
+    /// Offset of message queue ring buffer.
+    pub(crate) const STATQ_OFFSET: usize = core::mem::offset_of!(GspMem, gspq)
+        + core::mem::offset_of!(Msgq, msgq)
+        - Self::POST_PTE_OFFSET;
+
+    /// Number of page table entries for the GSP shared region.
+    pub(crate) const NUM_PTES: usize = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
+
+    /// Creates a new command queue for `dev`.
+    pub(crate) fn new(dev: &device::Device<device::Bound>) -> impl PinInit<Self, Error> + '_ {
+        try_pin_init!(Self {
+            inner <- new_mutex!(CmdqInner {
+                gsp_mem: DmaGspMem::new(dev)?,
+                seq: 0,
+                }),
+            dev: dev.into(),
+        })
+    }
+
+    /// Computes the checksum for the message pointed to by `it`.
+    ///
+    /// A message is made of several parts, so `it` is an iterator over byte slices representing
+    /// these parts.
+    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
+        let sum64 = it
+            .enumerate()
+            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
+            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
+
+        ((sum64 >> 32) as u32) ^ (sum64 as u32)
+    }
+
+    /// Notifies the GSP that we have updated the command queue pointers.
+    fn notify_gsp(bar: &Bar0) {
+        regs::NV_PGSP_QUEUE_HEAD::default()
+            .set_address(0)
+            .write(bar);
+    }
+
+    /// Sends `command` to the GSP and waits for the reply.
+    ///
+    /// The mutex is held for the entire send+receive cycle to ensure that no other command can
+    /// be interleaved. Messages with non-matching function codes are silently consumed until the
+    /// expected reply arrives.
+    ///
+    /// # Errors
+    ///
+    /// - `ETIMEDOUT` if space does not become available to send the command, or if the reply is
+    ///   not received within the timeout.
+    /// - `EIO` if the variable payload requested by the command has not been entirely
+    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
+    ///
+    /// Error codes returned by the command and reply initializers are propagated as-is.
+    pub(crate) fn send_sync_command<M>(&self, bar: &Bar0, command: M) -> Result<M::Reply>
+    where
+        M: CommandToGsp,
+        M::Reply: MessageFromGsp,
+        Error: From<M::InitError>,
+        Error: From<<M::Reply as MessageFromGsp>::InitError>,
+    {
+        let mut inner = self.inner.lock();
+        inner.send_command(&self.dev, bar, command)?;
+
+        loop {
+            match inner.receive_msg::<M::Reply>(&self.dev, Delta::from_secs(10)) {
+                Ok(reply) => break Ok(reply),
+                Err(ERANGE) => continue,
+                Err(e) => break Err(e),
+            }
+        }
+    }
+
+    /// Sends `command` to the GSP without waiting for a reply.
+    ///
+    /// # Errors
+    ///
+    /// - `ETIMEDOUT` if space does not become available within the timeout.
+    /// - `EIO` if the variable payload requested by the command has not been entirely
+    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
+    ///
+    /// Error codes returned by the command initializers are propagated as-is.
+    pub(crate) fn send_async_command<M>(&self, bar: &Bar0, command: M) -> Result
+    where
+        M: CommandToGsp<Reply = NoReply>,
+        Error: From<M::InitError>,
+    {
+        self.inner.lock().send_command(&self.dev, bar, command)
+    }
+
+    /// Receive a message from the GSP.
+    ///
+    /// See [`CmdqInner::receive_msg`] for details.
+    pub(crate) fn receive_msg<M: MessageFromGsp>(&self, timeout: Delta) -> Result<M>
+    where
+        // This allows all error types, including `Infallible`, to be used for `M::InitError`.
+        Error: From<M::InitError>,
+    {
+        self.inner.lock().receive_msg(&self.dev, timeout)
+    }
 
     /// Returns the DMA handle of the command queue's shared memory region.
     pub(crate) fn dma_handle(&self) -> DmaAddress {
-        self.gsp_mem.0.dma_handle()
+        self.inner.lock().gsp_mem.0.dma_handle()
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 47ca31611927..4740cda0b51c 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -170,7 +170,7 @@ fn read(
 }
 
 /// Waits for GSP initialization to complete.
-pub(crate) fn wait_gsp_init_done(cmdq: &mut Cmdq) -> Result {
+pub(crate) fn wait_gsp_init_done(cmdq: &Cmdq) -> Result {
     loop {
         match cmdq.receive_msg::<GspInitDone>(Delta::from_secs(10)) {
             Ok(_) => break Ok(()),
@@ -239,7 +239,7 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
 }
 
 /// Send the [`GetGspInfo`] command and awaits for its reply.
-pub(crate) fn get_gsp_info(cmdq: &mut Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
+pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
     cmdq.send_sync_command(bar, GetGspStaticInfo)
 }
 
diff --git a/drivers/gpu/nova-core/gsp/sequencer.rs b/drivers/gpu/nova-core/gsp/sequencer.rs
index 0cfbedc47fcf..f99f4fe652ba 100644
--- a/drivers/gpu/nova-core/gsp/sequencer.rs
+++ b/drivers/gpu/nova-core/gsp/sequencer.rs
@@ -356,7 +356,7 @@ pub(crate) struct GspSequencerParams<'a> {
 }
 
 impl<'a> GspSequencer<'a> {
-    pub(crate) fn run(cmdq: &mut Cmdq, params: GspSequencerParams<'a>) -> Result {
+    pub(crate) fn run(cmdq: &Cmdq, params: GspSequencerParams<'a>) -> Result {
         let seq_info = loop {
             match cmdq.receive_msg::<GspSequence>(Delta::from_secs(10)) {
                 Ok(seq_info) => break seq_info,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq
  2026-02-26 14:50 [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq Eliot Courtney
                   ` (3 preceding siblings ...)
  2026-02-26 14:50 ` [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq Eliot Courtney
@ 2026-02-26 18:48 ` Zhi Wang
  4 siblings, 0 replies; 22+ messages in thread
From: Zhi Wang @ 2026-02-26 18:48 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, nouveau, dri-devel,
	linux-kernel, rust-for-linux

On Thu, 26 Feb 2026 23:50:22 +0900
Eliot Courtney <ecourtney@nvidia.com> wrote:

Tested-by: Zhi Wang <zhiw@nvidia.com>

> Add locking to Cmdq. This is required e.g. for unloading the driver,
> which needs to send the UnloadingGuestDriver via the command queue
> on unbind which may be on a different thread.
> 
> We have commands that need a reply ("sync") and commands that don't
> ("async"). For sync commands we want to make sure that they don't get
> the reply of a different command back. The approach this patch series
> takes is by making sync commands block until they get a response. For
> now this should be ok, and we expect GSP to be fast anyway.
> 
> To do this, we need to know which commands expect a reply and which
> don't. John's existing series[1] adds IS_ASYNC which solves part of
> the problem, but we need to know a bit more. So instead, add an
> associated type called Reply which tells us what the reply is.
> 
> An alternative would be to define traits inheriting CommandToGsp, e.g.
> SyncCommand and AsyncCommand, instead of using the associated type. I
> implemented the associated type version because it feels more
> compositional rather than inherity so seemed a bit better to me. But
> both of these approaches work and are fine, IMO.
> 
> In summary, this patch series has three steps:
> 1. Add the type infrastructure to know what replies are expected for a
>    command and update each caller to explicitly send a sync or async
>    command.
> 2. Make Cmdq pinned so we can use Mutex
> 3. Add a Mutex to protect Cmdq by moving the relevant state to an
>    inner struct.
> 
> [1]:
> https://lore.kernel.org/all/20260211000451.192109-1-jhubbard@nvidia.com/
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
> Changes in v2:
> - Rebase on drm-rust-next
> - Link to v1:
> https://lore.kernel.org/r/20260225-cmdq-locking-v1-0-bbf6b4156706@nvidia.com
> 
> ---
> Eliot Courtney (4):
>       gpu: nova-core: gsp: fix stale doc comments on command queue
> methods gpu: nova-core: gsp: add sync and async command queue API to
> `Cmdq` gpu: nova-core: gsp: make `Cmdq` a pinned type
>       gpu: nova-core: gsp: add mutex locking to Cmdq
> 
>  drivers/gpu/nova-core/gsp.rs           |   5 +-
>  drivers/gpu/nova-core/gsp/boot.rs      |  13 +-
>  drivers/gpu/nova-core/gsp/cmdq.rs      | 246
> +++++++++++++++++++++------------
> drivers/gpu/nova-core/gsp/commands.rs  |  23 ++-
> drivers/gpu/nova-core/gsp/sequencer.rs |   2 +- 5 files changed, 183
> insertions(+), 106 deletions(-) ---
> base-commit: 4a49fe23e357b48845e31fe9c28a802c05458198
> change-id: 20260225-cmdq-locking-d32928a2c2cf
> prerequisite-message-id:
> <20260226-cmdq-continuation-v3-0-572ab9916766@nvidia.com>
> prerequisite-patch-id: fd45bc5b8eda5e2b54a052dddb1a1c363107f0a7
> prerequisite-patch-id: d0f59ef489346e978a222755f9fb42dfe7af19e5
> prerequisite-patch-id: 8844970d0e387488c70979a73732579ba174b46c
> prerequisite-patch-id: e138a94ed48fa8d50d5ed1eb36524f98923ed478
> prerequisite-patch-id: 4599a5e90d6665fa3acb7d4045de5d378ac28b4d
> prerequisite-patch-id: 30ed64c398e541d6efbcb2e46ed9a9e6cf953f4f
> prerequisite-patch-id: 9a965e9f29c8680c0b554e656ff8e9a1bfc67280
> prerequisite-patch-id: d8cccc3dfb070f304805fc7e0f24121809b4b300
> prerequisite-patch-id: c0a73dfd1fb630ab02486f0180b90f8fe850b4dc
> 
> Best regards,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-02-26 14:50 ` [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq` Eliot Courtney
@ 2026-02-28  6:11   ` John Hubbard
  2026-03-02  2:22     ` Eliot Courtney
  2026-03-02 12:28     ` Danilo Krummrich
  0 siblings, 2 replies; 22+ messages in thread
From: John Hubbard @ 2026-02-28  6:11 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alice Ryhl, Alexandre Courbot,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux

On 2/26/26 7:50 AM, Eliot Courtney wrote:
> Add sync and async command queue API and the type infrastructure to know
> what reply is expected from each `CommandToGsp`.
> 
> Use a marker type `NoReply` which does not implement `MessageFromGsp` to
> mark async commands which don't expect a response.
> 
...
> +    /// Type of the reply expected from the GSP, or [`NoReply`] for async commands.

Hi Eliot,

The following does not need to hold up your patchset, but I want
to bring it up somewhere just to work through it.

The sync/async naming that GSP RM uses is a little bit "off". I
spent some time discussing it with them, and the problem is that
sync/async is a concept that is somewhat independent of whether
a reply is expected. Usually, sync means a blocking wait for a
response, which is not necessarily required in all case with
GSP RM calls.

The naming would be better here if it reflected simply that
a response is expected, or not. I don't have great names for
that, but "fire and forget" works well for what we have so
far called "async". So we could do create a convention in which
no annotation means that the API has a response that will come
back, and some abbreviated for of "fire and forget" or "one way"
added to the function name would mean that no response is
expected.

Again, I don't think this has to happen here, because we can
go through and rename later, no problem there. But when I saw
the sync/asynch and remembered the very recent discussion, I
figured I'd better post something about it.

And yes, I started us off in the wrong direction with the
IS_ASYNCH thing! haha

thanks,
-- 
John Hubbard

> +    type Reply;
> +
>       /// Error type returned by [`CommandToGsp::init`].
>       type InitError;
>   
> @@ -604,7 +611,7 @@ fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>       ///   written to by its [`CommandToGsp::init_variable_payload`] method.
>       ///
>       /// Error codes returned by the command initializers are propagated as-is.
> -    pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
> +    fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>       where
>           M: CommandToGsp,
>           Error: From<M::InitError>,
> @@ -626,6 +633,51 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>           Ok(())
>       }
>   
> +    /// Sends `command` to the GSP and waits for the reply.
> +    ///
> +    /// # Errors
> +    ///
> +    /// - `ETIMEDOUT` if space does not become available to send the command, or if the reply is
> +    ///   not received within the timeout.
> +    /// - `EIO` if the variable payload requested by the command has not been entirely
> +    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
> +    ///
> +    /// Error codes returned by the command and reply initializers are propagated as-is.
> +    pub(crate) fn send_sync_command<M>(&mut self, bar: &Bar0, command: M) -> Result<M::Reply>
> +    where
> +        M: CommandToGsp,
> +        M::Reply: MessageFromGsp,
> +        Error: From<M::InitError>,
> +        Error: From<<M::Reply as MessageFromGsp>::InitError>,
> +    {
> +        self.send_command(bar, command)?;
> +
> +        loop {
> +            match self.receive_msg::<M::Reply>(Delta::from_secs(10)) {
> +                Ok(reply) => break Ok(reply),
> +                Err(ERANGE) => continue,
> +                Err(e) => break Err(e),
> +            }
> +        }
> +    }
> +
> +    /// Sends `command` to the GSP without waiting for a reply.
> +    ///
> +    /// # Errors
> +    ///
> +    /// - `ETIMEDOUT` if space does not become available within the timeout.
> +    /// - `EIO` if the variable payload requested by the command has not been entirely
> +    ///   written to by its [`CommandToGsp::init_variable_payload`] method.
> +    ///
> +    /// Error codes returned by the command initializers are propagated as-is.
> +    pub(crate) fn send_async_command<M>(&mut self, bar: &Bar0, command: M) -> Result
> +    where
> +        M: CommandToGsp<Reply = NoReply>,
> +        Error: From<M::InitError>,
> +    {
> +        self.send_command(bar, command)
> +    }
> +
>       /// Wait for a message to become available on the message queue.
>       ///
>       /// This works purely at the transport layer and does not interpret or validate the message
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index 74f875755e53..47ca31611927 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -26,7 +26,8 @@
>               command_size,
>               Cmdq,
>               CommandToGsp,
> -            MessageFromGsp, //
> +            MessageFromGsp,
> +            NoReply, //
>           },
>           fw::{
>               commands::*,
> @@ -53,6 +54,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>) -> Self {
>   impl<'a> CommandToGsp for SetSystemInfo<'a> {
>       const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
>       type Command = GspSetSystemInfo;
> +    type Reply = NoReply;
>       type InitError = Error;
>   
>       fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> @@ -104,6 +106,7 @@ pub(crate) fn new() -> Self {
>   impl CommandToGsp for SetRegistry {
>       const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
>       type Command = PackedRegistryTable;
> +    type Reply = NoReply;
>       type InitError = Infallible;
>   
>       fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> @@ -183,6 +186,7 @@ pub(crate) fn wait_gsp_init_done(cmdq: &mut Cmdq) -> Result {
>   impl CommandToGsp for GetGspStaticInfo {
>       const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
>       type Command = GspStaticConfigInfo;
> +    type Reply = GetGspStaticInfoReply;
>       type InitError = Infallible;
>   
>       fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> @@ -236,15 +240,7 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
>   
>   /// Send the [`GetGspInfo`] command and awaits for its reply.
>   pub(crate) fn get_gsp_info(cmdq: &mut Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
> -    cmdq.send_command(bar, GetGspStaticInfo)?;
> -
> -    loop {
> -        match cmdq.receive_msg::<GetGspStaticInfoReply>(Delta::from_secs(5)) {
> -            Ok(info) => return Ok(info),
> -            Err(ERANGE) => continue,
> -            Err(e) => return Err(e),
> -        }
> -    }
> +    cmdq.send_sync_command(bar, GetGspStaticInfo)
>   }
>   
>   /// The `ContinuationRecord` command.
> @@ -262,6 +258,7 @@ pub(crate) fn new(data: &'a [u8]) -> Self {
>   impl<'a> CommandToGsp for ContinuationRecord<'a> {
>       const FUNCTION: MsgFunction = MsgFunction::ContinuationRecord;
>       type Command = ();
> +    type Reply = NoReply;
>       type InitError = Infallible;
>   
>       fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> @@ -356,6 +353,7 @@ pub(crate) enum SplitCommand<'a, C: CommandToGsp> {
>   impl<'a, C: CommandToGsp> CommandToGsp for SplitCommand<'a, C> {
>       const FUNCTION: MsgFunction = C::FUNCTION;
>       type Command = C::Command;
> +    type Reply = C::Reply;
>       type InitError = C::InitError;
>   
>       fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> @@ -412,6 +410,7 @@ fn new(len: usize) -> Result<Self> {
>       impl CommandToGsp for TestPayload {
>           const FUNCTION: MsgFunction = MsgFunction::Nop;
>           type Command = ();
> +        type Reply = NoReply;
>           type InitError = Infallible;
>   
>           fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> 




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-02-28  6:11   ` John Hubbard
@ 2026-03-02  2:22     ` Eliot Courtney
  2026-03-02  2:44       ` John Hubbard
  2026-03-02  3:03       ` Alexandre Courbot
  2026-03-02 12:28     ` Danilo Krummrich
  1 sibling, 2 replies; 22+ messages in thread
From: Eliot Courtney @ 2026-03-02  2:22 UTC (permalink / raw)
  To: John Hubbard, Eliot Courtney, Danilo Krummrich, Alice Ryhl,
	Alexandre Courbot, David Airlie, Simona Vetter, Benno Lossin,
	Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>> Add sync and async command queue API and the type infrastructure to know
>> what reply is expected from each `CommandToGsp`.
>> 
>> Use a marker type `NoReply` which does not implement `MessageFromGsp` to
>> mark async commands which don't expect a response.
>> 
> ...
>> +    /// Type of the reply expected from the GSP, or [`NoReply`] for async commands.
>
> Hi Eliot,
>
> The following does not need to hold up your patchset, but I want
> to bring it up somewhere just to work through it.
>
> The sync/async naming that GSP RM uses is a little bit "off". I
> spent some time discussing it with them, and the problem is that
> sync/async is a concept that is somewhat independent of whether
> a reply is expected. Usually, sync means a blocking wait for a
> response, which is not necessarily required in all case with
> GSP RM calls.
>
> The naming would be better here if it reflected simply that
> a response is expected, or not. I don't have great names for
> that, but "fire and forget" works well for what we have so
> far called "async". So we could do create a convention in which
> no annotation means that the API has a response that will come
> back, and some abbreviated for of "fire and forget" or "one way"
> added to the function name would mean that no response is
> expected.
>
> Again, I don't think this has to happen here, because we can
> go through and rename later, no problem there. But when I saw
> the sync/asynch and remembered the very recent discussion, I
> figured I'd better post something about it.
>
> And yes, I started us off in the wrong direction with the
> IS_ASYNCH thing! haha
>
> thanks,

Hi John,

I totally agree and was hoping that someone would have a good suggestion
for this. I discussed this exact thing with Alex before posting this
too. So if you have any naming suggestions would love to hear them.

As you say, sync and async are orthogonal to reply vs no reply. I think
we have several ideas here actually:
- blocking vs non-blocking
- reply vs no-reply
- wait for reply vs don't wait for reply (practically equivalent to
  blocking vs non-blocking here, but conceptually the send could also be
  blocking vs non-blocking)

We should also be careful with conflating waiting for the reply vs not
having a reply. So `send_without_waiting_for_reply` is definitely
confusing to me, because there may be a reply that we just don't wait
for.

Some ideas:
- send_command_with_reply + send_command_without_reply
  - Maybe non-obvious that this blocks for send_command_with_reply.
- send_and_wait_for_reply + send_no_reply
  - More obvious that it blocks and gets the reply.
  - Should be obvious from context that you are sending a command
    anyway.

Personally I think it's nice to keep a convention of keeping it
mostly obvious which functions block/wait. (e.g. we already have
wait_for_msg in cmdq.rs).

For lack of a better idea  i suggest send_and_wait_for_reply +
send_no_reply for now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02  2:22     ` Eliot Courtney
@ 2026-03-02  2:44       ` John Hubbard
  2026-03-02  3:03       ` Alexandre Courbot
  1 sibling, 0 replies; 22+ messages in thread
From: John Hubbard @ 2026-03-02  2:44 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alice Ryhl, Alexandre Courbot,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, dri-devel

On 3/1/26 6:22 PM, Eliot Courtney wrote:
> On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
>> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>>> Add sync and async command queue API and the type infrastructure to know
>>> what reply is expected from each `CommandToGsp`.
>>>
>>> Use a marker type `NoReply` which does not implement `MessageFromGsp` to
>>> mark async commands which don't expect a response.
>>>
>> ...
> Hi John,
> 
> I totally agree and was hoping that someone would have a good suggestion
> for this. I discussed this exact thing with Alex before posting this
> too. So if you have any naming suggestions would love to hear them.
> 
> As you say, sync and async are orthogonal to reply vs no reply. I think
> we have several ideas here actually:
> - blocking vs non-blocking
> - reply vs no-reply
> - wait for reply vs don't wait for reply (practically equivalent to
>    blocking vs non-blocking here, but conceptually the send could also be
>    blocking vs non-blocking)
> 
> We should also be careful with conflating waiting for the reply vs not
> having a reply. So `send_without_waiting_for_reply` is definitely
> confusing to me, because there may be a reply that we just don't wait
> for.
> 
> Some ideas:
> - send_command_with_reply + send_command_without_reply
>    - Maybe non-obvious that this blocks for send_command_with_reply.
> - send_and_wait_for_reply + send_no_reply
>    - More obvious that it blocks and gets the reply.
>    - Should be obvious from context that you are sending a command
>      anyway.
> 
> Personally I think it's nice to keep a convention of keeping it
> mostly obvious which functions block/wait. (e.g. we already have
> wait_for_msg in cmdq.rs).
> 
> For lack of a better idea  i suggest send_and_wait_for_reply +
> send_no_reply for now.

That sounds great, I'd love to start with that. Simple and clear.

One thing: "send no reply" sounds a lot like "do not send a reply". :)
So maybe:

     send_and_wait_for_reply()
     send_without_waiting_for_reply()

? Or something along those lines? It's really verbose I know...

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02  2:22     ` Eliot Courtney
  2026-03-02  2:44       ` John Hubbard
@ 2026-03-02  3:03       ` Alexandre Courbot
  2026-03-02  3:08         ` John Hubbard
  2026-03-02 17:26         ` Gary Guo
  1 sibling, 2 replies; 22+ messages in thread
From: Alexandre Courbot @ 2026-03-02  3:03 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, nouveau, dri-devel,
	linux-kernel, rust-for-linux, dri-devel

On Mon Mar 2, 2026 at 11:22 AM JST, Eliot Courtney wrote:
> On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
>> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>>> Add sync and async command queue API and the type infrastructure to know
>>> what reply is expected from each `CommandToGsp`.
>>> 
>>> Use a marker type `NoReply` which does not implement `MessageFromGsp` to
>>> mark async commands which don't expect a response.
>>> 
>> ...
>>> +    /// Type of the reply expected from the GSP, or [`NoReply`] for async commands.
>>
>> Hi Eliot,
>>
>> The following does not need to hold up your patchset, but I want
>> to bring it up somewhere just to work through it.
>>
>> The sync/async naming that GSP RM uses is a little bit "off". I
>> spent some time discussing it with them, and the problem is that
>> sync/async is a concept that is somewhat independent of whether
>> a reply is expected. Usually, sync means a blocking wait for a
>> response, which is not necessarily required in all case with
>> GSP RM calls.
>>
>> The naming would be better here if it reflected simply that
>> a response is expected, or not. I don't have great names for
>> that, but "fire and forget" works well for what we have so
>> far called "async". So we could do create a convention in which
>> no annotation means that the API has a response that will come
>> back, and some abbreviated for of "fire and forget" or "one way"
>> added to the function name would mean that no response is
>> expected.
>>
>> Again, I don't think this has to happen here, because we can
>> go through and rename later, no problem there. But when I saw
>> the sync/asynch and remembered the very recent discussion, I
>> figured I'd better post something about it.
>>
>> And yes, I started us off in the wrong direction with the
>> IS_ASYNCH thing! haha
>>
>> thanks,
>
> Hi John,
>
> I totally agree and was hoping that someone would have a good suggestion
> for this. I discussed this exact thing with Alex before posting this
> too. So if you have any naming suggestions would love to hear them.
>
> As you say, sync and async are orthogonal to reply vs no reply. I think
> we have several ideas here actually:
> - blocking vs non-blocking
> - reply vs no-reply
> - wait for reply vs don't wait for reply (practically equivalent to
>   blocking vs non-blocking here, but conceptually the send could also be
>   blocking vs non-blocking)
>
> We should also be careful with conflating waiting for the reply vs not
> having a reply. So `send_without_waiting_for_reply` is definitely
> confusing to me, because there may be a reply that we just don't wait
> for.
>
> Some ideas:
> - send_command_with_reply + send_command_without_reply
>   - Maybe non-obvious that this blocks for send_command_with_reply.
> - send_and_wait_for_reply + send_no_reply
>   - More obvious that it blocks and gets the reply.
>   - Should be obvious from context that you are sending a command
>     anyway.
>
> Personally I think it's nice to keep a convention of keeping it
> mostly obvious which functions block/wait. (e.g. we already have
> wait_for_msg in cmdq.rs).
>
> For lack of a better idea  i suggest send_and_wait_for_reply +
> send_no_reply for now.

One important detail IMHO is that the API cannot be misused, i.e. you
cannot call the fire-and-forget send method on a command that expects a
reply. So the risk is mostly when adding support for a new command - but
if that step is done properly, users will be directed to the right
method by the compiler.

This, I think, allows us to tolerate more ambiguity in the method names,
as long as their documentation makes up for it. We all agree that
`async` and `sync` are not a good fit, but `send`/`send_noreply` should
be tolerable (I'd like to keep the names short if possible)

Or maybe we can use a variant of the trick mentioned by Gary in [1] and
have a single `send_command` method?

[1] https://lore.kernel.org/all/DGRJJA3068FV.3CE7J7SKLTN8O@garyguo.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02  3:03       ` Alexandre Courbot
@ 2026-03-02  3:08         ` John Hubbard
  2026-03-02  4:42           ` Eliot Courtney
  2026-03-02 17:26         ` Gary Guo
  1 sibling, 1 reply; 22+ messages in thread
From: John Hubbard @ 2026-03-02  3:08 UTC (permalink / raw)
  To: Alexandre Courbot, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Benno Lossin, Gary Guo, nouveau, dri-devel, linux-kernel,
	rust-for-linux, dri-devel

On 3/1/26 7:03 PM, Alexandre Courbot wrote:
> On Mon Mar 2, 2026 at 11:22 AM JST, Eliot Courtney wrote:
>> On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
>>> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>>>> Add sync and async command queue API and the type infrastructure to know
>>>> what reply is expected from each `CommandToGsp`.
...
>> For lack of a better idea  i suggest send_and_wait_for_reply +
>> send_no_reply for now.
> 
> One important detail IMHO is that the API cannot be misused, i.e. you
> cannot call the fire-and-forget send method on a command that expects a
> reply. So the risk is mostly when adding support for a new command - but
> if that step is done properly, users will be directed to the right
> method by the compiler.

Naming is not *just* about risk of someone misusing it. It's about
being able to read things on the screen and having a good chance of
understanding it.

> 
> This, I think, allows us to tolerate more ambiguity in the method names,
> as long as their documentation makes up for it. We all agree that

Really, no. Let's do our best on naming, *in addition* to the documentation
and having Rust help lock things down.

> `async` and `sync` are not a good fit, but `send`/`send_noreply` should
> be tolerable (I'd like to keep the names short if possible)
> 
> Or maybe we can use a variant of the trick mentioned by Gary in [1] and
> have a single `send_command` method?
> 
> [1] https://lore.kernel.org/all/DGRJJA3068FV.3CE7J7SKLTN8O@garyguo.net/

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02  3:08         ` John Hubbard
@ 2026-03-02  4:42           ` Eliot Courtney
  2026-03-02  5:31             ` Alexandre Courbot
  0 siblings, 1 reply; 22+ messages in thread
From: Eliot Courtney @ 2026-03-02  4:42 UTC (permalink / raw)
  To: John Hubbard, Alexandre Courbot, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Benno Lossin, Gary Guo, nouveau, dri-devel, linux-kernel,
	rust-for-linux, dri-devel

On Mon Mar 2, 2026 at 12:08 PM JST, John Hubbard wrote:
> On 3/1/26 7:03 PM, Alexandre Courbot wrote:
>> On Mon Mar 2, 2026 at 11:22 AM JST, Eliot Courtney wrote:
>>> On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
>>>> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>>>>> Add sync and async command queue API and the type infrastructure to know
>>>>> what reply is expected from each `CommandToGsp`.
> ...
>>> For lack of a better idea  i suggest send_and_wait_for_reply +
>>> send_no_reply for now.
>> 
>> One important detail IMHO is that the API cannot be misused, i.e. you
>> cannot call the fire-and-forget send method on a command that expects a
>> reply. So the risk is mostly when adding support for a new command - but
>> if that step is done properly, users will be directed to the right
>> method by the compiler.
>
> Naming is not *just* about risk of someone misusing it. It's about
> being able to read things on the screen and having a good chance of
> understanding it.
>
>> 
>> This, I think, allows us to tolerate more ambiguity in the method names,
>> as long as their documentation makes up for it. We all agree that
>
> Really, no. Let's do our best on naming, *in addition* to the documentation
> and having Rust help lock things down.

I personally agree with this take, and if it takes a verbose name to
make it clear then I feel it's unfortunate but there's no way around it.
Especially since we have a few different concepts to distinguish between.

Agreed that we are safe writing it since the type system will help us
though. So mainly just optimising on how easy it is to read.

At least the current usages don't seem to end up on long lines, so I
don't think it will introduce too much noise to have the function names
be a bit longer.

>
>> `async` and `sync` are not a good fit, but `send`/`send_noreply` should
>> be tolerable (I'd like to keep the names short if possible)
>> 
>> Or maybe we can use a variant of the trick mentioned by Gary in [1] and
>> have a single `send_command` method?
>> 
>> [1] https://lore.kernel.org/all/DGRJJA3068FV.3CE7J7SKLTN8O@garyguo.net/

I guess this would be the approach of maximially not implying anything
with the function name and relying on usage context to see what it's
trying to do. But IMO it might be better to give a clear name than to
introduce more type machinery here since this API will be much less
widely used.

>
> thanks,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02  4:42           ` Eliot Courtney
@ 2026-03-02  5:31             ` Alexandre Courbot
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Courbot @ 2026-03-02  5:31 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Alice Ryhl, Simona Vetter,
	Benno Lossin, Gary Guo, nouveau, dri-devel, linux-kernel,
	rust-for-linux, dri-devel

On Mon Mar 2, 2026 at 1:42 PM JST, Eliot Courtney wrote:
> On Mon Mar 2, 2026 at 12:08 PM JST, John Hubbard wrote:
>> On 3/1/26 7:03 PM, Alexandre Courbot wrote:
>>> On Mon Mar 2, 2026 at 11:22 AM JST, Eliot Courtney wrote:
>>>> On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
>>>>> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>>>>>> Add sync and async command queue API and the type infrastructure to know
>>>>>> what reply is expected from each `CommandToGsp`.
>> ...
>>>> For lack of a better idea  i suggest send_and_wait_for_reply +
>>>> send_no_reply for now.
>>> 
>>> One important detail IMHO is that the API cannot be misused, i.e. you
>>> cannot call the fire-and-forget send method on a command that expects a
>>> reply. So the risk is mostly when adding support for a new command - but
>>> if that step is done properly, users will be directed to the right
>>> method by the compiler.
>>
>> Naming is not *just* about risk of someone misusing it. It's about
>> being able to read things on the screen and having a good chance of
>> understanding it.
>>
>>> 
>>> This, I think, allows us to tolerate more ambiguity in the method names,
>>> as long as their documentation makes up for it. We all agree that
>>
>> Really, no. Let's do our best on naming, *in addition* to the documentation
>> and having Rust help lock things down.
>
> I personally agree with this take, and if it takes a verbose name to
> make it clear then I feel it's unfortunate but there's no way around it.
> Especially since we have a few different concepts to distinguish between.
>
> Agreed that we are safe writing it since the type system will help us
> though. So mainly just optimising on how easy it is to read.
>
> At least the current usages don't seem to end up on long lines, so I
> don't think it will introduce too much noise to have the function names
> be a bit longer.

Alright, we're still far from things like
rmDeviceGpuLocksReleaseAndThreadStateFreeDeferredIntHandlerOptimized()
(not making this up) so I guess we can afford some extra clarity in our
method names. :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-02-28  6:11   ` John Hubbard
  2026-03-02  2:22     ` Eliot Courtney
@ 2026-03-02 12:28     ` Danilo Krummrich
  2026-03-02 18:03       ` John Hubbard
  2026-03-03  2:46       ` Eliot Courtney
  1 sibling, 2 replies; 22+ messages in thread
From: Danilo Krummrich @ 2026-03-02 12:28 UTC (permalink / raw)
  To: John Hubbard
  Cc: Eliot Courtney, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, nouveau, dri-devel,
	linux-kernel, rust-for-linux

On Sat Feb 28, 2026 at 7:11 AM CET, John Hubbard wrote:
> The sync/async naming that GSP RM uses is a little bit "off". I
> spent some time discussing it with them, and the problem is that
> sync/async is a concept that is somewhat independent of whether
> a reply is expected. Usually, sync means a blocking wait for a
> response, which is not necessarily required in all case with
> GSP RM calls.
>
> The naming would be better here if it reflected simply that
> a response is expected, or not. I don't have great names for
> that, but "fire and forget" works well for what we have so
> far called "async". So we could do create a convention in which
> no annotation means that the API has a response that will come
> back, and some abbreviated for of "fire and forget" or "one way"
> added to the function name would mean that no response is
> expected.

I think the relevant information for the caller is whether the call is blocking
or non-blocking; i.e. do we have cases where we want to block, but discard the
reply, or expect a reply but don't want to wait for it?

So, unless there is additional complexity I'm not aware of, I feel like
send_command() and send_command_no_wait() should be sufficient.

(Maybe send_command_wait() if we want to be a bit more explicit.)

As for the specific commands, we could have traits to control whether blocking
or non-blocking submissions are allowed for them in the first place, i.e. this
gives us some control about whether a reply is allowed to be discarded through a
_no_wait() submission etc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02  3:03       ` Alexandre Courbot
  2026-03-02  3:08         ` John Hubbard
@ 2026-03-02 17:26         ` Gary Guo
  1 sibling, 0 replies; 22+ messages in thread
From: Gary Guo @ 2026-03-02 17:26 UTC (permalink / raw)
  To: Alexandre Courbot, Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, nouveau, dri-devel,
	linux-kernel, rust-for-linux, dri-devel

On Mon Mar 2, 2026 at 3:03 AM GMT, Alexandre Courbot wrote:
> On Mon Mar 2, 2026 at 11:22 AM JST, Eliot Courtney wrote:
>> On Sat Feb 28, 2026 at 3:11 PM JST, John Hubbard wrote:
>>> On 2/26/26 7:50 AM, Eliot Courtney wrote:
>>>> Add sync and async command queue API and the type infrastructure to know
>>>> what reply is expected from each `CommandToGsp`.
>>>> 
>>>> Use a marker type `NoReply` which does not implement `MessageFromGsp` to
>>>> mark async commands which don't expect a response.
>>>> 
>>> ...
>>>> +    /// Type of the reply expected from the GSP, or [`NoReply`] for async commands.
>>>
>>> Hi Eliot,
>>>
>>> The following does not need to hold up your patchset, but I want
>>> to bring it up somewhere just to work through it.
>>>
>>> The sync/async naming that GSP RM uses is a little bit "off". I
>>> spent some time discussing it with them, and the problem is that
>>> sync/async is a concept that is somewhat independent of whether
>>> a reply is expected. Usually, sync means a blocking wait for a
>>> response, which is not necessarily required in all case with
>>> GSP RM calls.
>>>
>>> The naming would be better here if it reflected simply that
>>> a response is expected, or not. I don't have great names for
>>> that, but "fire and forget" works well for what we have so
>>> far called "async". So we could do create a convention in which
>>> no annotation means that the API has a response that will come
>>> back, and some abbreviated for of "fire and forget" or "one way"
>>> added to the function name would mean that no response is
>>> expected.
>>>
>>> Again, I don't think this has to happen here, because we can
>>> go through and rename later, no problem there. But when I saw
>>> the sync/asynch and remembered the very recent discussion, I
>>> figured I'd better post something about it.
>>>
>>> And yes, I started us off in the wrong direction with the
>>> IS_ASYNCH thing! haha
>>>
>>> thanks,
>>
>> Hi John,
>>
>> I totally agree and was hoping that someone would have a good suggestion
>> for this. I discussed this exact thing with Alex before posting this
>> too. So if you have any naming suggestions would love to hear them.
>>
>> As you say, sync and async are orthogonal to reply vs no reply. I think
>> we have several ideas here actually:
>> - blocking vs non-blocking
>> - reply vs no-reply
>> - wait for reply vs don't wait for reply (practically equivalent to
>>   blocking vs non-blocking here, but conceptually the send could also be
>>   blocking vs non-blocking)
>>
>> We should also be careful with conflating waiting for the reply vs not
>> having a reply. So `send_without_waiting_for_reply` is definitely
>> confusing to me, because there may be a reply that we just don't wait
>> for.
>>
>> Some ideas:
>> - send_command_with_reply + send_command_without_reply
>>   - Maybe non-obvious that this blocks for send_command_with_reply.
>> - send_and_wait_for_reply + send_no_reply
>>   - More obvious that it blocks and gets the reply.
>>   - Should be obvious from context that you are sending a command
>>     anyway.
>>
>> Personally I think it's nice to keep a convention of keeping it
>> mostly obvious which functions block/wait. (e.g. we already have
>> wait_for_msg in cmdq.rs).
>>
>> For lack of a better idea  i suggest send_and_wait_for_reply +
>> send_no_reply for now.
>
> One important detail IMHO is that the API cannot be misused, i.e. you
> cannot call the fire-and-forget send method on a command that expects a
> reply. So the risk is mostly when adding support for a new command - but
> if that step is done properly, users will be directed to the right
> method by the compiler.
>
> This, I think, allows us to tolerate more ambiguity in the method names,
> as long as their documentation makes up for it. We all agree that
> `async` and `sync` are not a good fit, but `send`/`send_noreply` should
> be tolerable (I'd like to keep the names short if possible)
>
> Or maybe we can use a variant of the trick mentioned by Gary in [1] and
> have a single `send_command` method?

I think for this particular use case there isn't even a need to use any tricks?
You can simply just skip the receiving part when there's no reply expected.

Something like

    if TypeId::of::<M::Reply>() == TypeId::of::<NoReply>() {
        // SAFETY: `M::Reply` is `NoReply` and it's `Copy`.
        return Ok(unsafe { core::mem::transmute_copy(&NoReply) });
    }

Best,
Gary

>
> [1] https://lore.kernel.org/all/DGRJJA3068FV.3CE7J7SKLTN8O@garyguo.net/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type
  2026-02-26 14:50 ` [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type Eliot Courtney
@ 2026-03-02 17:33   ` Gary Guo
  2026-03-03  3:42     ` Eliot Courtney
  0 siblings, 1 reply; 22+ messages in thread
From: Gary Guo @ 2026-03-02 17:33 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alice Ryhl, Alexandre Courbot,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Zhi Wang

On Thu Feb 26, 2026 at 2:50 PM GMT, Eliot Courtney wrote:
> Make `Cmdq` a pinned type. This is needed to use Mutex, which is needed
> to add locking to `Cmdq`.
>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp.rs      | 5 +++--
>  drivers/gpu/nova-core/gsp/cmdq.rs | 9 ++++-----
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
> index 174feaca0a6b..a6f3918c20b1 100644
> --- a/drivers/gpu/nova-core/gsp.rs
> +++ b/drivers/gpu/nova-core/gsp.rs
> @@ -112,6 +112,7 @@ pub(crate) struct Gsp {
>      /// RM log buffer.
>      logrm: LogBuffer,
>      /// Command queue.
> +    #[pin]
>      pub(crate) cmdq: Cmdq,
>      /// RM arguments.
>      rmargs: CoherentAllocation<GspArgumentsPadded>,
> @@ -132,7 +133,7 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
>                  loginit: LogBuffer::new(dev)?,
>                  logintr: LogBuffer::new(dev)?,
>                  logrm: LogBuffer::new(dev)?,
> -                cmdq: Cmdq::new(dev)?,
> +                cmdq <- Cmdq::new(dev),
>                  rmargs: CoherentAllocation::<GspArgumentsPadded>::alloc_coherent(
>                      dev,
>                      1,
> @@ -149,7 +150,7 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
>                          libos[1] = LibosMemoryRegionInitArgument::new("LOGINTR", &logintr.0)
>                      )?;
>                      dma_write!(libos[2] = LibosMemoryRegionInitArgument::new("LOGRM", &logrm.0))?;
> -                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(cmdq))?;
> +                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;

Hmm, I don't think the `&` here is needed?

Best,
Gary

>                      dma_write!(libos[3] = LibosMemoryRegionInitArgument::new("RMARGS", rmargs))?;
>                  },
>              }))
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index daf3e1d153d4..6bb1decd2af5 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -474,6 +474,7 @@ pub(crate) fn command_size<M>(command: &M) -> usize
>  ///
>  /// Provides the ability to send commands and receive messages from the GSP using a shared memory
>  /// area.
> +#[pin_data]
>  pub(crate) struct Cmdq {
>      /// Device this command queue belongs to.
>      dev: ARef<device::Device>,
> @@ -501,13 +502,11 @@ impl Cmdq {
>      pub(crate) const NUM_PTES: usize = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
>  
>      /// Creates a new command queue for `dev`.
> -    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<Cmdq> {
> -        let gsp_mem = DmaGspMem::new(dev)?;
> -
> -        Ok(Cmdq {
> +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> impl PinInit<Self, Error> + '_ {
> +        try_pin_init!(Self {
> +            gsp_mem: DmaGspMem::new(dev)?,
>              dev: dev.into(),
>              seq: 0,
> -            gsp_mem,
>          })
>      }
>  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq
  2026-02-26 14:50 ` [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq Eliot Courtney
@ 2026-03-02 17:36   ` Gary Guo
  2026-03-03  3:47     ` Eliot Courtney
  0 siblings, 1 reply; 22+ messages in thread
From: Gary Guo @ 2026-03-02 17:36 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alice Ryhl, Alexandre Courbot,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Zhi Wang

On Thu Feb 26, 2026 at 2:50 PM GMT, Eliot Courtney wrote:
> Wrap `Cmdq`'s mutable state in a new struct `CmdqInner` and wrap that in
> a Mutex. This lets `Cmdq` methods take &self instead of &mut self, which
> lets required commands be sent e.g. while unloading the driver.
>
> The mutex is held over both send and receive in `send_sync_command` to
> make sure that it doesn't get the reply of some other command that could
> have been sent just beforehand.
>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/boot.rs      |   8 +-
>  drivers/gpu/nova-core/gsp/cmdq.rs      | 266 ++++++++++++++++++---------------
>  drivers/gpu/nova-core/gsp/commands.rs  |   4 +-
>  drivers/gpu/nova-core/gsp/sequencer.rs |   2 +-
>  4 files changed, 153 insertions(+), 127 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index 55899eba75db..d12ad1bd2cd8 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -128,7 +128,7 @@ fn run_fwsec_frts(
>      ///
>      /// Upon return, the GSP is up and running, and its runtime object given as return value.
>      pub(crate) fn boot(
> -        mut self: Pin<&mut Self>,
> +        self: Pin<&mut Self>,
>          pdev: &pci::Device<device::Bound>,
>          bar: &Bar0,
>          chipset: Chipset,
> @@ -214,13 +214,13 @@ pub(crate) fn boot(
>              dev: pdev.as_ref().into(),
>              bar,
>          };
> -        GspSequencer::run(&mut self.cmdq, seq_params)?;
> +        GspSequencer::run(&self.cmdq, seq_params)?;
>  
>          // Wait until GSP is fully initialized.
> -        commands::wait_gsp_init_done(&mut self.cmdq)?;
> +        commands::wait_gsp_init_done(&self.cmdq)?;
>  
>          // Obtain and display basic GPU information.
> -        let info = commands::get_gsp_info(&mut self.cmdq, bar)?;
> +        let info = commands::get_gsp_info(&self.cmdq, bar)?;
>          match info.gpu_name() {
>              Ok(name) => dev_info!(pdev, "GPU name: {}\n", name),
>              Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 6bb1decd2af5..5010587c96f9 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -16,8 +16,12 @@
>      },
>      dma_write,
>      io::poll::read_poll_timeout,
> +    new_mutex,
>      prelude::*,
> -    sync::aref::ARef,
> +    sync::{
> +        aref::ARef,
> +        Mutex, //
> +    },
>      time::Delta,
>      transmute::{
>          AsBytes,
> @@ -54,8 +58,8 @@
>  
>  /// Trait implemented by types representing a command to send to the GSP.
>  ///
> -/// The main purpose of this trait is to provide [`Cmdq::send_command`] with the information it
> -/// needs to send a given command.
> +/// The main purpose of this trait is to provide [`Cmdq`] with the information it needs to send
> +/// a given command.
>  ///
>  /// [`CommandToGsp::init`] in particular is responsible for initializing the command directly
>  /// into the space reserved for it in the command queue buffer.
> @@ -470,66 +474,15 @@ pub(crate) fn command_size<M>(command: &M) -> usize
>      size_of::<M::Command>() + command.variable_payload_len()
>  }
>  
> -/// GSP command queue.
> -///
> -/// Provides the ability to send commands and receive messages from the GSP using a shared memory
> -/// area.
> -#[pin_data]
> -pub(crate) struct Cmdq {
> -    /// Device this command queue belongs to.
> -    dev: ARef<device::Device>,
> +/// Inner mutex protected state of [`Cmdq`].
> +struct CmdqInner {
>      /// Current command sequence number.
>      seq: u32,
>      /// Memory area shared with the GSP for communicating commands and messages.
>      gsp_mem: DmaGspMem,
>  }
>  
> -impl Cmdq {
> -    /// Offset of the data after the PTEs.
> -    const POST_PTE_OFFSET: usize = core::mem::offset_of!(GspMem, cpuq);
> -
> -    /// Offset of command queue ring buffer.
> -    pub(crate) const CMDQ_OFFSET: usize = core::mem::offset_of!(GspMem, cpuq)
> -        + core::mem::offset_of!(Msgq, msgq)
> -        - Self::POST_PTE_OFFSET;
> -
> -    /// Offset of message queue ring buffer.
> -    pub(crate) const STATQ_OFFSET: usize = core::mem::offset_of!(GspMem, gspq)
> -        + core::mem::offset_of!(Msgq, msgq)
> -        - Self::POST_PTE_OFFSET;
> -
> -    /// Number of page table entries for the GSP shared region.
> -    pub(crate) const NUM_PTES: usize = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
> -
> -    /// Creates a new command queue for `dev`.
> -    pub(crate) fn new(dev: &device::Device<device::Bound>) -> impl PinInit<Self, Error> + '_ {
> -        try_pin_init!(Self {
> -            gsp_mem: DmaGspMem::new(dev)?,
> -            dev: dev.into(),
> -            seq: 0,
> -        })
> -    }
> -
> -    /// Computes the checksum for the message pointed to by `it`.
> -    ///
> -    /// A message is made of several parts, so `it` is an iterator over byte slices representing
> -    /// these parts.
> -    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
> -        let sum64 = it
> -            .enumerate()
> -            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> -            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
> -
> -        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> -    }
> -
> -    /// Notifies the GSP that we have updated the command queue pointers.
> -    fn notify_gsp(bar: &Bar0) {
> -        regs::NV_PGSP_QUEUE_HEAD::default()
> -            .set_address(0)
> -            .write(bar);
> -    }
> -
> +impl CmdqInner {
>      /// Sends `command` to the GSP, without splitting it.
>      ///
>      /// # Errors
> @@ -540,7 +493,7 @@ fn notify_gsp(bar: &Bar0) {
>      ///   written to by its [`CommandToGsp::init_variable_payload`] method.
>      ///
>      /// Error codes returned by the command initializers are propagated as-is.
> -    fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
> +    fn send_single_command<M>(&mut self, dev: &device::Device, bar: &Bar0, command: M) -> Result

Any reason that the `dev` is passed in everything instead of just have it be
part of `CmdqInner`?

It appears that the `Cmdq` methods don't actually use it apart from passing it
to `CmdqInner`.

Best,
Gary

>      where
>          M: CommandToGsp,
>          // This allows all error types, including `Infallible`, to be used for `M::InitError`.
> @@ -583,7 +536,7 @@ fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>              ])));
>  
>          dev_dbg!(
> -            &self.dev,
> +            dev,
>              "GSP RPC: send: seq# {}, function={:?}, length=0x{:x}\n",
>              self.seq,
>              M::FUNCTION,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02 12:28     ` Danilo Krummrich
@ 2026-03-02 18:03       ` John Hubbard
  2026-03-03  2:46       ` Eliot Courtney
  1 sibling, 0 replies; 22+ messages in thread
From: John Hubbard @ 2026-03-02 18:03 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Eliot Courtney, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, nouveau, dri-devel,
	linux-kernel, rust-for-linux

On 3/2/26 4:28 AM, Danilo Krummrich wrote:
> On Sat Feb 28, 2026 at 7:11 AM CET, John Hubbard wrote:
>> The sync/async naming that GSP RM uses is a little bit "off". I
>> spent some time discussing it with them, and the problem is that
>> sync/async is a concept that is somewhat independent of whether
>> a reply is expected. Usually, sync means a blocking wait for a
>> response, which is not necessarily required in all case with
>> GSP RM calls.
>>
>> The naming would be better here if it reflected simply that
>> a response is expected, or not. I don't have great names for
>> that, but "fire and forget" works well for what we have so
>> far called "async". So we could do create a convention in which
>> no annotation means that the API has a response that will come
>> back, and some abbreviated for of "fire and forget" or "one way"
>> added to the function name would mean that no response is
>> expected.
> 
> I think the relevant information for the caller is whether the call is blocking
> or non-blocking; i.e. do we have cases where we want to block, but discard the
> reply, or expect a reply but don't want to wait for it?
> 
> So, unless there is additional complexity I'm not aware of, I feel like
> send_command() and send_command_no_wait() should be sufficient.

That's my favorite so far. It's unencumbered by any assumptions about
behavior, and unambiguous about what it does, and shorter names too.

> 
> (Maybe send_command_wait() if we want to be a bit more explicit.)
> 
> As for the specific commands, we could have traits to control whether blocking
> or non-blocking submissions are allowed for them in the first place, i.e. this
> gives us some control about whether a reply is allowed to be discarded through a
> _no_wait() submission etc.

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq`
  2026-03-02 12:28     ` Danilo Krummrich
  2026-03-02 18:03       ` John Hubbard
@ 2026-03-03  2:46       ` Eliot Courtney
  1 sibling, 0 replies; 22+ messages in thread
From: Eliot Courtney @ 2026-03-03  2:46 UTC (permalink / raw)
  To: Danilo Krummrich, John Hubbard
  Cc: Eliot Courtney, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, nouveau, dri-devel,
	linux-kernel, rust-for-linux, dri-devel

On Mon Mar 2, 2026 at 9:28 PM JST, Danilo Krummrich wrote:
> On Sat Feb 28, 2026 at 7:11 AM CET, John Hubbard wrote:
>> The sync/async naming that GSP RM uses is a little bit "off". I
>> spent some time discussing it with them, and the problem is that
>> sync/async is a concept that is somewhat independent of whether
>> a reply is expected. Usually, sync means a blocking wait for a
>> response, which is not necessarily required in all case with
>> GSP RM calls.
>>
>> The naming would be better here if it reflected simply that
>> a response is expected, or not. I don't have great names for
>> that, but "fire and forget" works well for what we have so
>> far called "async". So we could do create a convention in which
>> no annotation means that the API has a response that will come
>> back, and some abbreviated for of "fire and forget" or "one way"
>> added to the function name would mean that no response is
>> expected.
>
> I think the relevant information for the caller is whether the call is blocking
> or non-blocking; i.e. do we have cases where we want to block, but discard the
> reply, or expect a reply but don't want to wait for it?

Such a case (block and discard) does exist in nouveau (RECV_POLL type
command queue submission) and IIRC it's used in the suspend/resume path
(code was added in 8c327a0660a4 ("drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL"))

But, we won't have this issue in nova-core because we can explicitly
specify the reply size. So I think your suggested naming is good. I will
update the series with this naming unless there are objections.

Thanks

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type
  2026-03-02 17:33   ` Gary Guo
@ 2026-03-03  3:42     ` Eliot Courtney
  0 siblings, 0 replies; 22+ messages in thread
From: Eliot Courtney @ 2026-03-03  3:42 UTC (permalink / raw)
  To: Gary Guo, Eliot Courtney, Danilo Krummrich, Alice Ryhl,
	Alexandre Courbot, David Airlie, Simona Vetter, Benno Lossin
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Zhi Wang

On Tue Mar 3, 2026 at 2:33 AM JST, Gary Guo wrote:
> On Thu Feb 26, 2026 at 2:50 PM GMT, Eliot Courtney wrote:
>> Make `Cmdq` a pinned type. This is needed to use Mutex, which is needed
>> to add locking to `Cmdq`.
>>
>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/gsp.rs      | 5 +++--
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 9 ++++-----
>>  2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
>> index 174feaca0a6b..a6f3918c20b1 100644
>> --- a/drivers/gpu/nova-core/gsp.rs
>> +++ b/drivers/gpu/nova-core/gsp.rs
>> @@ -112,6 +112,7 @@ pub(crate) struct Gsp {
>>      /// RM log buffer.
>>      logrm: LogBuffer,
>>      /// Command queue.
>> +    #[pin]
>>      pub(crate) cmdq: Cmdq,
>>      /// RM arguments.
>>      rmargs: CoherentAllocation<GspArgumentsPadded>,
>> @@ -132,7 +133,7 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
>>                  loginit: LogBuffer::new(dev)?,
>>                  logintr: LogBuffer::new(dev)?,
>>                  logrm: LogBuffer::new(dev)?,
>> -                cmdq: Cmdq::new(dev)?,
>> +                cmdq <- Cmdq::new(dev),
>>                  rmargs: CoherentAllocation::<GspArgumentsPadded>::alloc_coherent(
>>                      dev,
>>                      1,
>> @@ -149,7 +150,7 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
>>                          libos[1] = LibosMemoryRegionInitArgument::new("LOGINTR", &logintr.0)
>>                      )?;
>>                      dma_write!(libos[2] = LibosMemoryRegionInitArgument::new("LOGRM", &logrm.0))?;
>> -                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(cmdq))?;
>> +                    dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
>
> Hmm, I don't think the `&` here is needed?

It seems this is needed, and here is the compile error, if you are
interested:

```
error[E0308]: mismatched types
   --> drivers/gpu/nova-core/gsp.rs:153:78
    |
153 |                     dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(cmdq))?;
    |                                                  --------------------------- ^^^^ expected `&Cmdq`, found `Pin<&mut Cmdq>`
    |                                                  |
    |                                                  arguments to this function are incorrect
    |
    = note: expected reference `&Cmdq`
                  found struct `core::pin::Pin<&mut Cmdq>`
```

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq
  2026-03-02 17:36   ` Gary Guo
@ 2026-03-03  3:47     ` Eliot Courtney
  2026-03-03  7:58       ` Alexandre Courbot
  0 siblings, 1 reply; 22+ messages in thread
From: Eliot Courtney @ 2026-03-03  3:47 UTC (permalink / raw)
  To: Gary Guo, Eliot Courtney, Danilo Krummrich, Alice Ryhl,
	Alexandre Courbot, David Airlie, Simona Vetter, Benno Lossin
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Zhi Wang

On Tue Mar 3, 2026 at 2:36 AM JST, Gary Guo wrote:
>> +impl CmdqInner {
>>      /// Sends `command` to the GSP, without splitting it.
>>      ///
>>      /// # Errors
>> @@ -540,7 +493,7 @@ fn notify_gsp(bar: &Bar0) {
>>      ///   written to by its [`CommandToGsp::init_variable_payload`] method.
>>      ///
>>      /// Error codes returned by the command initializers are propagated as-is.
>> -    fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>> +    fn send_single_command<M>(&mut self, dev: &device::Device, bar: &Bar0, command: M) -> Result
>
> Any reason that the `dev` is passed in everything instead of just have it be
> part of `CmdqInner`?
>
> It appears that the `Cmdq` methods don't actually use it apart from passing it
> to `CmdqInner`.

Not a strong reason - I originally thought it would be a bit nicer if
CmdqInner only contains what we actually want to lock on. But I think
this way is good too, so I will update it. Thanks~

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq
  2026-03-03  3:47     ` Eliot Courtney
@ 2026-03-03  7:58       ` Alexandre Courbot
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Courbot @ 2026-03-03  7:58 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Gary Guo, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, nouveau, dri-devel, linux-kernel,
	rust-for-linux, Zhi Wang

On Tue Mar 3, 2026 at 12:47 PM JST, Eliot Courtney wrote:
> On Tue Mar 3, 2026 at 2:36 AM JST, Gary Guo wrote:
>>> +impl CmdqInner {
>>>      /// Sends `command` to the GSP, without splitting it.
>>>      ///
>>>      /// # Errors
>>> @@ -540,7 +493,7 @@ fn notify_gsp(bar: &Bar0) {
>>>      ///   written to by its [`CommandToGsp::init_variable_payload`] method.
>>>      ///
>>>      /// Error codes returned by the command initializers are propagated as-is.
>>> -    fn send_single_command<M>(&mut self, bar: &Bar0, command: M) -> Result
>>> +    fn send_single_command<M>(&mut self, dev: &device::Device, bar: &Bar0, command: M) -> Result
>>
>> Any reason that the `dev` is passed in everything instead of just have it be
>> part of `CmdqInner`?
>>
>> It appears that the `Cmdq` methods don't actually use it apart from passing it
>> to `CmdqInner`.
>
> Not a strong reason - I originally thought it would be a bit nicer if
> CmdqInner only contains what we actually want to lock on. But I think
> this way is good too, so I will update it. Thanks~

Right, when the only purpose of passing the `Device` is to print log
messages, we prefer to store a reference to it to make the API simpler -
other parts of Nova do that.

If we require a bound device, that's another story and in this case it
is (so far) passed as an argument.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-03  7:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 14:50 [PATCH v2 0/4] gpu: nova-core: gsp: add locking to Cmdq Eliot Courtney
2026-02-26 14:50 ` [PATCH v2 1/4] gpu: nova-core: gsp: fix stale doc comments on command queue methods Eliot Courtney
2026-02-26 14:50 ` [PATCH v2 2/4] gpu: nova-core: gsp: add sync and async command queue API to `Cmdq` Eliot Courtney
2026-02-28  6:11   ` John Hubbard
2026-03-02  2:22     ` Eliot Courtney
2026-03-02  2:44       ` John Hubbard
2026-03-02  3:03       ` Alexandre Courbot
2026-03-02  3:08         ` John Hubbard
2026-03-02  4:42           ` Eliot Courtney
2026-03-02  5:31             ` Alexandre Courbot
2026-03-02 17:26         ` Gary Guo
2026-03-02 12:28     ` Danilo Krummrich
2026-03-02 18:03       ` John Hubbard
2026-03-03  2:46       ` Eliot Courtney
2026-02-26 14:50 ` [PATCH v2 3/4] gpu: nova-core: gsp: make `Cmdq` a pinned type Eliot Courtney
2026-03-02 17:33   ` Gary Guo
2026-03-03  3:42     ` Eliot Courtney
2026-02-26 14:50 ` [PATCH v2 4/4] gpu: nova-core: gsp: add mutex locking to Cmdq Eliot Courtney
2026-03-02 17:36   ` Gary Guo
2026-03-03  3:47     ` Eliot Courtney
2026-03-03  7:58       ` Alexandre Courbot
2026-02-26 18:48 ` [PATCH v2 0/4] gpu: nova-core: gsp: add " Zhi Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox