[PATCH 0/3] gpu: nova-core: unload extras for Hopper/Blackwell

public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] gpu: nova-core: unload extras for Hopper/Blackwell
@ 2026-04-09 14:19 Eliot Courtney
  2026-04-09 14:19 ` [PATCH 1/3] gpu: nova-core: fsp: wait to consume message before sending another Eliot Courtney
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eliot Courtney @ 2026-04-09 14:19 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, John Hubbard, Gary Guo
  Cc: Alistair Popple, Joel Fernandes, Timur Tabi, rust-for-linux,
	dri-devel, linux-kernel, Eliot Courtney

This has a few patches for what I had to do to get driver unload/reload
working for Hopper/Blackwell. This depends on John's Blackwell series
and Alex's unload series, although neither apply cleanly on latest
drm-rust-next for me so this might not apply as is, but hopefully can be
integrated into the other series or something as needed.

It's just two steps:
1. Make sure FSP didn't have any old message lying around that it hadn't
   consumed, which I observed sometimes happened to me on unload/reload.
2. Wait for GSP falcon to halt.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
Eliot Courtney (3):
      gpu: nova-core: fsp: wait to consume message before sending another
      gpu: nova-core: add Architecture::uses_sec2() helper
      gpu: nova-core: add non-sec2 unload path

 drivers/gpu/nova-core/falcon/fsp.rs | 16 ++++++++-
 drivers/gpu/nova-core/gpu.rs        |  7 ++++
 drivers/gpu/nova-core/gsp/boot.rs   | 68 +++++++++++++++++++++----------------
 3 files changed, 60 insertions(+), 31 deletions(-)
---
base-commit: a7a080bb4236ebe577b6776d940d1717912ff6dd
change-id: 20260409-b4-blackwell-unload-d6640795b906
prerequisite-message-id: <20260326013902.588242-1-jhubbard@nvidia.com>
prerequisite-patch-id: 684f360b8bcde8aa60a21ede276c66011837d1c4
prerequisite-patch-id: 4a3309ecef296df2a87da6a9bb1aada65c6275b2
prerequisite-patch-id: 4329316b36a3cdbdc69ee92d441a98369af5308e
prerequisite-patch-id: 1a89e18d676cedd0827c45c10ec651070e496bab
prerequisite-patch-id: 07a27de80185e4668de12ca27cb694b962fb1508
prerequisite-patch-id: 4f6d3535babd577c823bb31e3254e8501c08b80e
prerequisite-patch-id: 2d5ff77e0ecc9508d6cecf1f56a7ef3c84a85eac
prerequisite-patch-id: 04ddb7f204509ad595532a047ee7c3f83800c023
prerequisite-patch-id: d206d6d02ce4ecde692491ff740ed26ba0b4caa7
prerequisite-patch-id: 6e0b6030c3c3bfe9803388003264c9354127caa4
prerequisite-patch-id: 22b8afa61f8a51c0880b2f663b3acf5a1c3b4f3b
prerequisite-patch-id: 39e2ed088c2e1ab8164c9f187ba5252da7ead0ed
prerequisite-patch-id: 754624f3a1f545a3fa2696470535bc0d6a35c8db
prerequisite-patch-id: c2ae7b0baa9cdf5e2d749932e626c26aba9004f9
prerequisite-patch-id: 55219f9c856a182b6f11d746aa39f43c06d71c33
prerequisite-patch-id: 37958a6c896b87d3bfb93dc274c11226b264c09a
prerequisite-patch-id: 725adc556e52d09b9eaf5ff44893c88026dbef53
prerequisite-patch-id: 65ea7d6c1e59013b6043eac9892a02509486934d
prerequisite-patch-id: ff1c67ba8a2f68dbed96d98727852ba6b2705f43
prerequisite-patch-id: 9b519f929efef820598622ce961c60b08023bd77
prerequisite-patch-id: 3e4bafe4d8a3ce714cbbdaed63d507a433723318
prerequisite-patch-id: c8dc88e3ef4927033e28ed2cb8745f1fdf6cf222
prerequisite-patch-id: bf7cfd08604f8cad8d7d113d2eabe97333706988
prerequisite-patch-id: 3d69d62e324ed0a78633dd06d140c6a31fa5aac0
prerequisite-patch-id: d7db554d4123665256e7b3a1be2a770b80c07862
prerequisite-patch-id: 5e967ef35e39ef0e9f2679dea87ec70d654e5978
prerequisite-patch-id: defe154fb89b1df18d52ee8c53926974e2721e1e
prerequisite-patch-id: cdac691000a6ee280a97ee34506937665252de0e
prerequisite-patch-id: 7ae9020265203ea16122a70d1292561d577cd5c5
prerequisite-patch-id: 897bf39bb8efb97875a7d62096571bf08ad35868
prerequisite-patch-id: 07396406a0e1d94d074bc62f0a6985446f4b804f
prerequisite-message-id: <20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com>
prerequisite-patch-id: bf2577ff1e6a1151ffde8aad0bfde3e79486c8f1
prerequisite-patch-id: 48161c25415721654d27bffe124f923e1e3e2f76
prerequisite-patch-id: d50980a0003da08299b74ffd8474e46dcab3e1b6
prerequisite-patch-id: 34e78e25f97a31a14407e3c2d503bcb7b3cd2f02
prerequisite-patch-id: d9519c8417025a7045f0bfc34a6ff8db038b2c52
prerequisite-patch-id: b6f20c9337a41f9b9aac640849ee802ae748e441
prerequisite-patch-id: bdbbe4e0c2bdc5ad4600e42cc6380e163cd9eec4

Best regards,
--  
Eliot Courtney <ecourtney@nvidia.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] gpu: nova-core: fsp: wait to consume message before sending another
  2026-04-09 14:19 [PATCH 0/3] gpu: nova-core: unload extras for Hopper/Blackwell Eliot Courtney
@ 2026-04-09 14:19 ` Eliot Courtney
  2026-04-09 14:19 ` [PATCH 2/3] gpu: nova-core: add Architecture::uses_sec2() helper Eliot Courtney
  2026-04-09 14:19 ` [PATCH 3/3] gpu: nova-core: add non-sec2 unload path Eliot Courtney
  2 siblings, 0 replies; 5+ messages in thread
From: Eliot Courtney @ 2026-04-09 14:19 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, John Hubbard, Gary Guo
  Cc: Alistair Popple, Joel Fernandes, Timur Tabi, rust-for-linux,
	dri-devel, linux-kernel, Eliot Courtney

FSP can only get one message at a time, so make sure it has consumed the
previous one before trying to give it another.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index f618a681ff28..1497f163806e 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -7,12 +7,14 @@
 
 use kernel::{
     io::{
+        poll::read_poll_timeout,
         register::WithBase,
         Io,
         IoCapable, //
     },
     num::Bounded,
-    prelude::*, //
+    prelude::*,
+    time::Delta, //
 };
 
 use kernel::io::register::RegisterBase;
@@ -186,6 +188,18 @@ pub(crate) fn send_msg(&self, bar: &Bar0, packet: &[u8]) -> Result {
             return Err(EINVAL);
         }
 
+        // Wait for FSP to consume any previous message before sending.
+        read_poll_timeout(
+            || {
+                let head = bar.read(regs::NV_PFSP_QUEUE_HEAD).address().get();
+                let tail = bar.read(regs::NV_PFSP_QUEUE_TAIL).address().get();
+                Ok(head == tail)
+            },
+            |&ready| ready,
+            Delta::from_millis(1),
+            Delta::from_secs(2),
+        )?;
+
         // Write message to EMEM at offset 0 (validates 4-byte alignment)
         self.write_emem(bar, 0, packet)?;
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] gpu: nova-core: add Architecture::uses_sec2() helper
  2026-04-09 14:19 [PATCH 0/3] gpu: nova-core: unload extras for Hopper/Blackwell Eliot Courtney
  2026-04-09 14:19 ` [PATCH 1/3] gpu: nova-core: fsp: wait to consume message before sending another Eliot Courtney
@ 2026-04-09 14:19 ` Eliot Courtney
  2026-04-09 14:19 ` [PATCH 3/3] gpu: nova-core: add non-sec2 unload path Eliot Courtney
  2 siblings, 0 replies; 5+ messages in thread
From: Eliot Courtney @ 2026-04-09 14:19 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, John Hubbard, Gary Guo
  Cc: Alistair Popple, Joel Fernandes, Timur Tabi, rust-for-linux,
	dri-devel, linux-kernel, Eliot Courtney

This will be used in the following patch as common logic.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs      | 7 +++++++
 drivers/gpu/nova-core/gsp/boot.rs | 5 +----
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 2bcaa7bc5125..674dc286162a 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -180,6 +180,13 @@ pub(crate) enum Architecture with TryFrom<Bounded<u32, 6>> {
 }
 
 impl Architecture {
+    /// Returns `true` if this architecture uses SEC2 to boot GSP.
+    ///
+    /// Turing/Ampere/Ada use FWSEC + SEC2 booter firmware. Hopper and later use FSP instead.
+    pub(crate) const fn uses_sec2(&self) -> bool {
+        matches!(self, Self::Turing | Self::Ampere | Self::Ada)
+    }
+
     /// Returns the DMA mask supported by this architecture.
     pub(crate) const fn dma_mask(&self) -> DmaMask {
         match self {
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index e6d8b848ec46..1aac634c3b67 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -332,10 +332,7 @@ pub(crate) fn boot(
         sec2_falcon: &Falcon<Sec2>,
     ) -> Result {
         let dev = pdev.as_ref();
-        let uses_sec2 = matches!(
-            chipset.arch(),
-            Architecture::Turing | Architecture::Ampere | Architecture::Ada
-        );
+        let uses_sec2 = chipset.arch().uses_sec2();
 
         let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] gpu: nova-core: add non-sec2 unload path
  2026-04-09 14:19 [PATCH 0/3] gpu: nova-core: unload extras for Hopper/Blackwell Eliot Courtney
  2026-04-09 14:19 ` [PATCH 1/3] gpu: nova-core: fsp: wait to consume message before sending another Eliot Courtney
  2026-04-09 14:19 ` [PATCH 2/3] gpu: nova-core: add Architecture::uses_sec2() helper Eliot Courtney
@ 2026-04-09 14:19 ` Eliot Courtney
  2026-04-20  8:07   ` Alexandre Courbot
  2 siblings, 1 reply; 5+ messages in thread
From: Eliot Courtney @ 2026-04-09 14:19 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, John Hubbard, Gary Guo
  Cc: Alistair Popple, Joel Fernandes, Timur Tabi, rust-for-linux,
	dri-devel, linux-kernel, Eliot Courtney

For non-sec2 it is only required to wait for GSP falcon to halt. This is
because GSP does the main work of unloading on GPUs not using sec2.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs | 63 +++++++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 1aac634c3b67..e536ad7222b2 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -453,37 +453,48 @@ pub(crate) fn unload(
             .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
         dev_dbg!(dev, "GSP shut down\n");
 
-        /* Run FWSEC-SB to reset the GSP falcon to its pre-libos state. */
+        if chipset.arch().uses_sec2() {
+            /* Run FWSEC-SB to reset the GSP falcon to its pre-libos state. */
 
-        let bios = Vbios::new(dev, bar)?;
-        let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, &bios, FwsecCommand::Sb)?;
-        fwsec_sb.run(dev, gsp_falcon, bar)?;
-        dev_dbg!(dev, "FWSEC SB completed\n");
+            let bios = Vbios::new(dev, bar)?;
+            let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, &bios, FwsecCommand::Sb)?;
+            fwsec_sb.run(dev, gsp_falcon, bar)?;
+            dev_dbg!(dev, "FWSEC SB completed\n");
 
-        /* Remove WPR2 region if set. */
+            /* Remove WPR2 region if set. */
 
-        let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
-        dev_dbg!(dev, "WPR2 HI: {:?}\n", wpr2_hi);
-        if wpr2_hi.is_wpr2_set() {
-            let booter_unloader = BooterFirmware::new(
-                dev,
-                BooterKind::Unloader,
-                chipset,
-                FIRMWARE_VERSION,
-                sec2_falcon,
-                bar,
-            )?;
+            let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+            dev_dbg!(dev, "WPR2 HI: {:?}\n", wpr2_hi);
+            if wpr2_hi.is_wpr2_set() {
+                let booter_unloader = BooterFirmware::new(
+                    dev,
+                    BooterKind::Unloader,
+                    chipset,
+                    FIRMWARE_VERSION,
+                    sec2_falcon,
+                    bar,
+                )?;
 
-            dev_dbg!(dev, "Booter unloader created\n");
+                dev_dbg!(dev, "Booter unloader created\n");
 
-            sec2_falcon.reset(bar)?;
-            sec2_falcon.load(dev, bar, &booter_unloader)?;
-            let _ = sec2_falcon.boot(bar, Some(0xff), Some(0xff))?;
-            dev_dbg!(
-                dev,
-                "WPR2 HI: {:?}\n",
-                bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI)
-            );
+                sec2_falcon.reset(bar)?;
+                sec2_falcon.load(dev, bar, &booter_unloader)?;
+                let _ = sec2_falcon.boot(bar, Some(0xff), Some(0xff))?;
+                dev_dbg!(
+                    dev,
+                    "WPR2 HI: {:?}\n",
+                    bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI)
+                );
+            }
+        } else {
+            // GSP falcon does most of the work of resetting, so just wait for it to finish.
+            read_poll_timeout(
+                || Ok(gsp_falcon.is_riscv_active(bar)),
+                |&active| !active,
+                Delta::from_millis(10),
+                Delta::from_secs(5),
+            )
+            .inspect_err(|_| dev_err!(dev, "GSP falcon failed to halt\n"))?;
         }
 
         Ok(())

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/3] gpu: nova-core: add non-sec2 unload path
  2026-04-09 14:19 ` [PATCH 3/3] gpu: nova-core: add non-sec2 unload path Eliot Courtney
@ 2026-04-20  8:07   ` Alexandre Courbot
  0 siblings, 0 replies; 5+ messages in thread
From: Alexandre Courbot @ 2026-04-20  8:07 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	John Hubbard, Gary Guo, Alistair Popple, Joel Fernandes,
	Timur Tabi, rust-for-linux, dri-devel, linux-kernel

On Thu Apr 9, 2026 at 11:19 PM JST, Eliot Courtney wrote:
> For non-sec2 it is only required to wait for GSP falcon to halt. This is
> because GSP does the main work of unloading on GPUs not using sec2.
>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Thanks a lot for this, I am hitting the FLR reset bug and this has made
testing the Blackwell series so much easier! :)

I think we should take this into John's series after rebasing it on my
unload code, as without it running Blackwell through VFIO is
impractical.

I think patches 2 and 3 can be picked up as-is. For patch 1, I want to
give feedback to make the FSP command queue sounder from the get-go, so
I'm not sure whether it will still apply after that, but let me come
back to it a bit later.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-20  8:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 14:19 [PATCH 0/3] gpu: nova-core: unload extras for Hopper/Blackwell Eliot Courtney
2026-04-09 14:19 ` [PATCH 1/3] gpu: nova-core: fsp: wait to consume message before sending another Eliot Courtney
2026-04-09 14:19 ` [PATCH 2/3] gpu: nova-core: add Architecture::uses_sec2() helper Eliot Courtney
2026-04-09 14:19 ` [PATCH 3/3] gpu: nova-core: add non-sec2 unload path Eliot Courtney
2026-04-20  8:07   ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox