[PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process

NVIDIA GPU driver infrastructure
 help / color / mirror / Atom feed

* [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process
@ 2026-06-19 13:42 Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

The recently introduced `GspBootContext` is useful in at least two
other places:

- As a replacement for the sequencer's `GspSequencerParams`,
- As the main parameter of `Gsp::unload`, for symmetry with `Gsp::boot`.

This symmetry can be further exploited to make `Gsp::boot` and
`Gsp::unload` generic over the exact boot context required for the
current chip. Doing so requires some more preliminary work that goes
beyond this series, but this groundwork is needed regardless.

Another cleanup that can be performed as a consequence is the removal of
the `BootUnloadGuard`. This type is cumbersome as it holds extra
references to the boot context, making it difficult to make some of
these references mutable (as we want to do with the `Falcon`s that
should require exclusive access to load and run firmware). Furthermore,
it is only needed a single time in `Gsp::boot`.

By running the boot process in a local closure, we can catch errors and
call `Gsp::unload` on the unload bundle just as efficiently, and with
less code.

Although the final goal is to enable mutable methods on `Falcon`s and
make the GSP boot context generic, these cleanups stand on their own and
remove quite a bit of code, so I am sending them separately.

This series applies on `drm-rust-next`, and requires [1].

[1] https://lore.kernel.org/all/20260617-boot-vram-v3-0-20b9ec5fe9f2@nvidia.com/

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Alexandre Courbot (6):
      gpu: nova-core: gsp: sequencer: use GspBootContext
      gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer
      gpu: nova-core: gsp: move boot code into local closure
      gpu: nova-core: gsp: replace BootUnloadGuard with local handler
      gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot
      gpu: nova-core: gsp: make unload take GspBootContext

 drivers/gpu/nova-core/gpu.rs           |  19 ++++-
 drivers/gpu/nova-core/gsp/boot.rs      | 140 ++++++++++++---------------------
 drivers/gpu/nova-core/gsp/hal.rs       |  35 ++++-----
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  59 ++++++--------
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 133 ++++++++++++++-----------------
 drivers/gpu/nova-core/gsp/sequencer.rs |  71 +++++++----------
 6 files changed, 191 insertions(+), 266 deletions(-)
---
base-commit: e655873885063245fd7f49f81cebfdfdef66a59d
change-id: 20260619-nova-bootcontext-401d6107e8fb
prerequisite-message-id: 20260617-boot-vram-v3-0-20b9ec5fe9f2@nvidia.com
prerequisite-patch-id: 3b9948cc78c017fd9edcd51e1a65d0aa741a1b8a
prerequisite-patch-id: e5c8ef9f9fde8bf165f8339da1cc55c7f25ee845
prerequisite-patch-id: e6657ea9ed37ee9a195aae6d87d5f28d4ef2378e

Best regards,
--  
Alexandre Courbot <acourbot@nvidia.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext
  2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
@ 2026-06-19 13:42 ` Alexandre Courbot
  2026-06-22  7:00   ` Eliot Courtney
  2026-06-19 13:42 ` [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer Alexandre Courbot
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

`GspBootContext` contains all the resources currently carried by
`GspSequencerParams`, so replace the latter with the former for better
integration with the boot process and less code.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 21 +++++++-------------
 drivers/gpu/nova-core/gsp/sequencer.rs | 36 ++++++++++++----------------------
 2 files changed, 20 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index f8a8541704ee..6ed4ee268086 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -37,10 +37,7 @@
             GspHal,
             UnloadBundle, //
         },
-        sequencer::{
-            GspSequencer,
-            GspSequencerParams, //
-        },
+        sequencer::GspSequencer,
         Gsp,
         GspBootContext,
         GspFwWprMeta, //
@@ -336,16 +333,12 @@ fn boot<'a>(
     }
 
     fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {
-        // Create and run the GSP sequencer.
-        let seq_params = GspSequencerParams {
-            bootloader_app_version: gsp_fw.bootloader.app_version,
-            libos_dma_handle: gsp.libos.dma_handle(),
-            gsp_falcon: ctx.gsp_falcon,
-            sec2_falcon: ctx.sec2_falcon,
-            dev: ctx.dev(),
-            bar: ctx.bar,
-        };
-        GspSequencer::run(&gsp.cmdq, seq_params)?;
+        GspSequencer::run(
+            &gsp.cmdq,
+            ctx,
+            gsp.libos.dma_handle(),
+            gsp_fw.bootloader.app_version,
+        )?;
 
         Ok(())
     }
diff --git a/drivers/gpu/nova-core/gsp/sequencer.rs b/drivers/gpu/nova-core/gsp/sequencer.rs
index e0850d21adca..0da3c3531886 100644
--- a/drivers/gpu/nova-core/gsp/sequencer.rs
+++ b/drivers/gpu/nova-core/gsp/sequencer.rs
@@ -31,6 +31,7 @@
             MessageFromGsp, //
         },
         fw,
+        GspBootContext, //
     },
     num::FromSafeCast,
     sbuffer::SBufferIter,
@@ -338,24 +339,13 @@ fn iter(&self) -> GspSeqIter<'_> {
     }
 }
 
-/// Parameters for running the GSP sequencer.
-pub(crate) struct GspSequencerParams<'a> {
-    /// Bootloader application version.
-    pub(crate) bootloader_app_version: u32,
-    /// LibOS DMA handle address.
-    pub(crate) libos_dma_handle: u64,
-    /// GSP falcon for core operations.
-    pub(crate) gsp_falcon: &'a Falcon<Gsp>,
-    /// SEC2 falcon for core operations.
-    pub(crate) sec2_falcon: &'a Falcon<Sec2>,
-    /// Device for logging.
-    pub(crate) dev: &'a device::Device,
-    /// BAR0 for register access.
-    pub(crate) bar: Bar0<'a>,
-}
-
 impl<'a> GspSequencer<'a> {
-    pub(crate) fn run(cmdq: &Cmdq, params: GspSequencerParams<'a>) -> Result {
+    pub(crate) fn run(
+        cmdq: &Cmdq,
+        ctx: &'a GspBootContext<'_>,
+        libos_dma_handle: u64,
+        bootloader_app_version: u32,
+    ) -> Result {
         let seq_info = loop {
             match cmdq.receive_msg::<GspSequence>(Cmdq::RECEIVE_TIMEOUT) {
                 Ok(seq_info) => break seq_info,
@@ -366,12 +356,12 @@ pub(crate) fn run(cmdq: &Cmdq, params: GspSequencerParams<'a>) -> Result {
 
         let sequencer = GspSequencer {
             seq_info,
-            bar: params.bar,
-            sec2_falcon: params.sec2_falcon,
-            gsp_falcon: params.gsp_falcon,
-            libos_dma_handle: params.libos_dma_handle,
-            bootloader_app_version: params.bootloader_app_version,
-            dev: params.dev,
+            bar: ctx.bar,
+            sec2_falcon: ctx.sec2_falcon,
+            gsp_falcon: ctx.gsp_falcon,
+            libos_dma_handle,
+            bootloader_app_version,
+            dev: ctx.dev(),
         };
 
         dev_dbg!(sequencer.dev, "Running CPU Sequencer commands\n");

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext
  2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
@ 2026-06-22  7:00   ` Eliot Courtney
  0 siblings, 0 replies; 14+ messages in thread
From: Eliot Courtney @ 2026-06-22  7:00 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	dri-devel

On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
> `GspBootContext` contains all the resources currently carried by
> `GspSequencerParams`, so replace the latter with the former for better
> integration with the boot process and less code.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer
  2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
@ 2026-06-19 13:42 ` Alexandre Courbot
  2026-06-22  7:00   ` Eliot Courtney
  2026-06-19 13:42 ` [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure Alexandre Courbot
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

The sequence is currently stored in the `GspSequencer` even though its
lifetime is limited to `GspSequencer::run`. This object-oriented design
does not play well with the borrow-checker, as `GspSequencer::iter`
borrows a reference to the `GspSequencer`, which makes it difficult to
introduce mutable references in `GspBootContext`, as we want to do in
order to make the `Falcon` references mutable.

Thus, store the sequence locally in `GspSequencer::run`, and move
iterator creation to `GspSeqIter::new` so it no longer needs to borrow
the whole `GspSequencer`.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/sequencer.rs | 35 +++++++++++++++-------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/sequencer.rs b/drivers/gpu/nova-core/gsp/sequencer.rs
index 0da3c3531886..b5e049f76c28 100644
--- a/drivers/gpu/nova-core/gsp/sequencer.rs
+++ b/drivers/gpu/nova-core/gsp/sequencer.rs
@@ -129,8 +129,6 @@ pub(crate) fn new(data: &[u8], dev: &device::Device) -> Result<(Self, usize)> {
 
 /// GSP Sequencer for executing firmware commands during boot.
 pub(crate) struct GspSequencer<'a> {
-    /// Sequencer information with command data.
-    seq_info: GspSequence,
     /// `Bar0` for register access.
     bar: Bar0<'a>,
     /// SEC2 falcon for core operations.
@@ -271,7 +269,7 @@ fn run(&self, seq: &GspSequencer<'_>) -> Result {
 }
 
 /// Iterator over GSP sequencer commands.
-pub(crate) struct GspSeqIter<'a> {
+struct GspSeqIter<'a> {
     /// Command data buffer.
     cmd_data: &'a [u8],
     /// Current position in the buffer.
@@ -284,6 +282,18 @@ pub(crate) struct GspSeqIter<'a> {
     dev: &'a device::Device,
 }
 
+impl<'a> GspSeqIter<'a> {
+    fn new(seq: &'a GspSequence, dev: &'a device::Device) -> Self {
+        Self {
+            cmd_data: &seq.cmd_data,
+            current_offset: 0,
+            total_cmds: seq.cmd_index,
+            cmds_processed: 0,
+            dev,
+        }
+    }
+}
+
 impl<'a> Iterator for GspSeqIter<'a> {
     type Item = Result<GspSeqCmd>;
 
@@ -325,20 +335,6 @@ fn next(&mut self) -> Option<Self::Item> {
     }
 }
 
-impl<'a> GspSequencer<'a> {
-    fn iter(&self) -> GspSeqIter<'_> {
-        let cmd_data = &self.seq_info.cmd_data[..];
-
-        GspSeqIter {
-            cmd_data,
-            current_offset: 0,
-            total_cmds: self.seq_info.cmd_index,
-            cmds_processed: 0,
-            dev: self.dev,
-        }
-    }
-}
-
 impl<'a> GspSequencer<'a> {
     pub(crate) fn run(
         cmdq: &Cmdq,
@@ -355,7 +351,6 @@ pub(crate) fn run(
         };
 
         let sequencer = GspSequencer {
-            seq_info,
             bar: ctx.bar,
             sec2_falcon: ctx.sec2_falcon,
             gsp_falcon: ctx.gsp_falcon,
@@ -366,14 +361,14 @@ pub(crate) fn run(
 
         dev_dbg!(sequencer.dev, "Running CPU Sequencer commands\n");
 
-        for cmd_result in sequencer.iter() {
+        for cmd_result in GspSeqIter::new(&seq_info, sequencer.dev) {
             match cmd_result {
                 Ok(cmd) => cmd.run(&sequencer)?,
                 Err(e) => {
                     dev_err!(
                         sequencer.dev,
                         "Error running command at index {}\n",
-                        sequencer.seq_info.cmd_index
+                        seq_info.cmd_index
                     );
                     return Err(e);
                 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer
  2026-06-19 13:42 ` [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer Alexandre Courbot
@ 2026-06-22  7:00   ` Eliot Courtney
  0 siblings, 0 replies; 14+ messages in thread
From: Eliot Courtney @ 2026-06-22  7:00 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	dri-devel

On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
> The sequence is currently stored in the `GspSequencer` even though its
> lifetime is limited to `GspSequencer::run`. This object-oriented design
> does not play well with the borrow-checker, as `GspSequencer::iter`
> borrows a reference to the `GspSequencer`, which makes it difficult to
> introduce mutable references in `GspBootContext`, as we want to do in
> order to make the `Falcon` references mutable.
>
> Thus, store the sequence locally in `GspSequencer::run`, and move
> iterator creation to `GspSeqIter::new` so it no longer needs to borrow
> the whole `GspSequencer`.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
  2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer Alexandre Courbot
@ 2026-06-19 13:42 ` Alexandre Courbot
  2026-06-22  7:57   ` Eliot Courtney
  2026-06-19 13:42 ` [PATCH 4/6] gpu: nova-core: gsp: replace BootUnloadGuard with local handler Alexandre Courbot
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
more local and less intrusive mechanism to run the GSP unload sequence
upon GSP boot failure. Doing so requires running the boot code in a
local closure, which changes its indentation and would make other
changes difficult to track in the diff. Thus, this preparatory patch
moves said boot code into a local closure that is run upon construction,
so the next patch does not need to re-indent code that changes.

This is a mechanical preparatory patch to make the next patch easier to
read. No functional change intended.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs      | 42 ++++++++-------
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 40 +++++++-------
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 96 ++++++++++++++++++----------------
 3 files changed, 96 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index bb2000b7a78b..9eccfd634b61 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -119,30 +119,36 @@ pub(crate) fn boot(
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
         let unload_guard = hal.boot(&self, &ctx, &fb_layout, &wpr_meta)?;
+        // Run from a closure so we can retrieve the result, and run the unload sequence of the GSP
+        // in case of error.
+        let res = (|| {
+            gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
-        gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
+            // Poll for RISC-V to become active before continuing.
+            read_poll_timeout(
+                || Ok(gsp_falcon.is_riscv_active(bar)),
+                |val: &bool| *val,
+                Delta::from_millis(10),
+                Delta::from_secs(5),
+            )?;
 
-        // Poll for RISC-V to become active before continuing.
-        read_poll_timeout(
-            || Ok(gsp_falcon.is_riscv_active(bar)),
-            |val: &bool| *val,
-            Delta::from_millis(10),
-            Delta::from_secs(5),
-        )?;
+            dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
 
-        dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
+            self.cmdq
+                .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
+            self.cmdq
+                .send_command_no_wait(bar, commands::SetRegistry::new())?;
 
-        self.cmdq
-            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
-        self.cmdq
-            .send_command_no_wait(bar, commands::SetRegistry::new())?;
+            hal.post_boot(&self, &ctx, &gsp_fw)?;
 
-        hal.post_boot(&self, &ctx, &gsp_fw)?;
+            // Wait until GSP is fully initialized.
+            commands::wait_gsp_init_done(&self.cmdq)
+        })();
 
-        // Wait until GSP is fully initialized.
-        commands::wait_gsp_init_done(&self.cmdq)?;
-
-        Ok(unload_guard.dismiss())
+        match res {
+            Err(e) => Err(e),
+            Ok(()) => Ok(unload_guard.dismiss()),
+        }
     }
 
     /// Shut down the GSP and wait until it is offline.
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index c9fdc8cacedc..46e03f34bc74 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -162,31 +162,35 @@ fn boot<'a>(
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
-        let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
+        let res = (|| {
+            let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
-        let unload_bundle = crate::gsp::UnloadBundle(
-            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
-        );
+            let unload_bundle = crate::gsp::UnloadBundle(
+                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
+            );
 
-        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-        let unload_guard =
-            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
+            // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
+            let unload_guard =
+                BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
 
-        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
+            let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
 
-        let args = FmcBootArgs::new(
-            dev,
-            chipset,
-            wpr_meta.dma_handle(),
-            gsp.libos.dma_handle(),
-            false,
-        )?;
+            let args = FmcBootArgs::new(
+                dev,
+                chipset,
+                wpr_meta.dma_handle(),
+                gsp.libos.dma_handle(),
+                false,
+            )?;
 
-        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
+            fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
-        wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
+            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
 
-        Ok(unload_guard)
+            Ok(unload_guard)
+        })();
+
+        res
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 6ed4ee268086..9b24361f924b 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -277,59 +277,63 @@ fn boot<'a>(
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
-        let bios = Vbios::new(dev, bar)?;
+        let res = (|| {
+            let bios = Vbios::new(dev, bar)?;
 
-        // Try and prepare the unload bundle.
-        //
-        // If the unload bundle creation fails, the GPU will need to be reset before the driver can
-        // be probed again.
-        let unload_bundle =
-            Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
-                .inspect_err(|e| {
-                    dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
-                    dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
-                    dev_warn!(
-                        dev,
-                        "The GPU will need to be reset before the driver can bind again.\n"
-                    );
-                })
-                .ok()
-                .map(crate::gsp::UnloadBundle);
+            // Try and prepare the unload bundle.
+            //
+            // If the unload bundle creation fails, the GPU will need to be reset before the driver
+            // can be probed again.
+            let unload_bundle =
+                Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+                    .inspect_err(|e| {
+                        dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+                        dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+                        dev_warn!(
+                            dev,
+                            "The GPU will need to be reset before the driver can bind again.\n"
+                        );
+                    })
+                    .ok()
+                    .map(crate::gsp::UnloadBundle);
 
-        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-        let unload_guard =
-            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
+            // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
+            let unload_guard =
+                BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
 
-        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
-        if !fb_layout.frts.is_empty() {
-            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
-        }
+            // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
+            if !fb_layout.frts.is_empty() {
+                run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+            }
 
-        gsp_falcon.reset(bar)?;
-        let libos_handle = gsp.libos.dma_handle();
-        let (mbox0, mbox1) = gsp_falcon.boot(
-            bar,
-            Some(libos_handle as u32),
-            Some((libos_handle >> 32) as u32),
-        )?;
-        dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+            gsp_falcon.reset(bar)?;
+            let libos_handle = gsp.libos.dma_handle();
+            let (mbox0, mbox1) = gsp_falcon.boot(
+                bar,
+                Some(libos_handle as u32),
+                Some((libos_handle >> 32) as u32),
+            )?;
+            dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
 
-        dev_dbg!(
-            dev,
-            "Using SEC2 to load and run the booter_load firmware...\n"
-        );
+            dev_dbg!(
+                dev,
+                "Using SEC2 to load and run the booter_load firmware...\n"
+            );
 
-        BooterFirmware::new(
-            dev,
-            BooterKind::Loader,
-            chipset,
-            FIRMWARE_VERSION,
-            sec2_falcon,
-            bar,
-        )?
-        .run(dev, bar, sec2_falcon, wpr_meta)?;
+            BooterFirmware::new(
+                dev,
+                BooterKind::Loader,
+                chipset,
+                FIRMWARE_VERSION,
+                sec2_falcon,
+                bar,
+            )?
+            .run(dev, bar, sec2_falcon, wpr_meta)?;
 
-        Ok(unload_guard)
+            Ok(unload_guard)
+        })();
+
+        res
     }
 
     fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
  2026-06-19 13:42 ` [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure Alexandre Courbot
@ 2026-06-22  7:57   ` Eliot Courtney
  2026-06-22 15:04     ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Eliot Courtney @ 2026-06-22  7:57 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	dri-devel

On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
> The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
> more local and less intrusive mechanism to run the GSP unload sequence
> upon GSP boot failure. Doing so requires running the boot code in a
> local closure, which changes its indentation and would make other
> changes difficult to track in the diff. Thus, this preparatory patch
> moves said boot code into a local closure that is run upon construction,
> so the next patch does not need to re-indent code that changes.
>
> This is a mechanical preparatory patch to make the next patch easier to
> read. No functional change intended.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---

I agree with removing BootUnloadGuard, but I think it's not great to do
a bunch of lifting into closures then manually handling the result. It's
error prone imo (we already had several bugs relating to this kind of
thing). Instead, what about just using ScopeGuard directly? This lets us
avoid lifting into closures (which is a bit noisy) and avoids manual
result handling for failures (which is a bit error prone). With the
`GspBootContext` it's fairly easy to do now:

```
let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
    let _ = gsp.unload(ctx, unload_bundle);
});
```

I confirmed that it's also compatible with the v2 of this series that
has the mutable Fsp - you can stash the context inside the ScopeGuard
data (then making a &mut reference to the stashed context for brevity)
or have a separate unload context type that doesn't use FSP or something
(could later be type parametrized along with Gsp, for example).

For example here is a rough diff on top of this patch series (you can
change the Result<Option<UnloadBundle>> returns to like
Result<Result<UnloadBundle>> if you want to centralise teh error
handling of a failed unloadbundle although currently it can only fail in
one location):

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 7918ebb508f9..f6454b106293 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -314,7 +314,7 @@ fn drop(self: Pin<&mut Self>) {
             .as_ref()
             .get_ref()
             .unload(
-                GspBootContext {
+                &GspBootContext {
                     pdev: device,
                     bar,
                     chipset: this.spec.chipset,
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 336ad23c96f9..cfe10a8313c8 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -6,7 +6,8 @@
     dma::Coherent,
     io::poll::read_poll_timeout,
     prelude::*,
-    time::Delta, //
+    time::Delta,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -55,57 +56,35 @@ pub(crate) fn boot(
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
-        let (res, unload_bundle) = hal.boot(&self, &ctx, &fb_layout, &wpr_meta);
+        let unload_bundle = hal.boot(&self, &ctx, &fb_layout, &wpr_meta)?;
 
-        // Display error for unload bundle if any, and convert to `Option`.
-        let unload_bundle = unload_bundle
-            .inspect_err(|e| {
-                dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
-                dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
-                dev_warn!(
-                    dev,
-                    "The GPU will need to be reset before the driver can bind again.\n"
-                );
-            })
-            .ok();
-
-        // Run from a closure so we can retrieve the result, and run the unload sequence of the GSP
-        // in case of error.
-        let res = res.and_then(|()| {
-            gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
-
-            // Poll for RISC-V to become active before continuing.
-            read_poll_timeout(
-                || Ok(gsp_falcon.is_riscv_active(bar)),
-                |val: &bool| *val,
-                Delta::from_millis(10),
-                Delta::from_secs(5),
-            )?;
-
-            dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
-
-            self.cmdq
-                .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
-            self.cmdq
-                .send_command_no_wait(bar, commands::SetRegistry::new())?;
-
-            hal.post_boot(&self, &ctx, &gsp_fw)?;
-
-            // Wait until GSP is fully initialized.
-            commands::wait_gsp_init_done(&self.cmdq)
+        let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
+            let _ = self.unload(&ctx, unload_bundle);
         });
 
-        match res {
-            Err(e) => {
-                dev_err!(dev, "GSP boot failed with error {:?}\n", e);
+        gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
-                // Ignore errors during unload; we will return the error that happened during boot.
-                let _ = self.unload(ctx, unload_bundle);
+        // Poll for RISC-V to become active before continuing.
+        read_poll_timeout(
+            || Ok(gsp_falcon.is_riscv_active(bar)),
+            |val: &bool| *val,
+            Delta::from_millis(10),
+            Delta::from_secs(5),
+        )?;
 
-                Err(e)
-            }
-            Ok(()) => Ok(unload_bundle),
-        }
+        dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
+
+        self.cmdq
+            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
+        self.cmdq
+            .send_command_no_wait(bar, commands::SetRegistry::new())?;
+
+        hal.post_boot(&self, &ctx, &gsp_fw)?;
+
+        // Wait until GSP is fully initialized.
+        commands::wait_gsp_init_done(&self.cmdq)?;
+
+        Ok(unload_guard.dismiss())
     }
 
     /// Shut down the GSP and wait until it is offline.
@@ -134,7 +113,7 @@ fn shutdown_gsp(
     /// This stops all activity on the GSP.
     pub(crate) fn unload(
         &self,
-        ctx: super::GspBootContext<'_>,
+        ctx: &super::GspBootContext<'_>,
         unload_bundle: Option<super::UnloadBundle>,
     ) -> Result {
         let dev = ctx.dev();
@@ -153,7 +132,7 @@ pub(crate) fn unload(
             res = res.and(
                 unload_bundle
                     .0
-                    .run(&ctx)
+                    .run(ctx)
                     .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
             );
         } else {
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 113d445239b9..849ca224085b 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -37,22 +37,15 @@ pub(super) trait UnloadBundle: Send {
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
     ///
-    /// Returns two things:
-    ///
-    /// - The `Result` of the boot process itself,
-    /// - The `UnloadBundle` to use with [`Gsp::unload`], or `Err` if the bundle could not be
-    ///   created.
-    ///
-    /// Note that the two returned values are independent: it is possible for the boot process to
-    /// succeed while the unload bundle couldn't be created. In this case, the GSP won't be able to
-    /// unload properly and a full GPU reset is required before the GSP can be booted again.
+    /// Upon success, returns the [`crate::gsp::UnloadBundle`] to use with [`Gsp::unload`], if one
+    /// could be created.
     fn boot(
         &self,
         gsp: &Gsp,
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Result<crate::gsp::UnloadBundle>);
+    ) -> Result<Option<crate::gsp::UnloadBundle>>;
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index c6ff2fb216ea..04c27afc650a 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -7,7 +7,8 @@
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
-    time::Delta, //
+    time::Delta,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -143,35 +144,35 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Result<crate::gsp::UnloadBundle>) {
+    ) -> Result<Option<crate::gsp::UnloadBundle>> {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
 
-        let mut unload_bundle = Err(ENODATA);
+        let unload_bundle = crate::gsp::UnloadBundle(
+            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
+        );
 
-        let res = (|| {
-            unload_bundle = Ok(crate::gsp::UnloadBundle(
-                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>,
-            ));
+        let unload_guard = ScopeGuard::new_with_data(Some(unload_bundle), |unload_bundle| {
+            let _ = gsp.unload(ctx, unload_bundle);
+        });
 
-            let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
+        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
 
-            let args = FmcBootArgs::new(
-                dev,
-                chipset,
-                wpr_meta.dma_handle(),
-                gsp.libos.dma_handle(),
-                false,
-            )?;
+        let args = FmcBootArgs::new(
+            dev,
+            chipset,
+            wpr_meta.dma_handle(),
+            gsp.libos.dma_handle(),
+            false,
+        )?;
 
-            fsp.boot_fmc(dev, bar, fb_layout, &args)?;
+        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
-            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())
-        })();
+        wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
 
-        (res, unload_bundle)
+        Ok(unload_guard.dismiss())
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 93ff8a154100..fb1d233ac7db 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -6,7 +6,8 @@
 use kernel::{
     device,
     dma::Coherent,
-    io::Io, //
+    io::Io,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -267,57 +268,66 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Result<crate::gsp::UnloadBundle>) {
+    ) -> Result<Option<crate::gsp::UnloadBundle>> {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
-        let mut unload_bundle = Err(ENODATA);
+        let bios = Vbios::new(dev, bar)?;
 
-        let res = (|| {
-            let bios = Vbios::new(dev, bar)?;
+        // Try and prepare the unload bundle.
+        //
+        // If the unload bundle creation fails, the GPU will need to be reset before the driver can
+        // be probed again.
+        let unload_bundle =
+            Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+                .inspect_err(|e| {
+                    dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+                    dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+                    dev_warn!(
+                        dev,
+                        "The GPU will need to be reset before the driver can bind again.\n"
+                    );
+                })
+                .ok()
+                .map(crate::gsp::UnloadBundle);
 
-            // Try and prepare the unload bundle.
-            //
-            // If the unload bundle creation fails, the GPU will need to be reset before the driver
-            // can be probed again.
-            unload_bundle =
-                Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
-                    .map(crate::gsp::UnloadBundle);
+        let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
+            let _ = gsp.unload(ctx, unload_bundle);
+        });
 
-            // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
-            if !fb_layout.frts.is_empty() {
-                run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
-            }
+        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
+        if !fb_layout.frts.is_empty() {
+            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+        }
 
-            gsp_falcon.reset(bar)?;
-            let libos_handle = gsp.libos.dma_handle();
-            let (mbox0, mbox1) = gsp_falcon.boot(
-                bar,
-                Some(libos_handle as u32),
-                Some((libos_handle >> 32) as u32),
-            )?;
-            dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+        gsp_falcon.reset(bar)?;
+        let libos_handle = gsp.libos.dma_handle();
+        let (mbox0, mbox1) = gsp_falcon.boot(
+            bar,
+            Some(libos_handle as u32),
+            Some((libos_handle >> 32) as u32),
+        )?;
+        dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
 
-            dev_dbg!(
-                dev,
-                "Using SEC2 to load and run the booter_load firmware...\n"
-            );
+        dev_dbg!(
+            dev,
+            "Using SEC2 to load and run the booter_load firmware...\n"
+        );
 
-            BooterFirmware::new(
-                dev,
-                BooterKind::Loader,
-                chipset,
-                FIRMWARE_VERSION,
-                sec2_falcon,
-                bar,
-            )?
-            .run(dev, bar, sec2_falcon, wpr_meta)
-        })();
+        BooterFirmware::new(
+            dev,
+            BooterKind::Loader,
+            chipset,
+            FIRMWARE_VERSION,
+            sec2_falcon,
+            bar,
+        )?
+        .run(dev, bar, sec2_falcon, wpr_meta)?;
 
-        (res, unload_bundle)
+        Ok(unload_guard.dismiss())
     }
 
     fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
  2026-06-22  7:57   ` Eliot Courtney
@ 2026-06-22 15:04     ` Alexandre Courbot
  2026-06-23  5:40       ` Eliot Courtney
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-22 15:04 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo, John Hubbard, Alistair Popple, Timur Tabi, Zhi Wang,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Mon Jun 22, 2026 at 4:57 PM JST, Eliot Courtney wrote:
> On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
>> The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
>> more local and less intrusive mechanism to run the GSP unload sequence
>> upon GSP boot failure. Doing so requires running the boot code in a
>> local closure, which changes its indentation and would make other
>> changes difficult to track in the diff. Thus, this preparatory patch
>> moves said boot code into a local closure that is run upon construction,
>> so the next patch does not need to re-indent code that changes.
>>
>> This is a mechanical preparatory patch to make the next patch easier to
>> read. No functional change intended.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>
> I agree with removing BootUnloadGuard, but I think it's not great to do
> a bunch of lifting into closures then manually handling the result. It's
> error prone imo (we already had several bugs relating to this kind of
> thing). Instead, what about just using ScopeGuard directly? This lets us
> avoid lifting into closures (which is a bit noisy) and avoids manual
> result handling for failures (which is a bit error prone). With the
> `GspBootContext` it's fairly easy to do now:
>
> ```
> let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
>     let _ = gsp.unload(ctx, unload_bundle);
> });
> ```

Yes, initially I wanted to use `ScopeGuard` but ran into issues when
trying to make the references mutable. However it looks like you have
been able to overcome these issues, thanks for taking the time to craft
a patch!

>
> I confirmed that it's also compatible with the v2 of this series that
> has the mutable Fsp - you can stash the context inside the ScopeGuard
> data (then making a &mut reference to the stashed context for brevity)
> or have a separate unload context type that doesn't use FSP or something
> (could later be type parametrized along with Gsp, for example).
>
> For example here is a rough diff on top of this patch series (you can
> change the Result<Option<UnloadBundle>> returns to like
> Result<Result<UnloadBundle>> if you want to centralise teh error
> handling of a failed unloadbundle although currently it can only fail in
> one location):

Yes, looking at it it looks like a cleaner approach than using closures.

The only thing that I saw as a regression is that now each HAL needs to
call `Gsp::unload` itself in its own `ScopeGuard`. I don't think that's
the HAL's work - `Gsp::boot` should be the centralized point where this
happens.

But we can make both work if `unload_bundle` is passed as an output
argument to `GspHal::boot` instead of being returned:

    let mut guard = ScopeGuard::new_with_data((ctx, None), ...);
    let (ctx, unload_bundle) = &mut *guard;

    // `unload_bundle` is a mutable reference to the
    // `Option<UnloadBundle>` in `guard`.
    hal.boot(&self, ctx, unload_bundle, &fb_layout, &wpr_meta)?;

The boot method can fill `unload_bundle` early, and if it returns an
error then the `ScopeGuard` will be able to use it. Also, and that's
nice, the HALs don't need to use `ScopeGuard`. But output arguments
aren't really something we expect to see in Rust.

Another alternative would be to separate the unload bundle construction
from `GspHal::boot`:

    let unload_bundle = hal.build_unload_bundle(ctx, ...);
    let guard = ScopeGuard::new_with_data((ctx, unload_bundle), ...);

    hal.boot(...)?;

It removes the "1 boot -> 1 unload bundle" symmetry, but on the other
hand it also splits concerns more clearly. And it removes the awkward
return type of `GspHal::boot`, which come to think of it was another
smell that things were not in the right place.

The main drawback I see is that we now need to build `Vbios` twice for
TU102, since it is needed both for the unload bundle and for booting.
But the solution is the same as what the v2 of this series does to
`Fsp`: the BIOS is a GPU-wide subdevice that is likely to be used
elsewhere, not something to be confined in a GSP HAL. So I say, let's
extract it and make it also part of `GspBootContext`.

How does that sound?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
  2026-06-22 15:04     ` Alexandre Courbot
@ 2026-06-23  5:40       ` Eliot Courtney
  2026-06-24  4:30         ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Eliot Courtney @ 2026-06-23  5:40 UTC (permalink / raw)
  To: Alexandre Courbot, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo, John Hubbard, Alistair Popple, Timur Tabi, Zhi Wang,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Tue Jun 23, 2026 at 12:04 AM JST, Alexandre Courbot wrote:
> On Mon Jun 22, 2026 at 4:57 PM JST, Eliot Courtney wrote:
>> On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
>>> The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
>>> more local and less intrusive mechanism to run the GSP unload sequence
>>> upon GSP boot failure. Doing so requires running the boot code in a
>>> local closure, which changes its indentation and would make other
>>> changes difficult to track in the diff. Thus, this preparatory patch
>>> moves said boot code into a local closure that is run upon construction,
>>> so the next patch does not need to re-indent code that changes.
>>>
>>> This is a mechanical preparatory patch to make the next patch easier to
>>> read. No functional change intended.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>
>> I agree with removing BootUnloadGuard, but I think it's not great to do
>> a bunch of lifting into closures then manually handling the result. It's
>> error prone imo (we already had several bugs relating to this kind of
>> thing). Instead, what about just using ScopeGuard directly? This lets us
>> avoid lifting into closures (which is a bit noisy) and avoids manual
>> result handling for failures (which is a bit error prone). With the
>> `GspBootContext` it's fairly easy to do now:
>>
>> ```
>> let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
>>     let _ = gsp.unload(ctx, unload_bundle);
>> });
>> ```
>
> Yes, initially I wanted to use `ScopeGuard` but ran into issues when
> trying to make the references mutable. However it looks like you have
> been able to overcome these issues, thanks for taking the time to craft
> a patch!
>
>>
>> I confirmed that it's also compatible with the v2 of this series that
>> has the mutable Fsp - you can stash the context inside the ScopeGuard
>> data (then making a &mut reference to the stashed context for brevity)
>> or have a separate unload context type that doesn't use FSP or something
>> (could later be type parametrized along with Gsp, for example).
>>
>> For example here is a rough diff on top of this patch series (you can
>> change the Result<Option<UnloadBundle>> returns to like
>> Result<Result<UnloadBundle>> if you want to centralise teh error
>> handling of a failed unloadbundle although currently it can only fail in
>> one location):
>
> Yes, looking at it it looks like a cleaner approach than using closures.
>
> The only thing that I saw as a regression is that now each HAL needs to
> call `Gsp::unload` itself in its own `ScopeGuard`. I don't think that's
> the HAL's work - `Gsp::boot` should be the centralized point where this
> happens.
>
> But we can make both work if `unload_bundle` is passed as an output
> argument to `GspHal::boot` instead of being returned:
>
>     let mut guard = ScopeGuard::new_with_data((ctx, None), ...);
>     let (ctx, unload_bundle) = &mut *guard;
>
>     // `unload_bundle` is a mutable reference to the
>     // `Option<UnloadBundle>` in `guard`.
>     hal.boot(&self, ctx, unload_bundle, &fb_layout, &wpr_meta)?;
>
> The boot method can fill `unload_bundle` early, and if it returns an
> error then the `ScopeGuard` will be able to use it. Also, and that's
> nice, the HALs don't need to use `ScopeGuard`. But output arguments
> aren't really something we expect to see in Rust.
>
> Another alternative would be to separate the unload bundle construction
> from `GspHal::boot`:
>
>     let unload_bundle = hal.build_unload_bundle(ctx, ...);
>     let guard = ScopeGuard::new_with_data((ctx, unload_bundle), ...);
>
>     hal.boot(...)?;
>
> It removes the "1 boot -> 1 unload bundle" symmetry, but on the other
> hand it also splits concerns more clearly. And it removes the awkward
> return type of `GspHal::boot`, which come to think of it was another
> smell that things were not in the right place.
>
> The main drawback I see is that we now need to build `Vbios` twice for
> TU102, since it is needed both for the unload bundle and for booting.
> But the solution is the same as what the v2 of this series does to
> `Fsp`: the BIOS is a GPU-wide subdevice that is likely to be used
> elsewhere, not something to be confined in a GSP HAL. So I say, let's
> extract it and make it also part of `GspBootContext`.
>
> How does that sound?

I think that currently it's confusing because we have two concepts in
use with very similar names. We have "unloadbundle" meaning what we need
to run to unload the driver, and we have "unload_guard" which is for
running unwind stuff if we error. And for the most part these things are
the same, but they might not be (e.g. in my other series where we need
to keep certain DMA allocations alive just for the error path, but not
for an unload later, or when we are partially loaded). So it might be
nice to make these two things more separate.

I think that trying to build the unload bundle separately
(`build_unload_bundle`) is confusing because e.g. hal.boot() still needs
to handle unwinding its state in case of error, so it adds a strong
assumption that unloading in success is the same as unwinding in error.

If we switch to a model where the caller owns Gsp::unload, we need to
support prolonging some state until gsp.unload() finishes, e.g. in my
blackwell fixes series, we need to prolong DMA allocation in the error
case for `FmcBootArgs`. It's doable to prolong the DMA allocation
because the callee currently owns running the Gsp::unload. But we would
need to do some gymnastics if the caller owns running Gsp::unload and I
can't figure out a nice way to do this. In general, if we have partially
initialized hardware, it seems likely to me we will have to handle
keeping a certain set of resources alive until we can guarantee they are
not accessed anymore (gsp inactive being a clear marker).

Separately, I think it would be good to explicitly define the semantics
of what the state should be in after each one of these calls in
Gsp::boot. For example:

"Callees that fail should call gsp.unload(ctx, unwind_bundle) to unload
if it's required. If it succeeds, return an unload bundle."

For state that should be unwound *before* Gsp::unload, I think the HAL
methods should handle it internally (regardless of what we decide for
caller vs callee ownership of running gsp.unload). But we do have to
deal with unwind that needs to happen after gsp shutdown unfortunately.
It might be nice to make this more explicit.

So it kinda sucks to have to repeat Gsp::unload in HALs, but also I
think that whether Gsp::unload is required depends on how far we have
initialized, so HALs may not want to actually call it. So IMO it makes
some sense there too to have them decide. (e.g. if we only loaded Vbios
so far).

So concretely, IMO:
1. Keep gsp.unload() the responsibility of the callees, because they may
   need to prolong state until after gsp.unload, and, maybe we are not
   initialized enough to want to call gsp.unload yet and callees know
   this best.
2. Through naming, explicitly differentiate between unload and unwind.

WDYT?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
  2026-06-23  5:40       ` Eliot Courtney
@ 2026-06-24  4:30         ` Alexandre Courbot
  2026-06-24  6:36           ` Eliot Courtney
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-24  4:30 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo, John Hubbard, Alistair Popple, Timur Tabi, Zhi Wang,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Tue Jun 23, 2026 at 2:40 PM JST, Eliot Courtney wrote:
> On Tue Jun 23, 2026 at 12:04 AM JST, Alexandre Courbot wrote:
>> On Mon Jun 22, 2026 at 4:57 PM JST, Eliot Courtney wrote:
>>> On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
>>>> The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
>>>> more local and less intrusive mechanism to run the GSP unload sequence
>>>> upon GSP boot failure. Doing so requires running the boot code in a
>>>> local closure, which changes its indentation and would make other
>>>> changes difficult to track in the diff. Thus, this preparatory patch
>>>> moves said boot code into a local closure that is run upon construction,
>>>> so the next patch does not need to re-indent code that changes.
>>>>
>>>> This is a mechanical preparatory patch to make the next patch easier to
>>>> read. No functional change intended.
>>>>
>>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>>> ---
>>>
>>> I agree with removing BootUnloadGuard, but I think it's not great to do
>>> a bunch of lifting into closures then manually handling the result. It's
>>> error prone imo (we already had several bugs relating to this kind of
>>> thing). Instead, what about just using ScopeGuard directly? This lets us
>>> avoid lifting into closures (which is a bit noisy) and avoids manual
>>> result handling for failures (which is a bit error prone). With the
>>> `GspBootContext` it's fairly easy to do now:
>>>
>>> ```
>>> let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
>>>     let _ = gsp.unload(ctx, unload_bundle);
>>> });
>>> ```
>>
>> Yes, initially I wanted to use `ScopeGuard` but ran into issues when
>> trying to make the references mutable. However it looks like you have
>> been able to overcome these issues, thanks for taking the time to craft
>> a patch!
>>
>>>
>>> I confirmed that it's also compatible with the v2 of this series that
>>> has the mutable Fsp - you can stash the context inside the ScopeGuard
>>> data (then making a &mut reference to the stashed context for brevity)
>>> or have a separate unload context type that doesn't use FSP or something
>>> (could later be type parametrized along with Gsp, for example).
>>>
>>> For example here is a rough diff on top of this patch series (you can
>>> change the Result<Option<UnloadBundle>> returns to like
>>> Result<Result<UnloadBundle>> if you want to centralise teh error
>>> handling of a failed unloadbundle although currently it can only fail in
>>> one location):
>>
>> Yes, looking at it it looks like a cleaner approach than using closures.
>>
>> The only thing that I saw as a regression is that now each HAL needs to
>> call `Gsp::unload` itself in its own `ScopeGuard`. I don't think that's
>> the HAL's work - `Gsp::boot` should be the centralized point where this
>> happens.
>>
>> But we can make both work if `unload_bundle` is passed as an output
>> argument to `GspHal::boot` instead of being returned:
>>
>>     let mut guard = ScopeGuard::new_with_data((ctx, None), ...);
>>     let (ctx, unload_bundle) = &mut *guard;
>>
>>     // `unload_bundle` is a mutable reference to the
>>     // `Option<UnloadBundle>` in `guard`.
>>     hal.boot(&self, ctx, unload_bundle, &fb_layout, &wpr_meta)?;
>>
>> The boot method can fill `unload_bundle` early, and if it returns an
>> error then the `ScopeGuard` will be able to use it. Also, and that's
>> nice, the HALs don't need to use `ScopeGuard`. But output arguments
>> aren't really something we expect to see in Rust.
>>
>> Another alternative would be to separate the unload bundle construction
>> from `GspHal::boot`:
>>
>>     let unload_bundle = hal.build_unload_bundle(ctx, ...);
>>     let guard = ScopeGuard::new_with_data((ctx, unload_bundle), ...);
>>
>>     hal.boot(...)?;
>>
>> It removes the "1 boot -> 1 unload bundle" symmetry, but on the other
>> hand it also splits concerns more clearly. And it removes the awkward
>> return type of `GspHal::boot`, which come to think of it was another
>> smell that things were not in the right place.
>>
>> The main drawback I see is that we now need to build `Vbios` twice for
>> TU102, since it is needed both for the unload bundle and for booting.
>> But the solution is the same as what the v2 of this series does to
>> `Fsp`: the BIOS is a GPU-wide subdevice that is likely to be used
>> elsewhere, not something to be confined in a GSP HAL. So I say, let's
>> extract it and make it also part of `GspBootContext`.
>>
>> How does that sound?
>
> I think that currently it's confusing because we have two concepts in
> use with very similar names. We have "unloadbundle" meaning what we need
> to run to unload the driver, and we have "unload_guard" which is for
> running unwind stuff if we error. And for the most part these things are
> the same, but they might not be (e.g. in my other series where we need
> to keep certain DMA allocations alive just for the error path, but not
> for an unload later, or when we are partially loaded). So it might be
> nice to make these two things more separate.
>
> I think that trying to build the unload bundle separately
> (`build_unload_bundle`) is confusing because e.g. hal.boot() still needs
> to handle unwinding its state in case of error, so it adds a strong
> assumption that unloading in success is the same as unwinding in error.

I don't think `hal.boot()` can unwind its state properly - or I should
say, I don't think it can unwind it better than just running its unload
bundle itself.

There is little point calling `Gsp::unload` from a HAL failure as it
just queues an `UnloadGuestDriver` message (that the GSP wouldn't be in
condition to process) before running the unload bundle. So here we can
simply skip the guaranteed message timeout and run the unload bundle (or
the relevant part) directly. But even doing so doesn't guarantee that we
can recover.

For TU102, the unload bundle runs two firmwares to reset the GSP falcon
to its original state and to remove the WPR2 region. I don't think we
can/should run them selectively, as they don't undo work in the same
order as the boot sequence - or even from the same microcontroller! For
instance, FWSEC-FRTS (GSP falcon) sets up the WPR2 region, but it is
tore down by Booter Unload (SEC2 falcon). So the best we can do if
something fails on the boot path is run the whole unload bundle and hope
that we can recover.

For GH100, the unload bundle just waits for the GSP falcon to get out of
RISC-V mode, as a consequence of the `UnloadingGuestDriver` GSP command
sent by `Gsp::unload`. The problem is, that in case of failure that
command cannot even be sent as the GSP is not up! :) So the unload
bundle will either succeed immediately (if we failed early) or timeout
(if the GSP is already in RISC-V mode and stuck somewhere else). It's
actually probably better to not even try to run it in this case, and
just wait for the GSP lockdown release to try and address the DMA buffer
lifetime you raised.

Which supports your case that the HALs should be be responsible for
their own unwinding - only not by running `Gsp::unload` IMHO.

In any case, there doesn't seem to be a recovery path that we can
execute reliably in case of GSP boot failure. So making the HALs
responsible for recovering does indeed sound like the safest solution
(with TU102 trying to run the unload bundle, and GH100 doing... I'm not
sure what :)) and should address the issues you raised - but please let
me know if I missed something.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
  2026-06-24  4:30         ` Alexandre Courbot
@ 2026-06-24  6:36           ` Eliot Courtney
  0 siblings, 0 replies; 14+ messages in thread
From: Eliot Courtney @ 2026-06-24  6:36 UTC (permalink / raw)
  To: Alexandre Courbot, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo, John Hubbard, Alistair Popple, Timur Tabi, Zhi Wang,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Wed Jun 24, 2026 at 1:30 PM JST, Alexandre Courbot wrote:
> On Tue Jun 23, 2026 at 2:40 PM JST, Eliot Courtney wrote:
>> On Tue Jun 23, 2026 at 12:04 AM JST, Alexandre Courbot wrote:
>>> On Mon Jun 22, 2026 at 4:57 PM JST, Eliot Courtney wrote:
>>>> On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
>>>>> The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
>>>>> more local and less intrusive mechanism to run the GSP unload sequence
>>>>> upon GSP boot failure. Doing so requires running the boot code in a
>>>>> local closure, which changes its indentation and would make other
>>>>> changes difficult to track in the diff. Thus, this preparatory patch
>>>>> moves said boot code into a local closure that is run upon construction,
>>>>> so the next patch does not need to re-indent code that changes.
>>>>>
>>>>> This is a mechanical preparatory patch to make the next patch easier to
>>>>> read. No functional change intended.
>>>>>
>>>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>>>> ---
>>>>
>>>> I agree with removing BootUnloadGuard, but I think it's not great to do
>>>> a bunch of lifting into closures then manually handling the result. It's
>>>> error prone imo (we already had several bugs relating to this kind of
>>>> thing). Instead, what about just using ScopeGuard directly? This lets us
>>>> avoid lifting into closures (which is a bit noisy) and avoids manual
>>>> result handling for failures (which is a bit error prone). With the
>>>> `GspBootContext` it's fairly easy to do now:
>>>>
>>>> ```
>>>> let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
>>>>     let _ = gsp.unload(ctx, unload_bundle);
>>>> });
>>>> ```
>>>
>>> Yes, initially I wanted to use `ScopeGuard` but ran into issues when
>>> trying to make the references mutable. However it looks like you have
>>> been able to overcome these issues, thanks for taking the time to craft
>>> a patch!
>>>
>>>>
>>>> I confirmed that it's also compatible with the v2 of this series that
>>>> has the mutable Fsp - you can stash the context inside the ScopeGuard
>>>> data (then making a &mut reference to the stashed context for brevity)
>>>> or have a separate unload context type that doesn't use FSP or something
>>>> (could later be type parametrized along with Gsp, for example).
>>>>
>>>> For example here is a rough diff on top of this patch series (you can
>>>> change the Result<Option<UnloadBundle>> returns to like
>>>> Result<Result<UnloadBundle>> if you want to centralise teh error
>>>> handling of a failed unloadbundle although currently it can only fail in
>>>> one location):
>>>
>>> Yes, looking at it it looks like a cleaner approach than using closures.
>>>
>>> The only thing that I saw as a regression is that now each HAL needs to
>>> call `Gsp::unload` itself in its own `ScopeGuard`. I don't think that's
>>> the HAL's work - `Gsp::boot` should be the centralized point where this
>>> happens.
>>>
>>> But we can make both work if `unload_bundle` is passed as an output
>>> argument to `GspHal::boot` instead of being returned:
>>>
>>>     let mut guard = ScopeGuard::new_with_data((ctx, None), ...);
>>>     let (ctx, unload_bundle) = &mut *guard;
>>>
>>>     // `unload_bundle` is a mutable reference to the
>>>     // `Option<UnloadBundle>` in `guard`.
>>>     hal.boot(&self, ctx, unload_bundle, &fb_layout, &wpr_meta)?;
>>>
>>> The boot method can fill `unload_bundle` early, and if it returns an
>>> error then the `ScopeGuard` will be able to use it. Also, and that's
>>> nice, the HALs don't need to use `ScopeGuard`. But output arguments
>>> aren't really something we expect to see in Rust.
>>>
>>> Another alternative would be to separate the unload bundle construction
>>> from `GspHal::boot`:
>>>
>>>     let unload_bundle = hal.build_unload_bundle(ctx, ...);
>>>     let guard = ScopeGuard::new_with_data((ctx, unload_bundle), ...);
>>>
>>>     hal.boot(...)?;
>>>
>>> It removes the "1 boot -> 1 unload bundle" symmetry, but on the other
>>> hand it also splits concerns more clearly. And it removes the awkward
>>> return type of `GspHal::boot`, which come to think of it was another
>>> smell that things were not in the right place.
>>>
>>> The main drawback I see is that we now need to build `Vbios` twice for
>>> TU102, since it is needed both for the unload bundle and for booting.
>>> But the solution is the same as what the v2 of this series does to
>>> `Fsp`: the BIOS is a GPU-wide subdevice that is likely to be used
>>> elsewhere, not something to be confined in a GSP HAL. So I say, let's
>>> extract it and make it also part of `GspBootContext`.
>>>
>>> How does that sound?
>>
>> I think that currently it's confusing because we have two concepts in
>> use with very similar names. We have "unloadbundle" meaning what we need
>> to run to unload the driver, and we have "unload_guard" which is for
>> running unwind stuff if we error. And for the most part these things are
>> the same, but they might not be (e.g. in my other series where we need
>> to keep certain DMA allocations alive just for the error path, but not
>> for an unload later, or when we are partially loaded). So it might be
>> nice to make these two things more separate.
>>
>> I think that trying to build the unload bundle separately
>> (`build_unload_bundle`) is confusing because e.g. hal.boot() still needs
>> to handle unwinding its state in case of error, so it adds a strong
>> assumption that unloading in success is the same as unwinding in error.
>
> I don't think `hal.boot()` can unwind its state properly - or I should
> say, I don't think it can unwind it better than just running its unload
> bundle itself.
>
> There is little point calling `Gsp::unload` from a HAL failure as it
> just queues an `UnloadGuestDriver` message (that the GSP wouldn't be in
> condition to process) before running the unload bundle. So here we can
> simply skip the guaranteed message timeout and run the unload bundle (or
> the relevant part) directly. But even doing so doesn't guarantee that we
> can recover.
>
> For TU102, the unload bundle runs two firmwares to reset the GSP falcon
> to its original state and to remove the WPR2 region. I don't think we
> can/should run them selectively, as they don't undo work in the same
> order as the boot sequence - or even from the same microcontroller! For
> instance, FWSEC-FRTS (GSP falcon) sets up the WPR2 region, but it is
> tore down by Booter Unload (SEC2 falcon). So the best we can do if
> something fails on the boot path is run the whole unload bundle and hope
> that we can recover.
>
> For GH100, the unload bundle just waits for the GSP falcon to get out of
> RISC-V mode, as a consequence of the `UnloadingGuestDriver` GSP command
> sent by `Gsp::unload`. The problem is, that in case of failure that
> command cannot even be sent as the GSP is not up! :) So the unload
> bundle will either succeed immediately (if we failed early) or timeout
> (if the GSP is already in RISC-V mode and stuck somewhere else). It's
> actually probably better to not even try to run it in this case, and
> just wait for the GSP lockdown release to try and address the DMA buffer
> lifetime you raised.
>
> Which supports your case that the HALs should be be responsible for
> their own unwinding - only not by running `Gsp::unload` IMHO.
>
> In any case, there doesn't seem to be a recovery path that we can
> execute reliably in case of GSP boot failure. So making the HALs
> responsible for recovering does indeed sound like the safest solution
> (with TU102 trying to run the unload bundle, and GH100 doing... I'm not
> sure what :)) and should address the issues you raised - but please let
> me know if I missed something.

Ok I think you are right w.r.t. it might be simpler to not have
significantly staged unwinding logic in the HALs. That said, IIUC, in
principle, it could unwind a bit better than the unload bundle. For
example, if we fail before we send the CoT message, then we don't need
to do anything to reset. If we fail after a CoT response, then we can
wait for GSP to halt and then try a falcon level reset if that fails. If
we fail sending the CoT message, then perhaps it needs a FLR to be sure
that things are properly reset - I think these are some options for
trying to ensure reset on GH100+.

I think we could centralise a lot of this reset logic, e.g. overall it
would be send guest unload, wait for halt, then start trying various
resets if GSP doesn't halt. This could be some HAL methods I suppose,
and Gsp::unload just deals with running the completely common Gsp guest
unload + calling the HAL bundle.

So that fits with keeping Gsp::unload out of HAL modules, but also
keeping the ScopeGuard run unload bundle in HAL boot() etc so we can
allow HALS to e.g. keep resources alive until GSP is definitely reset.
Then in a follow up we can harden the reset process to handle more
cases.

WDYT?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/6] gpu: nova-core: gsp: replace BootUnloadGuard with local handler
  2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
                   ` (2 preceding siblings ...)
  2026-06-19 13:42 ` [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure Alexandre Courbot
@ 2026-06-19 13:42 ` Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 5/6] gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 6/6] gpu: nova-core: gsp: make unload take GspBootContext Alexandre Courbot
  5 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

When adding the GSP unload capability, we introduced new types to
support what is effectively an ad-hoc behavior: that `Gsp::unload` must
be run if any error occurs during the boot sequence.

Furthermore, `BootUnloadGuard` is problematic because it holds
additional references to the boot context, notably the `Falcon`s. These
extra references stand in the way of making some of the `Falcon`'s
methods mutable, since those methods would require exclusive access. As
this behavior is only needed in one place, introducing dedicated types
for it is distracting and unnecessary.

Thus, take advantage of the local closures introduced in the preceding
patch to run the unload sequence in `Gsp::boot` if an error occurred at
any step of the boot process.

This slightly broadens the failure cleanup path, as `Gsp::boot` now
attempts a best-effort `Gsp::unload` even when the failure happened
before the unload bundle was prepared.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs      | 81 ++++++----------------------------
 drivers/gpu/nova-core/gsp/hal.rs       | 20 ++++++---
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 28 +++++-------
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 23 ++++------
 4 files changed, 47 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 9eccfd634b61..3574f1a87344 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -7,8 +7,7 @@
     dma::Coherent,
     io::poll::read_poll_timeout,
     prelude::*,
-    time::Delta,
-    types::ScopeGuard, //
+    time::Delta, //
 };
 
 use crate::{
@@ -30,66 +29,6 @@
     },
 };
 
-/// Arguments required to call [`Gsp::unload`](super::Gsp::unload).
-///
-/// Stored as their own type to avoid repeating a long and tedious list in [`BootUnloadGuard`].
-pub(super) struct BootUnloadArgs<'a> {
-    gsp: &'a super::Gsp,
-    dev: &'a device::Device<device::Bound>,
-    bar: Bar0<'a>,
-    gsp_falcon: &'a Falcon<Gsp>,
-    sec2_falcon: &'a Falcon<Sec2>,
-    unload_bundle: Option<super::UnloadBundle>,
-}
-
-/// Guard that calls [`Gsp::unload`](super::Gsp::unload) with a
-/// [`UnloadBundle`](super::UnloadBundle) when dropped.
-///
-/// Used to ensure the `UnloadBundle` is run during failure paths.
-pub(super) struct BootUnloadGuard<'a> {
-    guard: ScopeGuard<BootUnloadArgs<'a>, fn(BootUnloadArgs<'a>)>,
-}
-
-impl<'a> BootUnloadGuard<'a> {
-    /// Wraps `unload_bundle` into a guard that executes it when dropped.
-    pub(super) fn new(
-        gsp: &'a super::Gsp,
-        dev: &'a device::Device<device::Bound>,
-        bar: Bar0<'a>,
-        gsp_falcon: &'a Falcon<Gsp>,
-        sec2_falcon: &'a Falcon<Sec2>,
-        unload_bundle: Option<super::UnloadBundle>,
-    ) -> Self {
-        Self {
-            guard: ScopeGuard::new_with_data(
-                BootUnloadArgs {
-                    gsp,
-                    dev,
-                    bar,
-                    gsp_falcon,
-                    sec2_falcon,
-                    unload_bundle,
-                },
-                |args| {
-                    let _ = super::Gsp::unload(
-                        args.gsp,
-                        args.dev,
-                        args.bar,
-                        args.gsp_falcon,
-                        args.sec2_falcon,
-                        args.unload_bundle,
-                    );
-                },
-            ),
-        }
-    }
-
-    /// Disarms the guard and returns the [`UnloadBundle`](super::UnloadBundle) it contains.
-    pub(super) fn dismiss(self) -> Option<super::UnloadBundle> {
-        self.guard.dismiss().unload_bundle
-    }
-}
-
 impl super::Gsp {
     /// Attempt to boot the GSP.
     ///
@@ -118,10 +57,11 @@ pub(crate) fn boot(
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
-        let unload_guard = hal.boot(&self, &ctx, &fb_layout, &wpr_meta)?;
+        let (res, unload_bundle) = hal.boot(&self, &ctx, &fb_layout, &wpr_meta);
+
         // Run from a closure so we can retrieve the result, and run the unload sequence of the GSP
         // in case of error.
-        let res = (|| {
+        let res = res.and_then(|()| {
             gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
             // Poll for RISC-V to become active before continuing.
@@ -143,11 +83,18 @@ pub(crate) fn boot(
 
             // Wait until GSP is fully initialized.
             commands::wait_gsp_init_done(&self.cmdq)
-        })();
+        });
 
         match res {
-            Err(e) => Err(e),
-            Ok(()) => Ok(unload_guard.dismiss()),
+            Err(e) => {
+                dev_err!(dev, "GSP boot failed with error {:?}\n", e);
+
+                // Ignore errors during unload; we will return the error that happened during boot.
+                let _ = self.unload(dev, bar, gsp_falcon, ctx.sec2_falcon, unload_bundle);
+
+                Err(e)
+            }
+            Ok(()) => Ok(unload_bundle),
         }
     }
 
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 51a277fe97bb..a76be4e43272 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -24,7 +24,6 @@
         Chipset, //
     },
     gsp::{
-        boot::BootUnloadGuard,
         Gsp,
         GspBootContext,
         GspFwWprMeta, //
@@ -51,15 +50,22 @@ fn run(
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
     ///
-    /// Upon success, returns a guard that runs the GSP unload sequence if GSP boot does not
-    /// complete.
-    fn boot<'a>(
+    /// Returns two things:
+    ///
+    /// - The `Result` of the boot process itself,
+    /// - The `UnloadBundle` to use with [`Gsp::unload`], or `None` if the bundle could not be
+    ///   created.
+    ///
+    /// Note that the two returned values are independent: it is possible for the boot process to
+    /// succeed while the unload bundle couldn't be created. In this case, the GSP won't be able to
+    /// unload properly and a full GPU reset is required before the GSP can be booted again.
+    fn boot(
         &self,
-        gsp: &'a Gsp,
-        ctx: &GspBootContext<'a>,
+        gsp: &Gsp,
+        ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> Result<BootUnloadGuard<'a>>;
+    ) -> (Result, Option<crate::gsp::UnloadBundle>);
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 46e03f34bc74..3ed2433feabd 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -27,7 +27,6 @@
         Fsp, //
     },
     gsp::{
-        boot::BootUnloadGuard,
         hal::{
             GspHal,
             UnloadBundle, //
@@ -149,29 +148,26 @@ impl GspHal for Gh100 {
     ///
     /// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
     /// the GSP boot internally - no manual GSP reset/boot is needed.
-    fn boot<'a>(
+    fn boot(
         &self,
-        gsp: &'a Gsp,
-        ctx: &GspBootContext<'a>,
+        gsp: &Gsp,
+        ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> Result<BootUnloadGuard<'a>> {
+    ) -> (Result, Option<crate::gsp::UnloadBundle>) {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
-        let sec2_falcon = ctx.sec2_falcon;
+
+        let mut unload_bundle = None;
 
         let res = (|| {
             let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
-            let unload_bundle = crate::gsp::UnloadBundle(
-                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
-            );
-
-            // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-            let unload_guard =
-                BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
+            unload_bundle = Some(crate::gsp::UnloadBundle(
+                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>,
+            ));
 
             let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
 
@@ -185,12 +181,10 @@ fn boot<'a>(
 
             fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
-            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
-
-            Ok(unload_guard)
+            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())
         })();
 
-        res
+        (res, unload_bundle)
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 9b24361f924b..fca8da281e16 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,7 +32,6 @@
     },
     gpu::Chipset,
     gsp::{
-        boot::BootUnloadGuard,
         hal::{
             GspHal,
             UnloadBundle, //
@@ -264,19 +263,21 @@ fn run_fwsec_frts(
 struct Tu102;
 
 impl GspHal for Tu102 {
-    fn boot<'a>(
+    fn boot(
         &self,
-        gsp: &'a Gsp,
-        ctx: &GspBootContext<'a>,
+        gsp: &Gsp,
+        ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> Result<BootUnloadGuard<'a>> {
+    ) -> (Result, Option<crate::gsp::UnloadBundle>) {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
+        let mut unload_bundle = None;
+
         let res = (|| {
             let bios = Vbios::new(dev, bar)?;
 
@@ -284,7 +285,7 @@ fn boot<'a>(
             //
             // If the unload bundle creation fails, the GPU will need to be reset before the driver
             // can be probed again.
-            let unload_bundle =
+            unload_bundle =
                 Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
                     .inspect_err(|e| {
                         dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
@@ -297,10 +298,6 @@ fn boot<'a>(
                     .ok()
                     .map(crate::gsp::UnloadBundle);
 
-            // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-            let unload_guard =
-                BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
-
             // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
             if !fb_layout.frts.is_empty() {
                 run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
@@ -328,12 +325,10 @@ fn boot<'a>(
                 sec2_falcon,
                 bar,
             )?
-            .run(dev, bar, sec2_falcon, wpr_meta)?;
-
-            Ok(unload_guard)
+            .run(dev, bar, sec2_falcon, wpr_meta)
         })();
 
-        res
+        (res, unload_bundle)
     }
 
     fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot
  2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
                   ` (3 preceding siblings ...)
  2026-06-19 13:42 ` [PATCH 4/6] gpu: nova-core: gsp: replace BootUnloadGuard with local handler Alexandre Courbot
@ 2026-06-19 13:42 ` Alexandre Courbot
  2026-06-19 13:42 ` [PATCH 6/6] gpu: nova-core: gsp: make unload take GspBootContext Alexandre Courbot
  5 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

The warning emitted when the unload bundle cannot be constructed is valid
regardless of the boot method, but it was local to `Tu102`. Move it to
`Gsp::boot` so it applies to all boot methods.

This requires the HAL's `boot` method to return the `Result` for the
unload bundle's creation, so `Gsp::boot` can display the warning.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs      | 12 ++++++++++++
 drivers/gpu/nova-core/gsp/hal.rs       |  4 ++--
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  6 +++---
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 13 ++-----------
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 3574f1a87344..3f11eaec26c7 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -59,6 +59,18 @@ pub(crate) fn boot(
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
         let (res, unload_bundle) = hal.boot(&self, &ctx, &fb_layout, &wpr_meta);
 
+        // Display error for unload bundle if any, and convert to `Option`.
+        let unload_bundle = unload_bundle
+            .inspect_err(|e| {
+                dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+                dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+                dev_warn!(
+                    dev,
+                    "The GPU will need to be reset before the driver can bind again.\n"
+                );
+            })
+            .ok();
+
         // Run from a closure so we can retrieve the result, and run the unload sequence of the GSP
         // in case of error.
         let res = res.and_then(|()| {
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index a76be4e43272..00e39cc1fbde 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -53,7 +53,7 @@ pub(super) trait GspHal: Send {
     /// Returns two things:
     ///
     /// - The `Result` of the boot process itself,
-    /// - The `UnloadBundle` to use with [`Gsp::unload`], or `None` if the bundle could not be
+    /// - The `UnloadBundle` to use with [`Gsp::unload`], or `Err` if the bundle could not be
     ///   created.
     ///
     /// Note that the two returned values are independent: it is possible for the boot process to
@@ -65,7 +65,7 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Option<crate::gsp::UnloadBundle>);
+    ) -> (Result, Result<crate::gsp::UnloadBundle>);
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 3ed2433feabd..58cef3859d49 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -154,18 +154,18 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Option<crate::gsp::UnloadBundle>) {
+    ) -> (Result, Result<crate::gsp::UnloadBundle>) {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
 
-        let mut unload_bundle = None;
+        let mut unload_bundle = Err(ENODATA);
 
         let res = (|| {
             let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
-            unload_bundle = Some(crate::gsp::UnloadBundle(
+            unload_bundle = Ok(crate::gsp::UnloadBundle(
                 KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>,
             ));
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index fca8da281e16..4cc140d08370 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -269,14 +269,14 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Option<crate::gsp::UnloadBundle>) {
+    ) -> (Result, Result<crate::gsp::UnloadBundle>) {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
-        let mut unload_bundle = None;
+        let mut unload_bundle = Err(ENODATA);
 
         let res = (|| {
             let bios = Vbios::new(dev, bar)?;
@@ -287,15 +287,6 @@ fn boot(
             // can be probed again.
             unload_bundle =
                 Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
-                    .inspect_err(|e| {
-                        dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
-                        dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
-                        dev_warn!(
-                            dev,
-                            "The GPU will need to be reset before the driver can bind again.\n"
-                        );
-                    })
-                    .ok()
                     .map(crate::gsp::UnloadBundle);
 
             // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] gpu: nova-core: gsp: make unload take GspBootContext
  2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
                   ` (4 preceding siblings ...)
  2026-06-19 13:42 ` [PATCH 5/6] gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot Alexandre Courbot
@ 2026-06-19 13:42 ` Alexandre Courbot
  5 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2026-06-19 13:42 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	Zhi Wang, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

`GspBootContext` contains the resources required to boot the GSP. As it
turns out, this is also the context required for unloading it.

Reflect that fact by replacing the arguments of `Gsp::unload` with the
`GspBootContext`. This symmetry between `Gsp::boot` and `Gsp::unload`
will also be convenient when we want to make these methods generic
over the boot context corresponding to the boot method used.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs           | 19 ++++++++++++++++---
 drivers/gpu/nova-core/gsp/boot.rs      | 17 +++++++----------
 drivers/gpu/nova-core/gsp/hal.rs       | 15 +--------------
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 13 +++----------
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 20 +++++++++-----------
 5 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 42dabcc1b3e4..24976484c87a 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -269,7 +269,9 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 #[pin_data(PinnedDrop)]
 struct GspResources<'gpu> {
     /// Device owning the GPU.
-    device: &'gpu device::Device<device::Bound>,
+    device: &'gpu pci::Device<device::Bound>,
+    /// Details about the chipset.
+    spec: Spec,
     /// MMIO mapping of PCI BAR 0.
     bar: Bar0<'gpu>,
     /// GSP falcon instance, used for GSP boot up and cleanup.
@@ -312,7 +314,16 @@ fn drop(self: Pin<&mut Self>) {
             .gsp
             .as_ref()
             .get_ref()
-            .unload(device, bar, &*this.gsp_falcon, &*this.sec2_falcon, bundle)
+            .unload(
+                GspBootContext {
+                    pdev: device,
+                    bar,
+                    chipset: this.spec.chipset,
+                    gsp_falcon: &*this.gsp_falcon,
+                    sec2_falcon: &*this.sec2_falcon,
+                },
+                bundle,
+            )
             .inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
     }
 }
@@ -344,7 +355,9 @@ pub(crate) fn new(
             sysmem_flush: SysmemFlush::register(pdev.as_ref(), bar, spec.chipset)?,
 
             gsp_resources <- try_pin_init!(GspResources {
-                device: pdev.as_ref(),
+                device: pdev,
+
+                spec: *spec,
 
                 bar,
 
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 3f11eaec26c7..336ad23c96f9 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -3,7 +3,6 @@
 
 use kernel::{
     bits,
-    device,
     dma::Coherent,
     io::poll::read_poll_timeout,
     prelude::*,
@@ -14,7 +13,6 @@
     driver::Bar0,
     falcon::{
         gsp::Gsp,
-        sec2::Sec2,
         Falcon, //
     },
     fb::FbLayout,
@@ -102,7 +100,7 @@ pub(crate) fn boot(
                 dev_err!(dev, "GSP boot failed with error {:?}\n", e);
 
                 // Ignore errors during unload; we will return the error that happened during boot.
-                let _ = self.unload(dev, bar, gsp_falcon, ctx.sec2_falcon, unload_bundle);
+                let _ = self.unload(ctx, unload_bundle);
 
                 Err(e)
             }
@@ -136,17 +134,16 @@ fn shutdown_gsp(
     /// This stops all activity on the GSP.
     pub(crate) fn unload(
         &self,
-        dev: &device::Device<device::Bound>,
-        bar: Bar0<'_>,
-        gsp_falcon: &Falcon<Gsp>,
-        sec2_falcon: &Falcon<Sec2>,
+        ctx: super::GspBootContext<'_>,
         unload_bundle: Option<super::UnloadBundle>,
     ) -> Result {
+        let dev = ctx.dev();
+
         // Shut down the GSP. Keep going even in case of error.
         let mut res = Self::shutdown_gsp(
             &self.cmdq,
-            bar,
-            gsp_falcon,
+            ctx.bar,
+            ctx.gsp_falcon,
             commands::PowerStateLevel::Level0,
         )
         .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));
@@ -156,7 +153,7 @@ pub(crate) fn unload(
             res = res.and(
                 unload_bundle
                     .0
-                    .run(dev, bar, gsp_falcon, sec2_falcon)
+                    .run(&ctx)
                     .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
             );
         } else {
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 00e39cc1fbde..113d445239b9 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -5,18 +5,11 @@
 mod tu102;
 
 use kernel::{
-    device,
     dma::Coherent,
     prelude::*, //
 };
 
 use crate::{
-    driver::Bar0,
-    falcon::{
-        gsp::Gsp as GspEngine,
-        sec2::Sec2,
-        Falcon, //
-    },
     fb::FbLayout,
     firmware::gsp::GspFirmware,
     gpu::{
@@ -37,13 +30,7 @@
 /// required for unloading is prepared at load time, and stored here until it needs to be run.
 pub(super) trait UnloadBundle: Send {
     /// Performs the steps required to properly reset the GSP after it has been stopped.
-    fn run(
-        &self,
-        dev: &device::Device<device::Bound>,
-        bar: Bar0<'_>,
-        gsp_falcon: &Falcon<GspEngine>,
-        sec2_falcon: &Falcon<Sec2>,
-    ) -> Result;
+    fn run(&self, ctx: &GspBootContext<'_>) -> Result;
 }
 
 /// Trait implemented by GSP HALs.
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 58cef3859d49..3ef87fd668de 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -14,7 +14,6 @@
     driver::Bar0,
     falcon::{
         gsp::Gsp as GspEngine,
-        sec2::Sec2,
         Falcon, //
     },
     fb::FbLayout,
@@ -122,22 +121,16 @@ fn wait_for_gsp_lockdown_release(
 struct FspUnloadBundle;
 
 impl UnloadBundle for FspUnloadBundle {
-    fn run(
-        &self,
-        dev: &device::Device<device::Bound>,
-        bar: Bar0<'_>,
-        gsp_falcon: &Falcon<GspEngine>,
-        _sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    fn run(&self, ctx: &GspBootContext<'_>) -> Result {
         // GSP falcon does most of the work of resetting, so just wait for it to finish.
         read_poll_timeout(
-            || Ok(gsp_falcon.is_riscv_active(bar)),
+            || Ok(ctx.gsp_falcon.is_riscv_active(ctx.bar)),
             |&active| !active,
             Delta::from_millis(10),
             Delta::from_secs(5),
         )
         .map(|_| ())
-        .inspect_err(|_| dev_err!(dev, "GSP falcon failed to halt\n"))
+        .inspect_err(|_| dev_err!(ctx.dev(), "GSP falcon failed to halt\n"))
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 4cc140d08370..93ff8a154100 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -123,18 +123,15 @@ fn build(
 }
 
 impl UnloadBundle for Sec2UnloadBundle {
-    fn run(
-        &self,
-        dev: &device::Device<device::Bound>,
-        bar: Bar0<'_>,
-        gsp_falcon: &Falcon<GspEngine>,
-        sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    fn run(&self, ctx: &GspBootContext<'_>) -> Result {
+        let dev = ctx.dev();
+        let bar = ctx.bar;
+
         // Run FWSEC-SB to reset the GSP falcon to its pre-libos state.
         // Log errors but keep going if it fails.
         let fwsec_sb_res = self
             .fwsec_sb
-            .run(dev, bar, gsp_falcon)
+            .run(dev, bar, ctx.gsp_falcon)
             .inspect_err(|e| dev_err!(dev, "FWSEC-SB failed to run: {:?}\n", e));
 
         // Remove WPR2 region if set.
@@ -144,13 +141,14 @@ fn run(
                 return Ok(());
             }
 
-            sec2_falcon.reset(bar)?;
-            sec2_falcon.load(dev, bar, &self.booter_unloader)?;
+            ctx.sec2_falcon.reset(bar)?;
+            ctx.sec2_falcon.load(dev, bar, &self.booter_unloader)?;
 
             // Sentinel value to confirm that Booter Unloader has run.
             const MAILBOX_SENTINEL: u32 = 0xff;
             let (mbox0, _) =
-                sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
+                ctx.sec2_falcon
+                    .boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
             if mbox0 != 0 {
                 dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
                 return Err(EINVAL);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-06-24  6:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
2026-06-22  7:00   ` Eliot Courtney
2026-06-19 13:42 ` [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer Alexandre Courbot
2026-06-22  7:00   ` Eliot Courtney
2026-06-19 13:42 ` [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure Alexandre Courbot
2026-06-22  7:57   ` Eliot Courtney
2026-06-22 15:04     ` Alexandre Courbot
2026-06-23  5:40       ` Eliot Courtney
2026-06-24  4:30         ` Alexandre Courbot
2026-06-24  6:36           ` Eliot Courtney
2026-06-19 13:42 ` [PATCH 4/6] gpu: nova-core: gsp: replace BootUnloadGuard with local handler Alexandre Courbot
2026-06-19 13:42 ` [PATCH 5/6] gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot Alexandre Courbot
2026-06-19 13:42 ` [PATCH 6/6] gpu: nova-core: gsp: make unload take GspBootContext Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox