NVIDIA GPU driver infrastructure
 help / color / mirror / Atom feed
From: "Eliot Courtney" <ecourtney@nvidia.com>
To: "Alexandre Courbot" <acourbot@nvidia.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>, "Gary Guo" <gary@garyguo.net>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Timur Tabi" <ttabi@nvidia.com>,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	"Zhi Wang" <zhiw@nvidia.com>, <nova-gpu@lists.linux.dev>,
	<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
	<rust-for-linux@vger.kernel.org>,
	"dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Subject: Re: [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure
Date: Mon, 22 Jun 2026 16:57:04 +0900	[thread overview]
Message-ID: <DJFF1W6VGY4Q.2PV5MEPMFXIDB@nvidia.com> (raw)
In-Reply-To: <20260619-nova-bootcontext-v1-3-45193cd0a2e5@nvidia.com>

On Fri Jun 19, 2026 at 10:42 PM JST, Alexandre Courbot wrote:
> The next patch aims at replacing the cumbersome `BootUnloadGuard` with a
> more local and less intrusive mechanism to run the GSP unload sequence
> upon GSP boot failure. Doing so requires running the boot code in a
> local closure, which changes its indentation and would make other
> changes difficult to track in the diff. Thus, this preparatory patch
> moves said boot code into a local closure that is run upon construction,
> so the next patch does not need to re-indent code that changes.
>
> This is a mechanical preparatory patch to make the next patch easier to
> read. No functional change intended.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---

I agree with removing BootUnloadGuard, but I think it's not great to do
a bunch of lifting into closures then manually handling the result. It's
error prone imo (we already had several bugs relating to this kind of
thing). Instead, what about just using ScopeGuard directly? This lets us
avoid lifting into closures (which is a bit noisy) and avoids manual
result handling for failures (which is a bit error prone). With the
`GspBootContext` it's fairly easy to do now:

```
let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
    let _ = gsp.unload(ctx, unload_bundle);
});
```

I confirmed that it's also compatible with the v2 of this series that
has the mutable Fsp - you can stash the context inside the ScopeGuard
data (then making a &mut reference to the stashed context for brevity)
or have a separate unload context type that doesn't use FSP or something
(could later be type parametrized along with Gsp, for example).

For example here is a rough diff on top of this patch series (you can
change the Result<Option<UnloadBundle>> returns to like
Result<Result<UnloadBundle>> if you want to centralise teh error
handling of a failed unloadbundle although currently it can only fail in
one location):

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 7918ebb508f9..f6454b106293 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -314,7 +314,7 @@ fn drop(self: Pin<&mut Self>) {
             .as_ref()
             .get_ref()
             .unload(
-                GspBootContext {
+                &GspBootContext {
                     pdev: device,
                     bar,
                     chipset: this.spec.chipset,
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 336ad23c96f9..cfe10a8313c8 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -6,7 +6,8 @@
     dma::Coherent,
     io::poll::read_poll_timeout,
     prelude::*,
-    time::Delta, //
+    time::Delta,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -55,57 +56,35 @@ pub(crate) fn boot(
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
-        let (res, unload_bundle) = hal.boot(&self, &ctx, &fb_layout, &wpr_meta);
+        let unload_bundle = hal.boot(&self, &ctx, &fb_layout, &wpr_meta)?;
 
-        // Display error for unload bundle if any, and convert to `Option`.
-        let unload_bundle = unload_bundle
-            .inspect_err(|e| {
-                dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
-                dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
-                dev_warn!(
-                    dev,
-                    "The GPU will need to be reset before the driver can bind again.\n"
-                );
-            })
-            .ok();
-
-        // Run from a closure so we can retrieve the result, and run the unload sequence of the GSP
-        // in case of error.
-        let res = res.and_then(|()| {
-            gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
-
-            // Poll for RISC-V to become active before continuing.
-            read_poll_timeout(
-                || Ok(gsp_falcon.is_riscv_active(bar)),
-                |val: &bool| *val,
-                Delta::from_millis(10),
-                Delta::from_secs(5),
-            )?;
-
-            dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
-
-            self.cmdq
-                .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
-            self.cmdq
-                .send_command_no_wait(bar, commands::SetRegistry::new())?;
-
-            hal.post_boot(&self, &ctx, &gsp_fw)?;
-
-            // Wait until GSP is fully initialized.
-            commands::wait_gsp_init_done(&self.cmdq)
+        let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
+            let _ = self.unload(&ctx, unload_bundle);
         });
 
-        match res {
-            Err(e) => {
-                dev_err!(dev, "GSP boot failed with error {:?}\n", e);
+        gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
-                // Ignore errors during unload; we will return the error that happened during boot.
-                let _ = self.unload(ctx, unload_bundle);
+        // Poll for RISC-V to become active before continuing.
+        read_poll_timeout(
+            || Ok(gsp_falcon.is_riscv_active(bar)),
+            |val: &bool| *val,
+            Delta::from_millis(10),
+            Delta::from_secs(5),
+        )?;
 
-                Err(e)
-            }
-            Ok(()) => Ok(unload_bundle),
-        }
+        dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
+
+        self.cmdq
+            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
+        self.cmdq
+            .send_command_no_wait(bar, commands::SetRegistry::new())?;
+
+        hal.post_boot(&self, &ctx, &gsp_fw)?;
+
+        // Wait until GSP is fully initialized.
+        commands::wait_gsp_init_done(&self.cmdq)?;
+
+        Ok(unload_guard.dismiss())
     }
 
     /// Shut down the GSP and wait until it is offline.
@@ -134,7 +113,7 @@ fn shutdown_gsp(
     /// This stops all activity on the GSP.
     pub(crate) fn unload(
         &self,
-        ctx: super::GspBootContext<'_>,
+        ctx: &super::GspBootContext<'_>,
         unload_bundle: Option<super::UnloadBundle>,
     ) -> Result {
         let dev = ctx.dev();
@@ -153,7 +132,7 @@ pub(crate) fn unload(
             res = res.and(
                 unload_bundle
                     .0
-                    .run(&ctx)
+                    .run(ctx)
                     .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
             );
         } else {
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 113d445239b9..849ca224085b 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -37,22 +37,15 @@ pub(super) trait UnloadBundle: Send {
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
     ///
-    /// Returns two things:
-    ///
-    /// - The `Result` of the boot process itself,
-    /// - The `UnloadBundle` to use with [`Gsp::unload`], or `Err` if the bundle could not be
-    ///   created.
-    ///
-    /// Note that the two returned values are independent: it is possible for the boot process to
-    /// succeed while the unload bundle couldn't be created. In this case, the GSP won't be able to
-    /// unload properly and a full GPU reset is required before the GSP can be booted again.
+    /// Upon success, returns the [`crate::gsp::UnloadBundle`] to use with [`Gsp::unload`], if one
+    /// could be created.
     fn boot(
         &self,
         gsp: &Gsp,
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Result<crate::gsp::UnloadBundle>);
+    ) -> Result<Option<crate::gsp::UnloadBundle>>;
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index c6ff2fb216ea..04c27afc650a 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -7,7 +7,8 @@
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
-    time::Delta, //
+    time::Delta,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -143,35 +144,35 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Result<crate::gsp::UnloadBundle>) {
+    ) -> Result<Option<crate::gsp::UnloadBundle>> {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
 
-        let mut unload_bundle = Err(ENODATA);
+        let unload_bundle = crate::gsp::UnloadBundle(
+            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
+        );
 
-        let res = (|| {
-            unload_bundle = Ok(crate::gsp::UnloadBundle(
-                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>,
-            ));
+        let unload_guard = ScopeGuard::new_with_data(Some(unload_bundle), |unload_bundle| {
+            let _ = gsp.unload(ctx, unload_bundle);
+        });
 
-            let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
+        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
 
-            let args = FmcBootArgs::new(
-                dev,
-                chipset,
-                wpr_meta.dma_handle(),
-                gsp.libos.dma_handle(),
-                false,
-            )?;
+        let args = FmcBootArgs::new(
+            dev,
+            chipset,
+            wpr_meta.dma_handle(),
+            gsp.libos.dma_handle(),
+            false,
+        )?;
 
-            fsp.boot_fmc(dev, bar, fb_layout, &args)?;
+        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
-            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())
-        })();
+        wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
 
-        (res, unload_bundle)
+        Ok(unload_guard.dismiss())
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 93ff8a154100..fb1d233ac7db 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -6,7 +6,8 @@
 use kernel::{
     device,
     dma::Coherent,
-    io::Io, //
+    io::Io,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -267,57 +268,66 @@ fn boot(
         ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> (Result, Result<crate::gsp::UnloadBundle>) {
+    ) -> Result<Option<crate::gsp::UnloadBundle>> {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
-        let mut unload_bundle = Err(ENODATA);
+        let bios = Vbios::new(dev, bar)?;
 
-        let res = (|| {
-            let bios = Vbios::new(dev, bar)?;
+        // Try and prepare the unload bundle.
+        //
+        // If the unload bundle creation fails, the GPU will need to be reset before the driver can
+        // be probed again.
+        let unload_bundle =
+            Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+                .inspect_err(|e| {
+                    dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+                    dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+                    dev_warn!(
+                        dev,
+                        "The GPU will need to be reset before the driver can bind again.\n"
+                    );
+                })
+                .ok()
+                .map(crate::gsp::UnloadBundle);
 
-            // Try and prepare the unload bundle.
-            //
-            // If the unload bundle creation fails, the GPU will need to be reset before the driver
-            // can be probed again.
-            unload_bundle =
-                Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
-                    .map(crate::gsp::UnloadBundle);
+        let unload_guard = ScopeGuard::new_with_data(unload_bundle, |unload_bundle| {
+            let _ = gsp.unload(ctx, unload_bundle);
+        });
 
-            // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
-            if !fb_layout.frts.is_empty() {
-                run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
-            }
+        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
+        if !fb_layout.frts.is_empty() {
+            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+        }
 
-            gsp_falcon.reset(bar)?;
-            let libos_handle = gsp.libos.dma_handle();
-            let (mbox0, mbox1) = gsp_falcon.boot(
-                bar,
-                Some(libos_handle as u32),
-                Some((libos_handle >> 32) as u32),
-            )?;
-            dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+        gsp_falcon.reset(bar)?;
+        let libos_handle = gsp.libos.dma_handle();
+        let (mbox0, mbox1) = gsp_falcon.boot(
+            bar,
+            Some(libos_handle as u32),
+            Some((libos_handle >> 32) as u32),
+        )?;
+        dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
 
-            dev_dbg!(
-                dev,
-                "Using SEC2 to load and run the booter_load firmware...\n"
-            );
+        dev_dbg!(
+            dev,
+            "Using SEC2 to load and run the booter_load firmware...\n"
+        );
 
-            BooterFirmware::new(
-                dev,
-                BooterKind::Loader,
-                chipset,
-                FIRMWARE_VERSION,
-                sec2_falcon,
-                bar,
-            )?
-            .run(dev, bar, sec2_falcon, wpr_meta)
-        })();
+        BooterFirmware::new(
+            dev,
+            BooterKind::Loader,
+            chipset,
+            FIRMWARE_VERSION,
+            sec2_falcon,
+            bar,
+        )?
+        .run(dev, bar, sec2_falcon, wpr_meta)?;
 
-        (res, unload_bundle)
+        Ok(unload_guard.dismiss())
     }
 
     fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {


  reply	other threads:[~2026-06-22  7:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
2026-06-22  7:00   ` Eliot Courtney
2026-06-19 13:42 ` [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer Alexandre Courbot
2026-06-22  7:00   ` Eliot Courtney
2026-06-19 13:42 ` [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure Alexandre Courbot
2026-06-22  7:57   ` Eliot Courtney [this message]
2026-06-19 13:42 ` [PATCH 4/6] gpu: nova-core: gsp: replace BootUnloadGuard with local handler Alexandre Courbot
2026-06-19 13:42 ` [PATCH 5/6] gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot Alexandre Courbot
2026-06-19 13:42 ` [PATCH 6/6] gpu: nova-core: gsp: make unload take GspBootContext Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DJFF1W6VGY4Q.2PV5MEPMFXIDB@nvidia.com \
    --to=ecourtney@nvidia.com \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel-bounces@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nova-gpu@lists.linux.dev \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=ttabi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox