NVIDIA GPU driver infrastructure
 help / color / mirror / Atom feed
From: Alexandre Courbot <acourbot@nvidia.com>
To: Danilo Krummrich <dakr@kernel.org>,
	Alice Ryhl <aliceryhl@google.com>,
	 David Airlie <airlied@gmail.com>,
	Simona Vetter <simona@ffwll.ch>,  Gary Guo <gary@garyguo.net>
Cc: John Hubbard <jhubbard@nvidia.com>,
	 Alistair Popple <apopple@nvidia.com>,
	Timur Tabi <ttabi@nvidia.com>,
	 Eliot Courtney <ecourtney@nvidia.com>,
	Zhi Wang <zhiw@nvidia.com>,
	 nova-gpu@lists.linux.dev, dri-devel@lists.freedesktop.org,
	 linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	 Alexandre Courbot <acourbot@nvidia.com>
Subject: [PATCH 4/6] gpu: nova-core: gsp: replace BootUnloadGuard with local handler
Date: Fri, 19 Jun 2026 22:42:19 +0900	[thread overview]
Message-ID: <20260619-nova-bootcontext-v1-4-45193cd0a2e5@nvidia.com> (raw)
In-Reply-To: <20260619-nova-bootcontext-v1-0-45193cd0a2e5@nvidia.com>

When adding the GSP unload capability, we introduced new types to
support what is effectively an ad-hoc behavior: that `Gsp::unload` must
be run if any error occurs during the boot sequence.

Furthermore, `BootUnloadGuard` is problematic because it holds
additional references to the boot context, notably the `Falcon`s. These
extra references stand in the way of making some of the `Falcon`'s
methods mutable, since those methods would require exclusive access. As
this behavior is only needed in one place, introducing dedicated types
for it is distracting and unnecessary.

Thus, take advantage of the local closures introduced in the preceding
patch to run the unload sequence in `Gsp::boot` if an error occurred at
any step of the boot process.

This slightly broadens the failure cleanup path, as `Gsp::boot` now
attempts a best-effort `Gsp::unload` even when the failure happened
before the unload bundle was prepared.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs      | 81 ++++++----------------------------
 drivers/gpu/nova-core/gsp/hal.rs       | 20 ++++++---
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 28 +++++-------
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 23 ++++------
 4 files changed, 47 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 9eccfd634b61..3574f1a87344 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -7,8 +7,7 @@
     dma::Coherent,
     io::poll::read_poll_timeout,
     prelude::*,
-    time::Delta,
-    types::ScopeGuard, //
+    time::Delta, //
 };
 
 use crate::{
@@ -30,66 +29,6 @@
     },
 };
 
-/// Arguments required to call [`Gsp::unload`](super::Gsp::unload).
-///
-/// Stored as their own type to avoid repeating a long and tedious list in [`BootUnloadGuard`].
-pub(super) struct BootUnloadArgs<'a> {
-    gsp: &'a super::Gsp,
-    dev: &'a device::Device<device::Bound>,
-    bar: Bar0<'a>,
-    gsp_falcon: &'a Falcon<Gsp>,
-    sec2_falcon: &'a Falcon<Sec2>,
-    unload_bundle: Option<super::UnloadBundle>,
-}
-
-/// Guard that calls [`Gsp::unload`](super::Gsp::unload) with a
-/// [`UnloadBundle`](super::UnloadBundle) when dropped.
-///
-/// Used to ensure the `UnloadBundle` is run during failure paths.
-pub(super) struct BootUnloadGuard<'a> {
-    guard: ScopeGuard<BootUnloadArgs<'a>, fn(BootUnloadArgs<'a>)>,
-}
-
-impl<'a> BootUnloadGuard<'a> {
-    /// Wraps `unload_bundle` into a guard that executes it when dropped.
-    pub(super) fn new(
-        gsp: &'a super::Gsp,
-        dev: &'a device::Device<device::Bound>,
-        bar: Bar0<'a>,
-        gsp_falcon: &'a Falcon<Gsp>,
-        sec2_falcon: &'a Falcon<Sec2>,
-        unload_bundle: Option<super::UnloadBundle>,
-    ) -> Self {
-        Self {
-            guard: ScopeGuard::new_with_data(
-                BootUnloadArgs {
-                    gsp,
-                    dev,
-                    bar,
-                    gsp_falcon,
-                    sec2_falcon,
-                    unload_bundle,
-                },
-                |args| {
-                    let _ = super::Gsp::unload(
-                        args.gsp,
-                        args.dev,
-                        args.bar,
-                        args.gsp_falcon,
-                        args.sec2_falcon,
-                        args.unload_bundle,
-                    );
-                },
-            ),
-        }
-    }
-
-    /// Disarms the guard and returns the [`UnloadBundle`](super::UnloadBundle) it contains.
-    pub(super) fn dismiss(self) -> Option<super::UnloadBundle> {
-        self.guard.dismiss().unload_bundle
-    }
-}
-
 impl super::Gsp {
     /// Attempt to boot the GSP.
     ///
@@ -118,10 +57,11 @@ pub(crate) fn boot(
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
-        let unload_guard = hal.boot(&self, &ctx, &fb_layout, &wpr_meta)?;
+        let (res, unload_bundle) = hal.boot(&self, &ctx, &fb_layout, &wpr_meta);
+
         // Run from a closure so we can retrieve the result, and run the unload sequence of the GSP
         // in case of error.
-        let res = (|| {
+        let res = res.and_then(|()| {
             gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
             // Poll for RISC-V to become active before continuing.
@@ -143,11 +83,18 @@ pub(crate) fn boot(
 
             // Wait until GSP is fully initialized.
             commands::wait_gsp_init_done(&self.cmdq)
-        })();
+        });
 
         match res {
-            Err(e) => Err(e),
-            Ok(()) => Ok(unload_guard.dismiss()),
+            Err(e) => {
+                dev_err!(dev, "GSP boot failed with error {:?}\n", e);
+
+                // Ignore errors during unload; we will return the error that happened during boot.
+                let _ = self.unload(dev, bar, gsp_falcon, ctx.sec2_falcon, unload_bundle);
+
+                Err(e)
+            }
+            Ok(()) => Ok(unload_bundle),
         }
     }
 
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 51a277fe97bb..a76be4e43272 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -24,7 +24,6 @@
         Chipset, //
     },
     gsp::{
-        boot::BootUnloadGuard,
         Gsp,
         GspBootContext,
         GspFwWprMeta, //
@@ -51,15 +50,22 @@ fn run(
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
     ///
-    /// Upon success, returns a guard that runs the GSP unload sequence if GSP boot does not
-    /// complete.
-    fn boot<'a>(
+    /// Returns two things:
+    ///
+    /// - The `Result` of the boot process itself,
+    /// - The `UnloadBundle` to use with [`Gsp::unload`], or `None` if the bundle could not be
+    ///   created.
+    ///
+    /// Note that the two returned values are independent: it is possible for the boot process to
+    /// succeed while the unload bundle couldn't be created. In this case, the GSP won't be able to
+    /// unload properly and a full GPU reset is required before the GSP can be booted again.
+    fn boot(
         &self,
-        gsp: &'a Gsp,
-        ctx: &GspBootContext<'a>,
+        gsp: &Gsp,
+        ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> Result<BootUnloadGuard<'a>>;
+    ) -> (Result, Option<crate::gsp::UnloadBundle>);
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 46e03f34bc74..3ed2433feabd 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -27,7 +27,6 @@
         Fsp, //
     },
     gsp::{
-        boot::BootUnloadGuard,
         hal::{
             GspHal,
             UnloadBundle, //
@@ -149,29 +148,26 @@ impl GspHal for Gh100 {
     ///
     /// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
     /// the GSP boot internally - no manual GSP reset/boot is needed.
-    fn boot<'a>(
+    fn boot(
         &self,
-        gsp: &'a Gsp,
-        ctx: &GspBootContext<'a>,
+        gsp: &Gsp,
+        ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> Result<BootUnloadGuard<'a>> {
+    ) -> (Result, Option<crate::gsp::UnloadBundle>) {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
-        let sec2_falcon = ctx.sec2_falcon;
+
+        let mut unload_bundle = None;
 
         let res = (|| {
             let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
-            let unload_bundle = crate::gsp::UnloadBundle(
-                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
-            );
-
-            // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-            let unload_guard =
-                BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
+            unload_bundle = Some(crate::gsp::UnloadBundle(
+                KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>,
+            ));
 
             let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
 
@@ -185,12 +181,10 @@ fn boot<'a>(
 
             fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
-            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
-
-            Ok(unload_guard)
+            wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())
         })();
 
-        res
+        (res, unload_bundle)
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 9b24361f924b..fca8da281e16 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,7 +32,6 @@
     },
     gpu::Chipset,
     gsp::{
-        boot::BootUnloadGuard,
         hal::{
             GspHal,
             UnloadBundle, //
@@ -264,19 +263,21 @@ fn run_fwsec_frts(
 struct Tu102;
 
 impl GspHal for Tu102 {
-    fn boot<'a>(
+    fn boot(
         &self,
-        gsp: &'a Gsp,
-        ctx: &GspBootContext<'a>,
+        gsp: &Gsp,
+        ctx: &GspBootContext<'_>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-    ) -> Result<BootUnloadGuard<'a>> {
+    ) -> (Result, Option<crate::gsp::UnloadBundle>) {
         let dev = ctx.dev();
         let bar = ctx.bar;
         let chipset = ctx.chipset;
         let gsp_falcon = ctx.gsp_falcon;
         let sec2_falcon = ctx.sec2_falcon;
 
+        let mut unload_bundle = None;
+
         let res = (|| {
             let bios = Vbios::new(dev, bar)?;
 
@@ -284,7 +285,7 @@ fn boot<'a>(
             //
             // If the unload bundle creation fails, the GPU will need to be reset before the driver
             // can be probed again.
-            let unload_bundle =
+            unload_bundle =
                 Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
                     .inspect_err(|e| {
                         dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
@@ -297,10 +298,6 @@ fn boot<'a>(
                     .ok()
                     .map(crate::gsp::UnloadBundle);
 
-            // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-            let unload_guard =
-                BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
-
             // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
             if !fb_layout.frts.is_empty() {
                 run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
@@ -328,12 +325,10 @@ fn boot<'a>(
                 sec2_falcon,
                 bar,
             )?
-            .run(dev, bar, sec2_falcon, wpr_meta)?;
-
-            Ok(unload_guard)
+            .run(dev, bar, sec2_falcon, wpr_meta)
         })();
 
-        res
+        (res, unload_bundle)
     }
 
     fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {

-- 
2.54.0


  parent reply	other threads:[~2026-06-19 13:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-19 13:42 [PATCH 0/6] gpu: nova-core: consolidate and streamline GSP boot process Alexandre Courbot
2026-06-19 13:42 ` [PATCH 1/6] gpu: nova-core: gsp: sequencer: use GspBootContext Alexandre Courbot
2026-06-19 13:42 ` [PATCH 2/6] gpu: nova-core: gsp: sequencer: do not store sequence into GspSequencer Alexandre Courbot
2026-06-19 13:42 ` [PATCH 3/6] gpu: nova-core: gsp: move boot code into local closure Alexandre Courbot
2026-06-19 13:42 ` Alexandre Courbot [this message]
2026-06-19 13:42 ` [PATCH 5/6] gpu: nova-core: gsp: move unload bundle error handling to Gsp::boot Alexandre Courbot
2026-06-19 13:42 ` [PATCH 6/6] gpu: nova-core: gsp: make unload take GspBootContext Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260619-nova-bootcontext-v1-4-45193cd0a2e5@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ecourtney@nvidia.com \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nova-gpu@lists.linux.dev \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=ttabi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox