[PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support

NVIDIA GPU driver infrastructure
 help / color / mirror / Atom feed

* [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support
@ 2026-06-02  3:20 John Hubbard
  2026-06-02  3:20 ` [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture John Hubbard
                   ` (23 more replies)
  0 siblings, 24 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Changes in v12:

* Rebased onto drm-rust-next, formally picking up the "unload" changes.
  These were already included (I previously neglected to mention that
  dependency), but now v12 applies directly to drm-rust-next.

* FSP is now an owned object (Fsp) rather than a namespace of static
  methods. It owns the FSP falcon and the FMC firmware and is produced
  by Fsp::wait_secure_boot(), which only returns once secure boot has
  completed, so the Chain of Trust methods cannot run before FSP is
  ready. The send/receive and boot methods take &mut self, making the
  Fsp the exclusive token for an in-flight exchange. The FSP falcon stub
  no longer creates the falcon, which is now created and owned by Fsp.
  (Alexandre Courbot, Eliot Courtney)

* Corrected the GB10x sysmem flush register base. The HSHUB0 registers
  were based at 0x00891000, which has no counterpart in Open RM. The
  correct base is NV_HSHUB0_PRIV_BASE (0x00870000), which is the same on
  every Blackwell chip. The HSHUB0 and FBHUB0 flush registers are now
  relative registers with names matching Open RM, and the per-window
  base lives in the GB10x and GB20x framebuffer HALs. (Alexandre Courbot)

* The FSP Chain of Trust version moved from a Chipset method returning
  Option<FspCotVersion> to an FSP HAL method returning u16, dropping the
  newtype. Hopper (version 1) and GB10x (version 2) share a boot-status
  HAL but differ here, so GB10x now has its own FSP HAL. (Alexandre
  Courbot)

* FMC signatures are initialized in place with KBox::init() instead of
  building the 816-byte FmcSignatures on the stack. elf_section keeps
  pub(super) visibility and the FSP size constants stay private.
  (Alexandre Courbot, Eliot Courtney)

* The Hopper framebuffer HAL is now introduced in the larger non-WPR
  heap patch, where it first differs from GA102, instead of a patch
  earlier. non_wpr_heap_size() is no longer a defaulted trait method.
  tu102 exposes non_wpr_heap_size_tu102(), which ga100 and ga102 call,
  so a future chipset cannot silently inherit the Turing value.
  (Alexandre Courbot, Eliot Courtney)

* The FMC firmware image is declared via MODULE_FIRMWARE for FSP-based
  chipsets, so it is bundled alongside the other GSP firmware. (Eliot
  Courtney)

* ELF class detection uses named constants (ELFMAG, SELFMAG, EI_CLASS,
  ELFCLASS32, ELFCLASS64) and drops a redundant length check. (Eliot
  Courtney)

* FSP now expands to "Foundation Security Processor" throughout, rather
  than "Firmware System Processor". (Timur Tabi)

* Collected Reviewed-by tags from Eliot Courtney.

Changes in v11:

* Made the FSP messaging path sound. The FSP falcon's EMEM window is a
  stateful register pair (program an offset, then touch the data
  register), so modeling it as a stateless I/O region let aliasing
  accesses corrupt each other's offset with no unsafe at the call site.
  The EMEM accessors and the send/receive helpers now take &mut self, so
  the falcon handle is the exclusive token for an in-flight exchange,
  and the unsafe Io/IoCapable impls and their unreachable! bounds checks
  are gone. The accessors now program the EMEM offset once and stream
  through the data register using the falcon's auto-increment, matching
  Open RM, instead of re-programming the offset for every word.

* Rebased onto a current drm-rust-next that already carries the v10
  preparatory patches, which are dropped from the series.

* Top of the series: the v10 boot-integration patch is replaced by "gsp:
  enable FSP boot path" (Alexandre Courbot) and "add non-sec2 unload
  path" (Eliot Courtney). The Hopper/Blackwell boot path now lives in
  the GSP HAL (gsp/hal/gh100.rs) and returns a BootUnloadGuard.

* Reordered per review: hardware-differences patches first (DMA mask,
  PCI config mirror, PMU-reserved framebuffer, non-WPR heap, WPR2 heap,
  sysmem flush registers), then the FSP/FMC stack, then GSP lockdown
  release polling.

* Hardware-difference patches are now HAL methods instead of inline
  Architecture matches: the PMU-reserved framebuffer size (patch
  retitled from "calculate reserved FB heap size" to "compute
  PMU-reserved framebuffer size"), the non-WPR heap size (now u32 with a
  1 MiB default instead of Option<u32>, per v10 review, with the GB10x
  value in the GB100 HAL and the larger GB20x value in the GB202 HAL),
  and the PCI config mirror range. The larger WPR2 heap pulls its base
  size from the generated bindings, drops the custom constants that have
  no Open RM counterpart, and matches all architectures exhaustively.

* FSP firmware handling moved into firmware/fsp.rs: FspFirmware now
  holds parsed signatures (KBox<FmcSignatures>) instead of a raw ELF
  copy, extracted through a get_section closure (per v10 review).

* FSP secure-boot polling uses a per-chipset FSP HAL
  (fsp/hal/{gh100,gb202}.rs) reading the correct NV_THERM_I2CS register,
  instead of a free function in regs.rs.

* FSP Chain of Trust boot was redone around a new FmcBootArgs type, and
  the response headers are strongly typed (MctpHeader/NvdmHeader instead
  of bare u32), with the vendor ID from kernel::pci::Vendor.

* GB10x/GB20x sysmem flush: the HSHUB0/FBHUB0 register details moved
  from module doccomments onto the write_sysmem_flush_page_* methods.

* Commit message cleanups: dropped stale claims, shortened an
  over-length subject, and fixed trailer ordering.

Changes in v10:

* Reordered per review (and direct assistance--thanks again) from
  Alexandre Courbot: the two refactoring patches (factor .fwsignature*
  selection, use GPU Architecture to simplify HALs) now come first,
  before GPU identification. The boot_via_fsp stub is introduced early
  and completed as FSP features arrive. The SEC2 refactoring, PCI config
  mirror, and reserved heap size patches are moved earlier in the
  series.

* Made pmuReservedSize conditional on Blackwell dGPU architectures.
  Open RM only sets this field for Blackwell (Turing/Ampere/Ada/Hopper
  all leave it zero). Added calc_pmu_reserved_size() helper and
  FbLayout.pmu_reserved_size field to route the value through the
  layout instead of using the constant unconditionally. Replaced
  `as u32` cast with usize_into_u32 for PMU_RESERVED_SIZE. (Alexandre)

* Split the GFW boot wait HAL change into two patches: one that moves
  the existing behavior into a GpuHal trait, and a second that adds the
  Hopper/Blackwell skip.

* Removed the Spec::chipset() accessor (no longer needed after
  restructuring). Updated the Copy/Clone commit message accordingly.

* Rebased onto drm-rust-next-staging, which includes
  const_align_up(), "move firmware image parsing code to firmware.rs",
  "factor out an elf_str() function", and "make WPR heap sizing
  fallible" from the v9 series. Series is now 28 patches (was 31).

* Depends on the "rust: sizes: SizeConstants trait" series[N], which
  adds typed SZ_* constants (u64::SZ_1M, u32::SZ_4K, etc.). The
  nova-core conversion patch ("use SizeConstants trait for u64 size
  constants") will be posted separately, but is already included in my
  git branch. The Blackwell patches that introduce new SZ_* usage
  (larger non-WPR heap, FSP Chain of Trust boot, larger WPR2 heap) use
  the trait form from the start.

* Fixed the PCI config mirror commit message: corrected hex offsets to
  match the code (older architectures use 0x088000, Hopper/Blackwell
  use 0x092000).

* Dropped the never-used nvdm_type_raw() method from the MCTP/NVDM
  introducing patch.

* Removed stale Co-developed-by tag from the FSP Chain of Trust boot
  commit per Alex's request. Rewrote the commit message to remove
  references to the no-longer-existent fmc_full field.

* Added missing #[expect(dead_code)] on GspFmcBootParams in the FSP
  secure boot commit, removed when the struct becomes used in the
  Chain of Trust boot commit.

Changes in v9:

* Rebased onto today's drm-rust-next.

* Split Architecture::Blackwell into BlackwellGB10x and BlackwellGB20x,
  after Gary Guo and Sashiko pointed out that GB10x and GB20x are
  distinct enough to warrant separate architecture variants. This
  surfaced several bugs where all Blackwell chips were incorrectly
  treated as a single group:
  * Fixed the FSP boot completion register address for GB10x. GB10x
    uses the same address as Hopper (0x000200bc), not the GB20x
    address (0x00ad00bc).
  * Made the FSP secure boot timeout architecture-dependent. GB20x
    now gets 5000ms while Hopper and GB10x keep 4000ms.
  * Removed chipset-level match arms that were working around the
    single-variant design in fb/hal.rs, firmware/gsp.rs, and regs.rs.

* Simplified find_gsp_sigs_section() to return &'static str instead of
  Option<&'static str>, since the Architecture enum is now exhaustive
  and every variant has a known signature section name.

* Moved dma_set_mask_and_coherent from probe() into Gpu::new(), with
  the unsafe block narrowed to just that call. Gpu::new() now takes
  pci::Device<device::Core> instead of device::Bound to support this.

* Dropped the local `chipset` variable in Gpu::new() and accessed
  spec.chipset() directly, since Spec is now Copy.

* Changed Spec::chipset() to take self instead of &self, since Spec is
  Copy.

* Removed the unnecessary Tu102/Gh100 consts in gpu/hal.rs and used the
  unit structs directly.

* Kept a hold on the Firmware object in FspFirmware instead of copying
  the FMC ELF into a KVec<u8>.

* Moved the dev_info formatting fix and the GFW_BOOT comment removal
  out of the Copy/Clone patch and into the patches that actually touch
  those lines.

* Added Reviewed-by tags from Gary Guo and Alice Ryhl.

Changes in v8:

* Added Clone/Copy derives to Spec and Revision. Removed the
  unnecessary pin_init_scope wrapping in Gpu::new() that the lack of
  Copy had forced. Added a Spec::chipset() accessor.

* Removed implementation-detail sentence from the
  Architecture::dma_mask() doccomment.

* Simplified the GPU HAL to two variants (Tu102, Gh100) instead of
  four. Renamed "Fsp" to "Gh100" to follow the HAL naming convention.
  Removed the spurious GA100 special case. Moved the GFW_BOOT wait into
  the HAL method itself instead of returning a bool.

* Increased the GFW_BOOT wait timeout from 4 seconds to 30 seconds,
  after Joel found that a different Blackwell SKU required extra time.

* Removed stray Cc lines from each patch.

* Fixed rustfmt issues in gsp/fw.rs and gsp/boot.rs reported by the
  kernel test robot against v7 patches 27 and 31.

Changes in v7:
* Rebased onto Alexandre Courbot's rust register!() series in
  drm-rust-next, including the related generic I/O accessor and
  IoCapable changes.

* Rebased onto drm-rust-next (v7.0-rc4 based).

* Dropped the v6 patches that are already in drm-rust-next: the
  aux-device fix, the pdev helper macro patch, and the one-item-per-line
  use cleanup.

* Reworked the GPU init pieces per review. DMA mask setup now stays in
  driver probe, with the mask width selected by GPU architecture, and
  the GFW boot policy now lives in a dedicated GPU HAL.

* Reworked firmware image parsing per review around a single ElfFormat
  trait with associated header types. Also added support for both ELF32
  and ELF64 images, with automatic format detection.

* Reworked the MCTP/NVDM protocol code to use bitfield! and typed
  accessors, removing the open-coded bit handling.

* Reworked the FSP messaging part of the series so that the message
  structures are introduced in the first patches that use them, instead
  of as a standalone dead-code-only patch. Also changed fmc_full to use
  KVec<u8> from the start.

* Split the WPR heap overflow handling out into a separate prep patch.
  That patch makes management_overhead() and wpr_heap_size() fallible,
  uses checked arithmetic, and leaves the larger WPR2 heap patch with
  only the Hopper and Blackwell sizing changes.

* Added a code comment documenting the Hopper and Blackwell PCI config
  mirror base change.

Changes in v6:

* Rebased onto drm-rust-next (v7.0-rc1 based).

* Dropped the first two patches from v5 (aux device fix and pdev
  macros), which have since been merged independently.

* const_align_up(): reworked per review from Gary Guo, Miguel Ojeda,
  and Danilo Krummrich: now returns Option<usize> instead of panicking,
  takes an Alignment argument instead of a const generic, and no longer
  needs the inline_const feature addition in scripts/Makefile.build.

* The rust/sizes and SZ_*_U64 patches from v5 are no longer included.
  I plan to post those as a separate series that depends on this one.

Changes in v5:

* Rebased onto linux.git master.

* Split MCTP protocol into its own module and file.

* Many Rust-based improvements: more use of types, especially. Also
  used Result and Option more.

* Lots of cleanup of comments and print output and error handling.

* Added const_align_up() to rust/ and used it in nova-core. This
  required enabling a Rust feature: inline_const, as recommended by
  Miguel Ojeda.

* Refactoring various things, such as Gpu::new() to own Spec creation,
  and several more such things.

* Fixed three Delta::ZERO busy-polls (patches 21, 24, 31) to use
  non-zero sleep intervals (after just realizing that it was a bad
  choice to have zero in there).

* Reduced GH100/GB100 HAL duplication. Made FSP_PKEY_SIZE/FSP_SIG_SIZE
  consistent across patches. Replaced fragile architecture checks with
  chipset.arch(). Renamed LIBOS_BLACKWELL.

* Narrowed the scope of some of the #![expect(dead_code)] cases,
  although that really only matters within the series, not once it is
  fully applied.

[1] https://github.com/Gnurou/linux/commits/drm-rust-next-staging/
[2] https://lore.kernel.org/20260411024118.471294-1-jhubbard@nvidia.com

Alexandre Courbot (1):
  gpu: nova-core: gsp: enable FSP boot path

Eliot Courtney (1):
  gpu: nova-core: add non-sec2 unload path

John Hubbard (20):
  gpu: nova-core: set DMA mask width based on GPU architecture
  gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror
  gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size
  gpu: nova-core: Hopper/Blackwell: larger non-WPR heap
  gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap
  gpu: nova-core: Blackwell: use correct sysmem flush registers
  gpu: nova-core: don't assume 64-bit firmware images
  gpu: nova-core: add support for 32-bit firmware images
  gpu: nova-core: add auto-detection of 32-bit, 64-bit firmware images
  gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub
  gpu: nova-core: Hopper/Blackwell: add FMC firmware image
  gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion
    waiting
  gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations
  gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  gpu: nova-core: add MCTP/NVDM protocol types for firmware
    communication
  gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging
  gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version
  gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
  gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling

 drivers/gpu/nova-core/driver.rs               |  15 -
 drivers/gpu/nova-core/falcon.rs               |   1 +
 drivers/gpu/nova-core/falcon/fsp.rs           | 202 +++++++++++
 drivers/gpu/nova-core/fb.rs                   |   8 +-
 drivers/gpu/nova-core/fb/hal.rs               |  19 +-
 drivers/gpu/nova-core/fb/hal/ga100.rs         |   9 +
 drivers/gpu/nova-core/fb/hal/ga102.rs         |  11 +-
 drivers/gpu/nova-core/fb/hal/gb100.rs         | 122 +++++++
 drivers/gpu/nova-core/fb/hal/gb202.rs         |  95 ++++++
 drivers/gpu/nova-core/fb/hal/gh100.rs         |  50 +++
 drivers/gpu/nova-core/fb/hal/tu102.rs         |  17 +
 drivers/gpu/nova-core/firmware.rs             | 189 ++++++++--
 drivers/gpu/nova-core/firmware/fsp.rs         | 133 ++++++++
 drivers/gpu/nova-core/firmware/gsp.rs         |   4 +-
 drivers/gpu/nova-core/fsp.rs                  | 323 ++++++++++++++++++
 drivers/gpu/nova-core/fsp/hal.rs              |  32 ++
 drivers/gpu/nova-core/fsp/hal/gb100.rs        |  23 ++
 drivers/gpu/nova-core/fsp/hal/gb202.rs        |  27 ++
 drivers/gpu/nova-core/fsp/hal/gh100.rs        |  32 ++
 drivers/gpu/nova-core/gpu.rs                  |  28 +-
 drivers/gpu/nova-core/gpu/hal.rs              |  13 +-
 drivers/gpu/nova-core/gpu/hal/gh100.rs        |  18 +-
 drivers/gpu/nova-core/gpu/hal/tu102.rs        |  14 +
 drivers/gpu/nova-core/gsp.rs                  |   1 +
 drivers/gpu/nova-core/gsp/boot.rs             |   2 +-
 drivers/gpu/nova-core/gsp/commands.rs         |   8 +-
 drivers/gpu/nova-core/gsp/fw.rs               |  85 ++++-
 drivers/gpu/nova-core/gsp/fw/commands.rs      |  15 +-
 .../gpu/nova-core/gsp/fw/r570_144/bindings.rs |  83 +++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs        | 163 ++++++++-
 drivers/gpu/nova-core/mctp.rs                 | 100 ++++++
 drivers/gpu/nova-core/nova_core.rs            |   2 +
 drivers/gpu/nova-core/regs.rs                 | 119 +++++++
 33 files changed, 1882 insertions(+), 81 deletions(-)
 create mode 100644 drivers/gpu/nova-core/falcon/fsp.rs
 create mode 100644 drivers/gpu/nova-core/fb/hal/gb100.rs
 create mode 100644 drivers/gpu/nova-core/fb/hal/gb202.rs
 create mode 100644 drivers/gpu/nova-core/fb/hal/gh100.rs
 create mode 100644 drivers/gpu/nova-core/firmware/fsp.rs
 create mode 100644 drivers/gpu/nova-core/fsp.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal/gb100.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal/gb202.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal/gh100.rs
 create mode 100644 drivers/gpu/nova-core/mctp.rs

base-commit: 90cb2bb93b85ba452d4d67295c8bf3cdeea7d937
-- 
2.54.0

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  6:40   ` Eliot Courtney
  2026-06-02  3:20 ` [PATCH v12 02/22] gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror John Hubbard
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Replace the hardcoded 47-bit DMA mask with a GPU HAL method that
provides the correct value for the architecture.

Set the DMA mask in Gpu::new(). Gpu owns all DMA allocations for
the device, so no concurrent allocations can exist while the
constructor is still running.

Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs        | 15 ---------------
 drivers/gpu/nova-core/gpu.rs           | 12 ++++++++++--
 drivers/gpu/nova-core/gpu/hal.rs       |  8 +++++++-
 drivers/gpu/nova-core/gpu/hal/gh100.rs |  9 ++++++++-
 drivers/gpu/nova-core/gpu/hal/tu102.rs |  5 +++++
 5 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index cff5034c2dcd..ade73da68be5 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -3,8 +3,6 @@
 use kernel::{
     auxiliary,
     device::Core,
-    dma::Device,
-    dma::DmaMask,
     pci,
     pci::{
         Class,
@@ -38,14 +36,6 @@ pub(crate) struct NovaCore<'bound> {
 
 const BAR0_SIZE: usize = SZ_16M;
 
-// For now we only support Ampere which can use up to 47-bit DMA addresses.
-//
-// TODO: Add an abstraction for this to support newer GPUs which may support
-// larger DMA addresses. Limiting these GPUs to smaller address widths won't
-// have any adverse affects, unless installed on systems which require larger
-// DMA addresses. These systems should be quite rare.
-const GPU_DMA_BITS: u32 = 47;
-
 pub(crate) type Bar0 = kernel::io::Mmio<BAR0_SIZE>;
 
 kernel::pci_device_table!(
@@ -88,11 +78,6 @@ fn probe<'bound>(
             pdev.enable_device_mem()?;
             pdev.set_master();
 
-            // SAFETY: No concurrent DMA allocations or mappings can be made because
-            // the device is still being probed and therefore isn't being used by
-            // other threads of execution.
-            unsafe { pdev.dma_set_mask_and_coherent(DmaMask::new::<GPU_DMA_BITS>())? };
-
             Ok(try_pin_init!(NovaCore {
                 bar: pdev.iomap_region_sized::<BAR0_SIZE>(0, c"nova-core/bar0")?,
                 // TODO: Use `&bar` self-referential pin-init syntax once available.
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index aed992488db3..38c75df77e16 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -2,6 +2,7 @@
 
 use kernel::{
     device,
+    dma::Device,
     fmt,
     io::Io,
     num::Bounded,
@@ -269,7 +270,7 @@ pub(crate) struct Gpu<'gpu> {
 
 impl<'gpu> Gpu<'gpu> {
     pub(crate) fn new(
-        pdev: &'gpu pci::Device<device::Bound>,
+        pdev: &'gpu pci::Device<device::Core<'_>>,
         bar: &'gpu Bar0,
     ) -> impl PinInit<Self, Error> + 'gpu {
         try_pin_init!(Self {
@@ -280,7 +281,14 @@ pub(crate) fn new(
 
             // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
             _: {
-                hal::gpu_hal(spec.chipset).wait_gfw_boot_completion(bar)
+                let hal = hal::gpu_hal(spec.chipset);
+                let dma_mask = hal.dma_mask();
+
+                // SAFETY: `Gpu` owns all DMA allocations for this device, and we are
+                // still constructing it, so no concurrent DMA allocations can exist.
+                unsafe { pdev.dma_set_mask_and_coherent(dma_mask)? };
+
+                hal.wait_gfw_boot_completion(bar)
                     .inspect_err(|_| dev_err!(pdev, "GFW boot did not complete\n"))?;
             },
 
diff --git a/drivers/gpu/nova-core/gpu/hal.rs b/drivers/gpu/nova-core/gpu/hal.rs
index 788de20ab5d3..0b636b713593 100644
--- a/drivers/gpu/nova-core/gpu/hal.rs
+++ b/drivers/gpu/nova-core/gpu/hal.rs
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use kernel::prelude::*;
+use kernel::{
+    dma::DmaMask,
+    prelude::*, //
+};
 
 use crate::{
     driver::Bar0,
@@ -16,6 +19,9 @@
 pub(crate) trait GpuHal {
     /// Waits for GFW_BOOT completion if required by this hardware family.
     fn wait_gfw_boot_completion(&self, bar: &Bar0) -> Result;
+
+    /// Returns the DMA mask for the current architecture.
+    fn dma_mask(&self) -> DmaMask;
 }
 
 pub(super) fn gpu_hal(chipset: Chipset) -> &'static dyn GpuHal {
diff --git a/drivers/gpu/nova-core/gpu/hal/gh100.rs b/drivers/gpu/nova-core/gpu/hal/gh100.rs
index 1ed5bccdda1d..41fbabb04ff8 100644
--- a/drivers/gpu/nova-core/gpu/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gpu/hal/gh100.rs
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use kernel::prelude::*;
+use kernel::{
+    dma::DmaMask,
+    prelude::*, //
+};
 
 use crate::driver::Bar0;
 
@@ -12,6 +15,10 @@ impl GpuHal for Gh100 {
     fn wait_gfw_boot_completion(&self, _bar: &Bar0) -> Result {
         Ok(())
     }
+
+    fn dma_mask(&self) -> DmaMask {
+        DmaMask::new::<52>()
+    }
 }
 
 const GH100: Gh100 = Gh100;
diff --git a/drivers/gpu/nova-core/gpu/hal/tu102.rs b/drivers/gpu/nova-core/gpu/hal/tu102.rs
index 08dd4434bd72..2881ab03dbcd 100644
--- a/drivers/gpu/nova-core/gpu/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gpu/hal/tu102.rs
@@ -19,6 +19,7 @@
 //! Note that the devinit sequence also needs to run during suspend/resume.
 
 use kernel::{
+    dma::DmaMask,
     io::{
         poll::read_poll_timeout,
         Io, //
@@ -80,6 +81,10 @@ fn wait_gfw_boot_completion(&self, bar: &Bar0) -> Result {
         )
         .map(|_| ())
     }
+
+    fn dma_mask(&self) -> DmaMask {
+        DmaMask::new::<47>()
+    }
 }
 
 const TU102: Tu102 = Tu102;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 02/22] gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
  2026-06-02  3:20 ` [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 03/22] gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size John Hubbard
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Hopper and Blackwell GPUs moved the PCI config space mirror from
0x088000 to 0x092000. Select the correct address per architecture
when building the GSP system info command.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs             |  7 +++++++
 drivers/gpu/nova-core/gpu/hal.rs         |  5 +++++
 drivers/gpu/nova-core/gpu/hal/gh100.rs   |  9 +++++++++
 drivers/gpu/nova-core/gpu/hal/tu102.rs   |  9 +++++++++
 drivers/gpu/nova-core/gsp/boot.rs        |  2 +-
 drivers/gpu/nova-core/gsp/commands.rs    |  8 +++++---
 drivers/gpu/nova-core/gsp/fw/commands.rs | 15 +++++++++++----
 7 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 38c75df77e16..7dd736e5b190 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::ops::Range;
+
 use kernel::{
     device,
     dma::Device,
@@ -134,6 +136,11 @@ pub(crate) const fn arch(self) -> Architecture {
     pub(crate) const fn needs_fwsec_bootloader(self) -> bool {
         matches!(self.arch(), Architecture::Turing) || matches!(self, Self::GA100)
     }
+
+    /// Returns the address range of the PCI config mirror space.
+    pub(crate) fn pci_config_mirror_range(self) -> Range<u32> {
+        hal::gpu_hal(self).pci_config_mirror_range()
+    }
 }
 
 // TODO
diff --git a/drivers/gpu/nova-core/gpu/hal.rs b/drivers/gpu/nova-core/gpu/hal.rs
index 0b636b713593..cd833bd49b9b 100644
--- a/drivers/gpu/nova-core/gpu/hal.rs
+++ b/drivers/gpu/nova-core/gpu/hal.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::ops::Range;
+
 use kernel::{
     dma::DmaMask,
     prelude::*, //
@@ -22,6 +24,9 @@ pub(crate) trait GpuHal {
 
     /// Returns the DMA mask for the current architecture.
     fn dma_mask(&self) -> DmaMask;
+
+    /// Returns the address range of the PCI config mirror space.
+    fn pci_config_mirror_range(&self) -> Range<u32>;
 }
 
 pub(super) fn gpu_hal(chipset: Chipset) -> &'static dyn GpuHal {
diff --git a/drivers/gpu/nova-core/gpu/hal/gh100.rs b/drivers/gpu/nova-core/gpu/hal/gh100.rs
index 41fbabb04ff8..17778a618900 100644
--- a/drivers/gpu/nova-core/gpu/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gpu/hal/gh100.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::ops::Range;
+
 use kernel::{
     dma::DmaMask,
     prelude::*, //
@@ -19,6 +21,13 @@ fn wait_gfw_boot_completion(&self, _bar: &Bar0) -> Result {
     fn dma_mask(&self) -> DmaMask {
         DmaMask::new::<52>()
     }
+
+    fn pci_config_mirror_range(&self) -> Range<u32> {
+        const PCI_CONFIG_MIRROR_START: u32 = 0x092000;
+        const PCI_CONFIG_MIRROR_SIZE: u32 = 0x001000;
+
+        PCI_CONFIG_MIRROR_START..PCI_CONFIG_MIRROR_START + PCI_CONFIG_MIRROR_SIZE
+    }
 }
 
 const GH100: Gh100 = Gh100;
diff --git a/drivers/gpu/nova-core/gpu/hal/tu102.rs b/drivers/gpu/nova-core/gpu/hal/tu102.rs
index 2881ab03dbcd..125478bfe07a 100644
--- a/drivers/gpu/nova-core/gpu/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gpu/hal/tu102.rs
@@ -18,6 +18,8 @@
 //!
 //! Note that the devinit sequence also needs to run during suspend/resume.
 
+use core::ops::Range;
+
 use kernel::{
     dma::DmaMask,
     io::{
@@ -85,6 +87,13 @@ fn wait_gfw_boot_completion(&self, bar: &Bar0) -> Result {
     fn dma_mask(&self) -> DmaMask {
         DmaMask::new::<47>()
     }
+
+    fn pci_config_mirror_range(&self) -> Range<u32> {
+        const PCI_CONFIG_MIRROR_START: u32 = 0x088000;
+        const PCI_CONFIG_MIRROR_SIZE: u32 = 0x001000;
+
+        PCI_CONFIG_MIRROR_START..PCI_CONFIG_MIRROR_START + PCI_CONFIG_MIRROR_SIZE
+    }
 }
 
 const TU102: Tu102 = Tu102;
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 087ee59da6d9..8c316fa2e585 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -144,7 +144,7 @@ pub(crate) fn boot(
         dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
 
         self.cmdq
-            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev))?;
+            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
         self.cmdq
             .send_command_no_wait(bar, commands::SetRegistry::new())?;
 
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 3a365455d10c..f84de9f4f045 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -19,6 +19,7 @@
 };
 
 use crate::{
+    gpu::Chipset,
     gsp::{
         cmdq::{
             Cmdq,
@@ -37,12 +38,13 @@
 /// The `GspSetSystemInfo` command.
 pub(crate) struct SetSystemInfo<'a> {
     pdev: &'a pci::Device<device::Bound>,
+    chipset: Chipset,
 }
 
 impl<'a> SetSystemInfo<'a> {
     /// Creates a new `GspSetSystemInfo` command using the parameters of `pdev`.
-    pub(crate) fn new(pdev: &'a pci::Device<device::Bound>) -> Self {
-        Self { pdev }
+    pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, chipset: Chipset) -> Self {
+        Self { pdev, chipset }
     }
 }
 
@@ -53,7 +55,7 @@ impl<'a> CommandToGsp for SetSystemInfo<'a> {
     type InitError = Error;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
-        Self::Command::init(self.pdev)
+        Self::Command::init(self.pdev, self.chipset)
     }
 }
 
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 42985d446bae..7bcc41fc7fa0 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -11,7 +11,10 @@
     }, //
 };
 
-use crate::gsp::GSP_PAGE_SIZE;
+use crate::{
+    gpu::Chipset,
+    gsp::GSP_PAGE_SIZE, //
+};
 
 use super::bindings;
 
@@ -25,8 +28,12 @@ pub(crate) struct GspSetSystemInfo {
 impl GspSetSystemInfo {
     /// Returns an in-place initializer for the `GspSetSystemInfo` command.
     #[allow(non_snake_case)]
-    pub(crate) fn init<'a>(dev: &'a pci::Device<device::Bound>) -> impl Init<Self, Error> + 'a {
+    pub(crate) fn init<'a>(
+        dev: &'a pci::Device<device::Bound>,
+        chipset: Chipset,
+    ) -> impl Init<Self, Error> + 'a {
         type InnerGspSystemInfo = bindings::GspSystemInfo;
+        let pci_config_mirror_range = chipset.pci_config_mirror_range();
         let init_inner = try_init!(InnerGspSystemInfo {
             gpuPhysAddr: dev.resource_start(0)?,
             gpuPhysFbAddr: dev.resource_start(1)?,
@@ -36,8 +43,8 @@ pub(crate) fn init<'a>(dev: &'a pci::Device<device::Bound>) -> impl Init<Self, E
             // Using TASK_SIZE in r535_gsp_rpc_set_system_info() seems wrong because
             // TASK_SIZE is per-task. That's probably a design issue in GSP-RM though.
             maxUserVa: (1 << 47) - 4096,
-            pciConfigMirrorBase: 0x088000,
-            pciConfigMirrorSize: 0x001000,
+            pciConfigMirrorBase: pci_config_mirror_range.start,
+            pciConfigMirrorSize: pci_config_mirror_range.end - pci_config_mirror_range.start,
 
             PCIDeviceID: (u32::from(dev.device_id()) << 16) | u32::from(dev.vendor_id().as_raw()),
             PCISubDeviceID: (u32::from(dev.subsystem_device_id()) << 16)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 03/22] gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
  2026-06-02  3:20 ` [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture John Hubbard
  2026-06-02  3:20 ` [PATCH v12 02/22] gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 04/22] gpu: nova-core: Hopper/Blackwell: larger non-WPR heap John Hubbard
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

GSP boot needs to know how much framebuffer memory is reserved for
the PMU. Compute it per architecture: Blackwell dGPUs reserve a
non-zero amount, earlier architectures leave it at zero, matching
Open RM behavior.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs           |  3 ++
 drivers/gpu/nova-core/fb/hal.rs       | 12 +++---
 drivers/gpu/nova-core/fb/hal/ga100.rs |  5 +++
 drivers/gpu/nova-core/fb/hal/ga102.rs |  7 +++-
 drivers/gpu/nova-core/fb/hal/gb100.rs | 57 +++++++++++++++++++++++++++
 drivers/gpu/nova-core/fb/hal/tu102.rs |  9 +++++
 drivers/gpu/nova-core/gsp/fw.rs       |  1 +
 7 files changed, 88 insertions(+), 6 deletions(-)
 create mode 100644 drivers/gpu/nova-core/fb/hal/gb100.rs

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index 1fb65d4eb290..d7a4dc944131 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -165,6 +165,8 @@ pub(crate) struct FbLayout {
     pub(crate) wpr2: FbRange,
     pub(crate) heap: FbRange,
     pub(crate) vf_partition_count: u8,
+    /// PMU reserved memory size, in bytes.
+    pub(crate) pmu_reserved_size: u32,
 }
 
 impl FbLayout {
@@ -265,6 +267,7 @@ pub(crate) fn new(chipset: Chipset, bar: &Bar0, gsp_fw: &GspFirmware) -> Result<
             wpr2,
             heap,
             vf_partition_count: 0,
+            pmu_reserved_size: hal.pmu_reserved_size(),
         })
     }
 }
diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
index 8b192a503363..b45784ad5f2e 100644
--- a/drivers/gpu/nova-core/fb/hal.rs
+++ b/drivers/gpu/nova-core/fb/hal.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::prelude::*;
 
@@ -12,6 +13,7 @@
 
 mod ga100;
 mod ga102;
+mod gb100;
 mod tu102;
 
 pub(crate) trait FbHal {
@@ -29,6 +31,9 @@ pub(crate) trait FbHal {
     /// Returns the VRAM size, in bytes.
     fn vidmem_size(&self, bar: &Bar0) -> u64;
 
+    /// Returns the amount of VRAM to reserve for the PMU.
+    fn pmu_reserved_size(&self) -> u32;
+
     /// Returns the FRTS size, in bytes.
     fn frts_size(&self) -> u64;
 }
@@ -38,10 +43,7 @@ pub(super) fn fb_hal(chipset: Chipset) -> &'static dyn FbHal {
     match chipset.arch() {
         Architecture::Turing => tu102::TU102_HAL,
         Architecture::Ampere if chipset == Chipset::GA100 => ga100::GA100_HAL,
-        Architecture::Ampere => ga102::GA102_HAL,
-        Architecture::Ada
-        | Architecture::Hopper
-        | Architecture::BlackwellGB10x
-        | Architecture::BlackwellGB20x => ga102::GA102_HAL,
+        Architecture::Ampere | Architecture::Ada | Architecture::Hopper => ga102::GA102_HAL,
+        Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => gb100::GB100_HAL,
     }
 }
diff --git a/drivers/gpu/nova-core/fb/hal/ga100.rs b/drivers/gpu/nova-core/fb/hal/ga100.rs
index 2f5871d915c3..0f5132aa9c31 100644
--- a/drivers/gpu/nova-core/fb/hal/ga100.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga100.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
     io::Io,
@@ -67,6 +68,10 @@ fn vidmem_size(&self, bar: &Bar0) -> u64 {
         super::tu102::vidmem_size_gp102(bar)
     }
 
+    fn pmu_reserved_size(&self) -> u32 {
+        super::tu102::pmu_reserved_size_tu102()
+    }
+
     // GA100 is a special case where its FRTS region exists, but is empty.  We
     // return a size of 0 because we still need to record where the region starts.
     fn frts_size(&self) -> u64 {
diff --git a/drivers/gpu/nova-core/fb/hal/ga102.rs b/drivers/gpu/nova-core/fb/hal/ga102.rs
index 3bb66f64bef7..17a2fef1ad44 100644
--- a/drivers/gpu/nova-core/fb/hal/ga102.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga102.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
     io::Io,
@@ -11,7 +12,7 @@
     regs, //
 };
 
-fn vidmem_size_ga102(bar: &Bar0) -> u64 {
+pub(super) fn vidmem_size_ga102(bar: &Bar0) -> u64 {
     bar.read(regs::NV_USABLE_FB_SIZE_IN_MB).usable_fb_size()
 }
 
@@ -36,6 +37,10 @@ fn vidmem_size(&self, bar: &Bar0) -> u64 {
         vidmem_size_ga102(bar)
     }
 
+    fn pmu_reserved_size(&self) -> u32 {
+        super::tu102::pmu_reserved_size_tu102()
+    }
+
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
new file mode 100644
index 000000000000..c78027c26a9e
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+//! Blackwell framebuffer HAL.
+
+use kernel::{
+    prelude::*,
+    ptr::{
+        const_align_up,
+        Alignment, //
+    },
+    sizes::*, //
+};
+
+use crate::{
+    driver::Bar0,
+    fb::hal::FbHal,
+    num::usize_into_u32, //
+};
+
+struct Gb100;
+
+const fn pmu_reserved_size_gb100() -> u32 {
+    usize_into_u32::<{ const_align_up(SZ_8M + SZ_16M + SZ_4K, Alignment::new::<SZ_128K>()).unwrap() }>(
+    )
+}
+
+impl FbHal for Gb100 {
+    fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
+        super::ga100::read_sysmem_flush_page_ga100(bar)
+    }
+
+    fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
+        super::ga100::write_sysmem_flush_page_ga100(bar, addr);
+
+        Ok(())
+    }
+
+    fn supports_display(&self, bar: &Bar0) -> bool {
+        super::ga100::display_enabled_ga100(bar)
+    }
+
+    fn vidmem_size(&self, bar: &Bar0) -> u64 {
+        super::ga102::vidmem_size_ga102(bar)
+    }
+
+    fn pmu_reserved_size(&self) -> u32 {
+        pmu_reserved_size_gb100()
+    }
+
+    fn frts_size(&self) -> u64 {
+        super::tu102::frts_size_tu102()
+    }
+}
+
+const GB100: Gb100 = Gb100;
+pub(super) const GB100_HAL: &dyn FbHal = &GB100;
diff --git a/drivers/gpu/nova-core/fb/hal/tu102.rs b/drivers/gpu/nova-core/fb/hal/tu102.rs
index 22c174bf1472..1755bbc27866 100644
--- a/drivers/gpu/nova-core/fb/hal/tu102.rs
+++ b/drivers/gpu/nova-core/fb/hal/tu102.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
     io::Io,
@@ -39,6 +40,10 @@ pub(super) fn vidmem_size_gp102(bar: &Bar0) -> u64 {
         .usable_fb_size()
 }
 
+pub(super) const fn pmu_reserved_size_tu102() -> u32 {
+    0
+}
+
 pub(super) const fn frts_size_tu102() -> u64 {
     u64::SZ_1M
 }
@@ -62,6 +67,10 @@ fn vidmem_size(&self, bar: &Bar0) -> u64 {
         vidmem_size_gp102(bar)
     }
 
+    fn pmu_reserved_size(&self) -> u32 {
+        pmu_reserved_size_tu102()
+    }
+
     fn frts_size(&self) -> u64 {
         frts_size_tu102()
     }
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 33c9f5860771..919d3ab00075 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -247,6 +247,7 @@ pub(crate) fn new<'a>(
             fbSize: fb_layout.fb.end - fb_layout.fb.start,
             vgaWorkspaceOffset: fb_layout.vga_workspace.start,
             vgaWorkspaceSize: fb_layout.vga_workspace.end - fb_layout.vga_workspace.start,
+            pmuReservedSize: fb_layout.pmu_reserved_size,
             ..Zeroable::init_zeroed()
         });
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 04/22] gpu: nova-core: Hopper/Blackwell: larger non-WPR heap
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (2 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 03/22] gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 05/22] gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap John Hubbard
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Hopper and Blackwell need a larger non-WPR heap than the 1 MiB that
earlier architectures use. Hopper and Blackwell GB10x need 2 MiB, while
Blackwell GB20x needs 2 MiB + 128 KiB. These sizes diverge by family,
so give Hopper and each Blackwell family its own framebuffer HAL and
select the non-WPR heap size per chipset family.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs           |  5 ++-
 drivers/gpu/nova-core/fb/hal.rs       | 11 ++++--
 drivers/gpu/nova-core/fb/hal/ga100.rs |  4 +++
 drivers/gpu/nova-core/fb/hal/ga102.rs |  4 +++
 drivers/gpu/nova-core/fb/hal/gb100.rs |  9 +++--
 drivers/gpu/nova-core/fb/hal/gb202.rs | 52 +++++++++++++++++++++++++++
 drivers/gpu/nova-core/fb/hal/gh100.rs | 50 ++++++++++++++++++++++++++
 drivers/gpu/nova-core/fb/hal/tu102.rs |  8 +++++
 8 files changed, 136 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/nova-core/fb/hal/gb202.rs
 create mode 100644 drivers/gpu/nova-core/fb/hal/gh100.rs

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index d7a4dc944131..0aaee718c2c3 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -252,9 +252,8 @@ pub(crate) fn new(chipset: Chipset, bar: &Bar0, gsp_fw: &GspFirmware) -> Result<
         };
 
         let heap = {
-            const HEAP_SIZE: u64 = u64::SZ_1M;
-
-            FbRange(wpr2.start - HEAP_SIZE..wpr2.start)
+            let heap_size = u64::from(hal.non_wpr_heap_size());
+            FbRange(wpr2.start - heap_size..wpr2.start)
         };
 
         Ok(Self {
diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
index b45784ad5f2e..be9e75f990f0 100644
--- a/drivers/gpu/nova-core/fb/hal.rs
+++ b/drivers/gpu/nova-core/fb/hal.rs
@@ -14,6 +14,8 @@
 mod ga100;
 mod ga102;
 mod gb100;
+mod gb202;
+mod gh100;
 mod tu102;
 
 pub(crate) trait FbHal {
@@ -34,6 +36,9 @@ pub(crate) trait FbHal {
     /// Returns the amount of VRAM to reserve for the PMU.
     fn pmu_reserved_size(&self) -> u32;
 
+    /// Returns the non-WPR heap size for this chipset, in bytes.
+    fn non_wpr_heap_size(&self) -> u32;
+
     /// Returns the FRTS size, in bytes.
     fn frts_size(&self) -> u64;
 }
@@ -43,7 +48,9 @@ pub(super) fn fb_hal(chipset: Chipset) -> &'static dyn FbHal {
     match chipset.arch() {
         Architecture::Turing => tu102::TU102_HAL,
         Architecture::Ampere if chipset == Chipset::GA100 => ga100::GA100_HAL,
-        Architecture::Ampere | Architecture::Ada | Architecture::Hopper => ga102::GA102_HAL,
-        Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => gb100::GB100_HAL,
+        Architecture::Ampere | Architecture::Ada => ga102::GA102_HAL,
+        Architecture::Hopper => gh100::GH100_HAL,
+        Architecture::BlackwellGB10x => gb100::GB100_HAL,
+        Architecture::BlackwellGB20x => gb202::GB202_HAL,
     }
 }
diff --git a/drivers/gpu/nova-core/fb/hal/ga100.rs b/drivers/gpu/nova-core/fb/hal/ga100.rs
index 0f5132aa9c31..af95f1bdd273 100644
--- a/drivers/gpu/nova-core/fb/hal/ga100.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga100.rs
@@ -72,6 +72,10 @@ fn pmu_reserved_size(&self) -> u32 {
         super::tu102::pmu_reserved_size_tu102()
     }
 
+    fn non_wpr_heap_size(&self) -> u32 {
+        super::tu102::non_wpr_heap_size_tu102()
+    }
+
     // GA100 is a special case where its FRTS region exists, but is empty.  We
     // return a size of 0 because we still need to record where the region starts.
     fn frts_size(&self) -> u64 {
diff --git a/drivers/gpu/nova-core/fb/hal/ga102.rs b/drivers/gpu/nova-core/fb/hal/ga102.rs
index 17a2fef1ad44..e06dbb08349e 100644
--- a/drivers/gpu/nova-core/fb/hal/ga102.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga102.rs
@@ -41,6 +41,10 @@ fn pmu_reserved_size(&self) -> u32 {
         super::tu102::pmu_reserved_size_tu102()
     }
 
+    fn non_wpr_heap_size(&self) -> u32 {
+        super::tu102::non_wpr_heap_size_tu102()
+    }
+
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
index c78027c26a9e..8d63350abf8a 100644
--- a/drivers/gpu/nova-core/fb/hal/gb100.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
-//! Blackwell framebuffer HAL.
+//! Blackwell GB10x framebuffer HAL.
 
 use kernel::{
     prelude::*,
@@ -20,7 +20,7 @@
 
 struct Gb100;
 
-const fn pmu_reserved_size_gb100() -> u32 {
+pub(super) const fn pmu_reserved_size_gb100() -> u32 {
     usize_into_u32::<{ const_align_up(SZ_8M + SZ_16M + SZ_4K, Alignment::new::<SZ_128K>()).unwrap() }>(
     )
 }
@@ -48,6 +48,11 @@ fn pmu_reserved_size(&self) -> u32 {
         pmu_reserved_size_gb100()
     }
 
+    fn non_wpr_heap_size(&self) -> u32 {
+        // Non-WPR heap for GB10x (see Open RM: kgspGetNonWprHeapSize, GB100/GB102).
+        u32::SZ_2M
+    }
+
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
diff --git a/drivers/gpu/nova-core/fb/hal/gb202.rs b/drivers/gpu/nova-core/fb/hal/gb202.rs
new file mode 100644
index 000000000000..542c1d7429e9
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal/gb202.rs
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+//! Blackwell GB20x framebuffer HAL.
+
+use kernel::{
+    prelude::*,
+    sizes::SizeConstants, //
+};
+
+use crate::{
+    driver::Bar0,
+    fb::hal::FbHal, //
+};
+
+struct Gb202;
+
+impl FbHal for Gb202 {
+    fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
+        super::ga100::read_sysmem_flush_page_ga100(bar)
+    }
+
+    fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
+        super::ga100::write_sysmem_flush_page_ga100(bar, addr);
+
+        Ok(())
+    }
+
+    fn supports_display(&self, bar: &Bar0) -> bool {
+        super::ga100::display_enabled_ga100(bar)
+    }
+
+    fn vidmem_size(&self, bar: &Bar0) -> u64 {
+        super::ga102::vidmem_size_ga102(bar)
+    }
+
+    fn pmu_reserved_size(&self) -> u32 {
+        super::gb100::pmu_reserved_size_gb100()
+    }
+
+    fn non_wpr_heap_size(&self) -> u32 {
+        // Non-WPR heap for GB20x (see Open RM: kgspGetNonWprHeapSize, GB202+).
+        u32::SZ_2M + u32::SZ_128K
+    }
+
+    fn frts_size(&self) -> u64 {
+        super::tu102::frts_size_tu102()
+    }
+}
+
+const GB202: Gb202 = Gb202;
+pub(super) const GB202_HAL: &dyn FbHal = &GB202;
diff --git a/drivers/gpu/nova-core/fb/hal/gh100.rs b/drivers/gpu/nova-core/fb/hal/gh100.rs
new file mode 100644
index 000000000000..8f79c72b1823
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal/gh100.rs
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::{
+    prelude::*,
+    sizes::SizeConstants, //
+};
+
+use crate::{
+    driver::Bar0,
+    fb::hal::FbHal, //
+};
+
+struct Gh100;
+
+impl FbHal for Gh100 {
+    fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
+        super::ga100::read_sysmem_flush_page_ga100(bar)
+    }
+
+    fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
+        super::ga100::write_sysmem_flush_page_ga100(bar, addr);
+
+        Ok(())
+    }
+
+    fn supports_display(&self, bar: &Bar0) -> bool {
+        super::ga100::display_enabled_ga100(bar)
+    }
+
+    fn vidmem_size(&self, bar: &Bar0) -> u64 {
+        super::ga102::vidmem_size_ga102(bar)
+    }
+
+    fn pmu_reserved_size(&self) -> u32 {
+        super::tu102::pmu_reserved_size_tu102()
+    }
+
+    fn non_wpr_heap_size(&self) -> u32 {
+        // Non-WPR heap for Hopper (see Open RM: kgspCalculateFbLayout_GH100).
+        u32::SZ_2M
+    }
+
+    fn frts_size(&self) -> u64 {
+        super::tu102::frts_size_tu102()
+    }
+}
+
+const GH100: Gh100 = Gh100;
+pub(super) const GH100_HAL: &dyn FbHal = &GH100;
diff --git a/drivers/gpu/nova-core/fb/hal/tu102.rs b/drivers/gpu/nova-core/fb/hal/tu102.rs
index 1755bbc27866..62d9357987f7 100644
--- a/drivers/gpu/nova-core/fb/hal/tu102.rs
+++ b/drivers/gpu/nova-core/fb/hal/tu102.rs
@@ -44,6 +44,10 @@ pub(super) const fn pmu_reserved_size_tu102() -> u32 {
     0
 }
 
+pub(super) const fn non_wpr_heap_size_tu102() -> u32 {
+    u32::SZ_1M
+}
+
 pub(super) const fn frts_size_tu102() -> u64 {
     u64::SZ_1M
 }
@@ -71,6 +75,10 @@ fn pmu_reserved_size(&self) -> u32 {
         pmu_reserved_size_tu102()
     }
 
+    fn non_wpr_heap_size(&self) -> u32 {
+        non_wpr_heap_size_tu102()
+    }
+
     fn frts_size(&self) -> u64 {
         frts_size_tu102()
     }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 05/22] gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (3 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 04/22] gpu: nova-core: Hopper/Blackwell: larger non-WPR heap John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers John Hubbard
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

The GSP-RM boot working memory portion of the WPR2 heap must be
larger on Hopper and later GPUs than on Turing, Ampere, and Ada.
Select the larger value for those generations.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw.rs               | 20 +++++++++++++------
 .../gpu/nova-core/gsp/fw/r570_144/bindings.rs |  1 +
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 919d3ab00075..0c54e8bf4bb3 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 pub(crate) mod commands;
 mod r570_144;
@@ -29,7 +30,10 @@
 use crate::{
     fb::FbLayout,
     firmware::gsp::GspFirmware,
-    gpu::Chipset,
+    gpu::{
+        Architecture,
+        Chipset, //
+    },
     gsp::{
         cmdq::Cmdq, //
         GSP_PAGE_SIZE,
@@ -106,11 +110,15 @@ enum GspFwHeapParams {}
 impl GspFwHeapParams {
     /// Returns the amount of GSP-RM heap memory used during GSP-RM boot and initialization (up to
     /// and including the first client subdevice allocation).
-    fn base_rm_size(_chipset: Chipset) -> u64 {
-        // TODO: this needs to be updated to return the correct value for Hopper+ once support for
-        // them is added:
-        // u64::from(bindings::GSP_FW_HEAP_PARAM_BASE_RM_SIZE_GH100)
-        u64::from(bindings::GSP_FW_HEAP_PARAM_BASE_RM_SIZE_TU10X)
+    fn base_rm_size(chipset: Chipset) -> u64 {
+        match chipset.arch() {
+            Architecture::Turing | Architecture::Ampere | Architecture::Ada => {
+                u64::from(bindings::GSP_FW_HEAP_PARAM_BASE_RM_SIZE_TU10X)
+            }
+            Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
+                u64::from(bindings::GSP_FW_HEAP_PARAM_BASE_RM_SIZE_GH100)
+            }
+        }
     }
 
     /// Returns the amount of heap memory required to support a single channel allocation.
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index f82ed097b283..1d592bd3f9ed 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -37,6 +37,7 @@ fn fmt(&self, fmt: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS2: u32 = 0;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS3_BAREMETAL: u32 = 23068672;
 pub const GSP_FW_HEAP_PARAM_BASE_RM_SIZE_TU10X: u32 = 8388608;
+pub const GSP_FW_HEAP_PARAM_BASE_RM_SIZE_GH100: u32 = 14680064;
 pub const GSP_FW_HEAP_PARAM_SIZE_PER_GB_FB: u32 = 98304;
 pub const GSP_FW_HEAP_PARAM_CLIENT_ALLOC_SIZE: u32 = 100663296;
 pub const GSP_FW_HEAP_SIZE_OVERRIDE_LIBOS2_MIN_MB: u32 = 64;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (4 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 05/22] gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:30   ` sashiko-bot
  2026-06-02  7:12   ` Eliot Courtney
  2026-06-02  3:20 ` [PATCH v12 07/22] gpu: nova-core: don't assume 64-bit firmware images John Hubbard
                   ` (17 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Blackwell GPUs moved the sysmem flush page registers away from the
Ampere/Ada location. GB10x routes the flush through a pair of HSHUB0
register sets (primary and egress) that must both be programmed to
the same address. GB20x routes it through FBHUB0.

Define these registers relative to their HSHUB0 and FBHUB0 bases, as
Open RM does, and implement the flush paths in the GB10x and GB20x
framebuffer HALs.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/fb/hal/gb100.rs | 66 +++++++++++++++++++++++++--
 drivers/gpu/nova-core/fb/hal/gb202.rs | 49 ++++++++++++++++++--
 drivers/gpu/nova-core/regs.rs         | 45 ++++++++++++++++++
 3 files changed, 154 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
index 8d63350abf8a..ecea4ff446ff 100644
--- a/drivers/gpu/nova-core/fb/hal/gb100.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
@@ -4,6 +4,14 @@
 //! Blackwell GB10x framebuffer HAL.
 
 use kernel::{
+    io::{
+        register::{
+            RegisterBase,
+            WithBase, //
+        },
+        Io, //
+    },
+    num::Bounded,
     prelude::*,
     ptr::{
         const_align_up,
@@ -15,11 +23,61 @@
 use crate::{
     driver::Bar0,
     fb::hal::FbHal,
-    num::usize_into_u32, //
+    num::usize_into_u32,
+    regs, //
 };
 
 struct Gb100;
 
+impl RegisterBase<regs::Hshub0Base> for Gb100 {
+    const BASE: usize = 0x0087_0000;
+}
+
+fn read_sysmem_flush_page_gb100(bar: &Bar0) -> u64 {
+    let lo = u64::from(
+        bar.read(regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb100>())
+            .adr(),
+    );
+    let hi = u64::from(
+        bar.read(regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb100>())
+            .adr(),
+    );
+
+    lo | (hi << 32)
+}
+
+/// Write the sysmem flush page address through the GB10x HSHUB0 registers.
+///
+/// Both the primary and EG (egress) register pairs must be programmed to the same address,
+/// as required by hardware.
+fn write_sysmem_flush_page_gb100(bar: &Bar0, addr: Bounded<u64, 52>) {
+    // CAST: lower 32 bits. Hardware ignores bits 7:0.
+    let addr_lo = *addr as u32;
+    let addr_hi = addr.shr::<32, 20>().cast::<u32>();
+
+    // Write HI first. The hardware will trigger the flush on the LO write.
+
+    // Primary HSHUB pair.
+    bar.write(
+        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb100>(),
+        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::zeroed().with_adr(addr_hi),
+    );
+    bar.write(
+        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb100>(),
+        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::zeroed().with_adr(addr_lo),
+    );
+
+    // EG (egress) pair -- must match the primary pair.
+    bar.write(
+        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb100>(),
+        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_HI::zeroed().with_adr(addr_hi),
+    );
+    bar.write(
+        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb100>(),
+        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_LO::zeroed().with_adr(addr_lo),
+    );
+}
+
 pub(super) const fn pmu_reserved_size_gb100() -> u32 {
     usize_into_u32::<{ const_align_up(SZ_8M + SZ_16M + SZ_4K, Alignment::new::<SZ_128K>()).unwrap() }>(
     )
@@ -27,11 +85,13 @@ pub(super) const fn pmu_reserved_size_gb100() -> u32 {
 
 impl FbHal for Gb100 {
     fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
-        super::ga100::read_sysmem_flush_page_ga100(bar)
+        read_sysmem_flush_page_gb100(bar)
     }
 
     fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
-        super::ga100::write_sysmem_flush_page_ga100(bar, addr);
+        let addr = Bounded::<u64, 52>::try_new(addr).ok_or(EINVAL)?;
+
+        write_sysmem_flush_page_gb100(bar, addr);
 
         Ok(())
     }
diff --git a/drivers/gpu/nova-core/fb/hal/gb202.rs b/drivers/gpu/nova-core/fb/hal/gb202.rs
index 542c1d7429e9..fa5c3f7f2b2e 100644
--- a/drivers/gpu/nova-core/fb/hal/gb202.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb202.rs
@@ -4,24 +4,67 @@
 //! Blackwell GB20x framebuffer HAL.
 
 use kernel::{
+    io::{
+        register::{
+            RegisterBase,
+            WithBase, //
+        },
+        Io, //
+    },
+    num::Bounded,
     prelude::*,
     sizes::SizeConstants, //
 };
 
 use crate::{
     driver::Bar0,
-    fb::hal::FbHal, //
+    fb::hal::FbHal,
+    regs, //
 };
 
 struct Gb202;
 
+impl RegisterBase<regs::Fbhub0Base> for Gb202 {
+    const BASE: usize = 0x008a_0000;
+}
+
+fn read_sysmem_flush_page_gb202(bar: &Bar0) -> u64 {
+    let lo = u64::from(
+        bar.read(regs::NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb202>())
+            .adr(),
+    );
+    let hi = u64::from(
+        bar.read(regs::NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb202>())
+            .adr(),
+    );
+
+    lo | (hi << 32)
+}
+
+/// Write the sysmem flush page address through the GB20x FBHUB0 registers.
+fn write_sysmem_flush_page_gb202(bar: &Bar0, addr: Bounded<u64, 52>) {
+    // Write HI first. The hardware will trigger the flush on the LO write.
+    bar.write(
+        regs::NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb202>(),
+        regs::NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::zeroed()
+            .with_adr(addr.shr::<32, 20>().cast::<u32>()),
+    );
+    bar.write(
+        regs::NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb202>(),
+        // CAST: lower 32 bits. Hardware ignores bits 7:0.
+        regs::NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::zeroed().with_adr(*addr as u32),
+    );
+}
+
 impl FbHal for Gb202 {
     fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
-        super::ga100::read_sysmem_flush_page_ga100(bar)
+        read_sysmem_flush_page_gb202(bar)
     }
 
     fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
-        super::ga100::write_sysmem_flush_page_ga100(bar, addr);
+        let addr = Bounded::<u64, 52>::try_new(addr).ok_or(EINVAL)?;
+
+        write_sysmem_flush_page_gb202(bar, addr);
 
         Ok(())
     }
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 356fbf364ea5..b39647684dd1 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
     io::{
@@ -147,6 +148,50 @@ fn fmt(&self, f: &mut kernel::fmt::Formatter<'_>) -> kernel::fmt::Result {
     }
 }
 
+/// Base of the GB10x HSHUB0 register window (`NV_HSHUB0_PRIV_BASE` in Open RM).
+///
+/// The base is provided by the GB10x framebuffer HAL.
+pub(crate) struct Hshub0Base(());
+
+/// Base of the GB20x FBHUB0 register window (`NV_FBHUB0_PRI_BASE` in Open RM).
+///
+/// The base is provided by the GB20x framebuffer HAL.
+pub(crate) struct Fbhub0Base(());
+
+register! {
+    // GB10x sysmem flush registers, relative to the HSHUB0 base. GB10x routes sysmembar
+    // through a primary and an EG (egress) pair that must both be programmed to the same
+    // address. Hardware ignores bits 7:0 of each LO register. The boot path uses a fixed
+    // HSHUB0 base, so the multiple runtime-discovered HSHUB bases are not needed here.
+    pub(crate) NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO(u32) @ Hshub0Base + 0x00000e50 {
+        31:0    adr => u32;
+    }
+
+    pub(crate) NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI(u32) @ Hshub0Base + 0x00000e54 {
+        19:0    adr;
+    }
+
+    pub(crate) NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_LO(u32) @ Hshub0Base + 0x000006c0 {
+        31:0    adr => u32;
+    }
+
+    pub(crate) NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_HI(u32) @ Hshub0Base + 0x000006c4 {
+        19:0    adr;
+    }
+
+    // GB20x sysmem flush registers, relative to the FBHUB0 base. Unlike the older
+    // NV_PFB_NISO_FLUSH_SYSMEM_ADDR registers which encode the address with an 8-bit
+    // right-shift, these take the raw address split into lower and upper halves. Hardware
+    // ignores bits 7:0 of the LO register.
+    pub(crate) NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_LO(u32) @ Fbhub0Base + 0x00001d58 {
+        31:0    adr => u32;
+    }
+
+    pub(crate) NV_PFB_FBHUB_PCIE_FLUSH_SYSMEM_ADDR_HI(u32) @ Fbhub0Base + 0x00001d5c {
+        19:0    adr;
+    }
+}
+
 impl NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE {
     /// Returns the usable framebuffer size, in bytes.
     pub(crate) fn usable_fb_size(self) -> u64 {
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 07/22] gpu: nova-core: don't assume 64-bit firmware images
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (5 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 08/22] gpu: nova-core: add support for 32-bit " John Hubbard
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Introduce a single ELF format abstraction that ties each ELF header
type to its matching section-header type. This keeps the shared
section parser ready for upcoming ELF32 support and avoids mixing
32-bit and 64-bit ELF layouts by mistake.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 112 +++++++++++++++++++++++-------
 1 file changed, 85 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 3aac073efee2..38088e950980 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 //! Contains structures and functions dedicated to the parsing, building and patching of firmwares
 //! to be loaded into a given execution unit.
@@ -467,17 +468,72 @@ mod elf {
         transmute::FromBytes, //
     };
 
+    /// Trait to abstract over ELF header differences.
+    trait ElfHeader: FromBytes {
+        fn shnum(&self) -> u16;
+        fn shoff(&self) -> u64;
+        fn shstrndx(&self) -> u16;
+    }
+
+    /// Trait to abstract over ELF section-header differences.
+    trait ElfSectionHeader: FromBytes {
+        fn name(&self) -> u32;
+        fn offset(&self) -> u64;
+        fn size(&self) -> u64;
+    }
+
+    /// Trait describing a matching ELF header and section-header format.
+    trait ElfFormat {
+        type Header: ElfHeader;
+        type SectionHeader: ElfSectionHeader;
+    }
+
     /// Newtype to provide a [`FromBytes`] implementation.
     #[repr(transparent)]
     struct Elf64Hdr(bindings::elf64_hdr);
     // SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
     unsafe impl FromBytes for Elf64Hdr {}
 
+    impl ElfHeader for Elf64Hdr {
+        fn shnum(&self) -> u16 {
+            self.0.e_shnum
+        }
+
+        fn shoff(&self) -> u64 {
+            self.0.e_shoff
+        }
+
+        fn shstrndx(&self) -> u16 {
+            self.0.e_shstrndx
+        }
+    }
+
     #[repr(transparent)]
     struct Elf64SHdr(bindings::elf64_shdr);
     // SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
     unsafe impl FromBytes for Elf64SHdr {}
 
+    impl ElfSectionHeader for Elf64SHdr {
+        fn name(&self) -> u32 {
+            self.0.sh_name
+        }
+
+        fn offset(&self) -> u64 {
+            self.0.sh_offset
+        }
+
+        fn size(&self) -> u64 {
+            self.0.sh_size
+        }
+    }
+
+    struct Elf64Format;
+
+    impl ElfFormat for Elf64Format {
+        type Header = Elf64Hdr;
+        type SectionHeader = Elf64SHdr;
+    }
+
     /// Returns a NULL-terminated string from the ELF image at `offset`.
     fn elf_str(elf: &[u8], offset: u64) -> Option<&str> {
         let idx = usize::try_from(offset).ok()?;
@@ -485,47 +541,49 @@ fn elf_str(elf: &[u8], offset: u64) -> Option<&str> {
         CStr::from_bytes_until_nul(bytes).ok()?.to_str().ok()
     }
 
-    /// Tries to extract section with name `name` from the ELF64 image `elf`, and returns it.
-    pub(super) fn elf64_section<'a, 'b>(elf: &'a [u8], name: &'b str) -> Option<&'a [u8]> {
-        let hdr = &elf
-            .get(0..size_of::<bindings::elf64_hdr>())
-            .and_then(Elf64Hdr::from_bytes)?
-            .0;
-
-        // Get all the section headers.
-        let mut shdr = {
-            let shdr_num = usize::from(hdr.e_shnum);
-            let shdr_start = usize::try_from(hdr.e_shoff).ok()?;
-            let shdr_end = shdr_num
-                .checked_mul(size_of::<Elf64SHdr>())
-                .and_then(|v| v.checked_add(shdr_start))?;
-
-            elf.get(shdr_start..shdr_end)
-                .map(|slice| slice.chunks_exact(size_of::<Elf64SHdr>()))?
-        };
+    fn elf_section_generic<'a, F>(elf: &'a [u8], name: &str) -> Option<&'a [u8]>
+    where
+        F: ElfFormat,
+    {
+        let hdr = F::Header::from_bytes(elf.get(0..size_of::<F::Header>())?)?;
+
+        let shdr_num = usize::from(hdr.shnum());
+        let shdr_start = usize::try_from(hdr.shoff()).ok()?;
+        let shdr_end = shdr_num
+            .checked_mul(size_of::<F::SectionHeader>())
+            .and_then(|v| v.checked_add(shdr_start))?;
+
+        // Get all the section headers as an iterator over byte chunks.
+        let shdr_bytes = elf.get(shdr_start..shdr_end)?;
+        let mut shdr_iter = shdr_bytes.chunks_exact(size_of::<F::SectionHeader>());
 
         // Get the strings table.
-        let strhdr = shdr
+        let strhdr = shdr_iter
             .clone()
-            .nth(usize::from(hdr.e_shstrndx))
-            .and_then(Elf64SHdr::from_bytes)?;
+            .nth(usize::from(hdr.shstrndx()))
+            .and_then(F::SectionHeader::from_bytes)?;
 
         // Find the section which name matches `name` and return it.
-        shdr.find_map(|sh| {
-            let hdr = Elf64SHdr::from_bytes(sh)?;
-            let name_offset = strhdr.0.sh_offset.checked_add(u64::from(hdr.0.sh_name))?;
+        shdr_iter.find_map(|sh_bytes| {
+            let sh = F::SectionHeader::from_bytes(sh_bytes)?;
+            let name_offset = strhdr.offset().checked_add(u64::from(sh.name()))?;
             let section_name = elf_str(elf, name_offset)?;
 
             if section_name != name {
                 return None;
             }
 
-            let start = usize::try_from(hdr.0.sh_offset).ok()?;
-            let end = usize::try_from(hdr.0.sh_size)
+            let start = usize::try_from(sh.offset()).ok()?;
+            let end = usize::try_from(sh.size())
                 .ok()
-                .and_then(|sh_size| start.checked_add(sh_size))?;
+                .and_then(|sz| start.checked_add(sz))?;
 
             elf.get(start..end)
         })
     }
+
+    /// Tries to extract section with name `name` from the ELF64 image `elf`, and returns it.
+    pub(super) fn elf64_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
+        elf_section_generic::<Elf64Format>(elf, name)
+    }
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 08/22] gpu: nova-core: add support for 32-bit firmware images
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (6 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 07/22] gpu: nova-core: don't assume 64-bit firmware images John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 09/22] gpu: nova-core: add auto-detection of 32-bit, 64-bit " John Hubbard
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Some GPU firmware images are packaged as 32-bit ELF rather than 64-bit.
Add a 32-bit implementation of the shared ELF section-parsing
abstraction so those images can be parsed alongside the existing 64-bit
path.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 53 +++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 38088e950980..e4dcc9a87b7e 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -534,6 +534,53 @@ impl ElfFormat for Elf64Format {
         type SectionHeader = Elf64SHdr;
     }
 
+    /// Newtype to provide [`FromBytes`] and [`ElfHeader`] implementations for ELF32.
+    #[repr(transparent)]
+    struct Elf32Hdr(bindings::elf32_hdr);
+    // SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+    unsafe impl FromBytes for Elf32Hdr {}
+
+    impl ElfHeader for Elf32Hdr {
+        fn shnum(&self) -> u16 {
+            self.0.e_shnum
+        }
+
+        fn shoff(&self) -> u64 {
+            u64::from(self.0.e_shoff)
+        }
+
+        fn shstrndx(&self) -> u16 {
+            self.0.e_shstrndx
+        }
+    }
+
+    /// Newtype to provide [`FromBytes`] and [`ElfSectionHeader`] implementations for ELF32.
+    #[repr(transparent)]
+    struct Elf32SHdr(bindings::elf32_shdr);
+    // SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+    unsafe impl FromBytes for Elf32SHdr {}
+
+    impl ElfSectionHeader for Elf32SHdr {
+        fn name(&self) -> u32 {
+            self.0.sh_name
+        }
+
+        fn offset(&self) -> u64 {
+            u64::from(self.0.sh_offset)
+        }
+
+        fn size(&self) -> u64 {
+            u64::from(self.0.sh_size)
+        }
+    }
+
+    struct Elf32Format;
+
+    impl ElfFormat for Elf32Format {
+        type Header = Elf32Hdr;
+        type SectionHeader = Elf32SHdr;
+    }
+
     /// Returns a NULL-terminated string from the ELF image at `offset`.
     fn elf_str(elf: &[u8], offset: u64) -> Option<&str> {
         let idx = usize::try_from(offset).ok()?;
@@ -586,4 +633,10 @@ fn elf_section_generic<'a, F>(elf: &'a [u8], name: &str) -> Option<&'a [u8]>
     pub(super) fn elf64_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
         elf_section_generic::<Elf64Format>(elf, name)
     }
+
+    /// Extract the section with name `name` from the ELF32 image `elf`.
+    #[expect(dead_code)]
+    pub(super) fn elf32_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
+        elf_section_generic::<Elf32Format>(elf, name)
+    }
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 09/22] gpu: nova-core: add auto-detection of 32-bit, 64-bit firmware images
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (7 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 08/22] gpu: nova-core: add support for 32-bit " John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  3:20 ` [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub John Hubbard
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

A firmware image may be either a 32-bit or a 64-bit ELF, and callers
should not have to know which. Detect the ELF class from the image
header at parse time and dispatch to the matching parser, so a single
entry point handles both layouts.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs     | 27 +++++++++++++++++++++++----
 drivers/gpu/nova-core/firmware/gsp.rs |  4 ++--
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index e4dcc9a87b7e..87588cb24f11 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -629,14 +629,33 @@ fn elf_section_generic<'a, F>(elf: &'a [u8], name: &str) -> Option<&'a [u8]>
         })
     }
 
-    /// Tries to extract section with name `name` from the ELF64 image `elf`, and returns it.
-    pub(super) fn elf64_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
+    /// Extract the section with name `name` from the ELF64 image `elf`.
+    fn elf64_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
         elf_section_generic::<Elf64Format>(elf, name)
     }
 
     /// Extract the section with name `name` from the ELF32 image `elf`.
-    #[expect(dead_code)]
-    pub(super) fn elf32_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
+    fn elf32_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
         elf_section_generic::<Elf32Format>(elf, name)
     }
+
+    /// Automatically detects ELF32 vs ELF64 based on the ELF header.
+    pub(super) fn elf_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a [u8]> {
+        // ELF identification: a 4-byte magic followed by a class byte (32- vs 64-bit).
+        const ELFMAG: &[u8] = b"\x7fELF";
+        const SELFMAG: usize = ELFMAG.len();
+        const EI_CLASS: usize = 4;
+        const ELFCLASS32: u8 = 1;
+        const ELFCLASS64: u8 = 2;
+
+        if elf.get(0..SELFMAG) != Some(ELFMAG) {
+            return None;
+        }
+
+        match *elf.get(EI_CLASS)? {
+            ELFCLASS32 => elf32_section(elf, name),
+            ELFCLASS64 => elf64_section(elf, name),
+            _ => None,
+        }
+    }
 }
diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
index e576bc8a9b1c..99a302bae567 100644
--- a/drivers/gpu/nova-core/firmware/gsp.rs
+++ b/drivers/gpu/nova-core/firmware/gsp.rs
@@ -88,7 +88,7 @@ pub(crate) fn new<'a>(
         pin_init::pin_init_scope(move || {
             let firmware = super::request_firmware(dev, chipset, "gsp", ver)?;
 
-            let fw_section = elf::elf64_section(firmware.data(), ".fwimage").ok_or(EINVAL)?;
+            let fw_section = elf::elf_section(firmware.data(), ".fwimage").ok_or(EINVAL)?;
 
             let size = fw_section.len();
 
@@ -148,7 +148,7 @@ pub(crate) fn new<'a>(
                 signatures: {
                     let sigs_section = Self::find_gsp_sigs_section(chipset);
 
-                    elf::elf64_section(firmware.data(), sigs_section)
+                    elf::elf_section(firmware.data(), sigs_section)
                         .ok_or(EINVAL)
                         .and_then(|data| Coherent::from_slice(dev, data, GFP_KERNEL))?
                 },
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (8 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 09/22] gpu: nova-core: add auto-detection of 32-bit, 64-bit " John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  6:50   ` Eliot Courtney
  2026-06-02  3:20 ` [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image John Hubbard
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Add the FSP (Foundation Security Processor) falcon engine type that
will handle secure boot and Chain of Trust operations on Hopper and
Blackwell architectures.

The FSP falcon replaces SEC2's role in the boot sequence for these newer
architectures. This initial stub just defines the falcon type and its
base address.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs     |  1 +
 drivers/gpu/nova-core/falcon/fsp.rs | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)
 create mode 100644 drivers/gpu/nova-core/falcon/fsp.rs

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 24cc2c26e28d..053ce5bea6cd 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -40,6 +40,7 @@
     regs,
 };
 
+pub(crate) mod fsp;
 pub(crate) mod gsp;
 mod hal;
 pub(crate) mod sec2;
diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
new file mode 100644
index 000000000000..c4a9ce8a47f8
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+//! FSP (Foundation Security Processor) falcon engine for Hopper/Blackwell GPUs.
+//!
+//! The FSP falcon handles secure boot and Chain of Trust operations
+//! on Hopper and Blackwell architectures, replacing SEC2's role.
+
+use kernel::io::register::RegisterBase;
+
+use crate::falcon::{
+    FalconEngine,
+    PFalcon2Base,
+    PFalconBase, //
+};
+
+/// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
+#[expect(dead_code)]
+pub(crate) struct Fsp(());
+
+impl RegisterBase<PFalconBase> for Fsp {
+    const BASE: usize = 0x8f2000;
+}
+
+impl RegisterBase<PFalcon2Base> for Fsp {
+    const BASE: usize = 0x8f3000;
+}
+
+impl FalconEngine for Fsp {}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (9 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub John Hubbard
@ 2026-06-02  3:20 ` John Hubbard
  2026-06-02  7:18   ` Eliot Courtney
  2026-06-02  3:21 ` [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting John Hubbard
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:20 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

FSP is the Falcon that runs FMC firmware on Hopper and Blackwell.
Load the FMC ELF in two forms: the image section that FSP boots from,
and the full Firmware object for later signature extraction during
Chain of Trust verification. Declare the FMC image in the module's
firmware table so it is bundled for FSP-based chipsets.

Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs      |  9 ++++-
 drivers/gpu/nova-core/firmware/fsp.rs  | 47 ++++++++++++++++++++++++++
 drivers/gpu/nova-core/gpu.rs           |  9 +++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 10 ++++--
 4 files changed, 72 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/nova-core/firmware/fsp.rs

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 87588cb24f11..366d3b76360e 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -28,6 +28,7 @@
 };
 
 pub(crate) mod booter;
+pub(crate) mod fsp;
 pub(crate) mod fwsec;
 pub(crate) mod gsp;
 pub(crate) mod riscv;
@@ -431,10 +432,16 @@ const fn make_entry_chipset(self, chipset: gpu::Chipset) -> Self {
             .make_entry_file(name, "bootloader")
             .make_entry_file(name, "gsp");
 
-        if chipset.needs_fwsec_bootloader() {
+        let this = if chipset.needs_fwsec_bootloader() {
             this.make_entry_file(name, "gen_bootloader")
         } else {
             this
+        };
+
+        if chipset.uses_fsp() {
+            this.make_entry_file(name, "fmc")
+        } else {
+            this
         }
     }
 
diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
new file mode 100644
index 000000000000..011be1e571c2
--- /dev/null
+++ b/drivers/gpu/nova-core/firmware/fsp.rs
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+//! FSP is a hardware unit that runs FMC firmware.
+
+use kernel::{
+    device,
+    dma::Coherent,
+    firmware::Firmware,
+    prelude::*, //
+};
+
+use crate::{
+    firmware::elf,
+    gpu::Chipset, //
+};
+
+pub(crate) struct FspFirmware {
+    /// FMC firmware image data (only the "image" ELF section).
+    #[expect(dead_code)]
+    pub(crate) fmc_image: Coherent<[u8]>,
+    /// Full FMC ELF for signature extraction.
+    #[expect(dead_code)]
+    pub(crate) fmc_elf: Firmware,
+}
+
+impl FspFirmware {
+    pub(crate) fn new(
+        dev: &device::Device<device::Bound>,
+        chipset: Chipset,
+        ver: &str,
+    ) -> Result<Self> {
+        let fw = super::request_firmware(dev, chipset, "fmc", ver)?;
+
+        // FSP expects only the "image" section, not the entire ELF file.
+        let fmc_image_data = elf::elf_section(fw.data(), "image").ok_or_else(|| {
+            dev_err!(dev, "FMC ELF file missing 'image' section\n");
+            EINVAL
+        })?;
+        let fmc_image = Coherent::from_slice(dev, fmc_image_data, GFP_KERNEL)?;
+
+        Ok(Self {
+            fmc_image,
+            fmc_elf: fw,
+        })
+    }
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 7dd736e5b190..b7341bde04be 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -137,6 +137,15 @@ pub(crate) const fn needs_fwsec_bootloader(self) -> bool {
         matches!(self.arch(), Architecture::Turing) || matches!(self, Self::GA100)
     }
 
+    /// Returns `true` if this chipset boots via FSP (Hopper and later), which requires the FMC
+    /// firmware image.
+    pub(crate) const fn uses_fsp(self) -> bool {
+        matches!(
+            self.arch(),
+            Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x
+        )
+    }
+
     /// Returns the address range of the PCI config mirror space.
     pub(crate) fn pci_config_mirror_range(self) -> Range<u32> {
         hal::gpu_hal(self).pci_config_mirror_range()
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 9a4bb22578b3..9681f9a73e86 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -16,6 +16,10 @@
         Falcon, //
     },
     fb::FbLayout,
+    firmware::{
+        fsp::FspFirmware,
+        FIRMWARE_VERSION, //
+    },
     gpu::Chipset,
     gsp::{
         boot::BootUnloadGuard,
@@ -35,14 +39,16 @@ impl GspHal for Gh100 {
     fn boot<'a>(
         &self,
         _gsp: &'a Gsp,
-        _dev: &'a device::Device<device::Bound>,
+        dev: &'a device::Device<device::Bound>,
         _bar: &'a Bar0,
-        _chipset: Chipset,
+        chipset: Chipset,
         _fb_layout: &FbLayout,
         _wpr_meta: &Coherent<GspFwWprMeta>,
         _gsp_falcon: &'a Falcon<GspEngine>,
         _sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
+        let _fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
+
         Err(ENOTSUPP)
     }
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (10 preceding siblings ...)
  2026-06-02  3:20 ` [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  7:56   ` Eliot Courtney
  2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Hopper and Blackwell use FSP instead of SEC2 for secure boot. The
driver must wait for FSP secure boot to complete before continuing
with GSP bring-up. Poll for boot success with a 5-second timeout, and
return the FSP interface only on success so that later Chain of Trust
operations cannot run before FSP is ready. The interface owns the FSP
falcon and the FMC firmware.

Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs    |  1 -
 drivers/gpu/nova-core/fsp.rs           | 73 ++++++++++++++++++++++++++
 drivers/gpu/nova-core/fsp/hal.rs       | 27 ++++++++++
 drivers/gpu/nova-core/fsp/hal/gb202.rs | 23 ++++++++
 drivers/gpu/nova-core/fsp/hal/gh100.rs | 23 ++++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  6 ++-
 drivers/gpu/nova-core/nova_core.rs     |  1 +
 drivers/gpu/nova-core/regs.rs          | 36 +++++++++++++
 8 files changed, 187 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/nova-core/fsp.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal/gb202.rs
 create mode 100644 drivers/gpu/nova-core/fsp/hal/gh100.rs

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index c4a9ce8a47f8..d9f87262e8b1 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -15,7 +15,6 @@
 };
 
 /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
-#[expect(dead_code)]
 pub(crate) struct Fsp(());
 
 impl RegisterBase<PFalconBase> for Fsp {
diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
new file mode 100644
index 000000000000..f3524137d9f7
--- /dev/null
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+//! FSP (Foundation Security Processor) interface for Hopper/Blackwell GPUs.
+//!
+//! Hopper/Blackwell use a simplified firmware boot sequence: FMC, then FSP, then GSP.
+//! Unlike Turing/Ampere/Ada, there is no SEC2 (Security Engine 2) usage.
+//! FSP handles secure boot directly using FMC firmware and Chain of Trust.
+
+use kernel::{
+    device,
+    io::poll::read_poll_timeout,
+    prelude::*,
+    time::Delta, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        fsp::Fsp as FspEngine,
+        Falcon, //
+    },
+    firmware::fsp::FspFirmware,
+    gpu::Chipset,
+    regs, //
+};
+
+mod hal;
+
+/// FSP interface for Hopper/Blackwell GPUs.
+///
+/// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
+/// has completed. It owns the FSP falcon and the FMC firmware, which are used for the subsequent
+/// Chain of Trust boot.
+pub(crate) struct Fsp {
+    #[expect(dead_code)]
+    falcon: Falcon<FspEngine>,
+    #[expect(dead_code)]
+    fsp_fw: FspFirmware,
+}
+
+impl Fsp {
+    /// Waits for FSP secure boot completion, then returns the [`Fsp`] interface.
+    ///
+    /// Polls the thermal scratch register until FSP signals boot completion or the timeout
+    /// elapses. Returning an [`Fsp`] only on success guarantees, at the API level, that the
+    /// interface is not used before secure boot has completed.
+    pub(crate) fn wait_secure_boot(
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        fsp_fw: FspFirmware,
+    ) -> Result<Fsp> {
+        /// FSP secure boot completion timeout in milliseconds.
+        const FSP_SECURE_BOOT_TIMEOUT_MS: i64 = 5000;
+
+        let hal = hal::fsp_hal(chipset).ok_or(ENOTSUPP)?;
+        let falcon = Falcon::<FspEngine>::new(dev, chipset)?;
+
+        read_poll_timeout(
+            || Ok(hal.fsp_boot_status(bar)),
+            |&status| status == regs::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS,
+            Delta::from_millis(10),
+            Delta::from_millis(FSP_SECURE_BOOT_TIMEOUT_MS),
+        )
+        .map_err(|_| {
+            dev_err!(dev, "FSP secure boot completion timeout\n");
+            ETIMEDOUT
+        })?;
+
+        Ok(Fsp { falcon, fsp_fw })
+    }
+}
diff --git a/drivers/gpu/nova-core/fsp/hal.rs b/drivers/gpu/nova-core/fsp/hal.rs
new file mode 100644
index 000000000000..83d1e7daa998
--- /dev/null
+++ b/drivers/gpu/nova-core/fsp/hal.rs
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use crate::{
+    driver::Bar0,
+    gpu::{
+        Architecture,
+        Chipset, //
+    },
+};
+
+mod gb202;
+mod gh100;
+
+pub(super) trait FspHal {
+    /// Returns the secure boot status from the architecture-specific `NV_THERM_I2CS_SCRATCH` register.
+    fn fsp_boot_status(&self, bar: &Bar0) -> u32;
+}
+
+/// Returns the FSP HAL, or `None` if the architecture doesn't support FSP.
+pub(crate) fn fsp_hal(chipset: Chipset) -> Option<&'static dyn FspHal> {
+    match chipset.arch() {
+        Architecture::Turing | Architecture::Ampere | Architecture::Ada => None,
+        Architecture::Hopper | Architecture::BlackwellGB10x => Some(gh100::GH100_HAL),
+        Architecture::BlackwellGB20x => Some(gb202::GB202_HAL),
+    }
+}
diff --git a/drivers/gpu/nova-core/fsp/hal/gb202.rs b/drivers/gpu/nova-core/fsp/hal/gb202.rs
new file mode 100644
index 000000000000..2f08b6c9f308
--- /dev/null
+++ b/drivers/gpu/nova-core/fsp/hal/gb202.rs
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::io::Io;
+
+use crate::{
+    driver::Bar0,
+    fsp::hal::FspHal,
+    regs, //
+};
+
+struct Gb202;
+
+impl FspHal for Gb202 {
+    fn fsp_boot_status(&self, bar: &Bar0) -> u32 {
+        bar.read(regs::gb202::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE)
+            .fsp_boot_complete()
+            .into()
+    }
+}
+
+const GB202: Gb202 = Gb202;
+pub(super) const GB202_HAL: &dyn FspHal = &GB202;
diff --git a/drivers/gpu/nova-core/fsp/hal/gh100.rs b/drivers/gpu/nova-core/fsp/hal/gh100.rs
new file mode 100644
index 000000000000..290fb55a81da
--- /dev/null
+++ b/drivers/gpu/nova-core/fsp/hal/gh100.rs
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::io::Io;
+
+use crate::{
+    driver::Bar0,
+    fsp::hal::FspHal,
+    regs, //
+};
+
+struct Gh100;
+
+impl FspHal for Gh100 {
+    fn fsp_boot_status(&self, bar: &Bar0) -> u32 {
+        bar.read(regs::gh100::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE)
+            .fsp_boot_complete()
+            .into()
+    }
+}
+
+const GH100: Gh100 = Gh100;
+pub(super) const GH100_HAL: &dyn FspHal = &GH100;
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 9681f9a73e86..b25970dd4561 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -20,6 +20,7 @@
         fsp::FspFirmware,
         FIRMWARE_VERSION, //
     },
+    fsp::Fsp,
     gpu::Chipset,
     gsp::{
         boot::BootUnloadGuard,
@@ -40,14 +41,15 @@ fn boot<'a>(
         &self,
         _gsp: &'a Gsp,
         dev: &'a device::Device<device::Bound>,
-        _bar: &'a Bar0,
+        bar: &'a Bar0,
         chipset: Chipset,
         _fb_layout: &FbLayout,
         _wpr_meta: &Coherent<GspFwWprMeta>,
         _gsp_falcon: &'a Falcon<GspEngine>,
         _sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
-        let _fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
+        let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
+        let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
 
         Err(ENOTSUPP)
     }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 5a260062295f..7b6c331da10e 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -17,6 +17,7 @@
 mod falcon;
 mod fb;
 mod firmware;
+mod fsp;
 mod gpu;
 mod gsp;
 #[macro_use]
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index b39647684dd1..2cb1f02f35a4 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -587,3 +587,39 @@ pub(crate) mod ga100 {
         }
     }
 }
+
+pub(crate) const NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS: u32 = 0xff;
+
+pub(crate) mod gh100 {
+    use kernel::io::register;
+
+    // PTHERM
+
+    register! {
+        pub(crate) NV_THERM_I2CS_SCRATCH(u32) @ 0x000200bc {
+            31:0    data;
+        }
+
+        // Alias to `NV_THERM_I2CS_SCRATCH` when used to check for FSP boot completion.
+        pub(crate) NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE(u32) => NV_THERM_I2CS_SCRATCH {
+            31:0    fsp_boot_complete;
+        }
+    }
+}
+
+pub(crate) mod gb202 {
+    use kernel::io::register;
+
+    // PTHERM
+
+    register! {
+        pub(crate) NV_THERM_I2CS_SCRATCH(u32) @ 0x00ad00bc {
+            31:0    data;
+        }
+
+        // Alias to `NV_THERM_I2CS_SCRATCH` when used to check for FSP boot completion.
+        pub(crate) NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE(u32) => NV_THERM_I2CS_SCRATCH {
+            31:0    fsp_boot_complete;
+        }
+    }
+}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (11 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:32   ` sashiko-bot
                     ` (2 more replies)
  2026-06-02  3:21 ` [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations John Hubbard
                   ` (10 subsequent siblings)
  23 siblings, 3 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Extract the SHA-384 hash, RSA public key, and RSA signature from the
FMC ELF32 firmware sections. FSP Chain of Trust verification needs
these to validate the FMC image during boot.

Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/firmware/fsp.rs | 94 ++++++++++++++++++++++++++-
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
index 011be1e571c2..db61905eac9d 100644
--- a/drivers/gpu/nova-core/firmware/fsp.rs
+++ b/drivers/gpu/nova-core/firmware/fsp.rs
@@ -15,13 +15,35 @@
     gpu::Chipset, //
 };
 
+/// Size of the FSP SHA-384 hash, in bytes.
+const FSP_HASH_SIZE: usize = 48;
+/// Maximum size of the FSP public key (RSA-3072), in bytes.
+///
+/// The FMC ELF `publickey` section may be shorter, so the remaining bytes are zero-padded.
+const FSP_PKEY_SIZE: usize = 384;
+/// Maximum size of the FSP signature (RSA-3072), in bytes.
+///
+/// The FMC ELF `signature` section may be shorter, so the remaining bytes are zero-padded.
+const FSP_SIG_SIZE: usize = 384;
+
+/// Structure to hold FMC signatures.
+///
+/// C representation is used because this type is used for communication with the FSP.
+#[derive(Debug, Clone, Copy)]
+#[repr(C)]
+pub(crate) struct FmcSignatures {
+    pub(crate) hash384: [u8; FSP_HASH_SIZE],
+    pub(crate) public_key: [u8; FSP_PKEY_SIZE],
+    pub(crate) signature: [u8; FSP_SIG_SIZE],
+}
+
 pub(crate) struct FspFirmware {
     /// FMC firmware image data (only the "image" ELF section).
     #[expect(dead_code)]
     pub(crate) fmc_image: Coherent<[u8]>,
-    /// Full FMC ELF for signature extraction.
+    /// FMC firmware signatures.
     #[expect(dead_code)]
-    pub(crate) fmc_elf: Firmware,
+    pub(crate) fmc_sigs: KBox<FmcSignatures>,
 }
 
 impl FspFirmware {
@@ -41,7 +63,73 @@ pub(crate) fn new(
 
         Ok(Self {
             fmc_image,
-            fmc_elf: fw,
+            fmc_sigs: Self::extract_fmc_signatures(&fw, dev)?,
         })
     }
+
+    /// Extract FMC firmware signatures for Chain of Trust verification.
+    ///
+    /// Extracts real cryptographic signatures from FMC ELF32 firmware sections.
+    /// Returns signatures in a heap-allocated structure to prevent stack overflow.
+    fn extract_fmc_signatures(
+        fmc_fw: &Firmware,
+        dev: &device::Device,
+    ) -> Result<KBox<FmcSignatures>> {
+        let get_section = |name: &str, max_len: usize| {
+            elf::elf_section(fmc_fw.data(), name)
+                .ok_or(EINVAL)
+                .inspect_err(|_| dev_err!(dev, "FMC firmware missing '{}' section\n", name))
+                .and_then(|section| {
+                    if section.len() > max_len {
+                        dev_err!(
+                            dev,
+                            "FMC {} section size {} > maximum {}\n",
+                            name,
+                            section.len(),
+                            max_len
+                        );
+                        Err(EINVAL)
+                    } else {
+                        Ok(section)
+                    }
+                })
+        };
+
+        let hash_section = get_section("hash", FSP_HASH_SIZE)?;
+        let pkey_section = get_section("publickey", FSP_PKEY_SIZE)?;
+        let sig_section = get_section("signature", FSP_SIG_SIZE)?;
+
+        // The hash section is a SHA-384 output: it must be exactly FSP_HASH_SIZE bytes.
+        if hash_section.len() != FSP_HASH_SIZE {
+            dev_err!(
+                dev,
+                "FMC hash section size {} != expected {}\n",
+                hash_section.len(),
+                FSP_HASH_SIZE
+            );
+            return Err(EINVAL);
+        }
+
+        // Initialize the signatures in place to avoid building the large `FmcSignatures` on the
+        // stack, then fill each section from the firmware.
+        let signatures = KBox::init(
+            init!(FmcSignatures {
+                hash384: [0; _],
+                public_key: [0; _],
+                signature: [0; _],
+            })
+            .chain(|sigs| {
+                // PANIC: src and dst lengths are both FSP_HASH_SIZE (verified above).
+                sigs.hash384.copy_from_slice(hash_section);
+                // PANIC: dst is sliced to src.len(); src.len() <= FSP_PKEY_SIZE per `get_section`.
+                sigs.public_key[..pkey_section.len()].copy_from_slice(pkey_section);
+                // PANIC: dst is sliced to src.len(); src.len() <= FSP_SIG_SIZE per `get_section`.
+                sigs.signature[..sig_section.len()].copy_from_slice(sig_section);
+                Ok(())
+            }),
+            GFP_KERNEL,
+        )?;
+
+        Ok(signatures)
+    }
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (12 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02 11:42   ` Eliot Courtney
  2026-06-02 15:02   ` Alexandre Courbot
  2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
                   ` (9 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Add external memory (EMEM) read/write operations to the GPU's FSP falcon
engine. These operations use Falcon PIO (Programmed I/O) to communicate
with the FSP through indirect memory access.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs | 130 ++++++++++++++++++++++++++--
 drivers/gpu/nova-core/regs.rs       |  15 ++++
 2 files changed, 140 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index d9f87262e8b1..6b057d958115 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -6,12 +6,28 @@
 //! The FSP falcon handles secure boot and Chain of Trust operations
 //! on Hopper and Blackwell architectures, replacing SEC2's role.
 
-use kernel::io::register::RegisterBase;
+use kernel::{
+    io::{
+        register::{
+            RegisterBase,
+            WithBase, //
+        },
+        Io, //
+    },
+    num::Bounded,
+    prelude::*,
+    ptr::Alignment, //
+};
 
-use crate::falcon::{
-    FalconEngine,
-    PFalcon2Base,
-    PFalconBase, //
+use crate::{
+    driver::Bar0,
+    falcon::{
+        Falcon,
+        FalconEngine,
+        PFalcon2Base,
+        PFalconBase, //
+    },
+    regs,
 };
 
 /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
@@ -26,3 +42,107 @@ impl RegisterBase<PFalcon2Base> for Fsp {
 }
 
 impl FalconEngine for Fsp {}
+
+/// Maximum addressable EMEM size, derived from the 24-bit offset field
+/// in `NV_PFALCON_FALCON_EMEM_CTL`.
+const EMEM_MAX_SIZE: Alignment = Alignment::new::<{ 1 << 24 }>();
+
+/// I/O backend for the FSP falcon's external memory (EMEM).
+///
+/// `EMEM_CTL` is programmed once with a start offset and an auto-increment
+/// mode, then each access to `EMEM_DATA` advances the offset by one 32-bit
+/// word in hardware.
+struct Emem<'a> {
+    bar: &'a Bar0,
+}
+
+impl<'a> Emem<'a> {
+    fn new(bar: &'a Bar0) -> Self {
+        Self { bar }
+    }
+
+    /// Programs `EMEM_CTL` with the start byte `offset` and the `ctl` mode bits.
+    ///
+    /// Returns `EINVAL` if `offset` is outside the addressable EMEM window.
+    fn program(&mut self, offset: usize, ctl: regs::NV_PFALCON_FALCON_EMEM_CTL) -> Result {
+        let offset = Bounded::<usize, { EMEM_MAX_SIZE.log2() }>::try_new(offset)
+            .map(Bounded::cast::<u32>)
+            .ok_or(EINVAL)?;
+
+        self.bar
+            .write(WithBase::of::<Fsp>(), ctl.with_offset(offset));
+
+        Ok(())
+    }
+
+    /// Begins a write burst at byte `offset`, auto-incrementing on each write.
+    fn begin_write(&mut self, offset: usize) -> Result {
+        self.program(
+            offset,
+            regs::NV_PFALCON_FALCON_EMEM_CTL::zeroed().with_auto_increment_write(true),
+        )
+    }
+
+    /// Begins a read burst at byte `offset`, auto-incrementing on each read.
+    fn begin_read(&mut self, offset: usize) -> Result {
+        self.program(
+            offset,
+            regs::NV_PFALCON_FALCON_EMEM_CTL::zeroed().with_auto_increment_read(true),
+        )
+    }
+
+    /// Writes the next 32-bit `value`; hardware advances the offset.
+    fn write_next(&mut self, value: u32) {
+        self.bar.write(
+            WithBase::of::<Fsp>(),
+            regs::NV_PFALCON_FALCON_EMEM_DATA::zeroed().with_data(value),
+        );
+    }
+
+    /// Reads the next 32-bit word; hardware advances the offset.
+    fn read_next(&mut self) -> u32 {
+        self.bar
+            .read(regs::NV_PFALCON_FALCON_EMEM_DATA::of::<Fsp>())
+            .data()
+    }
+}
+
+impl Falcon<Fsp> {
+    /// Writes `data` to FSP external memory at byte `offset`.
+    ///
+    /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
+    /// if `offset` or the `data` length is not 4-byte aligned.
+    #[expect(dead_code)]
+    fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
+        if offset % 4 != 0 || data.len() % 4 != 0 {
+            return Err(EINVAL);
+        }
+
+        let mut emem = Emem::new(bar);
+        emem.begin_write(offset as usize)?;
+        for chunk in data.chunks_exact(4) {
+            emem.write_next(u32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
+        }
+
+        Ok(())
+    }
+
+    /// Reads FSP external memory at byte `offset` into `data`.
+    ///
+    /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
+    /// `offset` or the `data` length is not 4-byte aligned.
+    #[expect(dead_code)]
+    fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
+        if offset % 4 != 0 || data.len() % 4 != 0 {
+            return Err(EINVAL);
+        }
+
+        let mut emem = Emem::new(bar);
+        emem.begin_read(offset as usize)?;
+        for chunk in data.chunks_exact_mut(4) {
+            chunk.copy_from_slice(&emem.read_next().to_le_bytes());
+        }
+
+        Ok(())
+    }
+}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 2cb1f02f35a4..da7a10c0346a 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -475,6 +475,21 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
     pub(crate) NV_PFALCON_FBIF_CTL(u32) @ PFalconBase + 0x00000624 {
         7:7     allow_phys_no_ctx => bool;
     }
+
+    // Falcon EMEM PIO registers (used by FSP on Hopper/Blackwell).
+    // These provide the falcon external memory communication interface.
+    pub(crate) NV_PFALCON_FALCON_EMEM_CTL(u32) @ PFalconBase + 0x00000ac0 {
+        /// EMEM byte offset (must be 4-byte aligned).
+        23:0    offset;
+        /// Auto-increment the offset after each write.
+        24:24   auto_increment_write => bool;
+        /// Auto-increment the offset after each read.
+        25:25   auto_increment_read => bool;
+    }
+
+    pub(crate) NV_PFALCON_FALCON_EMEM_DATA(u32) @ PFalconBase + 0x00000ac4 {
+        31:0    data => u32;
+    }
 }
 
 impl NV_PFALCON_FALCON_DMACTL {
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (13 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:33   ` sashiko-bot
                     ` (2 more replies)
  2026-06-02  3:21 ` [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication John Hubbard
                   ` (8 subsequent siblings)
  23 siblings, 3 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

FSP communication uses a pair of non-circular queues in the FSP
falcon's EMEM, one for messages from the driver to FSP and one for
replies, with the driver polling for response data. Add the queue
registers and the low-level helpers used by the higher-level FSP
message layer.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs | 61 ++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs       | 21 ++++++++++
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index 6b057d958115..0ec1c55213bc 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -112,7 +112,6 @@ impl Falcon<Fsp> {
     ///
     /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
     /// if `offset` or the `data` length is not 4-byte aligned.
-    #[expect(dead_code)]
     fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
         if offset % 4 != 0 || data.len() % 4 != 0 {
             return Err(EINVAL);
@@ -131,7 +130,6 @@ fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
     ///
     /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
     /// `offset` or the `data` length is not 4-byte aligned.
-    #[expect(dead_code)]
     fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
         if offset % 4 != 0 || data.len() % 4 != 0 {
             return Err(EINVAL);
@@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
 
         Ok(())
     }
+
+    /// Poll FSP for incoming data.
+    ///
+    /// Returns the size of available data in bytes, or 0 if no data is available.
+    ///
+    /// The FSP message queue is not circular. Pointers are reset to 0 after each
+    /// message exchange, so `tail >= head` is always true when data is present.
+    #[expect(dead_code)]
+    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
+        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
+        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
+
+        if head == tail {
+            return 0;
+        }
+
+        // TAIL points at last DWORD written, so add 4 to get total size
+        tail.saturating_sub(head) + 4
+    }
+
+    /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
+    ///
+    /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
+    #[expect(dead_code)]
+    pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
+        if packet.is_empty() {
+            return Err(EINVAL);
+        }
+
+        // Write message to EMEM at offset 0 (validates 4-byte alignment)
+        self.write_emem(bar, 0, packet)?;
+
+        // Update queue pointers. TAIL points at the last DWORD written.
+        let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
+        bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
+        bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
+
+        Ok(())
+    }
+
+    /// Reads `size` bytes from FSP EMEM into `buffer` and resets the queue pointers.
+    ///
+    /// `size` comes from `poll_msgq`. Returns `EINVAL` if `size` is 0, exceeds
+    /// `buffer`, or is not 4-byte aligned.
+    #[expect(dead_code)]
+    pub(crate) fn recv_msg(&mut self, bar: &Bar0, buffer: &mut [u8], size: usize) -> Result {
+        if size == 0 || size > buffer.len() {
+            return Err(EINVAL);
+        }
+
+        // Read response from EMEM at offset 0 (validates 4-byte alignment)
+        self.read_emem(bar, 0, &mut buffer[..size])?;
+
+        // Reset message queue pointers after reading
+        bar.write_reg(regs::NV_PFSP_MSGQ_TAIL::zeroed().with_address(0));
+        bar.write_reg(regs::NV_PFSP_MSGQ_HEAD::zeroed().with_address(0));
+
+        Ok(())
+    }
 }
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index da7a10c0346a..8c51609d0281 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -576,6 +576,27 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
     }
 }
 
+// FSP (Foundation Security Processor) queue registers for Hopper/Blackwell Chain of Trust.
+// These registers manage falcon EMEM communication queues.
+
+register! {
+    pub(crate) NV_PFSP_QUEUE_HEAD(u32) @ 0x008f2c00 {
+        31:0    address => u32;
+    }
+
+    pub(crate) NV_PFSP_QUEUE_TAIL(u32) @ 0x008f2c04 {
+        31:0    address => u32;
+    }
+
+    pub(crate) NV_PFSP_MSGQ_HEAD(u32) @ 0x008f2c80 {
+        31:0    address => u32;
+    }
+
+    pub(crate) NV_PFSP_MSGQ_TAIL(u32) @ 0x008f2c84 {
+        31:0    address => u32;
+    }
+}
+
 // The modules below provide registers that are not identical on all supported chips. They should
 // only be used in HAL modules.
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (14 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  5:36   ` sashiko-bot
  2026-06-02 12:53   ` Eliot Courtney
  2026-06-02  3:21 ` [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging John Hubbard
                   ` (7 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Add the MCTP (Management Component Transport Protocol) and NVDM (NVIDIA
Device Management) wire-format types used for communication between the
kernel driver and GPU firmware processors.

This includes typed MCTP transport headers, NVDM message headers, and
NVDM message type identifiers. Both the FSP boot path and the upcoming
GSP RPC message queue share this protocol layer.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/mctp.rs      | 102 +++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 2 files changed, 103 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mctp.rs

diff --git a/drivers/gpu/nova-core/mctp.rs b/drivers/gpu/nova-core/mctp.rs
new file mode 100644
index 000000000000..a13146dc0cca
--- /dev/null
+++ b/drivers/gpu/nova-core/mctp.rs
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+//! MCTP/NVDM protocol types for NVIDIA GPU firmware communication.
+//!
+//! MCTP (Management Component Transport Protocol) carries NVDM (NVIDIA
+//! Device Management) messages between the kernel driver and GPU firmware
+//! processors such as FSP and GSP.
+
+#![expect(dead_code)]
+
+use kernel::pci::Vendor;
+
+/// NVDM message type identifiers carried over MCTP.
+#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
+#[repr(u8)]
+pub(crate) enum NvdmType {
+    #[default]
+    /// Chain of Trust boot message.
+    Cot = 0x14,
+    /// FSP command response.
+    FspResponse = 0x15,
+}
+
+impl TryFrom<u8> for NvdmType {
+    type Error = u8;
+
+    fn try_from(value: u8) -> Result<Self, Self::Error> {
+        match value {
+            x if x == u8::from(Self::Cot) => Ok(Self::Cot),
+            x if x == u8::from(Self::FspResponse) => Ok(Self::FspResponse),
+            _ => Err(value),
+        }
+    }
+}
+
+impl From<NvdmType> for u8 {
+    fn from(value: NvdmType) -> Self {
+        value as u8
+    }
+}
+
+bitfield! {
+    pub(crate) struct MctpHeader(u32), "MCTP transport header for NVIDIA firmware messages." {
+        31:31 som as bool, "Start-of-message bit.";
+        30:30 eom as bool, "End-of-message bit.";
+        29:28 seq as u8, "Packet sequence number.";
+        23:16 seid as u8, "Source endpoint ID.";
+    }
+}
+
+impl MctpHeader {
+    /// Builds a single-packet MCTP header (`SOM=1`, `EOM=1`, `SEQ=0`, `SEID=0`).
+    pub(crate) fn single_packet() -> Self {
+        Self::default().set_som(true).set_eom(true)
+    }
+
+    /// Returns whether this is a complete single-packet message (`SOM=1` and `EOM=1`).
+    pub(crate) fn is_single_packet(self) -> bool {
+        self.som() && self.eom()
+    }
+}
+
+impl From<u32> for MctpHeader {
+    fn from(raw: u32) -> Self {
+        Self(raw)
+    }
+}
+
+/// MCTP message type for PCI vendor-defined messages.
+const MSG_TYPE_VENDOR_PCI: u8 = 0x7e;
+
+bitfield! {
+    pub(crate) struct NvdmHeader(u32), "NVIDIA Vendor-Defined Message header over MCTP." {
+        31:24 nvdm_type as u8 ?=> NvdmType, "NVDM message type.";
+        23:8 vendor_id as u16, "PCI vendor ID.";
+        6:0 msg_type as u8, "MCTP vendor-defined message type.";
+    }
+}
+
+impl NvdmHeader {
+    /// Builds an NVDM header for the given message type.
+    pub(crate) fn new(nvdm_type: NvdmType) -> Self {
+        Self::default()
+            .set_msg_type(MSG_TYPE_VENDOR_PCI)
+            .set_vendor_id(Vendor::NVIDIA.as_raw())
+            .set_nvdm_type(nvdm_type)
+    }
+
+    /// Validates this header against the expected NVIDIA NVDM format and type.
+    pub(crate) fn validate(self, expected_type: NvdmType) -> bool {
+        self.msg_type() == MSG_TYPE_VENDOR_PCI
+            && self.vendor_id() == Vendor::NVIDIA.as_raw()
+            && matches!(self.nvdm_type(), Ok(nvdm_type) if nvdm_type == expected_type)
+    }
+}
+
+impl From<u32> for NvdmHeader {
+    fn from(raw: u32) -> Self {
+        Self(raw)
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 7b6c331da10e..9f0199f7b38c 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -20,6 +20,7 @@
 mod fsp;
 mod gpu;
 mod gsp;
+mod mctp;
 #[macro_use]
 mod num;
 mod regs;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (15 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:35   ` sashiko-bot
  2026-06-02  3:21 ` [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version John Hubbard
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

FSP exchanges are request/response: the driver sends an MCTP/NVDM
message and must match the reply against the request before acting on
it. Add the synchronous send-and-wait path that validates the response
transport and message headers and confirms the reply corresponds to the
request that was sent.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs |   3 -
 drivers/gpu/nova-core/fsp.rs        | 121 +++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index 0ec1c55213bc..3df1cb2385cb 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -150,7 +150,6 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
     ///
     /// The FSP message queue is not circular. Pointers are reset to 0 after each
     /// message exchange, so `tail >= head` is always true when data is present.
-    #[expect(dead_code)]
     pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
         let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
         let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
@@ -166,7 +165,6 @@ pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
     /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
     ///
     /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
-    #[expect(dead_code)]
     pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
         if packet.is_empty() {
             return Err(EINVAL);
@@ -187,7 +185,6 @@ pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
     ///
     /// `size` comes from `poll_msgq`. Returns `EINVAL` if `size` is 0, exceeds
     /// `buffer`, or is not 4-byte aligned.
-    #[expect(dead_code)]
     pub(crate) fn recv_msg(&mut self, bar: &Bar0, buffer: &mut [u8], size: usize) -> Result {
         if size == 0 || size > buffer.len() {
             return Err(EINVAL);
diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index f3524137d9f7..67cf83aba83c 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -11,7 +11,11 @@
     device,
     io::poll::read_poll_timeout,
     prelude::*,
-    time::Delta, //
+    time::Delta,
+    transmute::{
+        AsBytes,
+        FromBytes, //
+    },
 };
 
 use crate::{
@@ -22,18 +26,56 @@
     },
     firmware::fsp::FspFirmware,
     gpu::Chipset,
+    mctp::{
+        MctpHeader,
+        NvdmHeader,
+        NvdmType, //
+    },
+    num,
     regs, //
 };
 
 mod hal;
 
+/// FSP message timeout in milliseconds.
+const FSP_MSG_TIMEOUT_MS: i64 = 2000;
+
+/// FSP command response payload (`NVDM_PAYLOAD_COMMAND_RESPONSE`).
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct NvdmPayloadCommandResponse {
+    task_id: u32,
+    command_nvdm_type: u32,
+    error_code: u32,
+}
+
+/// Complete FSP response structure with MCTP and NVDM headers.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct FspResponse {
+    mctp_header: MctpHeader,
+    nvdm_header: NvdmHeader,
+    response: NvdmPayloadCommandResponse,
+}
+
+// SAFETY: FspResponse is a packed C struct with only integral fields.
+unsafe impl FromBytes for FspResponse {}
+
+/// Trait implemented by types representing a message to send to FSP.
+///
+/// This provides [`Fsp::send_sync_fsp`] with the information it needs to send
+/// a given message, following the same pattern as GSP's `CommandToGsp`.
+pub(crate) trait MessageToFsp: AsBytes {
+    /// NVDM type identifying this message to FSP.
+    const NVDM_TYPE: u32;
+}
+
 /// FSP interface for Hopper/Blackwell GPUs.
 ///
 /// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
 /// has completed. It owns the FSP falcon and the FMC firmware, which are used for the subsequent
 /// Chain of Trust boot.
 pub(crate) struct Fsp {
-    #[expect(dead_code)]
     falcon: Falcon<FspEngine>,
     #[expect(dead_code)]
     fsp_fw: FspFirmware,
@@ -70,4 +112,79 @@ pub(crate) fn wait_secure_boot(
 
         Ok(Fsp { falcon, fsp_fw })
     }
+
+    /// Sends a message to FSP and waits for the response.
+    #[expect(dead_code)]
+    fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: &Bar0, msg: &M) -> Result
+    where
+        M: MessageToFsp,
+    {
+        self.falcon.send_msg(bar, msg.as_bytes())?;
+
+        let packet_size = read_poll_timeout(
+            || Ok(self.falcon.poll_msgq(bar)),
+            |&size| size > 0,
+            Delta::from_millis(10),
+            Delta::from_millis(FSP_MSG_TIMEOUT_MS),
+        )
+        .map_err(|_| {
+            dev_err!(dev, "FSP response timeout\n");
+            ETIMEDOUT
+        })?;
+
+        let packet_size = num::u32_as_usize(packet_size);
+        let mut response_buf = KVec::<u8>::new();
+        response_buf.resize(packet_size, 0, GFP_KERNEL)?;
+        self.falcon.recv_msg(bar, &mut response_buf, packet_size)?;
+
+        let (response, _) = FspResponse::from_bytes_prefix(&response_buf[..]).ok_or_else(|| {
+            dev_err!(dev, "FSP response too small: {}\n", response_buf.len());
+            EIO
+        })?;
+
+        let mctp_header = response.mctp_header;
+        let nvdm_header = response.nvdm_header;
+        let command_nvdm_type = response.response.command_nvdm_type;
+        let error_code = response.response.error_code;
+
+        if !mctp_header.is_single_packet() {
+            dev_err!(
+                dev,
+                "Unexpected MCTP header in FSP reply: {:x?}\n",
+                mctp_header,
+            );
+            return Err(EIO);
+        }
+
+        if !nvdm_header.validate(NvdmType::FspResponse) {
+            dev_err!(
+                dev,
+                "Unexpected NVDM header in FSP reply: {:x?}\n",
+                nvdm_header,
+            );
+            return Err(EIO);
+        }
+
+        if command_nvdm_type != M::NVDM_TYPE {
+            dev_err!(
+                dev,
+                "Expected NVDM type {:#x} in reply, got {:#x}\n",
+                M::NVDM_TYPE,
+                command_nvdm_type
+            );
+            return Err(EIO);
+        }
+
+        if error_code != 0 {
+            dev_err!(
+                dev,
+                "NVDM command {:#x} failed with error {:#x}\n",
+                M::NVDM_TYPE,
+                error_code
+            );
+            return Err(EIO);
+        }
+
+        Ok(())
+    }
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (16 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02 12:55   ` Eliot Courtney
  2026-06-02  3:21 ` [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot John Hubbard
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

The FSP Chain of Trust handshake is versioned: Hopper speaks version 1
and Blackwell speaks version 2. Provide the version through the FSP HAL
so the boot message carries the value FSP expects, and so chipsets that
do not use FSP need not express a version at all.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/fsp/hal.rs       |  8 +++++++-
 drivers/gpu/nova-core/fsp/hal/gb100.rs | 23 +++++++++++++++++++++++
 drivers/gpu/nova-core/fsp/hal/gb202.rs |  4 ++++
 drivers/gpu/nova-core/fsp/hal/gh100.rs | 15 ++++++++++++---
 4 files changed, 46 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/nova-core/fsp/hal/gb100.rs

diff --git a/drivers/gpu/nova-core/fsp/hal.rs b/drivers/gpu/nova-core/fsp/hal.rs
index 83d1e7daa998..8f9c031102e0 100644
--- a/drivers/gpu/nova-core/fsp/hal.rs
+++ b/drivers/gpu/nova-core/fsp/hal.rs
@@ -9,19 +9,25 @@
     },
 };
 
+mod gb100;
 mod gb202;
 mod gh100;
 
 pub(super) trait FspHal {
     /// Returns the secure boot status from the architecture-specific `NV_THERM_I2CS_SCRATCH` register.
     fn fsp_boot_status(&self, bar: &Bar0) -> u32;
+
+    /// Returns the FSP Chain of Trust protocol version this chipset advertises.
+    #[expect(dead_code)]
+    fn cot_version(&self) -> u16;
 }
 
 /// Returns the FSP HAL, or `None` if the architecture doesn't support FSP.
 pub(crate) fn fsp_hal(chipset: Chipset) -> Option<&'static dyn FspHal> {
     match chipset.arch() {
         Architecture::Turing | Architecture::Ampere | Architecture::Ada => None,
-        Architecture::Hopper | Architecture::BlackwellGB10x => Some(gh100::GH100_HAL),
+        Architecture::Hopper => Some(gh100::GH100_HAL),
+        Architecture::BlackwellGB10x => Some(gb100::GB100_HAL),
         Architecture::BlackwellGB20x => Some(gb202::GB202_HAL),
     }
 }
diff --git a/drivers/gpu/nova-core/fsp/hal/gb100.rs b/drivers/gpu/nova-core/fsp/hal/gb100.rs
new file mode 100644
index 000000000000..d50aaba0a84f
--- /dev/null
+++ b/drivers/gpu/nova-core/fsp/hal/gb100.rs
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use crate::{
+    driver::Bar0,
+    fsp::hal::FspHal, //
+};
+
+struct Gb100;
+
+impl FspHal for Gb100 {
+    fn fsp_boot_status(&self, bar: &Bar0) -> u32 {
+        // GB10x shares Hopper's FSP secure boot status register.
+        super::gh100::fsp_boot_status_gh100(bar)
+    }
+
+    fn cot_version(&self) -> u16 {
+        2
+    }
+}
+
+const GB100: Gb100 = Gb100;
+pub(super) const GB100_HAL: &dyn FspHal = &GB100;
diff --git a/drivers/gpu/nova-core/fsp/hal/gb202.rs b/drivers/gpu/nova-core/fsp/hal/gb202.rs
index 2f08b6c9f308..2bca76c8fd64 100644
--- a/drivers/gpu/nova-core/fsp/hal/gb202.rs
+++ b/drivers/gpu/nova-core/fsp/hal/gb202.rs
@@ -17,6 +17,10 @@ fn fsp_boot_status(&self, bar: &Bar0) -> u32 {
             .fsp_boot_complete()
             .into()
     }
+
+    fn cot_version(&self) -> u16 {
+        2
+    }
 }
 
 const GB202: Gb202 = Gb202;
diff --git a/drivers/gpu/nova-core/fsp/hal/gh100.rs b/drivers/gpu/nova-core/fsp/hal/gh100.rs
index 290fb55a81da..c38a7e96eb60 100644
--- a/drivers/gpu/nova-core/fsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/fsp/hal/gh100.rs
@@ -11,11 +11,20 @@
 
 struct Gh100;
 
+/// Reads the FSP secure boot status from the Hopper/GB10x thermal scratch register.
+pub(super) fn fsp_boot_status_gh100(bar: &Bar0) -> u32 {
+    bar.read(regs::gh100::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE)
+        .fsp_boot_complete()
+        .into()
+}
+
 impl FspHal for Gh100 {
     fn fsp_boot_status(&self, bar: &Bar0) -> u32 {
-        bar.read(regs::gh100::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE)
-            .fsp_boot_complete()
-            .into()
+        fsp_boot_status_gh100(bar)
+    }
+
+    fn cot_version(&self) -> u16 {
+        1
     }
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (17 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:40   ` sashiko-bot
  2026-06-03  5:19   ` Alexandre Courbot
  2026-06-02  3:21 ` [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling John Hubbard
                   ` (4 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

Build and send the Chain of Trust message to FSP, bundling the
DMA-coherent boot parameters that FSP reads at boot time.

Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/firmware/fsp.rs         |   2 -
 drivers/gpu/nova-core/fsp.rs                  | 140 +++++++++++++++++-
 drivers/gpu/nova-core/fsp/hal.rs              |   1 -
 drivers/gpu/nova-core/gsp.rs                  |   1 +
 drivers/gpu/nova-core/gsp/fw.rs               |  64 ++++++++
 .../gpu/nova-core/gsp/fw/r570_144/bindings.rs |  82 ++++++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs        |  23 ++-
 drivers/gpu/nova-core/mctp.rs                 |   2 -
 8 files changed, 302 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
index db61905eac9d..938aa3a3bad5 100644
--- a/drivers/gpu/nova-core/firmware/fsp.rs
+++ b/drivers/gpu/nova-core/firmware/fsp.rs
@@ -39,10 +39,8 @@ pub(crate) struct FmcSignatures {
 
 pub(crate) struct FspFirmware {
     /// FMC firmware image data (only the "image" ELF section).
-    #[expect(dead_code)]
     pub(crate) fmc_image: Coherent<[u8]>,
     /// FMC firmware signatures.
-    #[expect(dead_code)]
     pub(crate) fmc_sigs: KBox<FmcSignatures>,
 }
 
diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 67cf83aba83c..352ef7683cf2 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -9,8 +9,14 @@
 
 use kernel::{
     device,
+    dma::Coherent,
     io::poll::read_poll_timeout,
     prelude::*,
+    ptr::{
+        Alignable,
+        Alignment, //
+    },
+    sizes::SZ_2M,
     time::Delta,
     transmute::{
         AsBytes,
@@ -24,8 +30,13 @@
         fsp::Fsp as FspEngine,
         Falcon, //
     },
-    firmware::fsp::FspFirmware,
+    fb::FbLayout,
+    firmware::fsp::{
+        FmcSignatures,
+        FspFirmware, //
+    },
     gpu::Chipset,
+    gsp::GspFmcBootParams,
     mctp::{
         MctpHeader,
         NvdmHeader,
@@ -49,6 +60,35 @@ struct NvdmPayloadCommandResponse {
     error_code: u32,
 }
 
+/// NVDM (NVIDIA Device Management) CoT (Chain of Trust) payload, the main
+/// message body sent to FSP for Chain of Trust boot.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct NvdmPayloadCot {
+    version: u16,
+    size: u16,
+    gsp_fmc_sysmem_offset: u64,
+    frts_sysmem_offset: u64,
+    frts_sysmem_size: u32,
+    frts_vidmem_offset: u64,
+    frts_vidmem_size: u32,
+    sigs: FmcSignatures,
+    gsp_boot_args_sysmem_offset: u64,
+}
+
+/// Complete FSP message structure with MCTP and NVDM headers.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct FspMessage {
+    mctp_header: MctpHeader,
+    nvdm_header: NvdmHeader,
+    cot: NvdmPayloadCot,
+}
+
+// SAFETY: `FspMessage` is `#[repr(C, packed)]` with no padding, so all of its
+// bytes are initialized.
+unsafe impl AsBytes for FspMessage {}
+
 /// Complete FSP response structure with MCTP and NVDM headers.
 #[repr(C, packed)]
 #[derive(Clone, Copy)]
@@ -70,6 +110,44 @@ pub(crate) trait MessageToFsp: AsBytes {
     const NVDM_TYPE: u32;
 }
 
+impl MessageToFsp for FspMessage {
+    const NVDM_TYPE: u32 = NvdmType::Cot as u32;
+}
+
+/// Bundled arguments for FMC boot via FSP Chain of Trust.
+pub(crate) struct FmcBootArgs {
+    chipset: Chipset,
+    fmc_boot_params: Coherent<GspFmcBootParams>,
+    resume: bool,
+}
+
+impl FmcBootArgs {
+    /// Builds FMC boot arguments, allocating the DMA-coherent boot parameter
+    /// structure that FSP will read.
+    pub(crate) fn new(
+        dev: &device::Device<device::Bound>,
+        chipset: Chipset,
+        wpr_meta_addr: u64,
+        libos_addr: u64,
+        resume: bool,
+    ) -> Result<Self> {
+        let init = GspFmcBootParams::new(wpr_meta_addr, libos_addr);
+
+        Ok(Self {
+            chipset,
+            fmc_boot_params: Coherent::<GspFmcBootParams>::init(dev, GFP_KERNEL, init)?,
+            resume,
+        })
+    }
+
+    /// DMA address of the FMC boot parameters, needed after boot for lockdown
+    /// release polling.
+    #[expect(dead_code)]
+    pub(crate) fn boot_params_dma_handle(&self) -> u64 {
+        self.fmc_boot_params.dma_handle()
+    }
+}
+
 /// FSP interface for Hopper/Blackwell GPUs.
 ///
 /// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
@@ -77,7 +155,6 @@ pub(crate) trait MessageToFsp: AsBytes {
 /// Chain of Trust boot.
 pub(crate) struct Fsp {
     falcon: Falcon<FspEngine>,
-    #[expect(dead_code)]
     fsp_fw: FspFirmware,
 }
 
@@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
         Ok(Fsp { falcon, fsp_fw })
     }
 
+    /// Boots GSP FMC via FSP Chain of Trust.
+    ///
+    /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
+    /// to FSP, and waits for the response.
+    pub(crate) fn boot_fmc(
+        &mut self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        fb_layout: &FbLayout,
+        args: &FmcBootArgs,
+    ) -> Result {
+        dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
+
+        let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
+        let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
+
+        // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
+        let frts_offset = if !args.resume {
+            let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
+
+            frts_reserved_size
+                .align_up(Alignment::new::<SZ_2M>())
+                .ok_or(EINVAL)?
+        } else {
+            0
+        };
+        let frts_size: u32 = if !args.resume {
+            fb_layout.frts.len().try_into()?
+        } else {
+            0
+        };
+
+        let msg = KBox::new(
+            FspMessage {
+                mctp_header: MctpHeader::single_packet(),
+                nvdm_header: NvdmHeader::new(NvdmType::Cot),
+                cot: NvdmPayloadCot {
+                    version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
+                    size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
+                        .map_err(|_| EINVAL)?,
+                    gsp_fmc_sysmem_offset: fmc_addr,
+                    frts_sysmem_offset: 0,
+                    frts_sysmem_size: 0,
+                    frts_vidmem_offset: frts_offset,
+                    frts_vidmem_size: frts_size,
+                    sigs: *self.fsp_fw.fmc_sigs,
+                    gsp_boot_args_sysmem_offset: fmc_boot_params_addr,
+                },
+            },
+            GFP_KERNEL,
+        )?;
+
+        self.send_sync_fsp(dev, bar, &*msg)?;
+
+        dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
+        Ok(())
+    }
+
     /// Sends a message to FSP and waits for the response.
-    #[expect(dead_code)]
     fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: &Bar0, msg: &M) -> Result
     where
         M: MessageToFsp,
diff --git a/drivers/gpu/nova-core/fsp/hal.rs b/drivers/gpu/nova-core/fsp/hal.rs
index 8f9c031102e0..9b6c5a3bfb6b 100644
--- a/drivers/gpu/nova-core/fsp/hal.rs
+++ b/drivers/gpu/nova-core/fsp/hal.rs
@@ -18,7 +18,6 @@ pub(super) trait FspHal {
     fn fsp_boot_status(&self, bar: &Bar0) -> u32;
 
     /// Returns the FSP Chain of Trust protocol version this chipset advertises.
-    #[expect(dead_code)]
     fn cot_version(&self) -> u16;
 }
 
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 1885cfa5cb38..69175ca3315c 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -25,6 +25,7 @@
 mod sequencer;
 
 pub(crate) use fw::{
+    GspFmcBootParams,
     GspFwWprMeta,
     LibosParams, //
 };
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 0c54e8bf4bb3..558b37863f00 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -934,3 +934,67 @@ fn new(cmdq: &Cmdq) -> impl Init<Self> + '_ {
         })
     }
 }
+
+#[repr(u32)]
+pub(crate) enum GspDmaTarget {
+    #[expect(dead_code)]
+    LocalFb = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB,
+    CoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM,
+    NoncoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM,
+}
+
+type GspAcrBootGspRmParams = bindings::GSP_ACR_BOOT_GSP_RM_PARAMS;
+
+impl GspAcrBootGspRmParams {
+    fn new(target: GspDmaTarget, wpr_meta_addr: u64) -> impl Init<Self> {
+        #[allow(non_snake_case)]
+        let params = init!(Self {
+            target: target as u32,
+            gspRmDescSize: num::usize_into_u32::<{ size_of::<GspFwWprMeta>() }>(),
+            gspRmDescOffset: wpr_meta_addr,
+            bIsGspRmBoot: 1,
+            wprCarveoutOffset: 0,
+            wprCarveoutSize: 0,
+            __bindgen_padding_0: Default::default(),
+        });
+
+        params
+    }
+}
+
+type GspRmParams = bindings::GSP_RM_PARAMS;
+
+impl GspRmParams {
+    fn new(target: GspDmaTarget, libos_addr: u64) -> impl Init<Self> {
+        #[allow(non_snake_case)]
+        let params = init!(Self {
+            target: target as u32,
+            bootArgsOffset: libos_addr,
+            __bindgen_padding_0: Default::default(),
+        });
+
+        params
+    }
+}
+
+pub(crate) type GspFmcBootParams = bindings::GSP_FMC_BOOT_PARAMS;
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for GspFmcBootParams {}
+// SAFETY: This struct only contains integer types for which all bit patterns are valid.
+unsafe impl FromBytes for GspFmcBootParams {}
+
+impl GspFmcBootParams {
+    pub(crate) fn new(wpr_meta_addr: u64, libos_addr: u64) -> impl Init<Self> {
+        #[allow(non_snake_case)]
+        let init = init!(Self {
+            // Blackwell FSP obtains WPR info from other sources, so
+            // wprCarveoutOffset and wprCarveoutSize are left zero.
+            bootGspRmParams <- GspAcrBootGspRmParams::new(GspDmaTarget::CoherentSystem, wpr_meta_addr),
+            gspRmParams <- GspRmParams::new(GspDmaTarget::NoncoherentSystem, libos_addr),
+            ..Zeroable::init_zeroed()
+        });
+
+        init
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 1d592bd3f9ed..ea350f9b2cc4 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -883,6 +883,88 @@ fn default() -> Self {
         }
     }
 }
+pub const GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB: GSP_DMA_TARGET = 0;
+pub const GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM: GSP_DMA_TARGET = 1;
+pub const GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM: GSP_DMA_TARGET = 2;
+pub const GSP_DMA_TARGET_GSP_DMA_TARGET_COUNT: GSP_DMA_TARGET = 3;
+pub type GSP_DMA_TARGET = ffi::c_uint;
+#[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct GSP_FMC_INIT_PARAMS {
+    pub regkeys: u32_,
+}
+#[repr(C)]
+#[derive(Debug, Copy, Clone, MaybeZeroable)]
+pub struct GSP_ACR_BOOT_GSP_RM_PARAMS {
+    pub target: GSP_DMA_TARGET,
+    pub gspRmDescSize: u32_,
+    pub gspRmDescOffset: u64_,
+    pub wprCarveoutOffset: u64_,
+    pub wprCarveoutSize: u32_,
+    pub bIsGspRmBoot: u8_,
+    pub __bindgen_padding_0: [u8; 3usize],
+}
+impl Default for GSP_ACR_BOOT_GSP_RM_PARAMS {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
+#[repr(C)]
+#[derive(Debug, Copy, Clone, MaybeZeroable)]
+pub struct GSP_RM_PARAMS {
+    pub target: GSP_DMA_TARGET,
+    pub __bindgen_padding_0: [u8; 4usize],
+    pub bootArgsOffset: u64_,
+}
+impl Default for GSP_RM_PARAMS {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
+#[repr(C)]
+#[derive(Debug, Copy, Clone, MaybeZeroable)]
+pub struct GSP_SPDM_PARAMS {
+    pub target: GSP_DMA_TARGET,
+    pub __bindgen_padding_0: [u8; 4usize],
+    pub payloadBufferOffset: u64_,
+    pub payloadBufferSize: u32_,
+    pub __bindgen_padding_1: [u8; 4usize],
+}
+impl Default for GSP_SPDM_PARAMS {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
+#[repr(C)]
+#[derive(Debug, Copy, Clone, MaybeZeroable)]
+pub struct GSP_FMC_BOOT_PARAMS {
+    pub initParams: GSP_FMC_INIT_PARAMS,
+    pub __bindgen_padding_0: [u8; 4usize],
+    pub bootGspRmParams: GSP_ACR_BOOT_GSP_RM_PARAMS,
+    pub gspRmParams: GSP_RM_PARAMS,
+    pub gspSpdmParams: GSP_SPDM_PARAMS,
+}
+impl Default for GSP_FMC_BOOT_PARAMS {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
 #[repr(C)]
 #[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
 pub struct rpc_unloading_guest_driver_v1F_07 {
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index b25970dd4561..f41f3fea15ff 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -20,7 +20,10 @@
         fsp::FspFirmware,
         FIRMWARE_VERSION, //
     },
-    fsp::Fsp,
+    fsp::{
+        FmcBootArgs,
+        Fsp, //
+    },
     gpu::Chipset,
     gsp::{
         boot::BootUnloadGuard,
@@ -39,17 +42,27 @@ impl GspHal for Gh100 {
     /// the GSP boot internally - no manual GSP reset/boot is needed.
     fn boot<'a>(
         &self,
-        _gsp: &'a Gsp,
+        gsp: &'a Gsp,
         dev: &'a device::Device<device::Bound>,
         bar: &'a Bar0,
         chipset: Chipset,
-        _fb_layout: &FbLayout,
-        _wpr_meta: &Coherent<GspFwWprMeta>,
+        fb_layout: &FbLayout,
+        wpr_meta: &Coherent<GspFwWprMeta>,
         _gsp_falcon: &'a Falcon<GspEngine>,
         _sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
         let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
-        let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
+        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
+
+        let args = FmcBootArgs::new(
+            dev,
+            chipset,
+            wpr_meta.dma_handle(),
+            gsp.libos.dma_handle(),
+            false,
+        )?;
+
+        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
         Err(ENOTSUPP)
     }
diff --git a/drivers/gpu/nova-core/mctp.rs b/drivers/gpu/nova-core/mctp.rs
index a13146dc0cca..be3e757d05a0 100644
--- a/drivers/gpu/nova-core/mctp.rs
+++ b/drivers/gpu/nova-core/mctp.rs
@@ -7,8 +7,6 @@
 //! Device Management) messages between the kernel driver and GPU firmware
 //! processors such as FSP and GSP.
 
-#![expect(dead_code)]
-
 use kernel::pci::Vendor;
 
 /// NVDM message type identifiers carried over MCTP.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (18 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:38   ` sashiko-bot
  2026-06-03  5:45   ` Alexandre Courbot
  2026-06-02  3:21 ` [PATCH v12 21/22] gpu: nova-core: add non-sec2 unload path John Hubbard
                   ` (3 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

On Hopper and Blackwell, FSP boots GSP with hardware lockdown enabled.
After FSP Chain of Trust completes, the driver must poll for lockdown
release before proceeding with GSP initialization. Add the register
bit and helper functions needed for this polling.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs           |  1 -
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 90 +++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs          |  2 +
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 352ef7683cf2..aec991afa669 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -142,7 +142,6 @@ pub(crate) fn new(
 
     /// DMA address of the FMC boot parameters, needed after boot for lockdown
     /// release polling.
-    #[expect(dead_code)]
     pub(crate) fn boot_params_dma_handle(&self) -> u64 {
         self.fmc_boot_params.dma_handle()
     }
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index f41f3fea15ff..02aec5281389 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -5,7 +5,13 @@
 
 use kernel::{
     device,
-    dma::Coherent, //
+    dma::Coherent,
+    io::{
+        poll::read_poll_timeout,
+        register::WithBase,
+        Io, //
+    },
+    time::Delta,
 };
 
 use crate::{
@@ -31,8 +37,85 @@
         Gsp,
         GspFwWprMeta, //
     },
+    regs,
 };
 
+/// GSP lockdown pattern written by firmware to mbox0 while RISC-V branch privilege
+/// lockdown is active. The low byte varies, the upper 24 bits are fixed.
+const GSP_LOCKDOWN_PATTERN: u32 = 0xbadf_4100;
+const GSP_LOCKDOWN_MASK: u32 = 0xffff_ff00;
+
+/// GSP falcon mailbox state, used to track lockdown release status.
+struct GspMbox {
+    mbox0: u32,
+    mbox1: u32,
+}
+
+impl GspMbox {
+    /// Reads both mailboxes from the GSP falcon.
+    fn read(gsp_falcon: &Falcon<GspEngine>, bar: &Bar0) -> Self {
+        Self {
+            mbox0: gsp_falcon.read_mailbox0(bar),
+            mbox1: gsp_falcon.read_mailbox1(bar),
+        }
+    }
+
+    /// Returns `true` if the lockdown pattern is present in `mbox0`.
+    fn is_locked_down(&self) -> bool {
+        (self.mbox0 & GSP_LOCKDOWN_MASK) == GSP_LOCKDOWN_PATTERN
+    }
+
+    /// Combines mailbox0 and mailbox1 into a 64-bit address.
+    fn combined_addr(&self) -> u64 {
+        (u64::from(self.mbox1) << 32) | u64::from(self.mbox0)
+    }
+
+    /// Returns `true` if GSP lockdown has been released.
+    ///
+    /// Checks the lockdown pattern, validates the boot params address,
+    /// and verifies the `HWCFG2` lockdown bit is clear.
+    fn lockdown_released(&self, bar: &Bar0, fmc_boot_params_addr: u64) -> bool {
+        if self.is_locked_down() {
+            return false;
+        }
+
+        if self.mbox0 != 0 && self.combined_addr() != fmc_boot_params_addr {
+            return true;
+        }
+
+        let hwcfg2 = bar.read(regs::NV_PFALCON_FALCON_HWCFG2::of::<GspEngine>());
+        !hwcfg2.riscv_br_priv_lockdown()
+    }
+}
+
+/// Waits for GSP lockdown to be released after FSP Chain of Trust.
+fn wait_for_gsp_lockdown_release(
+    dev: &device::Device<device::Bound>,
+    bar: &Bar0,
+    gsp_falcon: &Falcon<GspEngine>,
+    fmc_boot_params_addr: u64,
+) -> Result {
+    dev_dbg!(dev, "Waiting for GSP lockdown release\n");
+
+    let mbox = read_poll_timeout(
+        || Ok(GspMbox::read(gsp_falcon, bar)),
+        |mbox| mbox.lockdown_released(bar, fmc_boot_params_addr),
+        Delta::from_millis(10),
+        Delta::from_secs(30),
+    )
+    .inspect_err(|_| {
+        dev_err!(dev, "GSP lockdown release timeout\n");
+    })?;
+
+    if mbox.mbox0 != 0 {
+        dev_err!(dev, "GSP-FMC boot failed (mbox: {:#x})\n", mbox.mbox0);
+        return Err(EIO);
+    }
+
+    dev_dbg!(dev, "GSP lockdown released\n");
+    Ok(())
+}
+
 struct Gh100;
 
 impl GspHal for Gh100 {
@@ -48,7 +131,7 @@ fn boot<'a>(
         chipset: Chipset,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-        _gsp_falcon: &'a Falcon<GspEngine>,
+        gsp_falcon: &'a Falcon<GspEngine>,
         _sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
         let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
@@ -64,6 +147,9 @@ fn boot<'a>(
 
         fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
+        let fmc_boot_params_addr = args.boot_params_dma_handle();
+        wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, fmc_boot_params_addr)?;
+
         Err(ENOTSUPP)
     }
 }
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 8c51609d0281..a4a986f89340 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -363,6 +363,8 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
     pub(crate) NV_PFALCON_FALCON_HWCFG2(u32) @ PFalconBase + 0x000000f4 {
         /// Signal indicating that reset is completed (GA102+).
         31:31   reset_ready => bool;
+        /// RISC-V branch privilege lockdown bit.
+        13:13   riscv_br_priv_lockdown => bool;
         /// Set to 0 after memory scrubbing is completed.
         12:12   mem_scrubbing => bool;
         10:10   riscv => bool;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 21/22] gpu: nova-core: add non-sec2 unload path
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (19 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:21 ` [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path John Hubbard
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

From: Eliot Courtney <ecourtney@nvidia.com>

For non-sec2 it is only required to wait for GSP falcon to halt. This is
because GSP does the main work of unloading on GPUs not using sec2.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
[ jhubbard: use Result instead of Result<()> in the UnloadBundle impl ]
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 38 ++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 02aec5281389..d372ae85c5bc 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -33,7 +33,10 @@
     gpu::Chipset,
     gsp::{
         boot::BootUnloadGuard,
-        hal::GspHal,
+        hal::{
+            GspHal,
+            UnloadBundle, //
+        },
         Gsp,
         GspFwWprMeta, //
     },
@@ -116,6 +119,28 @@ fn wait_for_gsp_lockdown_release(
     Ok(())
 }
 
+struct FspUnloadBundle;
+
+impl UnloadBundle for FspUnloadBundle {
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+        _sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        // GSP falcon does most of the work of resetting, so just wait for it to finish.
+        read_poll_timeout(
+            || Ok(gsp_falcon.is_riscv_active(bar)),
+            |&active| !active,
+            Delta::from_millis(10),
+            Delta::from_secs(5),
+        )
+        .map(|_| ())
+        .inspect_err(|_| dev_err!(dev, "GSP falcon failed to halt\n"))
+    }
+}
+
 struct Gh100;
 
 impl GspHal for Gh100 {
@@ -132,9 +157,18 @@ fn boot<'a>(
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &'a Falcon<GspEngine>,
-        _sec2_falcon: &'a Falcon<Sec2>,
+        sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
         let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
+
+        let unload_bundle = crate::gsp::UnloadBundle(
+            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
+        );
+
+        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
+        let _unload_guard =
+            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
+
         let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
 
         let args = FmcBootArgs::new(
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (20 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 21/22] gpu: nova-core: add non-sec2 unload path John Hubbard
@ 2026-06-02  3:21 ` John Hubbard
  2026-06-02  3:38   ` sashiko-bot
  2026-06-02 12:38 ` [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support Danilo Krummrich
  2026-06-02 13:37 ` Alexandre Courbot
  23 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-02  3:21 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML, John Hubbard

From: Alexandre Courbot <acourbot@nvidia.com>

Now that all the elements are in place, enable the FSP boot path so
Hopper and Blackwell can boot.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index d372ae85c5bc..93d8a72fe930 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -166,7 +166,7 @@ fn boot<'a>(
         );
 
         // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-        let _unload_guard =
+        let unload_guard =
             BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
 
         let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
@@ -184,7 +184,7 @@ fn boot<'a>(
         let fmc_boot_params_addr = args.boot_params_dma_handle();
         wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, fmc_boot_params_addr)?;
 
-        Err(ENOTSUPP)
+        Ok(unload_guard)
     }
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers
  2026-06-02  3:20 ` [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers John Hubbard
@ 2026-06-02  3:30   ` sashiko-bot
  2026-06-02  8:00     ` Alexandre Courbot
  2026-06-02  7:12   ` Eliot Courtney
  1 sibling, 1 reply; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:30 UTC (permalink / raw)
  To: John Hubbard, Alexandre Courbot, Danilo Krummrich
  Cc: Alice Ryhl, Shashank Sharma, LKML, Miguel Ojeda, Gary Guo,
	Alex Gaynor, Alistair Popple, John Hubbard, Trevor Gross,
	Timur Tabi, Boqun Feng, Alexandre Courbot, Eliot Courtney,
	Björn Roy Baron, Danilo Krummrich, Bjorn Helgaas, nova-gpu,
	Benno Lossin, Zhi Wang, Simona Vetter, Andreas Hindborg,
	David Airlie

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Hardware match requirement violation during sysmem flush configuration on GB10x.
--

commit 4c62a61ec72a7cc29b5d20a82279ea5490f2e759
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Blackwell: use correct sysmem flush registers

This commit updates the Blackwell GPUs to route the sysmem flush page
registers to their proper locations. It defines the registers relative to
HSHUB0 for GB10x and FBHUB0 for GB20x, and implements the flush paths in
their respective HALs.

> diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
> index 8d63350abf8aa..ecea4ff446fff 100644
> --- a/drivers/gpu/nova-core/fb/hal/gb100.rs
> +++ b/drivers/gpu/nova-core/fb/hal/gb100.rs

[ ... ]

> +/// Write the sysmem flush page address through the GB10x HSHUB0 registers.
> +///
> +/// Both the primary and EG (egress) register pairs must be programmed to the same address,
> +/// as required by hardware.
> +fn write_sysmem_flush_page_gb100(bar: &Bar0, addr: Bounded<u64, 52>) {
> +    // CAST: lower 32 bits. Hardware ignores bits 7:0.
> +    let addr_lo = *addr as u32;
> +    let addr_hi = addr.shr::<32, 20>().cast::<u32>();
> +
> +    // Write HI first. The hardware will trigger the flush on the LO write.
> +
> +    // Primary HSHUB pair.
> +    bar.write(
> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb100>(),
> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::zeroed().with_adr(addr_hi),
> +    );
> +    bar.write(
> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb100>(),
> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::zeroed().with_adr(addr_lo),
> +    );

[Severity: High]
Does writing the primary LO register here trigger the flush before the EG
registers are updated to match?

The comments state that the hardware triggers the flush on the LO write, and
that both the primary and EG pairs must be programmed to the same address.
Because the EG pair is written after this primary LO register write, could
the hardware perform the flush while the EG registers still contain their old,
mismatched address?

> +
> +    // EG (egress) pair -- must match the primary pair.
> +    bar.write(
> +        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb100>(),
> +        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_HI::zeroed().with_adr(addr_hi),
> +    );
> +    bar.write(
> +        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb100>(),
> +        regs::NV_PFB_HSHUB_EG_PCIE_FLUSH_SYSMEM_ADDR_LO::zeroed().with_adr(addr_lo),
> +    );
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=6

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
@ 2026-06-02  3:32   ` sashiko-bot
  2026-06-02  7:56     ` Alexandre Courbot
  2026-06-02  8:11   ` Eliot Courtney
  2026-06-03  0:04   ` Timur Tabi
  2 siblings, 1 reply; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:32 UTC (permalink / raw)
  To: John Hubbard, Alexandre Courbot, Danilo Krummrich
  Cc: Alex Gaynor, Timur Tabi, Shashank Sharma, Gary Guo, Benno Lossin,
	David Airlie, Trevor Gross, Björn Roy Baron, nova-gpu, LKML,
	Bjorn Helgaas, John Hubbard, Simona Vetter, Boqun Feng,
	Eliot Courtney, Alexandre Courbot, Andreas Hindborg,
	Alistair Popple, Zhi Wang, Miguel Ojeda, Alice Ryhl,
	Danilo Krummrich

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch uses `field: expr` instead of `field <- expr` inside `init!`, violating Rust subsystem guidelines and defeating the goal of avoiding stack allocation.
--

commit 7149f8afdf16eef886b0d72d3c2139493024923e
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
    
This commit extracts the SHA-384 hash, RSA public key, and RSA signature
from FMC ELF32 firmware sections. The signatures are stored in a
heap-allocated structure used for FSP Chain of Trust verification.

> diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
> index 011be1e571c25..db61905eac9de 100644
> --- a/drivers/gpu/nova-core/firmware/fsp.rs
> +++ b/drivers/gpu/nova-core/firmware/fsp.rs

[ ... ]

> @@ -41,7 +63,73 @@ pub(crate) fn new(
>  
>          Ok(Self {
>              fmc_image,
> -            fmc_elf: fw,
> +            fmc_sigs: Self::extract_fmc_signatures(&fw, dev)?,
>          })
>      }
> +
> +    /// Extract FMC firmware signatures for Chain of Trust verification.

[ ... ]

> +        // Initialize the signatures in place to avoid building the large `FmcSignatures` on the
> +        // stack, then fill each section from the firmware.
> +        let signatures = KBox::init(
> +            init!(FmcSignatures {
> +                hash384: [0; _],
> +                public_key: [0; _],
> +                signature: [0; _],
> +            })

[Severity: Medium]
Does this init! block allocate these arrays on the stack before moving them 
to the heap?

According to the Rust subsystem coding guidelines, fields that require in-place
initialization should use the `field <- expr` syntax rather than `field: expr`. 

Because `hash384: [0; _]` and the other fields use the colon syntax, the
arrays (which are up to 384 bytes each) might be evaluated as temporaries on
the stack, which defeats the goal mentioned in the comment above.

Could these be updated to use the in-place initialization syntax, such as 
`hash384 <- kernel::init::zeroed()` or similar?

> +            .chain(|sigs| {
> +                // PANIC: src and dst lengths are both FSP_HASH_SIZE (verified above).
> +                sigs.hash384.copy_from_slice(hash_section);
> +                // PANIC: dst is sliced to src.len(); src.len() <= FSP_PKEY_SIZE per `get_section`.
> +                sigs.public_key[..pkey_section.len()].copy_from_slice(pkey_section);
> +                // PANIC: dst is sliced to src.len(); src.len() <= FSP_SIG_SIZE per `get_section`.
> +                sigs.signature[..sig_section.len()].copy_from_slice(sig_section);
> +                Ok(())
> +            }),
> +            GFP_KERNEL,
> +        )?;
> +
> +        Ok(signatures)
> +    }
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=13

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
@ 2026-06-02  3:33   ` sashiko-bot
  2026-06-03  1:14     ` Alexandre Courbot
  2026-06-02 12:21   ` Eliot Courtney
  2026-06-03  1:00   ` Alexandre Courbot
  2 siblings, 1 reply; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:33 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, John Hubbard
  Cc: Simona Vetter, Benno Lossin, Bjorn Helgaas, Trevor Gross,
	Eliot Courtney, Björn Roy Baron, Timur Tabi, Shashank Sharma,
	Zhi Wang, Boqun Feng, Alistair Popple, Gary Guo, Andreas Hindborg,
	Alex Gaynor, Alice Ryhl, Alexandre Courbot, Miguel Ojeda,
	David Airlie, nova-gpu, LKML, John Hubbard, Danilo Krummrich

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Race condition due to incorrect order of writing queue pointers in `send_msg`.
- [High] Silent failure when sending a 4-byte message.
- [High] Unchecked integer addition in `poll_msgq` can cause a kernel panic on untrusted MMIO reads.
--

commit 8a59b49984011320aa631b20a05550242e2baef2
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure

This commit adds the foundation security processor (FSP) message
infrastructure for the nova-core GPU driver. It implements queue
registers and low-level helpers to communicate with the FSP falcon's EMEM.

> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index 6b057d958115a..0ec1c55213bc8 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
[ ... ]
> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>  
>          Ok(())
>      }
> +
> +    /// Poll FSP for incoming data.
> +    ///
> +    /// Returns the size of available data in bytes, or 0 if no data is available.
> +    ///
> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
> +    /// message exchange, so `tail >= head` is always true when data is present.
> +    #[expect(dead_code)]
> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
> +
> +        if head == tail {
> +            return 0;
> +        }
> +
> +        // TAIL points at last DWORD written, so add 4 to get total size
> +        tail.saturating_sub(head) + 4
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
Can this unchecked addition panic if we encounter a PCIe link failure?

If `head` reads normally as 0 but a surprise removal occurs before reading
`tail`, `tail` could read as 0xFFFFFFFF. The saturating subtraction would
leave 0xFFFFFFFF, and the subsequent `+ 4` will overflow, causing a panic in
debug builds where CONFIG_RUST_OVERFLOW_CHECKS=y is set.

Should this use `saturating_add(4)` or another form of checked math to handle
untrusted hardware inputs safely?

> +    }
> +
> +    /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
> +    ///
> +    /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
> +        if packet.is_empty() {
> +            return Err(EINVAL);
> +        }
> +
> +        // Write message to EMEM at offset 0 (validates 4-byte alignment)
> +        self.write_emem(bar, 0, packet)?;
> +
> +        // Update queue pointers. TAIL points at the last DWORD written.
> +        let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
Does this create a silent failure if a 4-byte message is sent?

If `packet.len()` is 4, `tail_offset` is calculated as 0. Both NV_PFSP_QUEUE_HEAD
and NV_PFSP_QUEUE_TAIL will then be written as 0. Because the FSP protocol treats
`HEAD == TAIL` as an empty queue, the FSP will ignore the message while the
function still returns `Ok(())`.

Should this enforce a minimum packet length of 8 bytes or return an error
for 4-byte messages?

> +        bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
> +        bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
Is there a race condition with the FSP hardware due to the order of these
writes?

By writing NV_PFSP_QUEUE_TAIL before NV_PFSP_QUEUE_HEAD, if QUEUE_HEAD was
non-zero from a prior transaction, we momentarily create a state where
`TAIL != HEAD`. The FSP may wake up immediately and read from the old, incorrect
QUEUE_HEAD offset before the driver can write 0 to QUEUE_HEAD.

Should NV_PFSP_QUEUE_HEAD be initialized to 0 before writing the
NV_PFSP_QUEUE_TAIL doorbell?

> +
> +        Ok(())
> +    }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=15

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging
  2026-06-02  3:21 ` [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging John Hubbard
@ 2026-06-02  3:35   ` sashiko-bot
  0 siblings, 0 replies; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:35 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, John Hubbard
  Cc: Alistair Popple, Alexandre Courbot, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Danilo Krummrich, Alice Ryhl, Björn Roy Baron,
	Gary Guo, Trevor Gross, nova-gpu, John Hubbard, Zhi Wang,
	Eliot Courtney, David Airlie, Shashank Sharma, Timur Tabi,
	Benno Lossin, LKML, Boqun Feng, Alex Gaynor, Andreas Hindborg

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Unbounded kernel memory allocation based on untrusted hardware input.
- [High] FSP message synchronization ignores `task_id` and mishandles timeouts, leading to permanent message desynchronization.
- [Medium] FSP protocol structures use native integer types and are directly transmuted from byte streams, breaking big-endian compatibility.
--

commit 4be2156ade2cecdc1c8a1f092241ae18c134074e
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging

This commit introduces the synchronous send-and-wait path for FSP messaging
in the nova-core driver. It defines FSP protocol structures and a
send_sync_fsp method to exchange messages with the FSP hardware.

> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index f3524137d9f78..67cf83aba83c6 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs

[ ... ]

> @@ -22,18 +26,56 @@

[ ... ]

> +/// FSP command response payload (NVDM_PAYLOAD_COMMAND_RESPONSE).
> +#[repr(C, packed)]
> +#[derive(Clone, Copy)]
> +struct NvdmPayloadCommandResponse {
> +    task_id: u32,
> +    command_nvdm_type: u32,
> +    error_code: u32,
> +}

[Severity: Medium]
Will this structure parse values incorrectly on big-endian architectures?

The FSP hardware writes responses in a fixed endianness (little-endian), but
this uses native u32 types. When FspResponse::from_bytes_prefix transmutes
the byte stream, it could result in corrupted values and spurious validation
failures like command_nvdm_type != M::NVDM_TYPE on big-endian systems.

Should these use explicit little-endian types instead?

[ ... ]

> @@ -70,4 +112,79 @@ pub(crate) fn wait_secure_boot(

[ ... ]

> +    fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: &Bar0, msg: &M) -> Result
> +    where
> +        M: MessageToFsp,
> +    {

[ ... ]

> +        let packet_size = num::u32_as_usize(packet_size);
> +        let mut response_buf = KVec::<u8>::new();
> +        response_buf.resize(packet_size, 0, GFP_KERNEL)?;

[Severity: High]
Could this trigger an unbounded kernel memory allocation?

The packet_size is derived from poll_msgq(), which calculates the size
based on MMIO reads of the FSP queue head and tail pointers. If an errant
or compromised FSP firmware returns a large value, passing this unvalidated
size directly to resize() with GFP_KERNEL could trigger the OOM killer or
cause severe system stutter.

Should this size be clamped to a hardware-defined maximum, such as the
FSP EMEM capacity, before allocation?

[ ... ]

> +        if command_nvdm_type != M::NVDM_TYPE {
> +            dev_err!(
> +                dev,
> +                "Expected NVDM type {:#x} in reply, got {:#x}\n",
> +                M::NVDM_TYPE,
> +                command_nvdm_type
> +            );
> +            return Err(EIO);
> +        }

[Severity: High]
Does this validation logic leave the driver vulnerable to message
desynchronization after a timeout?

If send_sync_fsp times out waiting for a response, it returns ETIMEDOUT
without draining the message queue. If the FSP firmware completes the
request later, its response remains at the queue's tail.

A subsequent call to send_sync_fsp will immediately read this delayed
response. Because this validation only checks command_nvdm_type and
ignores the task_id (sequence number), the driver might incorrectly accept
the stale response as the reply to the new request if they share the same
NVDM type.

Should the validation also check task_id against the sent request to
reject stale responses?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=17

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path
  2026-06-02  3:21 ` [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path John Hubbard
@ 2026-06-02  3:38   ` sashiko-bot
  0 siblings, 0 replies; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:38 UTC (permalink / raw)
  To: Alexandre Courbot, John Hubbard, Danilo Krummrich
  Cc: Trevor Gross, Danilo Krummrich, David Airlie, Bjorn Helgaas,
	Andreas Hindborg, Zhi Wang, Miguel Ojeda, Björn Roy Baron,
	Alistair Popple, Alexandre Courbot, Boqun Feng, Alex Gaynor,
	Gary Guo, Timur Tabi, Shashank Sharma, LKML, John Hubbard,
	Alice Ryhl, nova-gpu, Eliot Courtney, Simona Vetter, Benno Lossin

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] DMA Use-After-Free during GSP boot failure path due to reverse drop order.
- [High] Unconditional boot failure due to invalid error check in `wait_for_gsp_lockdown_release`.
--

commit c4510072c2995001dbf13cc4a2db6eb978b85b10
Author: Alexandre Courbot <acourbot@nvidia.com>

gpu: nova-core: gsp: enable FSP boot path

This patch enables the FSP boot path so Hopper and Blackwell devices can
boot. It accomplishes this by removing the hardcoded ENOTSUPP return code
and returning the boot unload guard instead.

> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index d372ae85c5bc7..93d8a72fe9303 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> @@ -166,7 +166,7 @@ fn boot<'a>(
>          );
>  
>          // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
> -        let _unload_guard =
> +        let unload_guard =
>              BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));

[Severity: Critical]
This is a pre-existing issue, but does this declaration order cause a DMA
use-after-free if the boot path fails?

In Rust, local variables are dropped in the reverse order of their declaration.
Here in Gh100::boot(), unload_guard is declared before args (which contains
FmcBootArgs and its DMA-coherent allocations).

If wait_for_gsp_lockdown_release() returns an error, args is dropped first,
freeing and unmapping the DMA memory while the GSP RISC-V is still running.
unload_guard is dropped afterward to execute teardown.

Since FspUnloadBundle::run() only polls for the GSP to halt without issuing an
explicit hardware reset if it is stuck:

drivers/gpu/nova-core/gsp/hal/gh100.rs:FspUnloadBundle::run() {
    read_poll_timeout(
        || Ok(gsp_falcon.is_riscv_active(bar)),
        |&active| !active,
        Delta::from_millis(10),
        Delta::from_secs(5),
    )
...
}

Could this leave the GSP running indefinitely and capable of reading or writing
to the freed DMA memory?

>  
>          let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> @@ -184,7 +184,7 @@ fn boot<'a>(
>          let fmc_boot_params_addr = args.boot_params_dma_handle();
>          wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, fmc_boot_params_addr)?;

[Severity: High]
This isn't a bug introduced by this patch, but will this function unconditionally
return an error for a successful boot sequence?

In wait_for_gsp_lockdown_release(), read_poll_timeout() waits for the lockdown
to be released. The lockdown_released() function correctly handles the valid state
where mbox.mbox0 contains the lower 32 bits of fmc_boot_params_addr:

drivers/gpu/nova-core/gsp/hal/gh100.rs:lockdown_released() {
    if self.mbox0 != 0 && self.combined_addr() != fmc_boot_params_addr {
        return true;
    }
...
}

However, wait_for_gsp_lockdown_release() blindly evaluates mbox0 != 0 after
polling:

drivers/gpu/nova-core/gsp/hal/gh100.rs:wait_for_gsp_lockdown_release() {
...
    if mbox.mbox0 != 0 {
        dev_err!(dev, "GSP-FMC boot failed (mbox: {:#x})\n", mbox.mbox0);
        return Err(EIO);
    }
...
}

Since fmc_boot_params_addr is a DMA address, won't its lower 32 bits almost
certainly be non-zero, causing this newly enabled boot path to immediately fail?

>  
> -        Err(ENOTSUPP)
> +        Ok(unload_guard)
>      }
>  }
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=22

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling
  2026-06-02  3:21 ` [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling John Hubbard
@ 2026-06-02  3:38   ` sashiko-bot
  2026-06-03  5:45   ` Alexandre Courbot
  1 sibling, 0 replies; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:38 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, John Hubbard
  Cc: Zhi Wang, Shashank Sharma, Simona Vetter, Bjorn Helgaas, Gary Guo,
	LKML, John Hubbard, Timur Tabi, Andreas Hindborg, David Airlie,
	Björn Roy Baron, Boqun Feng, Alexandre Courbot, Benno Lossin,
	Alice Ryhl, Alistair Popple, Danilo Krummrich, Trevor Gross,
	nova-gpu, Miguel Ojeda, Eliot Courtney, Alex Gaynor

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The `io::` imports do not follow the vertical import sorting style.
--

commit 61efdca79645fc25a7eb578eb89ac5e6fa3b4971
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling
    
On Hopper and Blackwell, FSP boots GSP with hardware lockdown enabled.
This commit adds the necessary register bit and helper functions to poll
for lockdown release before proceeding with GSP initialization.

> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index f41f3fea15ff2..02aec52813894 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> @@ -5,7 +5,13 @@
>  
>  use kernel::{
>      device,
> -    dma::Coherent, //
> +    dma::Coherent,
> +    io::{
> +        poll::read_poll_timeout,
> +        register::WithBase,
> +        Io, //
> +    },

[Severity: Low]
This isn't a bug, but should the io imports follow the alphabetical sorting
used in the Rust vertical import style? If so, Io would sort before poll and
register since uppercase letters come before lowercase in standard sorting.

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=20

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
  2026-06-02  3:21 ` [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot John Hubbard
@ 2026-06-02  3:40   ` sashiko-bot
  2026-06-03  5:23     ` Alexandre Courbot
  2026-06-03  5:19   ` Alexandre Courbot
  1 sibling, 1 reply; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  3:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, John Hubbard
  Cc: Danilo Krummrich, LKML, Eliot Courtney, Zhi Wang,
	Björn Roy Baron, Gary Guo, David Airlie, Alex Gaynor,
	Boqun Feng, Miguel Ojeda, Alistair Popple, nova-gpu,
	Shashank Sharma, Simona Vetter, Bjorn Helgaas, John Hubbard,
	Alexandre Courbot, Timur Tabi, Alice Ryhl, Trevor Gross,
	Benno Lossin, Andreas Hindborg

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] DMA Use-After-Free of FMC boot parameters during initialization
- [High] Missing little-endian conversions in hardware payloads breaks big-endian hosts
- [Medium] Kernel stack overflow risk due to large struct passed by value
--

commit 42c5f0e39c2bb0bc87a3f8df25554d04f00a3e69
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
    
Build and send the Chain of Trust message to FSP, bundling the
DMA-coherent boot parameters that FSP reads at boot time.

> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index 67cf83aba83c6..352ef7683cf2a 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs

[ ... ]

> @@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
>          Ok(Fsp { falcon, fsp_fw })
>      }
>  
> +    /// Boots GSP FMC via FSP Chain of Trust.
> +    ///
> +    /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
> +    /// to FSP, and waits for the response.
> +    pub(crate) fn boot_fmc(
> +        &mut self,
> +        dev: &device::Device<device::Bound>,
> +        bar: &Bar0,
> +        fb_layout: &FbLayout,
> +        args: &FmcBootArgs,
> +    ) -> Result {
> +        dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
> +
> +        let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
> +        let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
> +
> +        // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
> +        let frts_offset = if !args.resume {
> +            let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
> +
> +            frts_reserved_size
> +                .align_up(Alignment::new::<SZ_2M>())
> +                .ok_or(EINVAL)?
> +        } else {
> +            0
> +        };
> +        let frts_size: u32 = if !args.resume {
> +            fb_layout.frts.len().try_into()?
> +        } else {
> +            0
> +        };
> +
> +        let msg = KBox::new(
> +            FspMessage {
> +                mctp_header: MctpHeader::single_packet(),
> +                nvdm_header: NvdmHeader::new(NvdmType::Cot),
> +                cot: NvdmPayloadCot {
[Severity: Medium]
Is it intentional to construct this ~868-byte FspMessage structure on the
stack before boxing it? 

This also dereferences and copies the 816-byte FmcSignatures structure
(sigs: *self.fsp_fw.fmc_sigs). Doesn't this bypass the optimization in
FspFirmware::extract_fmc_signatures() which avoided building FmcSignatures on
the stack?

Could this contribute to kernel stack overflows? Might it be safer to
initialize it directly on the heap using KBox::init() or KBox::new_uninit()?

> +                    version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
> +                    size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
> +                        .map_err(|_| EINVAL)?,
> +                    gsp_fmc_sysmem_offset: fmc_addr,
> +                    frts_sysmem_offset: 0,
> +                    frts_sysmem_size: 0,
> +                    frts_vidmem_offset: frts_offset,
> +                    frts_vidmem_size: frts_size,
[Severity: High]
Will these native integer types be correctly interpreted by the little-endian
RISC-V FSP/GSP processors if the host architecture is big-endian?

Because the struct is laid out in big-endian byte order on big-endian hosts,
won't the little-endian hardware swap the bytes when reading, resulting in
incorrect sizes, versions, and DMA addresses? 

Should explicit little-endian types or byte-swapping (like .to_le()) be used
here to avoid breaking driver initialization on big-endian hosts?

> +                    sigs: *self.fsp_fw.fmc_sigs,
> +                    gsp_boot_args_sysmem_offset: fmc_boot_params_addr,
> +                },
> +            },
> +            GFP_KERNEL,
> +        )?;
> +
> +        self.send_sync_fsp(dev, bar, &*msg)?;
> +
> +        dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
> +        Ok(())
> +    }

[ ... ]

> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c54e8bf4bb3a..558b37863f00a 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -934,3 +934,67 @@ fn new(cmdq: &Cmdq) -> impl Init<Self> + '_ {
>          })
>      }
>  }
> +
> +#[repr(u32)]
> +pub(crate) enum GspDmaTarget {
> +    #[expect(dead_code)]
> +    LocalFb = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB,
> +    CoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM,
> +    NoncoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM,
> +}
> +
> +type GspAcrBootGspRmParams = bindings::GSP_ACR_BOOT_GSP_RM_PARAMS;
> +
> +impl GspAcrBootGspRmParams {
> +    fn new(target: GspDmaTarget, wpr_meta_addr: u64) -> impl Init<Self> {
> +        #[allow(non_snake_case)]
> +        let params = init!(Self {
> +            target: target as u32,
> +            gspRmDescSize: num::usize_into_u32::<{ size_of::<GspFwWprMeta>() }>(),
> +            gspRmDescOffset: wpr_meta_addr,
> +            bIsGspRmBoot: 1,
[Severity: High]
Similarly to NvdmPayloadCot above, will these native integer types be
correctly interpreted by little-endian hardware on big-endian hosts?

Does this struct require explicit little-endian types or byte-swapping to
avoid accessing invalid physical memory on big-endian architectures?

> +            wprCarveoutOffset: 0,
> +            wprCarveoutSize: 0,
> +            __bindgen_padding_0: Default::default(),
> +        });
> +
> +        params
> +    }
> +}

[ ... ]

> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index b25970dd4561d..f41f3fea15ff2 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs

[ ... ]

> @@ -39,17 +42,27 @@ impl GspHal for Gh100 {
>      /// the GSP boot internally - no manual GSP reset/boot is needed.
>      fn boot<'a>(
>          &self,
> -        _gsp: &'a Gsp,
> +        gsp: &'a Gsp,
>          dev: &'a device::Device<device::Bound>,
>          bar: &'a Bar0,
>          chipset: Chipset,
> -        _fb_layout: &FbLayout,
> -        _wpr_meta: &Coherent<GspFwWprMeta>,
> +        fb_layout: &FbLayout,
> +        wpr_meta: &Coherent<GspFwWprMeta>,
>          _gsp_falcon: &'a Falcon<GspEngine>,
>          _sec2_falcon: &'a Falcon<Sec2>,
>      ) -> Result<BootUnloadGuard<'a>> {
>          let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
> -        let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> +        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> +
> +        let args = FmcBootArgs::new(
> +            dev,
> +            chipset,
> +            wpr_meta.dma_handle(),
> +            gsp.libos.dma_handle(),
> +            false,
> +        )?;
> +
> +        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
>  
>          Err(ENOTSUPP)
[Severity: High]
Does returning an error here immediately drop the args variable, thereby
freeing its DMA-coherent memory before the FSP finishes reading it?

Since the FSP is actively booting asynchronously, wouldn't this cause a
hardware use-after-free of the boot parameters, leading to IOMMU faults or
undefined behavior?

Note: I see this is fixed in a later commit 'gpu: nova-core: gsp: enable FSP
boot path' which changes the return to Ok(unload_guard) and properly handles
the required hardware delays.

>      }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=19

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication
  2026-06-02  3:21 ` [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication John Hubbard
@ 2026-06-02  5:36   ` sashiko-bot
  2026-06-03  2:41     ` Alexandre Courbot
  2026-06-02 12:53   ` Eliot Courtney
  1 sibling, 1 reply; 65+ messages in thread
From: sashiko-bot @ 2026-06-02  5:36 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: David Airlie, Bjorn Helgaas, Björn Roy Baron,
	Shashank Sharma, Andreas Hindborg, Alistair Popple, Boqun Feng,
	Gary Guo, Zhi Wang, Alexandre Courbot, Simona Vetter, Alice Ryhl,
	John Hubbard, Danilo Krummrich, Eliot Courtney, Trevor Gross,
	nova-gpu, Alex Gaynor, Benno Lossin, LKML, Miguel Ojeda,
	Timur Tabi

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Wire-format protocol headers are defined using native `u32` instead of explicit little-endian types, causing incorrect byte-order on big-endian architectures.
--

commit 790a1556a39d680db8fcdb2896578691a18e9d1f
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: add MCTP/NVDM protocol types for firmware communication

This commit introduces the Management Component Transport Protocol (MCTP) and
NVIDIA Device Management (NVDM) wire-format types used for communication
between the kernel driver and GPU firmware processors.

> diff --git a/drivers/gpu/nova-core/mctp.rs b/drivers/gpu/nova-core/mctp.rs
> new file mode 100644
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mctp.rs
[ ... ]
> +bitfield! {
> +    pub(crate) struct MctpHeader(u32), "MCTP transport header for NVIDIA firmware messages." {
> +        31:31 som as bool, "Start-of-message bit.";
> +        30:30 eom as bool, "End-of-message bit.";
> +        29:28 seq as u8, "Packet sequence number.";
> +        23:16 seid as u8, "Source endpoint ID.";
> +    }
> +}

[Severity: High]
Since MctpHeader is defined as a native u32 and embedded into FSP payload
structures, will this result in endianness corruption on big-endian
architectures?

If the memory representation of the struct is serialized directly using
as_bytes() during transmission, the hardware could receive the bytes in
completely reversed order on big-endian hosts. 

Should explicit little-endian types be used for these wire-format headers,
or should byte swapping be handled explicitly prior to serialization?

[ ... ]
> +bitfield! {
> +    pub(crate) struct NvdmHeader(u32), "NVIDIA Vendor-Defined Message header over MCTP." {
> +        31:24 nvdm_type as u8 ?=> NvdmType, "NVDM message type.";
> +        23:8 vendor_id as u16, "PCI vendor ID.";
> +        6:0 msg_type as u8, "MCTP vendor-defined message type.";
> +    }
> +}

[Severity: High]
Similarly, does defining NvdmHeader as a native u32 expose the NVDM messages
to the same endianness corruption on big-endian platforms when sent over the
wire?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=16

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture
  2026-06-02  3:20 ` [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture John Hubbard
@ 2026-06-02  6:40   ` Eliot Courtney
  0 siblings, 0 replies; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02  6:40 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:20 PM JST, John Hubbard wrote:
> Replace the hardcoded 47-bit DMA mask with a GPU HAL method that
> provides the correct value for the architecture.
>
> Set the DMA mask in Gpu::new(). Gpu owns all DMA allocations for
> the device, so no concurrent allocations can exist while the
> constructor is still running.
>
> Acked-by: Danilo Krummrich <dakr@kernel.org>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub
  2026-06-02  3:20 ` [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub John Hubbard
@ 2026-06-02  6:50   ` Eliot Courtney
  0 siblings, 0 replies; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02  6:50 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:20 PM JST, John Hubbard wrote:
> Add the FSP (Foundation Security Processor) falcon engine type that
> will handle secure boot and Chain of Trust operations on Hopper and
> Blackwell architectures.
>
> The FSP falcon replaces SEC2's role in the boot sequence for these newer
> architectures. This initial stub just defines the falcon type and its
> base address.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers
  2026-06-02  3:20 ` [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers John Hubbard
  2026-06-02  3:30   ` sashiko-bot
@ 2026-06-02  7:12   ` Eliot Courtney
  2026-06-02  8:26     ` Alexandre Courbot
  1 sibling, 1 reply; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02  7:12 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:20 PM JST, John Hubbard wrote:
> Blackwell GPUs moved the sysmem flush page registers away from the
> Ampere/Ada location. GB10x routes the flush through a pair of HSHUB0
> register sets (primary and egress) that must both be programmed to
> the same address. GB20x routes it through FBHUB0.
>
> Define these registers relative to their HSHUB0 and FBHUB0 bases, as
> Open RM does, and implement the flush paths in the GB10x and GB20x
> framebuffer HALs.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---

This patch looks correct so:

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

But it only updates the sysmem flush stuff for Blackwell. It looks like
Hopper also uses different registers, so I think we should update it
too (separate patch seems ok to me).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image
  2026-06-02  3:20 ` [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image John Hubbard
@ 2026-06-02  7:18   ` Eliot Courtney
  0 siblings, 0 replies; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02  7:18 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:20 PM JST, John Hubbard wrote:
> FSP is the Falcon that runs FMC firmware on Hopper and Blackwell.
> Load the FMC ELF in two forms: the image section that FSP boots from,
> and the full Firmware object for later signature extraction during
> Chain of Trust verification. Declare the FMC image in the module's
> firmware table so it is bundled for FSP-based chipsets.
>
> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting
  2026-06-02  3:21 ` [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting John Hubbard
@ 2026-06-02  7:56   ` Eliot Courtney
  2026-06-02  8:22     ` Alexandre Courbot
  0 siblings, 1 reply; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02  7:56 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Hopper and Blackwell use FSP instead of SEC2 for secure boot. The
> driver must wait for FSP secure boot to complete before continuing
> with GSP bring-up. Poll for boot success with a 5-second timeout, and
> return the FSP interface only on success so that later Chain of Trust
> operations cannot run before FSP is ready. The interface owns the FSP
> falcon and the FMC firmware.
>
> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/fsp.rs    |  1 -
>  drivers/gpu/nova-core/fsp.rs           | 73 ++++++++++++++++++++++++++
>  drivers/gpu/nova-core/fsp/hal.rs       | 27 ++++++++++
>  drivers/gpu/nova-core/fsp/hal/gb202.rs | 23 ++++++++
>  drivers/gpu/nova-core/fsp/hal/gh100.rs | 23 ++++++++
>  drivers/gpu/nova-core/gsp/hal/gh100.rs |  6 ++-
>  drivers/gpu/nova-core/nova_core.rs     |  1 +
>  drivers/gpu/nova-core/regs.rs          | 36 +++++++++++++
>  8 files changed, 187 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/nova-core/fsp.rs
>  create mode 100644 drivers/gpu/nova-core/fsp/hal.rs
>  create mode 100644 drivers/gpu/nova-core/fsp/hal/gb202.rs
>  create mode 100644 drivers/gpu/nova-core/fsp/hal/gh100.rs
>
> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index c4a9ce8a47f8..d9f87262e8b1 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> @@ -15,7 +15,6 @@
>  };
>  
>  /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
> -#[expect(dead_code)]
>  pub(crate) struct Fsp(());
>  
>  impl RegisterBase<PFalconBase> for Fsp {
> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> new file mode 100644
> index 000000000000..f3524137d9f7
> --- /dev/null
> +++ b/drivers/gpu/nova-core/fsp.rs
> @@ -0,0 +1,73 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> +
> +//! FSP (Foundation Security Processor) interface for Hopper/Blackwell GPUs.
> +//!
> +//! Hopper/Blackwell use a simplified firmware boot sequence: FMC, then FSP, then GSP.
> +//! Unlike Turing/Ampere/Ada, there is no SEC2 (Security Engine 2) usage.
> +//! FSP handles secure boot directly using FMC firmware and Chain of Trust.
> +
> +use kernel::{
> +    device,
> +    io::poll::read_poll_timeout,
> +    prelude::*,
> +    time::Delta, //
> +};
> +
> +use crate::{
> +    driver::Bar0,
> +    falcon::{
> +        fsp::Fsp as FspEngine,
> +        Falcon, //
> +    },
> +    firmware::fsp::FspFirmware,
> +    gpu::Chipset,
> +    regs, //
> +};
> +
> +mod hal;
> +
> +/// FSP interface for Hopper/Blackwell GPUs.
> +///
> +/// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
> +/// has completed. It owns the FSP falcon and the FMC firmware, which are used for the subsequent
> +/// Chain of Trust boot.
> +pub(crate) struct Fsp {
> +    #[expect(dead_code)]
> +    falcon: Falcon<FspEngine>,
> +    #[expect(dead_code)]
> +    fsp_fw: FspFirmware,
> +}
> +
> +impl Fsp {
> +    /// Waits for FSP secure boot completion, then returns the [`Fsp`] interface.
> +    ///
> +    /// Polls the thermal scratch register until FSP signals boot completion or the timeout
> +    /// elapses. Returning an [`Fsp`] only on success guarantees, at the API level, that the
> +    /// interface is not used before secure boot has completed.
> +    pub(crate) fn wait_secure_boot(
> +        dev: &device::Device<device::Bound>,
> +        bar: &Bar0,
> +        chipset: Chipset,
> +        fsp_fw: FspFirmware,

What about constructing FspFirmware inside `wait_secure_boot`? It fits
the concept of having this Fsp object own and control the FSP. This also
matches the pattern of Gsp::boot creating its own GspFirmware.

> +    ) -> Result<Fsp> {
> +        /// FSP secure boot completion timeout in milliseconds.
> +        const FSP_SECURE_BOOT_TIMEOUT_MS: i64 = 5000;
> +
> +        let hal = hal::fsp_hal(chipset).ok_or(ENOTSUPP)?;
> +        let falcon = Falcon::<FspEngine>::new(dev, chipset)?;
> +
> +        read_poll_timeout(
> +            || Ok(hal.fsp_boot_status(bar)),
> +            |&status| status == regs::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS,
> +            Delta::from_millis(10),
> +            Delta::from_millis(FSP_SECURE_BOOT_TIMEOUT_MS),
> +        )
> +        .map_err(|_| {
> +            dev_err!(dev, "FSP secure boot completion timeout\n");
> +            ETIMEDOUT
> +        })?;

nit: this can just be inspect_err(), it will be ETIMEDOUT if it times
out.

> +
> +        Ok(Fsp { falcon, fsp_fw })
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/fsp/hal.rs b/drivers/gpu/nova-core/fsp/hal.rs
> new file mode 100644
> index 000000000000..83d1e7daa998
> --- /dev/null
> +++ b/drivers/gpu/nova-core/fsp/hal.rs
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> +
> +use crate::{
> +    driver::Bar0,
> +    gpu::{
> +        Architecture,
> +        Chipset, //
> +    },
> +};
> +
> +mod gb202;
> +mod gh100;
> +
> +pub(super) trait FspHal {
> +    /// Returns the secure boot status from the architecture-specific `NV_THERM_I2CS_SCRATCH` register.
> +    fn fsp_boot_status(&self, bar: &Bar0) -> u32;
> +}
> +
> +/// Returns the FSP HAL, or `None` if the architecture doesn't support FSP.
> +pub(crate) fn fsp_hal(chipset: Chipset) -> Option<&'static dyn FspHal> {

nit: this can be pub(super)

With above changes,

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-02  3:32   ` sashiko-bot
@ 2026-06-02  7:56     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02  7:56 UTC (permalink / raw)
  To: sashiko-bot
  Cc: sashiko-reviews, John Hubbard, Danilo Krummrich, Alex Gaynor,
	Timur Tabi, Shashank Sharma, Gary Guo, Benno Lossin, David Airlie,
	Trevor Gross, Björn Roy Baron, nova-gpu, LKML, Bjorn Helgaas,
	Simona Vetter, Boqun Feng, Eliot Courtney, Andreas Hindborg,
	Alistair Popple, Zhi Wang, Miguel Ojeda, Alice Ryhl

On Tue Jun 2, 2026 at 12:32 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The patch uses `field: expr` instead of `field <- expr` inside `init!`, violating Rust subsystem guidelines and defeating the goal of avoiding stack allocation.
> --
>
> commit 7149f8afdf16eef886b0d72d3c2139493024923e
> Author: John Hubbard <jhubbard@nvidia.com>
>
> gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
>     
> This commit extracts the SHA-384 hash, RSA public key, and RSA signature
> from FMC ELF32 firmware sections. The signatures are stored in a
> heap-allocated structure used for FSP Chain of Trust verification.
>
>> diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
>> index 011be1e571c25..db61905eac9de 100644
>> --- a/drivers/gpu/nova-core/firmware/fsp.rs
>> +++ b/drivers/gpu/nova-core/firmware/fsp.rs
>
> [ ... ]
>
>> @@ -41,7 +63,73 @@ pub(crate) fn new(
>>  
>>          Ok(Self {
>>              fmc_image,
>> -            fmc_elf: fw,
>> +            fmc_sigs: Self::extract_fmc_signatures(&fw, dev)?,
>>          })
>>      }
>> +
>> +    /// Extract FMC firmware signatures for Chain of Trust verification.
>
> [ ... ]
>
>> +        // Initialize the signatures in place to avoid building the large `FmcSignatures` on the
>> +        // stack, then fill each section from the firmware.
>> +        let signatures = KBox::init(
>> +            init!(FmcSignatures {
>> +                hash384: [0; _],
>> +                public_key: [0; _],
>> +                signature: [0; _],
>> +            })
>
> [Severity: Medium]
> Does this init! block allocate these arrays on the stack before moving them 
> to the heap?
>
> According to the Rust subsystem coding guidelines, fields that require in-place
> initialization should use the `field <- expr` syntax rather than `field: expr`. 
>
> Because `hash384: [0; _]` and the other fields use the colon syntax, the
> arrays (which are up to 384 bytes each) might be evaluated as temporaries on
> the stack, which defeats the goal mentioned in the comment above.
>
> Could these be updated to use the in-place initialization syntax, such as 
> `hash384 <- kernel::init::zeroed()` or similar?

That's a good observation. One can also derive `Zeroable` on
`FmcSignatures` to simplify further:

--- a/drivers/gpu/nova-core/firmware/fsp.rs
+++ b/drivers/gpu/nova-core/firmware/fsp.rs
@@ -29,7 +29,7 @@
 /// Structure to hold FMC signatures.
 ///
 /// C representation is used because this type is used for communication with the FSP.
-#[derive(Debug, Clone, Copy)]
+#[derive(Debug, Clone, Copy, Zeroable)]
 #[repr(C)]
 pub(crate) struct FmcSignatures {
     pub(crate) hash384: [u8; FSP_HASH_SIZE],
@@ -113,12 +113,7 @@ fn extract_fmc_signatures(
         // Initialize the signatures in place to avoid building the large `FmcSignatures` on the
         // stack, then fill each section from the firmware.
         let signatures = KBox::init(
-            init!(FmcSignatures {
-                hash384: [0; _],
-                public_key: [0; _],
-                signature: [0; _],
-            })
-            .chain(|sigs| {
+            pin_init::init_zeroed::<FmcSignatures>().chain(|sigs| {
                 // PANIC: src and dst lengths are both FSP_HASH_SIZE (verified above).
                 sigs.hash384.copy_from_slice(hash_section);
                 // PANIC: dst is sliced to src.len(); src.len() <= FSP_PKEY_SIZE per `get_section`.

I will apply this chunk when pushing.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers
  2026-06-02  3:30   ` sashiko-bot
@ 2026-06-02  8:00     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02  8:00 UTC (permalink / raw)
  To: sashiko-bot
  Cc: sashiko-reviews, John Hubbard, Danilo Krummrich, Alice Ryhl,
	Shashank Sharma, LKML, Miguel Ojeda, Gary Guo, Alex Gaynor,
	Alistair Popple, Trevor Gross, Timur Tabi, Boqun Feng,
	Eliot Courtney, Björn Roy Baron, Bjorn Helgaas, nova-gpu,
	Benno Lossin, Zhi Wang, Simona Vetter, Andreas Hindborg,
	David Airlie

On Tue Jun 2, 2026 at 12:30 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Hardware match requirement violation during sysmem flush configuration on GB10x.
> --
>
> commit 4c62a61ec72a7cc29b5d20a82279ea5490f2e759
> Author: John Hubbard <jhubbard@nvidia.com>
>
> gpu: nova-core: Blackwell: use correct sysmem flush registers
>
> This commit updates the Blackwell GPUs to route the sysmem flush page
> registers to their proper locations. It defines the registers relative to
> HSHUB0 for GB10x and FBHUB0 for GB20x, and implements the flush paths in
> their respective HALs.
>
>> diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
>> index 8d63350abf8aa..ecea4ff446fff 100644
>> --- a/drivers/gpu/nova-core/fb/hal/gb100.rs
>> +++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
>
> [ ... ]
>
>> +/// Write the sysmem flush page address through the GB10x HSHUB0 registers.
>> +///
>> +/// Both the primary and EG (egress) register pairs must be programmed to the same address,
>> +/// as required by hardware.
>> +fn write_sysmem_flush_page_gb100(bar: &Bar0, addr: Bounded<u64, 52>) {
>> +    // CAST: lower 32 bits. Hardware ignores bits 7:0.
>> +    let addr_lo = *addr as u32;
>> +    let addr_hi = addr.shr::<32, 20>().cast::<u32>();
>> +
>> +    // Write HI first. The hardware will trigger the flush on the LO write.
>> +
>> +    // Primary HSHUB pair.
>> +    bar.write(
>> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::of::<Gb100>(),
>> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_HI::zeroed().with_adr(addr_hi),
>> +    );
>> +    bar.write(
>> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::of::<Gb100>(),
>> +        regs::NV_PFB_HSHUB_PCIE_FLUSH_SYSMEM_ADDR_LO::zeroed().with_adr(addr_lo),
>> +    );
>
> [Severity: High]
> Does writing the primary LO register here trigger the flush before the EG
> registers are updated to match?
>
> The comments state that the hardware triggers the flush on the LO write, and
> that both the primary and EG pairs must be programmed to the same address.
> Because the EG pair is written after this primary LO register write, could
> the hardware perform the flush while the EG registers still contain their old,
> mismatched address?

After confirmation the sequence is identical to OpenRM's, so this
particular concern does not look valid.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
  2026-06-02  3:32   ` sashiko-bot
@ 2026-06-02  8:11   ` Eliot Courtney
  2026-06-02  8:28     ` Alexandre Courbot
  2026-06-03  0:04   ` Timur Tabi
  2 siblings, 1 reply; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02  8:11 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Extract the SHA-384 hash, RSA public key, and RSA signature from the
> FMC ELF32 firmware sections. FSP Chain of Trust verification needs
> these to validate the FMC image during boot.
>
> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware/fsp.rs | 94 ++++++++++++++++++++++++++-
>  1 file changed, 91 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
> index 011be1e571c2..db61905eac9d 100644
> --- a/drivers/gpu/nova-core/firmware/fsp.rs
> +++ b/drivers/gpu/nova-core/firmware/fsp.rs
> @@ -15,13 +15,35 @@
>      gpu::Chipset, //
>  };
>  
> +/// Size of the FSP SHA-384 hash, in bytes.
> +const FSP_HASH_SIZE: usize = 48;
> +/// Maximum size of the FSP public key (RSA-3072), in bytes.
> +///
> +/// The FMC ELF `publickey` section may be shorter, so the remaining bytes are zero-padded.
> +const FSP_PKEY_SIZE: usize = 384;
> +/// Maximum size of the FSP signature (RSA-3072), in bytes.
> +///
> +/// The FMC ELF `signature` section may be shorter, so the remaining bytes are zero-padded.
> +const FSP_SIG_SIZE: usize = 384;
> +
> +/// Structure to hold FMC signatures.
> +///
> +/// C representation is used because this type is used for communication with the FSP.
> +#[derive(Debug, Clone, Copy)]
> +#[repr(C)]
> +pub(crate) struct FmcSignatures {
> +    pub(crate) hash384: [u8; FSP_HASH_SIZE],
> +    pub(crate) public_key: [u8; FSP_PKEY_SIZE],
> +    pub(crate) signature: [u8; FSP_SIG_SIZE],
> +}
> +
>  pub(crate) struct FspFirmware {
>      /// FMC firmware image data (only the "image" ELF section).
>      #[expect(dead_code)]
>      pub(crate) fmc_image: Coherent<[u8]>,
> -    /// Full FMC ELF for signature extraction.
> +    /// FMC firmware signatures.
>      #[expect(dead_code)]
> -    pub(crate) fmc_elf: Firmware,
> +    pub(crate) fmc_sigs: KBox<FmcSignatures>,
>  }
>  
>  impl FspFirmware {
> @@ -41,7 +63,73 @@ pub(crate) fn new(
>  
>          Ok(Self {
>              fmc_image,
> -            fmc_elf: fw,
> +            fmc_sigs: Self::extract_fmc_signatures(&fw, dev)?,
>          })
>      }
> +
> +    /// Extract FMC firmware signatures for Chain of Trust verification.
> +    ///
> +    /// Extracts real cryptographic signatures from FMC ELF32 firmware sections.
> +    /// Returns signatures in a heap-allocated structure to prevent stack overflow.
> +    fn extract_fmc_signatures(
> +        fmc_fw: &Firmware,
> +        dev: &device::Device,
> +    ) -> Result<KBox<FmcSignatures>> {
> +        let get_section = |name: &str, max_len: usize| {
> +            elf::elf_section(fmc_fw.data(), name)
> +                .ok_or(EINVAL)
> +                .inspect_err(|_| dev_err!(dev, "FMC firmware missing '{}' section\n", name))
> +                .and_then(|section| {
> +                    if section.len() > max_len {
> +                        dev_err!(
> +                            dev,
> +                            "FMC {} section size {} > maximum {}\n",
> +                            name,
> +                            section.len(),
> +                            max_len
> +                        );
> +                        Err(EINVAL)
> +                    } else {
> +                        Ok(section)
> +                    }
> +                })
> +        };
> +
> +        let hash_section = get_section("hash", FSP_HASH_SIZE)?;
> +        let pkey_section = get_section("publickey", FSP_PKEY_SIZE)?;
> +        let sig_section = get_section("signature", FSP_SIG_SIZE)?;
> +
> +        // The hash section is a SHA-384 output: it must be exactly FSP_HASH_SIZE bytes.
> +        if hash_section.len() != FSP_HASH_SIZE {
> +            dev_err!(
> +                dev,
> +                "FMC hash section size {} != expected {}\n",
> +                hash_section.len(),
> +                FSP_HASH_SIZE
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        // Initialize the signatures in place to avoid building the large `FmcSignatures` on the
> +        // stack, then fill each section from the firmware.
> +        let signatures = KBox::init(
> +            init!(FmcSignatures {
> +                hash384: [0; _],
> +                public_key: [0; _],
> +                signature: [0; _],
> +            })

This proc macro will generate some code like let field = [0; _]; which
it then writes into the final init location, so it's stack-ish storage
although I guess it'll be optimised out.

optional nit: may be better to derive Zeroable and use ..Zeroable::init_zeroed()
here.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting
  2026-06-02  7:56   ` Eliot Courtney
@ 2026-06-02  8:22     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02  8:22 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 4:56 PM JST, Eliot Courtney wrote:
> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>> Hopper and Blackwell use FSP instead of SEC2 for secure boot. The
>> driver must wait for FSP secure boot to complete before continuing
>> with GSP bring-up. Poll for boot success with a 5-second timeout, and
>> return the FSP interface only on success so that later Chain of Trust
>> operations cannot run before FSP is ready. The interface owns the FSP
>> falcon and the FMC firmware.
>>
>> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/falcon/fsp.rs    |  1 -
>>  drivers/gpu/nova-core/fsp.rs           | 73 ++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/fsp/hal.rs       | 27 ++++++++++
>>  drivers/gpu/nova-core/fsp/hal/gb202.rs | 23 ++++++++
>>  drivers/gpu/nova-core/fsp/hal/gh100.rs | 23 ++++++++
>>  drivers/gpu/nova-core/gsp/hal/gh100.rs |  6 ++-
>>  drivers/gpu/nova-core/nova_core.rs     |  1 +
>>  drivers/gpu/nova-core/regs.rs          | 36 +++++++++++++
>>  8 files changed, 187 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/gpu/nova-core/fsp.rs
>>  create mode 100644 drivers/gpu/nova-core/fsp/hal.rs
>>  create mode 100644 drivers/gpu/nova-core/fsp/hal/gb202.rs
>>  create mode 100644 drivers/gpu/nova-core/fsp/hal/gh100.rs
>>
>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> index c4a9ce8a47f8..d9f87262e8b1 100644
>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>> @@ -15,7 +15,6 @@
>>  };
>>  
>>  /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
>> -#[expect(dead_code)]
>>  pub(crate) struct Fsp(());
>>  
>>  impl RegisterBase<PFalconBase> for Fsp {
>> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
>> new file mode 100644
>> index 000000000000..f3524137d9f7
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/fsp.rs
>> @@ -0,0 +1,73 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>> +
>> +//! FSP (Foundation Security Processor) interface for Hopper/Blackwell GPUs.
>> +//!
>> +//! Hopper/Blackwell use a simplified firmware boot sequence: FMC, then FSP, then GSP.
>> +//! Unlike Turing/Ampere/Ada, there is no SEC2 (Security Engine 2) usage.
>> +//! FSP handles secure boot directly using FMC firmware and Chain of Trust.
>> +
>> +use kernel::{
>> +    device,
>> +    io::poll::read_poll_timeout,
>> +    prelude::*,
>> +    time::Delta, //
>> +};
>> +
>> +use crate::{
>> +    driver::Bar0,
>> +    falcon::{
>> +        fsp::Fsp as FspEngine,
>> +        Falcon, //
>> +    },
>> +    firmware::fsp::FspFirmware,
>> +    gpu::Chipset,
>> +    regs, //
>> +};
>> +
>> +mod hal;
>> +
>> +/// FSP interface for Hopper/Blackwell GPUs.
>> +///
>> +/// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
>> +/// has completed. It owns the FSP falcon and the FMC firmware, which are used for the subsequent
>> +/// Chain of Trust boot.
>> +pub(crate) struct Fsp {
>> +    #[expect(dead_code)]
>> +    falcon: Falcon<FspEngine>,
>> +    #[expect(dead_code)]
>> +    fsp_fw: FspFirmware,
>> +}
>> +
>> +impl Fsp {
>> +    /// Waits for FSP secure boot completion, then returns the [`Fsp`] interface.
>> +    ///
>> +    /// Polls the thermal scratch register until FSP signals boot completion or the timeout
>> +    /// elapses. Returning an [`Fsp`] only on success guarantees, at the API level, that the
>> +    /// interface is not used before secure boot has completed.
>> +    pub(crate) fn wait_secure_boot(
>> +        dev: &device::Device<device::Bound>,
>> +        bar: &Bar0,
>> +        chipset: Chipset,
>> +        fsp_fw: FspFirmware,
>
> What about constructing FspFirmware inside `wait_secure_boot`? It fits
> the concept of having this Fsp object own and control the FSP. This also
> matches the pattern of Gsp::boot creating its own GspFirmware.

That makes sense, otoh since this method is named after its most
important side-effect its name does not carry the expectation of loading
some firmware image. So after consideration I think it makes sense to
keep firmware loading a separate step - I would say differently if this
was named "new", but then we lose the important fact that this is also
touching the hardware and waiting on it.

At the end of the day, `Fsp` ends up ownning the `FspFirmware`, so the
most important architectural point is addressed.

>
>> +    ) -> Result<Fsp> {
>> +        /// FSP secure boot completion timeout in milliseconds.
>> +        const FSP_SECURE_BOOT_TIMEOUT_MS: i64 = 5000;
>> +
>> +        let hal = hal::fsp_hal(chipset).ok_or(ENOTSUPP)?;
>> +        let falcon = Falcon::<FspEngine>::new(dev, chipset)?;
>> +
>> +        read_poll_timeout(
>> +            || Ok(hal.fsp_boot_status(bar)),
>> +            |&status| status == regs::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS,
>> +            Delta::from_millis(10),
>> +            Delta::from_millis(FSP_SECURE_BOOT_TIMEOUT_MS),
>> +        )
>> +        .map_err(|_| {
>> +            dev_err!(dev, "FSP secure boot completion timeout\n");
>> +            ETIMEDOUT
>> +        })?;
>
> nit: this can just be inspect_err(), it will be ETIMEDOUT if it times
> out.

Indeed, I'll fixup when applying.

>
>> +
>> +        Ok(Fsp { falcon, fsp_fw })
>> +    }
>> +}
>> diff --git a/drivers/gpu/nova-core/fsp/hal.rs b/drivers/gpu/nova-core/fsp/hal.rs
>> new file mode 100644
>> index 000000000000..83d1e7daa998
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/fsp/hal.rs
>> @@ -0,0 +1,27 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>> +
>> +use crate::{
>> +    driver::Bar0,
>> +    gpu::{
>> +        Architecture,
>> +        Chipset, //
>> +    },
>> +};
>> +
>> +mod gb202;
>> +mod gh100;
>> +
>> +pub(super) trait FspHal {
>> +    /// Returns the secure boot status from the architecture-specific `NV_THERM_I2CS_SCRATCH` register.
>> +    fn fsp_boot_status(&self, bar: &Bar0) -> u32;
>> +}
>> +
>> +/// Returns the FSP HAL, or `None` if the architecture doesn't support FSP.
>> +pub(crate) fn fsp_hal(chipset: Chipset) -> Option<&'static dyn FspHal> {
>
> nit: this can be pub(super)

Same.

>
> With above changes,
>
> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

Thanks!

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers
  2026-06-02  7:12   ` Eliot Courtney
@ 2026-06-02  8:26     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02  8:26 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 4:12 PM JST, Eliot Courtney wrote:
> On Tue Jun 2, 2026 at 12:20 PM JST, John Hubbard wrote:
>> Blackwell GPUs moved the sysmem flush page registers away from the
>> Ampere/Ada location. GB10x routes the flush through a pair of HSHUB0
>> register sets (primary and egress) that must both be programmed to
>> the same address. GB20x routes it through FBHUB0.
>>
>> Define these registers relative to their HSHUB0 and FBHUB0 bases, as
>> Open RM does, and implement the flush paths in the GB10x and GB20x
>> framebuffer HALs.
>>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>
> This patch looks correct so:
>
> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
>
> But it only updates the sysmem flush stuff for Blackwell. It looks like
> Hopper also uses different registers, so I think we should update it
> too (separate patch seems ok to me).

Indeed, OpenRM agrees with you. Hopper support seems to be rather
trivial (just 2 different registers to use IIUC).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-02  8:11   ` Eliot Courtney
@ 2026-06-02  8:28     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02  8:28 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 5:11 PM JST, Eliot Courtney wrote:
> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>> Extract the SHA-384 hash, RSA public key, and RSA signature from the
>> FMC ELF32 firmware sections. FSP Chain of Trust verification needs
>> these to validate the FMC image during boot.
>>
>> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/firmware/fsp.rs | 94 ++++++++++++++++++++++++++-
>>  1 file changed, 91 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
>> index 011be1e571c2..db61905eac9d 100644
>> --- a/drivers/gpu/nova-core/firmware/fsp.rs
>> +++ b/drivers/gpu/nova-core/firmware/fsp.rs
>> @@ -15,13 +15,35 @@
>>      gpu::Chipset, //
>>  };
>>  
>> +/// Size of the FSP SHA-384 hash, in bytes.
>> +const FSP_HASH_SIZE: usize = 48;
>> +/// Maximum size of the FSP public key (RSA-3072), in bytes.
>> +///
>> +/// The FMC ELF `publickey` section may be shorter, so the remaining bytes are zero-padded.
>> +const FSP_PKEY_SIZE: usize = 384;
>> +/// Maximum size of the FSP signature (RSA-3072), in bytes.
>> +///
>> +/// The FMC ELF `signature` section may be shorter, so the remaining bytes are zero-padded.
>> +const FSP_SIG_SIZE: usize = 384;
>> +
>> +/// Structure to hold FMC signatures.
>> +///
>> +/// C representation is used because this type is used for communication with the FSP.
>> +#[derive(Debug, Clone, Copy)]
>> +#[repr(C)]
>> +pub(crate) struct FmcSignatures {
>> +    pub(crate) hash384: [u8; FSP_HASH_SIZE],
>> +    pub(crate) public_key: [u8; FSP_PKEY_SIZE],
>> +    pub(crate) signature: [u8; FSP_SIG_SIZE],
>> +}
>> +
>>  pub(crate) struct FspFirmware {
>>      /// FMC firmware image data (only the "image" ELF section).
>>      #[expect(dead_code)]
>>      pub(crate) fmc_image: Coherent<[u8]>,
>> -    /// Full FMC ELF for signature extraction.
>> +    /// FMC firmware signatures.
>>      #[expect(dead_code)]
>> -    pub(crate) fmc_elf: Firmware,
>> +    pub(crate) fmc_sigs: KBox<FmcSignatures>,
>>  }
>>  
>>  impl FspFirmware {
>> @@ -41,7 +63,73 @@ pub(crate) fn new(
>>  
>>          Ok(Self {
>>              fmc_image,
>> -            fmc_elf: fw,
>> +            fmc_sigs: Self::extract_fmc_signatures(&fw, dev)?,
>>          })
>>      }
>> +
>> +    /// Extract FMC firmware signatures for Chain of Trust verification.
>> +    ///
>> +    /// Extracts real cryptographic signatures from FMC ELF32 firmware sections.
>> +    /// Returns signatures in a heap-allocated structure to prevent stack overflow.
>> +    fn extract_fmc_signatures(
>> +        fmc_fw: &Firmware,
>> +        dev: &device::Device,
>> +    ) -> Result<KBox<FmcSignatures>> {
>> +        let get_section = |name: &str, max_len: usize| {
>> +            elf::elf_section(fmc_fw.data(), name)
>> +                .ok_or(EINVAL)
>> +                .inspect_err(|_| dev_err!(dev, "FMC firmware missing '{}' section\n", name))
>> +                .and_then(|section| {
>> +                    if section.len() > max_len {
>> +                        dev_err!(
>> +                            dev,
>> +                            "FMC {} section size {} > maximum {}\n",
>> +                            name,
>> +                            section.len(),
>> +                            max_len
>> +                        );
>> +                        Err(EINVAL)
>> +                    } else {
>> +                        Ok(section)
>> +                    }
>> +                })
>> +        };
>> +
>> +        let hash_section = get_section("hash", FSP_HASH_SIZE)?;
>> +        let pkey_section = get_section("publickey", FSP_PKEY_SIZE)?;
>> +        let sig_section = get_section("signature", FSP_SIG_SIZE)?;
>> +
>> +        // The hash section is a SHA-384 output: it must be exactly FSP_HASH_SIZE bytes.
>> +        if hash_section.len() != FSP_HASH_SIZE {
>> +            dev_err!(
>> +                dev,
>> +                "FMC hash section size {} != expected {}\n",
>> +                hash_section.len(),
>> +                FSP_HASH_SIZE
>> +            );
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        // Initialize the signatures in place to avoid building the large `FmcSignatures` on the
>> +        // stack, then fill each section from the firmware.
>> +        let signatures = KBox::init(
>> +            init!(FmcSignatures {
>> +                hash384: [0; _],
>> +                public_key: [0; _],
>> +                signature: [0; _],
>> +            })
>
> This proc macro will generate some code like let field = [0; _]; which
> it then writes into the final init location, so it's stack-ish storage
> although I guess it'll be optimised out.
>
> optional nit: may be better to derive Zeroable and use ..Zeroable::init_zeroed()
> here.
>
> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

Yup, this was also picked up by Sashiko. I ended up doing exactly what
you suggested [1].

[1] https://lore.kernel.org/all/DIYEINL9REVG.3484XPOLDZ0KJ@nvidia.com/

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations
  2026-06-02  3:21 ` [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations John Hubbard
@ 2026-06-02 11:42   ` Eliot Courtney
  2026-06-02 14:55     ` Alexandre Courbot
  2026-06-02 15:02   ` Alexandre Courbot
  1 sibling, 1 reply; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02 11:42 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Add external memory (EMEM) read/write operations to the GPU's FSP falcon
> engine. These operations use Falcon PIO (Programmed I/O) to communicate
> with the FSP through indirect memory access.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---


> +impl Falcon<Fsp> {
> +    /// Writes `data` to FSP external memory at byte `offset`.
> +    ///
> +    /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
> +    /// if `offset` or the `data` length is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
> +        if offset % 4 != 0 || data.len() % 4 != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut emem = Emem::new(bar);
> +        emem.begin_write(offset as usize)?;
> +        for chunk in data.chunks_exact(4) {
> +            emem.write_next(u32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Reads FSP external memory at byte `offset` into `data`.
> +    ///
> +    /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
> +    /// `offset` or the `data` length is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
> +        if offset % 4 != 0 || data.len() % 4 != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut emem = Emem::new(bar);
> +        emem.begin_read(offset as usize)?;
> +        for chunk in data.chunks_exact_mut(4) {
> +            chunk.copy_from_slice(&emem.read_next().to_le_bytes());
> +        }
> +
> +        Ok(())
> +    }
> +}

Both `write_emem` and `read_emem` are only ever called with `offset` as
zero. I checked openrm, and it looks like there aren't ever writes or
reads that don't start at zero. So we could simplify the code by
removing `offset` and starting from zero if we will never use a non-zero
offset (given we have auto-increment). This also lets us remove
`EMEM_MAX_SIZE` and some `Result`s.

> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 2cb1f02f35a4..da7a10c0346a 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -475,6 +475,21 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
>      pub(crate) NV_PFALCON_FBIF_CTL(u32) @ PFalconBase + 0x00000624 {
>          7:7     allow_phys_no_ctx => bool;
>      }
> +
> +    // Falcon EMEM PIO registers (used by FSP on Hopper/Blackwell).
> +    // These provide the falcon external memory communication interface.
> +    pub(crate) NV_PFALCON_FALCON_EMEM_CTL(u32) @ PFalconBase + 0x00000ac0 {
> +        /// EMEM byte offset (must be 4-byte aligned).
> +        23:0    offset;
> +        /// Auto-increment the offset after each write.
> +        24:24   auto_increment_write => bool;
> +        /// Auto-increment the offset after each read.
> +        25:25   auto_increment_read => bool;
> +    }
> +
> +    pub(crate) NV_PFALCON_FALCON_EMEM_DATA(u32) @ PFalconBase + 0x00000ac4 {
> +        31:0    data => u32;
> +    }
>  }

In openrm, it looks like this register only has offset from 15:2 rather
than 23:0. Is the full 24 bit offset correct?

Either way, we could make the non-divisible-by-4 case unrepresentable by
making this offset 15:2 (or 23:2) rather than 23:0.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
  2026-06-02  3:33   ` sashiko-bot
@ 2026-06-02 12:21   ` Eliot Courtney
  2026-06-03  1:34     ` Alexandre Courbot
  2026-06-03  1:00   ` Alexandre Courbot
  2 siblings, 1 reply; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02 12:21 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> FSP communication uses a pair of non-circular queues in the FSP
> falcon's EMEM, one for messages from the driver to FSP and one for
> replies, with the driver polling for response data. Add the queue
> registers and the low-level helpers used by the higher-level FSP
> message layer.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/fsp.rs | 61 ++++++++++++++++++++++++++++-
>  drivers/gpu/nova-core/regs.rs       | 21 ++++++++++
>  2 files changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index 6b057d958115..0ec1c55213bc 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> @@ -112,7 +112,6 @@ impl Falcon<Fsp> {
>      ///
>      /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
>      /// if `offset` or the `data` length is not 4-byte aligned.
> -    #[expect(dead_code)]
>      fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>          if offset % 4 != 0 || data.len() % 4 != 0 {
>              return Err(EINVAL);
> @@ -131,7 +130,6 @@ fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>      ///
>      /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
>      /// `offset` or the `data` length is not 4-byte aligned.
> -    #[expect(dead_code)]
>      fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>          if offset % 4 != 0 || data.len() % 4 != 0 {
>              return Err(EINVAL);
> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>  
>          Ok(())
>      }
> +
> +    /// Poll FSP for incoming data.
> +    ///
> +    /// Returns the size of available data in bytes, or 0 if no data is available.
> +    ///
> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
> +    /// message exchange, so `tail >= head` is always true when data is present.
> +    #[expect(dead_code)]
> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
> +
> +        if head == tail {
> +            return 0;
> +        }
> +
> +        // TAIL points at last DWORD written, so add 4 to get total size
> +        tail.saturating_sub(head) + 4
> +    }

In a later patch, `send_sync_fsp` polls this then calls `recv_msg`. But,
structurally it's possible to pass in any size to `recv_msg` and read
more than we are supposed to. What about having `recv_msg` do the
polling to get the size and return a KVec with the read out data,
instead of `send_sync_fsp`? `poll_msgq` could stay private and we can
make it public later if we need to.

> +
> +    /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
> +    ///
> +    /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
> +        if packet.is_empty() {
> +            return Err(EINVAL);
> +        }
> +
> +        // Write message to EMEM at offset 0 (validates 4-byte alignment)
> +        self.write_emem(bar, 0, packet)?;
> +
> +        // Update queue pointers. TAIL points at the last DWORD written.
> +        let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
> +        bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
> +        bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
> +
> +        Ok(())
> +    }
> +
> +    /// Reads `size` bytes from FSP EMEM into `buffer` and resets the queue pointers.
> +    ///
> +    /// `size` comes from `poll_msgq`. Returns `EINVAL` if `size` is 0, exceeds
> +    /// `buffer`, or is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    pub(crate) fn recv_msg(&mut self, bar: &Bar0, buffer: &mut [u8], size: usize) -> Result {
> +        if size == 0 || size > buffer.len() {
> +            return Err(EINVAL);
> +        }
> +
> +        // Read response from EMEM at offset 0 (validates 4-byte alignment)
> +        self.read_emem(bar, 0, &mut buffer[..size])?;
> +
> +        // Reset message queue pointers after reading
> +        bar.write_reg(regs::NV_PFSP_MSGQ_TAIL::zeroed().with_address(0));
> +        bar.write_reg(regs::NV_PFSP_MSGQ_HEAD::zeroed().with_address(0));
> +
> +        Ok(())
> +    }

I think we can remove the `size` argument and have the caller pass in
an appropriately sized slice (altho obviated by my other comment).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (21 preceding siblings ...)
  2026-06-02  3:21 ` [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path John Hubbard
@ 2026-06-02 12:38 ` Danilo Krummrich
  2026-06-02 13:37 ` Alexandre Courbot
  23 siblings, 0 replies; 65+ messages in thread
From: Danilo Krummrich @ 2026-06-02 12:38 UTC (permalink / raw)
  To: John Hubbard
  Cc: Alexandre Courbot, Timur Tabi, Alistair Popple, Eliot Courtney,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 5:20 AM CEST, John Hubbard wrote:
> * Made the FSP messaging path sound. The FSP falcon's EMEM window is a
>   stateful register pair (program an offset, then touch the data
>   register), so modeling it as a stateless I/O region let aliasing
>   accesses corrupt each other's offset with no unsafe at the call site.

It's not really that the problem was unsoundness, there was no Sync requirement
in the first place as the type never left probe(). Once it'd have gotten a Sync
requirement, we'd have to use a lock to protect it.

>   The EMEM accessors and the send/receive helpers now take &mut self, so
>   the falcon handle is the exclusive token for an in-flight exchange,

Once this will be used from a concurrent context, it will need a lock as well in
order to perform a mutable borrow.

So, it is not that one approach fundamentally works in a concurrent context and
the other doesn't.

It's more that the I/O backend approach did turn out to not add value
considering the implementation details (auto-increment streaming pattern) of the
EMEM window and how it is actually used (no random field access).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication
  2026-06-02  3:21 ` [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication John Hubbard
  2026-06-02  5:36   ` sashiko-bot
@ 2026-06-02 12:53   ` Eliot Courtney
  1 sibling, 0 replies; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02 12:53 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Add the MCTP (Management Component Transport Protocol) and NVDM (NVIDIA
> Device Management) wire-format types used for communication between the

I think NVDM here is "data model" not "device management" (and below).

> kernel driver and GPU firmware processors.
>
> This includes typed MCTP transport headers, NVDM message headers, and
> NVDM message type identifiers. Both the FSP boot path and the upcoming
> GSP RPC message queue share this protocol layer.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/mctp.rs      | 102 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  2 files changed, 103 insertions(+)
>  create mode 100644 drivers/gpu/nova-core/mctp.rs
>
> diff --git a/drivers/gpu/nova-core/mctp.rs b/drivers/gpu/nova-core/mctp.rs
> new file mode 100644
> index 000000000000..a13146dc0cca
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mctp.rs
> @@ -0,0 +1,102 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> +
> +//! MCTP/NVDM protocol types for NVIDIA GPU firmware communication.
> +//!
> +//! MCTP (Management Component Transport Protocol) carries NVDM (NVIDIA
> +//! Device Management) messages between the kernel driver and GPU firmware
> +//! processors such as FSP and GSP.
> +
> +#![expect(dead_code)]
> +
> +use kernel::pci::Vendor;
> +
> +/// NVDM message type identifiers carried over MCTP.
> +#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
> +#[repr(u8)]
> +pub(crate) enum NvdmType {
> +    #[default]
> +    /// Chain of Trust boot message.
> +    Cot = 0x14,
> +    /// FSP command response.
> +    FspResponse = 0x15,
> +}
> +
> +impl TryFrom<u8> for NvdmType {
> +    type Error = u8;
> +
> +    fn try_from(value: u8) -> Result<Self, Self::Error> {
> +        match value {
> +            x if x == u8::from(Self::Cot) => Ok(Self::Cot),
> +            x if x == u8::from(Self::FspResponse) => Ok(Self::FspResponse),
> +            _ => Err(value),
> +        }
> +    }
> +}
> +
> +impl From<NvdmType> for u8 {
> +    fn from(value: NvdmType) -> Self {
> +        value as u8
> +    }
> +}
> +
> +bitfield! {
> +    pub(crate) struct MctpHeader(u32), "MCTP transport header for NVIDIA firmware messages." {
> +        31:31 som as bool, "Start-of-message bit.";
> +        30:30 eom as bool, "End-of-message bit.";
> +        29:28 seq as u8, "Packet sequence number.";
> +        23:16 seid as u8, "Source endpoint ID.";
> +    }
> +}
> +
> +impl MctpHeader {
> +    /// Builds a single-packet MCTP header (`SOM=1`, `EOM=1`, `SEQ=0`, `SEID=0`).
> +    pub(crate) fn single_packet() -> Self {
> +        Self::default().set_som(true).set_eom(true)
> +    }
> +
> +    /// Returns whether this is a complete single-packet message (`SOM=1` and `EOM=1`).
> +    pub(crate) fn is_single_packet(self) -> bool {
> +        self.som() && self.eom()
> +    }
> +}
> +
> +impl From<u32> for MctpHeader {
> +    fn from(raw: u32) -> Self {
> +        Self(raw)
> +    }
> +}

This `From` implementation looks unused in this series and the new bitfield
impl generates this IIUC, so I think we can remove this.

> +
> +/// MCTP message type for PCI vendor-defined messages.
> +const MSG_TYPE_VENDOR_PCI: u8 = 0x7e;
> +
> +bitfield! {
> +    pub(crate) struct NvdmHeader(u32), "NVIDIA Vendor-Defined Message header over MCTP." {
> +        31:24 nvdm_type as u8 ?=> NvdmType, "NVDM message type.";
> +        23:8 vendor_id as u16, "PCI vendor ID.";
> +        6:0 msg_type as u8, "MCTP vendor-defined message type.";
> +    }
> +}
> +
> +impl NvdmHeader {
> +    /// Builds an NVDM header for the given message type.
> +    pub(crate) fn new(nvdm_type: NvdmType) -> Self {
> +        Self::default()
> +            .set_msg_type(MSG_TYPE_VENDOR_PCI)
> +            .set_vendor_id(Vendor::NVIDIA.as_raw())
> +            .set_nvdm_type(nvdm_type)
> +    }
> +
> +    /// Validates this header against the expected NVIDIA NVDM format and type.
> +    pub(crate) fn validate(self, expected_type: NvdmType) -> bool {
> +        self.msg_type() == MSG_TYPE_VENDOR_PCI
> +            && self.vendor_id() == Vendor::NVIDIA.as_raw()
> +            && matches!(self.nvdm_type(), Ok(nvdm_type) if nvdm_type == expected_type)
> +    }
> +}
> +
> +impl From<u32> for NvdmHeader {
> +    fn from(raw: u32) -> Self {
> +        Self(raw)
> +    }
> +}

This `From` implementation looks unused.

With above comments resolved,

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version
  2026-06-02  3:21 ` [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version John Hubbard
@ 2026-06-02 12:55   ` Eliot Courtney
  0 siblings, 0 replies; 65+ messages in thread
From: Eliot Courtney @ 2026-06-02 12:55 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Timur Tabi, Alistair Popple, Eliot Courtney, Shashank Sharma,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> The FSP Chain of Trust handshake is versioned: Hopper speaks version 1
> and Blackwell speaks version 2. Provide the version through the FSP HAL
> so the boot message carries the value FSP expects, and so chipsets that
> do not use FSP need not express a version at all.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support
  2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
                   ` (22 preceding siblings ...)
  2026-06-02 12:38 ` [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support Danilo Krummrich
@ 2026-06-02 13:37 ` Alexandre Courbot
  23 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02 13:37 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Timur Tabi, Alistair Popple, Eliot Courtney,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

The following patches:

>   gpu: nova-core: set DMA mask width based on GPU architecture
>   gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror
>   gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size
>   gpu: nova-core: Hopper/Blackwell: larger non-WPR heap
>   gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap
>   gpu: nova-core: Blackwell: use correct sysmem flush registers
>   gpu: nova-core: don't assume 64-bit firmware images
>   gpu: nova-core: add support for 32-bit firmware images
>   gpu: nova-core: add auto-detection of 32-bit, 64-bit firmware images
>   gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub
>   gpu: nova-core: Hopper/Blackwell: add FMC firmware image
>   gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion
>     waiting
>   gpu: nova-core: Hopper/Blackwell: add FMC signature extraction

pushed to `drm-rust-next`, thanks! Let's have another (hopefully final)
round on the remaining ones.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations
  2026-06-02 11:42   ` Eliot Courtney
@ 2026-06-02 14:55     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02 14:55 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 8:42 PM JST, Eliot Courtney wrote:
> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>> Add external memory (EMEM) read/write operations to the GPU's FSP falcon
>> engine. These operations use Falcon PIO (Programmed I/O) to communicate
>> with the FSP through indirect memory access.
>>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>
>
>> +impl Falcon<Fsp> {
>> +    /// Writes `data` to FSP external memory at byte `offset`.
>> +    ///
>> +    /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
>> +    /// if `offset` or the `data` length is not 4-byte aligned.
>> +    #[expect(dead_code)]
>> +    fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>> +        if offset % 4 != 0 || data.len() % 4 != 0 {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        let mut emem = Emem::new(bar);
>> +        emem.begin_write(offset as usize)?;
>> +        for chunk in data.chunks_exact(4) {
>> +            emem.write_next(u32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
>> +        }
>> +
>> +        Ok(())
>> +    }
>> +
>> +    /// Reads FSP external memory at byte `offset` into `data`.
>> +    ///
>> +    /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
>> +    /// `offset` or the `data` length is not 4-byte aligned.
>> +    #[expect(dead_code)]
>> +    fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>> +        if offset % 4 != 0 || data.len() % 4 != 0 {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        let mut emem = Emem::new(bar);
>> +        emem.begin_read(offset as usize)?;
>> +        for chunk in data.chunks_exact_mut(4) {
>> +            chunk.copy_from_slice(&emem.read_next().to_le_bytes());
>> +        }
>> +
>> +        Ok(())
>> +    }
>> +}
>
> Both `write_emem` and `read_emem` are only ever called with `offset` as
> zero. I checked openrm, and it looks like there aren't ever writes or
> reads that don't start at zero. So we could simplify the code by
> removing `offset` and starting from zero if we will never use a non-zero
> offset (given we have auto-increment). This also lets us remove
> `EMEM_MAX_SIZE` and some `Result`s.

IIUC the way FSP communication works is that EMEM is (in theory) divided
into 8 channels. Only one message or reply is ever processed at a given
time (i.e. this is not a ring buffer), so the only values valid for
`offset` are that start address of each channels.

So while we might need to have some sort of pseudo-offset in the future,
a freeform one is definitely not adequate for EMEM and for the time
being we are only working with channel 0 anyway. Thus I agree that it
makes sense to remove it altogether for now.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations
  2026-06-02  3:21 ` [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations John Hubbard
  2026-06-02 11:42   ` Eliot Courtney
@ 2026-06-02 15:02   ` Alexandre Courbot
  1 sibling, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-02 15:02 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Timur Tabi, Alistair Popple, Eliot Courtney,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Add external memory (EMEM) read/write operations to the GPU's FSP falcon
> engine. These operations use Falcon PIO (Programmed I/O) to communicate
> with the FSP through indirect memory access.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/fsp.rs | 130 ++++++++++++++++++++++++++--
>  drivers/gpu/nova-core/regs.rs       |  15 ++++
>  2 files changed, 140 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index d9f87262e8b1..6b057d958115 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> @@ -6,12 +6,28 @@
>  //! The FSP falcon handles secure boot and Chain of Trust operations
>  //! on Hopper and Blackwell architectures, replacing SEC2's role.
>  
> -use kernel::io::register::RegisterBase;
> +use kernel::{
> +    io::{
> +        register::{
> +            RegisterBase,
> +            WithBase, //
> +        },
> +        Io, //
> +    },
> +    num::Bounded,
> +    prelude::*,
> +    ptr::Alignment, //
> +};
>  
> -use crate::falcon::{
> -    FalconEngine,
> -    PFalcon2Base,
> -    PFalconBase, //
> +use crate::{
> +    driver::Bar0,
> +    falcon::{
> +        Falcon,
> +        FalconEngine,
> +        PFalcon2Base,
> +        PFalconBase, //
> +    },
> +    regs,
>  };
>  
>  /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
> @@ -26,3 +42,107 @@ impl RegisterBase<PFalcon2Base> for Fsp {
>  }
>  
>  impl FalconEngine for Fsp {}
> +
> +/// Maximum addressable EMEM size, derived from the 24-bit offset field
> +/// in `NV_PFALCON_FALCON_EMEM_CTL`.
> +const EMEM_MAX_SIZE: Alignment = Alignment::new::<{ 1 << 24 }>();
> +
> +/// I/O backend for the FSP falcon's external memory (EMEM).
> +///
> +/// `EMEM_CTL` is programmed once with a start offset and an auto-increment
> +/// mode, then each access to `EMEM_DATA` advances the offset by one 32-bit
> +/// word in hardware.
> +struct Emem<'a> {
> +    bar: &'a Bar0,
> +}
> +
> +impl<'a> Emem<'a> {
> +    fn new(bar: &'a Bar0) -> Self {
> +        Self { bar }
> +    }
> +
> +    /// Programs `EMEM_CTL` with the start byte `offset` and the `ctl` mode bits.
> +    ///
> +    /// Returns `EINVAL` if `offset` is outside the addressable EMEM window.
> +    fn program(&mut self, offset: usize, ctl: regs::NV_PFALCON_FALCON_EMEM_CTL) -> Result {
> +        let offset = Bounded::<usize, { EMEM_MAX_SIZE.log2() }>::try_new(offset)
> +            .map(Bounded::cast::<u32>)
> +            .ok_or(EINVAL)?;
> +
> +        self.bar
> +            .write(WithBase::of::<Fsp>(), ctl.with_offset(offset));
> +
> +        Ok(())
> +    }

If we follow Eliot's suggestion to drop `offset` for now, then I guess
will method will bring no extra benefit and can be simply inlined in
`begin_write` and `begin_read`.

> +
> +    /// Begins a write burst at byte `offset`, auto-incrementing on each write.
> +    fn begin_write(&mut self, offset: usize) -> Result {
> +        self.program(
> +            offset,
> +            regs::NV_PFALCON_FALCON_EMEM_CTL::zeroed().with_auto_increment_write(true),
> +        )
> +    }
> +
> +    /// Begins a read burst at byte `offset`, auto-incrementing on each read.
> +    fn begin_read(&mut self, offset: usize) -> Result {
> +        self.program(
> +            offset,
> +            regs::NV_PFALCON_FALCON_EMEM_CTL::zeroed().with_auto_increment_read(true),
> +        )
> +    }
> +
> +    /// Writes the next 32-bit `value`; hardware advances the offset.
> +    fn write_next(&mut self, value: u32) {
> +        self.bar.write(
> +            WithBase::of::<Fsp>(),
> +            regs::NV_PFALCON_FALCON_EMEM_DATA::zeroed().with_data(value),
> +        );
> +    }
> +
> +    /// Reads the next 32-bit word; hardware advances the offset.
> +    fn read_next(&mut self) -> u32 {
> +        self.bar
> +            .read(regs::NV_PFALCON_FALCON_EMEM_DATA::of::<Fsp>())
> +            .data()
> +    }
> +}
> +
> +impl Falcon<Fsp> {
> +    /// Writes `data` to FSP external memory at byte `offset`.
> +    ///
> +    /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
> +    /// if `offset` or the `data` length is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
> +        if offset % 4 != 0 || data.len() % 4 != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut emem = Emem::new(bar);
> +        emem.begin_write(offset as usize)?;
> +        for chunk in data.chunks_exact(4) {
> +            emem.write_next(u32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Reads FSP external memory at byte `offset` into `data`.
> +    ///
> +    /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
> +    /// `offset` or the `data` length is not 4-byte aligned.
> +    #[expect(dead_code)]
> +    fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
> +        if offset % 4 != 0 || data.len() % 4 != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut emem = Emem::new(bar);
> +        emem.begin_read(offset as usize)?;
> +        for chunk in data.chunks_exact_mut(4) {
> +            chunk.copy_from_slice(&emem.read_next().to_le_bytes());
> +        }
> +
> +        Ok(())
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 2cb1f02f35a4..da7a10c0346a 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -475,6 +475,21 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
>      pub(crate) NV_PFALCON_FBIF_CTL(u32) @ PFalconBase + 0x00000624 {
>          7:7     allow_phys_no_ctx => bool;
>      }
> +
> +    // Falcon EMEM PIO registers (used by FSP on Hopper/Blackwell).
> +    // These provide the falcon external memory communication interface.
> +    pub(crate) NV_PFALCON_FALCON_EMEM_CTL(u32) @ PFalconBase + 0x00000ac0 {

OpenRM calls this register `NV_PFSP_EMEMC`. For some reason it is
defined with an absolute address, despite at least another variant for
the GSP existing. Thus I think it makes sense to keep it relative in
Nova, with maybe a name closer to its OpenRM counterpart, e.g.
`NV_PFALCON_FALCON_EMEMC`.

> +        /// EMEM byte offset (must be 4-byte aligned).
> +        23:0    offset;

In OpenRM this is actually two fields, `offs` and `blk`. The two
lowest-bits are also not used, we should do the same both for accuracy
and to make users enforce the 4 bytes alignment.

> +        /// Auto-increment the offset after each write.
> +        24:24   auto_increment_write => bool;
> +        /// Auto-increment the offset after each read.
> +        25:25   auto_increment_read => bool;

Similarly in OpenRM these fields are `aincw` and `aincr` - let's use the
same names for discoverability.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
  2026-06-02  3:32   ` sashiko-bot
  2026-06-02  8:11   ` Eliot Courtney
@ 2026-06-03  0:04   ` Timur Tabi
  2026-06-03  0:20     ` Alexandre Courbot
  2 siblings, 1 reply; 65+ messages in thread
From: Timur Tabi @ 2026-06-03  0:04 UTC (permalink / raw)
  To: Alexandre Courbot, dakr@kernel.org, John Hubbard
  Cc: Shashank Sharma, lossin@kernel.org, boqun.feng@gmail.com,
	a.hindborg@kernel.org, Zhi Wang, simona@ffwll.ch,
	alex.gaynor@gmail.com, ojeda@kernel.org, nova-gpu@lists.linux.dev,
	tmgross@umich.edu, linux-kernel@vger.kernel.org,
	bjorn3_gh@protonmail.com, Eliot Courtney, airlied@gmail.com,
	aliceryhl@google.com, bhelgaas@google.com, gary@garyguo.net,
	Alistair Popple

On Mon, 2026-06-01 at 20:21 -0700, John Hubbard wrote:
> +/// Size of the FSP SHA-384 hash, in bytes.
> +const FSP_HASH_SIZE: usize = 48;
> +/// Maximum size of the FSP public key (RSA-3072), in bytes.
> +///
> +/// The FMC ELF `publickey` section may be shorter, so the remaining bytes are zero-padded.
> +const FSP_PKEY_SIZE: usize = 384;
> +/// Maximum size of the FSP signature (RSA-3072), in bytes.
> +///
> +/// The FMC ELF `signature` section may be shorter, so the remaining bytes are zero-padded.
> +const FSP_SIG_SIZE: usize = 384;

How can we be certain that these lengths are really fixed?  I can add these length restrictions to
the script that generates the files, but I want to be really certain that they will be correct for
all time.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-03  0:04   ` Timur Tabi
@ 2026-06-03  0:20     ` Alexandre Courbot
  2026-06-03  3:09       ` Timur Tabi
  0 siblings, 1 reply; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  0:20 UTC (permalink / raw)
  To: Timur Tabi
  Cc: dakr@kernel.org, John Hubbard, Shashank Sharma, lossin@kernel.org,
	boqun.feng@gmail.com, a.hindborg@kernel.org, Zhi Wang,
	simona@ffwll.ch, alex.gaynor@gmail.com, ojeda@kernel.org,
	nova-gpu@lists.linux.dev, tmgross@umich.edu,
	linux-kernel@vger.kernel.org, bjorn3_gh@protonmail.com,
	Eliot Courtney, airlied@gmail.com, aliceryhl@google.com,
	bhelgaas@google.com, gary@garyguo.net, Alistair Popple

On Wed Jun 3, 2026 at 9:04 AM JST, Timur Tabi wrote:
> On Mon, 2026-06-01 at 20:21 -0700, John Hubbard wrote:
>> +/// Size of the FSP SHA-384 hash, in bytes.
>> +const FSP_HASH_SIZE: usize = 48;
>> +/// Maximum size of the FSP public key (RSA-3072), in bytes.
>> +///
>> +/// The FMC ELF `publickey` section may be shorter, so the remaining bytes are zero-padded.
>> +const FSP_PKEY_SIZE: usize = 384;
>> +/// Maximum size of the FSP signature (RSA-3072), in bytes.
>> +///
>> +/// The FMC ELF `signature` section may be shorter, so the remaining bytes are zero-padded.
>> +const FSP_SIG_SIZE: usize = 384;
>
> How can we be certain that these lengths are really fixed?  I can add these length restrictions to
> the script that generates the files, but I want to be really certain that they will be correct for
> all time.

These are RSA-3072 keys/signatures, which are by convention 384 bytes
(or 3072 bits), so I think we can rely on that assumption as long as the
content type doesn't change. I am actually surprised that the comment
says the section can be shorter - that doesn't sound possible with
RSA-30272 .

Does the script have a way to validate the length? If so, I agree it is
a good idea to do so, to flag the problem before it shows up in the
driver.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
  2026-06-02  3:33   ` sashiko-bot
  2026-06-02 12:21   ` Eliot Courtney
@ 2026-06-03  1:00   ` Alexandre Courbot
  2 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  1:00 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Timur Tabi, Alistair Popple, Eliot Courtney,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
<snip>
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index da7a10c0346a..8c51609d0281 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -576,6 +576,27 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
>      }
>  }
>  
> +// FSP (Foundation Security Processor) queue registers for Hopper/Blackwell Chain of Trust.
> +// These registers manage falcon EMEM communication queues.
> +
> +register! {
> +    pub(crate) NV_PFSP_QUEUE_HEAD(u32) @ 0x008f2c00 {
> +        31:0    address => u32;
> +    }
> +
> +    pub(crate) NV_PFSP_QUEUE_TAIL(u32) @ 0x008f2c04 {
> +        31:0    address => u32;
> +    }
> +
> +    pub(crate) NV_PFSP_MSGQ_HEAD(u32) @ 0x008f2c80 {
> +        31:0    address => u32;
> +    }
> +
> +    pub(crate) NV_PFSP_MSGQ_TAIL(u32) @ 0x008f2c84 {
> +        31:0    address => u32;
> +    }

For these last two, OpenRM names the field `val`, not `address`.

> +}

In OpenRM these 4 registers are also actually arrays of 8 (because of
the 8 channels for which EMEM is also divided). Although we only use the
first one for now, I think that fact should be reflected here, e.g.

    pub(crate) NV_PFSP_QUEUE_HEAD(u32)[8] @ 0x008f2c00 {
        ...
    }

With user code converted to e.g:

    bar.write(
        Array::at(0),
        regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0),
    );

`poll_msgq`, `send_msg` an `recv_msg` should normally take an extra
`channel` argument as a result, but since we are not dividing EMEM
accordingly and using only channel 0 for now, I think this can be done
later.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-02  3:33   ` sashiko-bot
@ 2026-06-03  1:14     ` Alexandre Courbot
  2026-06-03  1:41       ` Eliot Courtney
  0 siblings, 1 reply; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  1:14 UTC (permalink / raw)
  To: sashiko-bot
  Cc: sashiko-reviews, Danilo Krummrich, John Hubbard, Simona Vetter,
	Benno Lossin, Bjorn Helgaas, Trevor Gross, Eliot Courtney,
	Björn Roy Baron, Timur Tabi, Shashank Sharma, Zhi Wang,
	Boqun Feng, Alistair Popple, Gary Guo, Andreas Hindborg,
	Alex Gaynor, Alice Ryhl, Miguel Ojeda, David Airlie, nova-gpu,
	LKML

On Tue Jun 2, 2026 at 12:33 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] Race condition due to incorrect order of writing queue pointers in `send_msg`.
> - [High] Silent failure when sending a 4-byte message.
> - [High] Unchecked integer addition in `poll_msgq` can cause a kernel panic on untrusted MMIO reads.
> --
>
> commit 8a59b49984011320aa631b20a05550242e2baef2
> Author: John Hubbard <jhubbard@nvidia.com>
>
> gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
>
> This commit adds the foundation security processor (FSP) message
> infrastructure for the nova-core GPU driver. It implements queue
> registers and low-level helpers to communicate with the FSP falcon's EMEM.
>
>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> index 6b057d958115a..0ec1c55213bc8 100644
>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> [ ... ]
>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>  
>>          Ok(())
>>      }
>> +
>> +    /// Poll FSP for incoming data.
>> +    ///
>> +    /// Returns the size of available data in bytes, or 0 if no data is available.
>> +    ///
>> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
>> +    /// message exchange, so `tail >= head` is always true when data is present.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>> +
>> +        if head == tail {
>> +            return 0;
>> +        }
>> +
>> +        // TAIL points at last DWORD written, so add 4 to get total size
>> +        tail.saturating_sub(head) + 4
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [Severity: High]
> Can this unchecked addition panic if we encounter a PCIe link failure?
>
> If `head` reads normally as 0 but a surprise removal occurs before reading
> `tail`, `tail` could read as 0xFFFFFFFF. The saturating subtraction would
> leave 0xFFFFFFFF, and the subsequent `+ 4` will overflow, causing a panic in
> debug builds where CONFIG_RUST_OVERFLOW_CHECKS=y is set.
>
> Should this use `saturating_add(4)` or another form of checked math to handle
> untrusted hardware inputs safely?

This is cheap to fix, and indeed probably a good idea.

>
>> +    }
>> +
>> +    /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>> +    ///
>> +    /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
>> +        if packet.is_empty() {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        // Write message to EMEM at offset 0 (validates 4-byte alignment)
>> +        self.write_emem(bar, 0, packet)?;
>> +
>> +        // Update queue pointers. TAIL points at the last DWORD written.
>> +        let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [Severity: High]
> Does this create a silent failure if a 4-byte message is sent?
>
> If `packet.len()` is 4, `tail_offset` is calculated as 0. Both NV_PFSP_QUEUE_HEAD
> and NV_PFSP_QUEUE_TAIL will then be written as 0. Because the FSP protocol treats
> `HEAD == TAIL` as an empty queue, the FSP will ignore the message while the
> function still returns `Ok(())`.
>
> Should this enforce a minimum packet length of 8 bytes or return an error
> for 4-byte messages?

This one looks bogus (looks like the claim that `HEAD == TAIL` means
empty queue is hallucinated?).

>
>> +        bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
>> +        bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [Severity: High]
> Is there a race condition with the FSP hardware due to the order of these
> writes?
>
> By writing NV_PFSP_QUEUE_TAIL before NV_PFSP_QUEUE_HEAD, if QUEUE_HEAD was
> non-zero from a prior transaction, we momentarily create a state where
> `TAIL != HEAD`. The FSP may wake up immediately and read from the old, incorrect
> QUEUE_HEAD offset before the driver can write 0 to QUEUE_HEAD.
>
> Should NV_PFSP_QUEUE_HEAD be initialized to 0 before writing the
> NV_PFSP_QUEUE_TAIL doorbell?

Here IIUC QUEUE_HEAD is always `0`, so this is a non-issue.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-02 12:21   ` Eliot Courtney
@ 2026-06-03  1:34     ` Alexandre Courbot
  2026-06-03  4:49       ` Eliot Courtney
  0 siblings, 1 reply; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  1:34 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 9:21 PM JST, Eliot Courtney wrote:
> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>> FSP communication uses a pair of non-circular queues in the FSP
>> falcon's EMEM, one for messages from the driver to FSP and one for
>> replies, with the driver polling for response data. Add the queue
>> registers and the low-level helpers used by the higher-level FSP
>> message layer.
>>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/falcon/fsp.rs | 61 ++++++++++++++++++++++++++++-
>>  drivers/gpu/nova-core/regs.rs       | 21 ++++++++++
>>  2 files changed, 80 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> index 6b057d958115..0ec1c55213bc 100644
>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>> @@ -112,7 +112,6 @@ impl Falcon<Fsp> {
>>      ///
>>      /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
>>      /// if `offset` or the `data` length is not 4-byte aligned.
>> -    #[expect(dead_code)]
>>      fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>>          if offset % 4 != 0 || data.len() % 4 != 0 {
>>              return Err(EINVAL);
>> @@ -131,7 +130,6 @@ fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>>      ///
>>      /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
>>      /// `offset` or the `data` length is not 4-byte aligned.
>> -    #[expect(dead_code)]
>>      fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>          if offset % 4 != 0 || data.len() % 4 != 0 {
>>              return Err(EINVAL);
>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>  
>>          Ok(())
>>      }
>> +
>> +    /// Poll FSP for incoming data.
>> +    ///
>> +    /// Returns the size of available data in bytes, or 0 if no data is available.
>> +    ///
>> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
>> +    /// message exchange, so `tail >= head` is always true when data is present.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>> +
>> +        if head == tail {
>> +            return 0;
>> +        }
>> +
>> +        // TAIL points at last DWORD written, so add 4 to get total size
>> +        tail.saturating_sub(head) + 4
>> +    }
>
> In a later patch, `send_sync_fsp` polls this then calls `recv_msg`. But,
> structurally it's possible to pass in any size to `recv_msg` and read
> more than we are supposed to. What about having `recv_msg` do the
> polling to get the size and return a KVec with the read out data,
> instead of `send_sync_fsp`? `poll_msgq` could stay private and we can
> make it public later if we need to.

The issue I see with returning a `KVec` is that it imposes a dynamic
allocation for every message. Granted, this is what the current code
does, but now that we have this `&mut self` logic in place that
guarantees exclusive access, we can also turn the receiving `KVec` into
a member of `Fsp` and keep passing it as a mut reference to avoid that.

>
>> +
>> +    /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>> +    ///
>> +    /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
>> +        if packet.is_empty() {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        // Write message to EMEM at offset 0 (validates 4-byte alignment)
>> +        self.write_emem(bar, 0, packet)?;
>> +
>> +        // Update queue pointers. TAIL points at the last DWORD written.
>> +        let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
>> +        bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
>> +        bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
>> +
>> +        Ok(())
>> +    }
>> +
>> +    /// Reads `size` bytes from FSP EMEM into `buffer` and resets the queue pointers.
>> +    ///
>> +    /// `size` comes from `poll_msgq`. Returns `EINVAL` if `size` is 0, exceeds
>> +    /// `buffer`, or is not 4-byte aligned.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn recv_msg(&mut self, bar: &Bar0, buffer: &mut [u8], size: usize) -> Result {
>> +        if size == 0 || size > buffer.len() {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        // Read response from EMEM at offset 0 (validates 4-byte alignment)
>> +        self.read_emem(bar, 0, &mut buffer[..size])?;
>> +
>> +        // Reset message queue pointers after reading
>> +        bar.write_reg(regs::NV_PFSP_MSGQ_TAIL::zeroed().with_address(0));
>> +        bar.write_reg(regs::NV_PFSP_MSGQ_HEAD::zeroed().with_address(0));
>> +
>> +        Ok(())
>> +    }
>
> I think we can remove the `size` argument and have the caller pass in
> an appropriately sized slice (altho obviated by my other comment).

Agreed, having both a slice and a length parameter is redundant and
requires extra checks that shouldn't be necessary. `recv_msg` is also
called right after we resized the receiving vector to the right size, so
we currently do have a call-time guarantee that `size == buffer.len()`.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-03  1:14     ` Alexandre Courbot
@ 2026-06-03  1:41       ` Eliot Courtney
  0 siblings, 0 replies; 65+ messages in thread
From: Eliot Courtney @ 2026-06-03  1:41 UTC (permalink / raw)
  To: Alexandre Courbot, sashiko-bot
  Cc: sashiko-reviews, Danilo Krummrich, John Hubbard, Simona Vetter,
	Benno Lossin, Bjorn Helgaas, Trevor Gross, Eliot Courtney,
	Björn Roy Baron, Timur Tabi, Shashank Sharma, Zhi Wang,
	Boqun Feng, Alistair Popple, Gary Guo, Andreas Hindborg,
	Alex Gaynor, Alice Ryhl, Miguel Ojeda, David Airlie, nova-gpu,
	LKML

On Wed Jun 3, 2026 at 10:14 AM JST, Alexandre Courbot wrote:
> On Tue Jun 2, 2026 at 12:33 PM JST, sashiko-bot wrote:
>> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>> - [High] Race condition due to incorrect order of writing queue pointers in `send_msg`.
>> - [High] Silent failure when sending a 4-byte message.
>> - [High] Unchecked integer addition in `poll_msgq` can cause a kernel panic on untrusted MMIO reads.
>> --
>>
>> commit 8a59b49984011320aa631b20a05550242e2baef2
>> Author: John Hubbard <jhubbard@nvidia.com>
>>
>> gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
>>
>> This commit adds the foundation security processor (FSP) message
>> infrastructure for the nova-core GPU driver. It implements queue
>> registers and low-level helpers to communicate with the FSP falcon's EMEM.
>>
>>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>>> index 6b057d958115a..0ec1c55213bc8 100644
>>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>> [ ... ]
>>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>>  
>>>          Ok(())
>>>      }
>>> +
>>> +    /// Poll FSP for incoming data.
>>> +    ///
>>> +    /// Returns the size of available data in bytes, or 0 if no data is available.
>>> +    ///
>>> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
>>> +    /// message exchange, so `tail >= head` is always true when data is present.
>>> +    #[expect(dead_code)]
>>> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>>> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>>> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>>> +
>>> +        if head == tail {
>>> +            return 0;
>>> +        }
>>> +
>>> +        // TAIL points at last DWORD written, so add 4 to get total size
>>> +        tail.saturating_sub(head) + 4
>>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> [Severity: High]
>> Can this unchecked addition panic if we encounter a PCIe link failure?
>>
>> If `head` reads normally as 0 but a surprise removal occurs before reading
>> `tail`, `tail` could read as 0xFFFFFFFF. The saturating subtraction would
>> leave 0xFFFFFFFF, and the subsequent `+ 4` will overflow, causing a panic in
>> debug builds where CONFIG_RUST_OVERFLOW_CHECKS=y is set.
>>
>> Should this use `saturating_add(4)` or another form of checked math to handle
>> untrusted hardware inputs safely?
>
> This is cheap to fix, and indeed probably a good idea.
>
>>
>>> +    }
>>> +
>>> +    /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>>> +    ///
>>> +    /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
>>> +    #[expect(dead_code)]
>>> +    pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
>>> +        if packet.is_empty() {
>>> +            return Err(EINVAL);
>>> +        }
>>> +
>>> +        // Write message to EMEM at offset 0 (validates 4-byte alignment)
>>> +        self.write_emem(bar, 0, packet)?;
>>> +
>>> +        // Update queue pointers. TAIL points at the last DWORD written.
>>> +        let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
>>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> [Severity: High]
>> Does this create a silent failure if a 4-byte message is sent?
>>
>> If `packet.len()` is 4, `tail_offset` is calculated as 0. Both NV_PFSP_QUEUE_HEAD
>> and NV_PFSP_QUEUE_TAIL will then be written as 0. Because the FSP protocol treats
>> `HEAD == TAIL` as an empty queue, the FSP will ignore the message while the
>> function still returns `Ok(())`.
>>
>> Should this enforce a minimum packet length of 8 bytes or return an error
>> for 4-byte messages?
>
> This one looks bogus (looks like the claim that `HEAD == TAIL` means
> empty queue is hallucinated?).
>
>>
>>> +        bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
>>> +        bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
>>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> [Severity: High]
>> Is there a race condition with the FSP hardware due to the order of these
>> writes?
>>
>> By writing NV_PFSP_QUEUE_TAIL before NV_PFSP_QUEUE_HEAD, if QUEUE_HEAD was
>> non-zero from a prior transaction, we momentarily create a state where
>> `TAIL != HEAD`. The FSP may wake up immediately and read from the old, incorrect
>> QUEUE_HEAD offset before the driver can write 0 to QUEUE_HEAD.
>>
>> Should NV_PFSP_QUEUE_HEAD be initialized to 0 before writing the
>> NV_PFSP_QUEUE_TAIL doorbell?
>
> Here IIUC QUEUE_HEAD is always `0`, so this is a non-issue.

AFAICT FSP looks at the queue when HEAD is written to (so MMIO
triggered) which means that HEAD==0 && TAIL==0 is possible to represent
a 4 byte send, if you disallow sending a 0 byte message. This also means
that there's no race condition here as long as you write HEAD after
writing TAIL (and you must write HEAD as 0 even though it is always 0 to
actually send the message). OTOH, I don't think FSP can ever send back 4
bytes, since that is impossible distinguish from nothing on the CPU
side.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication
  2026-06-02  5:36   ` sashiko-bot
@ 2026-06-03  2:41     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  2:41 UTC (permalink / raw)
  To: sashiko-bot
  Cc: sashiko-reviews, John Hubbard, Danilo Krummrich, David Airlie,
	Bjorn Helgaas, Björn Roy Baron, Shashank Sharma,
	Andreas Hindborg, Alistair Popple, Boqun Feng, Gary Guo, Zhi Wang,
	Simona Vetter, Alice Ryhl, Eliot Courtney, Trevor Gross, nova-gpu,
	Alex Gaynor, Benno Lossin, LKML, Miguel Ojeda, Timur Tabi

On Tue Jun 2, 2026 at 2:36 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Wire-format protocol headers are defined using native `u32` instead of explicit little-endian types, causing incorrect byte-order on big-endian architectures.

Nova being little-endian only, we can ignore these.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-03  0:20     ` Alexandre Courbot
@ 2026-06-03  3:09       ` Timur Tabi
  2026-06-03  3:53         ` John Hubbard
  0 siblings, 1 reply; 65+ messages in thread
From: Timur Tabi @ 2026-06-03  3:09 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: a.hindborg@kernel.org, lossin@kernel.org, boqun.feng@gmail.com,
	Shashank Sharma, Zhi Wang, simona@ffwll.ch, tmgross@umich.edu,
	alex.gaynor@gmail.com, nova-gpu@lists.linux.dev,
	linux-kernel@vger.kernel.org, ojeda@kernel.org, John Hubbard,
	bjorn3_gh@protonmail.com, Eliot Courtney, airlied@gmail.com,
	aliceryhl@google.com, bhelgaas@google.com, dakr@kernel.org,
	gary@garyguo.net, Alistair Popple

On Wed, 2026-06-03 at 09:20 +0900, Alexandre Courbot wrote:
> 
> > How can we be certain that these lengths are really fixed?  I can add these length
> > restrictions to
> > the script that generates the files, but I want to be really certain that they will be
> > correct for
> > all time.
> 
> These are RSA-3072 keys/signatures, which are by convention 384 bytes
> (or 3072 bits), so I think we can rely on that assumption as long as the
> content type doesn't change. I am actually surprised that the comment
> says the section can be shorter - that doesn't sound possible with
> RSA-30272 .

I'm assuming that maybe they could be RSA-2048.

> Does the script have a way to validate the length? If so, I agree it is
> a good idea to do so, to flag the problem before it shows up in the
> driver.

Yes, I will add these length checks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-03  3:09       ` Timur Tabi
@ 2026-06-03  3:53         ` John Hubbard
  2026-06-03 16:52           ` Timur Tabi
  0 siblings, 1 reply; 65+ messages in thread
From: John Hubbard @ 2026-06-03  3:53 UTC (permalink / raw)
  To: Timur Tabi, Alexandre Courbot
  Cc: a.hindborg@kernel.org, lossin@kernel.org, boqun.feng@gmail.com,
	Shashank Sharma, Zhi Wang, simona@ffwll.ch, tmgross@umich.edu,
	alex.gaynor@gmail.com, nova-gpu@lists.linux.dev,
	linux-kernel@vger.kernel.org, ojeda@kernel.org,
	bjorn3_gh@protonmail.com, Eliot Courtney, airlied@gmail.com,
	aliceryhl@google.com, bhelgaas@google.com, dakr@kernel.org,
	gary@garyguo.net, Alistair Popple

On 6/2/26 8:09 PM, Timur Tabi wrote:
> On Wed, 2026-06-03 at 09:20 +0900, Alexandre Courbot wrote:
>>
>>> How can we be certain that these lengths are really fixed?  I can add these length
>>> restrictions to
>>> the script that generates the files, but I want to be really certain that they will be
>>> correct for
>>> all time.
>>
>> These are RSA-3072 keys/signatures, which are by convention 384 bytes
>> (or 3072 bits), so I think we can rely on that assumption as long as the
>> content type doesn't change. I am actually surprised that the comment
>> says the section can be shorter - that doesn't sound possible with
>> RSA-30272 .

I recall repeatedly hitting cases on one of the GPU architectures that
used a shorter length, but I should go back and verify that it wasn't
just a bringup "conclusion" that doesn't hold up after the dust settled.

> 
> I'm assuming that maybe they could be RSA-2048.
> 
>> Does the script have a way to validate the length? If so, I agree it is
>> a good idea to do so, to flag the problem before it shows up in the
>> driver.
> 
> Yes, I will add these length checks.

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-03  1:34     ` Alexandre Courbot
@ 2026-06-03  4:49       ` Eliot Courtney
  2026-06-03  5:00         ` Alexandre Courbot
  0 siblings, 1 reply; 65+ messages in thread
From: Eliot Courtney @ 2026-06-03  4:49 UTC (permalink / raw)
  To: Alexandre Courbot, Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Wed Jun 3, 2026 at 10:34 AM JST, Alexandre Courbot wrote:
> On Tue Jun 2, 2026 at 9:21 PM JST, Eliot Courtney wrote:
>> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>>> FSP communication uses a pair of non-circular queues in the FSP
>>> falcon's EMEM, one for messages from the driver to FSP and one for
>>> replies, with the driver polling for response data. Add the queue
>>> registers and the low-level helpers used by the higher-level FSP
>>> message layer.
>>>
>>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>>> ---
>>>  drivers/gpu/nova-core/falcon/fsp.rs | 61 ++++++++++++++++++++++++++++-
>>>  drivers/gpu/nova-core/regs.rs       | 21 ++++++++++
>>>  2 files changed, 80 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>>> index 6b057d958115..0ec1c55213bc 100644
>>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>>> @@ -112,7 +112,6 @@ impl Falcon<Fsp> {
>>>      ///
>>>      /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
>>>      /// if `offset` or the `data` length is not 4-byte aligned.
>>> -    #[expect(dead_code)]
>>>      fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>>>          if offset % 4 != 0 || data.len() % 4 != 0 {
>>>              return Err(EINVAL);
>>> @@ -131,7 +130,6 @@ fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>>>      ///
>>>      /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
>>>      /// `offset` or the `data` length is not 4-byte aligned.
>>> -    #[expect(dead_code)]
>>>      fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>>          if offset % 4 != 0 || data.len() % 4 != 0 {
>>>              return Err(EINVAL);
>>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>>  
>>>          Ok(())
>>>      }
>>> +
>>> +    /// Poll FSP for incoming data.
>>> +    ///
>>> +    /// Returns the size of available data in bytes, or 0 if no data is available.
>>> +    ///
>>> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
>>> +    /// message exchange, so `tail >= head` is always true when data is present.
>>> +    #[expect(dead_code)]
>>> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>>> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>>> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>>> +
>>> +        if head == tail {
>>> +            return 0;
>>> +        }
>>> +
>>> +        // TAIL points at last DWORD written, so add 4 to get total size
>>> +        tail.saturating_sub(head) + 4
>>> +    }
>>
>> In a later patch, `send_sync_fsp` polls this then calls `recv_msg`. But,
>> structurally it's possible to pass in any size to `recv_msg` and read
>> more than we are supposed to. What about having `recv_msg` do the
>> polling to get the size and return a KVec with the read out data,
>> instead of `send_sync_fsp`? `poll_msgq` could stay private and we can
>> make it public later if we need to.
>
> The issue I see with returning a `KVec` is that it imposes a dynamic
> allocation for every message. Granted, this is what the current code
> does, but now that we have this `&mut self` logic in place that
> guarantees exclusive access, we can also turn the receiving `KVec` into
> a member of `Fsp` and keep passing it as a mut reference to avoid that.

I don't have a strong opinion here, but is having a dynamic allocation
for every message an issue here? AFAICT, this is called once during
boot. But by having Falcon<Fsp> decide the allocation we make it
structurally impossible to provide a wrongly sized output buffer, and
remove the need for the caller to separately poll, even though all it
wants to do now is wait for the next message whatever size it is. What
do we gain by delegating the polling and allocation to the caller?

Anyway I don't really mind, I am just trying improve my understanding
of the conventions for how much we try to avoid allocation in the
kernel.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
  2026-06-03  4:49       ` Eliot Courtney
@ 2026-06-03  5:00         ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  5:00 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: John Hubbard, Danilo Krummrich, Timur Tabi, Alistair Popple,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Wed Jun 3, 2026 at 1:49 PM JST, Eliot Courtney wrote:
> On Wed Jun 3, 2026 at 10:34 AM JST, Alexandre Courbot wrote:
>> On Tue Jun 2, 2026 at 9:21 PM JST, Eliot Courtney wrote:
>>> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>>>> FSP communication uses a pair of non-circular queues in the FSP
>>>> falcon's EMEM, one for messages from the driver to FSP and one for
>>>> replies, with the driver polling for response data. Add the queue
>>>> registers and the low-level helpers used by the higher-level FSP
>>>> message layer.
>>>>
>>>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>>>> ---
>>>>  drivers/gpu/nova-core/falcon/fsp.rs | 61 ++++++++++++++++++++++++++++-
>>>>  drivers/gpu/nova-core/regs.rs       | 21 ++++++++++
>>>>  2 files changed, 80 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>>>> index 6b057d958115..0ec1c55213bc 100644
>>>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>>>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>>>> @@ -112,7 +112,6 @@ impl Falcon<Fsp> {
>>>>      ///
>>>>      /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
>>>>      /// if `offset` or the `data` length is not 4-byte aligned.
>>>> -    #[expect(dead_code)]
>>>>      fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>>>>          if offset % 4 != 0 || data.len() % 4 != 0 {
>>>>              return Err(EINVAL);
>>>> @@ -131,7 +130,6 @@ fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>>>>      ///
>>>>      /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
>>>>      /// `offset` or the `data` length is not 4-byte aligned.
>>>> -    #[expect(dead_code)]
>>>>      fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>>>          if offset % 4 != 0 || data.len() % 4 != 0 {
>>>>              return Err(EINVAL);
>>>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>>>  
>>>>          Ok(())
>>>>      }
>>>> +
>>>> +    /// Poll FSP for incoming data.
>>>> +    ///
>>>> +    /// Returns the size of available data in bytes, or 0 if no data is available.
>>>> +    ///
>>>> +    /// The FSP message queue is not circular. Pointers are reset to 0 after each
>>>> +    /// message exchange, so `tail >= head` is always true when data is present.
>>>> +    #[expect(dead_code)]
>>>> +    pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>>>> +        let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>>>> +        let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>>>> +
>>>> +        if head == tail {
>>>> +            return 0;
>>>> +        }
>>>> +
>>>> +        // TAIL points at last DWORD written, so add 4 to get total size
>>>> +        tail.saturating_sub(head) + 4
>>>> +    }
>>>
>>> In a later patch, `send_sync_fsp` polls this then calls `recv_msg`. But,
>>> structurally it's possible to pass in any size to `recv_msg` and read
>>> more than we are supposed to. What about having `recv_msg` do the
>>> polling to get the size and return a KVec with the read out data,
>>> instead of `send_sync_fsp`? `poll_msgq` could stay private and we can
>>> make it public later if we need to.
>>
>> The issue I see with returning a `KVec` is that it imposes a dynamic
>> allocation for every message. Granted, this is what the current code
>> does, but now that we have this `&mut self` logic in place that
>> guarantees exclusive access, we can also turn the receiving `KVec` into
>> a member of `Fsp` and keep passing it as a mut reference to avoid that.
>
> I don't have a strong opinion here, but is having a dynamic allocation
> for every message an issue here? AFAICT, this is called once during
> boot. But by having Falcon<Fsp> decide the allocation we make it
> structurally impossible to provide a wrongly sized output buffer, and
> remove the need for the caller to separately poll, even though all it
> wants to do now is wait for the next message whatever size it is. What
> do we gain by delegating the polling and allocation to the caller?
>
> Anyway I don't really mind, I am just trying improve my understanding
> of the conventions for how much we try to avoid allocation in the
> kernel.

The FSP is also used later as the system runs for power management
(although not by this patchset), which made me more wary of doing one
allocation per message, but from what I can see it is never called in a
critical path (only suspend/resume) so I was being overly cautious
indeed.

Not having to worry about the message buffer size does simplify the code
quite a bit on top of being structurally sound, so let's do that!

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
  2026-06-02  3:21 ` [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot John Hubbard
  2026-06-02  3:40   ` sashiko-bot
@ 2026-06-03  5:19   ` Alexandre Courbot
  1 sibling, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  5:19 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Timur Tabi, Alistair Popple, Eliot Courtney,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Build and send the Chain of Trust message to FSP, bundling the
> DMA-coherent boot parameters that FSP reads at boot time.
>
> Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware/fsp.rs         |   2 -
>  drivers/gpu/nova-core/fsp.rs                  | 140 +++++++++++++++++-
>  drivers/gpu/nova-core/fsp/hal.rs              |   1 -
>  drivers/gpu/nova-core/gsp.rs                  |   1 +
>  drivers/gpu/nova-core/gsp/fw.rs               |  64 ++++++++
>  .../gpu/nova-core/gsp/fw/r570_144/bindings.rs |  82 ++++++++++
>  drivers/gpu/nova-core/gsp/hal/gh100.rs        |  23 ++-
>  drivers/gpu/nova-core/mctp.rs                 |   2 -
>  8 files changed, 302 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/firmware/fsp.rs b/drivers/gpu/nova-core/firmware/fsp.rs
> index db61905eac9d..938aa3a3bad5 100644
> --- a/drivers/gpu/nova-core/firmware/fsp.rs
> +++ b/drivers/gpu/nova-core/firmware/fsp.rs
> @@ -39,10 +39,8 @@ pub(crate) struct FmcSignatures {
>  
>  pub(crate) struct FspFirmware {
>      /// FMC firmware image data (only the "image" ELF section).
> -    #[expect(dead_code)]
>      pub(crate) fmc_image: Coherent<[u8]>,
>      /// FMC firmware signatures.
> -    #[expect(dead_code)]
>      pub(crate) fmc_sigs: KBox<FmcSignatures>,
>  }
>  
> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index 67cf83aba83c..352ef7683cf2 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs
> @@ -9,8 +9,14 @@
>  
>  use kernel::{
>      device,
> +    dma::Coherent,
>      io::poll::read_poll_timeout,
>      prelude::*,
> +    ptr::{
> +        Alignable,
> +        Alignment, //
> +    },
> +    sizes::SZ_2M,
>      time::Delta,
>      transmute::{
>          AsBytes,
> @@ -24,8 +30,13 @@
>          fsp::Fsp as FspEngine,
>          Falcon, //
>      },
> -    firmware::fsp::FspFirmware,
> +    fb::FbLayout,
> +    firmware::fsp::{
> +        FmcSignatures,
> +        FspFirmware, //
> +    },
>      gpu::Chipset,
> +    gsp::GspFmcBootParams,
>      mctp::{
>          MctpHeader,
>          NvdmHeader,
> @@ -49,6 +60,35 @@ struct NvdmPayloadCommandResponse {
>      error_code: u32,
>  }
>  
> +/// NVDM (NVIDIA Device Management) CoT (Chain of Trust) payload, the main
> +/// message body sent to FSP for Chain of Trust boot.
> +#[repr(C, packed)]
> +#[derive(Clone, Copy)]
> +struct NvdmPayloadCot {
> +    version: u16,
> +    size: u16,
> +    gsp_fmc_sysmem_offset: u64,
> +    frts_sysmem_offset: u64,
> +    frts_sysmem_size: u32,
> +    frts_vidmem_offset: u64,
> +    frts_vidmem_size: u32,
> +    sigs: FmcSignatures,
> +    gsp_boot_args_sysmem_offset: u64,
> +}
> +
> +/// Complete FSP message structure with MCTP and NVDM headers.
> +#[repr(C, packed)]
> +#[derive(Clone, Copy)]
> +struct FspMessage {
> +    mctp_header: MctpHeader,
> +    nvdm_header: NvdmHeader,
> +    cot: NvdmPayloadCot,
> +}
> +
> +// SAFETY: `FspMessage` is `#[repr(C, packed)]` with no padding, so all of its
> +// bytes are initialized.
> +unsafe impl AsBytes for FspMessage {}
> +
>  /// Complete FSP response structure with MCTP and NVDM headers.
>  #[repr(C, packed)]
>  #[derive(Clone, Copy)]
> @@ -70,6 +110,44 @@ pub(crate) trait MessageToFsp: AsBytes {
>      const NVDM_TYPE: u32;

This should probably be a `NvdmType`.

>  }
>  
> +impl MessageToFsp for FspMessage {
> +    const NVDM_TYPE: u32 = NvdmType::Cot as u32;
> +}
> +
> +/// Bundled arguments for FMC boot via FSP Chain of Trust.
> +pub(crate) struct FmcBootArgs {
> +    chipset: Chipset,
> +    fmc_boot_params: Coherent<GspFmcBootParams>,
> +    resume: bool,
> +}
> +
> +impl FmcBootArgs {
> +    /// Builds FMC boot arguments, allocating the DMA-coherent boot parameter
> +    /// structure that FSP will read.
> +    pub(crate) fn new(
> +        dev: &device::Device<device::Bound>,
> +        chipset: Chipset,
> +        wpr_meta_addr: u64,
> +        libos_addr: u64,
> +        resume: bool,
> +    ) -> Result<Self> {
> +        let init = GspFmcBootParams::new(wpr_meta_addr, libos_addr);
> +
> +        Ok(Self {
> +            chipset,
> +            fmc_boot_params: Coherent::<GspFmcBootParams>::init(dev, GFP_KERNEL, init)?,
> +            resume,
> +        })
> +    }
> +
> +    /// DMA address of the FMC boot parameters, needed after boot for lockdown
> +    /// release polling.
> +    #[expect(dead_code)]
> +    pub(crate) fn boot_params_dma_handle(&self) -> u64 {
> +        self.fmc_boot_params.dma_handle()
> +    }
> +}
> +
>  /// FSP interface for Hopper/Blackwell GPUs.
>  ///
>  /// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
> @@ -77,7 +155,6 @@ pub(crate) trait MessageToFsp: AsBytes {
>  /// Chain of Trust boot.
>  pub(crate) struct Fsp {
>      falcon: Falcon<FspEngine>,
> -    #[expect(dead_code)]
>      fsp_fw: FspFirmware,
>  }
>  
> @@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
>          Ok(Fsp { falcon, fsp_fw })
>      }
>  
> +    /// Boots GSP FMC via FSP Chain of Trust.
> +    ///
> +    /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
> +    /// to FSP, and waits for the response.
> +    pub(crate) fn boot_fmc(
> +        &mut self,
> +        dev: &device::Device<device::Bound>,
> +        bar: &Bar0,
> +        fb_layout: &FbLayout,
> +        args: &FmcBootArgs,
> +    ) -> Result {
> +        dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
> +
> +        let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
> +        let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
> +
> +        // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
> +        let frts_offset = if !args.resume {
> +            let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
> +
> +            frts_reserved_size
> +                .align_up(Alignment::new::<SZ_2M>())
> +                .ok_or(EINVAL)?
> +        } else {
> +            0
> +        };
> +        let frts_size: u32 = if !args.resume {
> +            fb_layout.frts.len().try_into()?
> +        } else {
> +            0
> +        };
> +
> +        let msg = KBox::new(
> +            FspMessage {

We have the same problem as the FSP signatures, this will result in the
`FspMessage` (868 bytes) being created on the stack before being moved
into the `KBox`. And because `FspMessage` is `packed` we cannot use `init!`...

So I guess the only way for now will be to zero-init it and initialize
the fields à la C using `chain`. Let me do that in a constructor.

> +                mctp_header: MctpHeader::single_packet(),
> +                nvdm_header: NvdmHeader::new(NvdmType::Cot),
> +                cot: NvdmPayloadCot {
> +                    version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
> +                    size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
> +                        .map_err(|_| EINVAL)?,

This can be:

    size: num::usize_into_u16::<{ core::mem::size_of::<NvdmPayloadCot>() }>(),

to infer the value at build time and avoid the runtime error check.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
  2026-06-02  3:40   ` sashiko-bot
@ 2026-06-03  5:23     ` Alexandre Courbot
  0 siblings, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  5:23 UTC (permalink / raw)
  To: sashiko-bot
  Cc: sashiko-reviews, Danilo Krummrich, John Hubbard, LKML,
	Eliot Courtney, Zhi Wang, Björn Roy Baron, Gary Guo,
	David Airlie, Alex Gaynor, Boqun Feng, Miguel Ojeda,
	Alistair Popple, nova-gpu, Shashank Sharma, Simona Vetter,
	Bjorn Helgaas, Timur Tabi, Alice Ryhl, Trevor Gross, Benno Lossin,
	Andreas Hindborg

On Tue Jun 2, 2026 at 12:40 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] DMA Use-After-Free of FMC boot parameters during initialization
> - [High] Missing little-endian conversions in hardware payloads breaks big-endian hosts
> - [Medium] Kernel stack overflow risk due to large struct passed by value
> --
>
> commit 42c5f0e39c2bb0bc87a3f8df25554d04f00a3e69
> Author: John Hubbard <jhubbard@nvidia.com>
>
> gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
>     
> Build and send the Chain of Trust message to FSP, bundling the
> DMA-coherent boot parameters that FSP reads at boot time.
>
>> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
>> index 67cf83aba83c6..352ef7683cf2a 100644
>> --- a/drivers/gpu/nova-core/fsp.rs
>> +++ b/drivers/gpu/nova-core/fsp.rs
>
> [ ... ]
>
>> @@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
>>          Ok(Fsp { falcon, fsp_fw })
>>      }
>>  
>> +    /// Boots GSP FMC via FSP Chain of Trust.
>> +    ///
>> +    /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
>> +    /// to FSP, and waits for the response.
>> +    pub(crate) fn boot_fmc(
>> +        &mut self,
>> +        dev: &device::Device<device::Bound>,
>> +        bar: &Bar0,
>> +        fb_layout: &FbLayout,
>> +        args: &FmcBootArgs,
>> +    ) -> Result {
>> +        dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
>> +
>> +        let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
>> +        let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
>> +
>> +        // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
>> +        let frts_offset = if !args.resume {
>> +            let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
>> +
>> +            frts_reserved_size
>> +                .align_up(Alignment::new::<SZ_2M>())
>> +                .ok_or(EINVAL)?
>> +        } else {
>> +            0
>> +        };
>> +        let frts_size: u32 = if !args.resume {
>> +            fb_layout.frts.len().try_into()?
>> +        } else {
>> +            0
>> +        };
>> +
>> +        let msg = KBox::new(
>> +            FspMessage {
>> +                mctp_header: MctpHeader::single_packet(),
>> +                nvdm_header: NvdmHeader::new(NvdmType::Cot),
>> +                cot: NvdmPayloadCot {
> [Severity: Medium]
> Is it intentional to construct this ~868-byte FspMessage structure on the
> stack before boxing it? 
>
> This also dereferences and copies the 816-byte FmcSignatures structure
> (sigs: *self.fsp_fw.fmc_sigs). Doesn't this bypass the optimization in
> FspFirmware::extract_fmc_signatures() which avoided building FmcSignatures on
> the stack?
>
> Could this contribute to kernel stack overflows? Might it be safer to
> initialize it directly on the heap using KBox::init() or KBox::new_uninit()?

Ah, I've just flagged the same thing in my comment. Very nice that Sashiko is
able to catch these.

>
>> +                    version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
>> +                    size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
>> +                        .map_err(|_| EINVAL)?,
>> +                    gsp_fmc_sysmem_offset: fmc_addr,
>> +                    frts_sysmem_offset: 0,
>> +                    frts_sysmem_size: 0,
>> +                    frts_vidmem_offset: frts_offset,
>> +                    frts_vidmem_size: frts_size,
> [Severity: High]
> Will these native integer types be correctly interpreted by the little-endian
> RISC-V FSP/GSP processors if the host architecture is big-endian?
>
> Because the struct is laid out in big-endian byte order on big-endian hosts,
> won't the little-endian hardware swap the bytes when reading, resulting in
> incorrect sizes, versions, and DMA addresses? 
>
> Should explicit little-endian types or byte-swapping (like .to_le()) be used
> here to avoid breaking driver initialization on big-endian hosts?

Non-issue as Nova is little-endian only.

>
>> +                    sigs: *self.fsp_fw.fmc_sigs,
>> +                    gsp_boot_args_sysmem_offset: fmc_boot_params_addr,
>> +                },
>> +            },
>> +            GFP_KERNEL,
>> +        )?;
>> +
>> +        self.send_sync_fsp(dev, bar, &*msg)?;
>> +
>> +        dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
>> +        Ok(())
>> +    }
>
> [ ... ]
>
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 0c54e8bf4bb3a..558b37863f00a 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -934,3 +934,67 @@ fn new(cmdq: &Cmdq) -> impl Init<Self> + '_ {
>>          })
>>      }
>>  }
>> +
>> +#[repr(u32)]
>> +pub(crate) enum GspDmaTarget {
>> +    #[expect(dead_code)]
>> +    LocalFb = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB,
>> +    CoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM,
>> +    NoncoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM,
>> +}
>> +
>> +type GspAcrBootGspRmParams = bindings::GSP_ACR_BOOT_GSP_RM_PARAMS;
>> +
>> +impl GspAcrBootGspRmParams {
>> +    fn new(target: GspDmaTarget, wpr_meta_addr: u64) -> impl Init<Self> {
>> +        #[allow(non_snake_case)]
>> +        let params = init!(Self {
>> +            target: target as u32,
>> +            gspRmDescSize: num::usize_into_u32::<{ size_of::<GspFwWprMeta>() }>(),
>> +            gspRmDescOffset: wpr_meta_addr,
>> +            bIsGspRmBoot: 1,
> [Severity: High]
> Similarly to NvdmPayloadCot above, will these native integer types be
> correctly interpreted by little-endian hardware on big-endian hosts?
>
> Does this struct require explicit little-endian types or byte-swapping to
> avoid accessing invalid physical memory on big-endian architectures?

Ditto.

>
>> +            wprCarveoutOffset: 0,
>> +            wprCarveoutSize: 0,
>> +            __bindgen_padding_0: Default::default(),
>> +        });
>> +
>> +        params
>> +    }
>> +}
>
> [ ... ]
>
>> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
>> index b25970dd4561d..f41f3fea15ff2 100644
>> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
>> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
>
> [ ... ]
>
>> @@ -39,17 +42,27 @@ impl GspHal for Gh100 {
>>      /// the GSP boot internally - no manual GSP reset/boot is needed.
>>      fn boot<'a>(
>>          &self,
>> -        _gsp: &'a Gsp,
>> +        gsp: &'a Gsp,
>>          dev: &'a device::Device<device::Bound>,
>>          bar: &'a Bar0,
>>          chipset: Chipset,
>> -        _fb_layout: &FbLayout,
>> -        _wpr_meta: &Coherent<GspFwWprMeta>,
>> +        fb_layout: &FbLayout,
>> +        wpr_meta: &Coherent<GspFwWprMeta>,
>>          _gsp_falcon: &'a Falcon<GspEngine>,
>>          _sec2_falcon: &'a Falcon<Sec2>,
>>      ) -> Result<BootUnloadGuard<'a>> {
>>          let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
>> -        let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
>> +        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
>> +
>> +        let args = FmcBootArgs::new(
>> +            dev,
>> +            chipset,
>> +            wpr_meta.dma_handle(),
>> +            gsp.libos.dma_handle(),
>> +            false,
>> +        )?;
>> +
>> +        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
>>  
>>          Err(ENOTSUPP)
> [Severity: High]
> Does returning an error here immediately drop the args variable, thereby
> freeing its DMA-coherent memory before the FSP finishes reading it?
>
> Since the FSP is actively booting asynchronously, wouldn't this cause a
> hardware use-after-free of the boot parameters, leading to IOMMU faults or
> undefined behavior?
>
> Note: I see this is fixed in a later commit 'gpu: nova-core: gsp: enable FSP
> boot path' which changes the return to Ok(unload_guard) and properly handles
> the required hardware delays.

`boot_fmc` is synchronous, so there should not be a use-after-free here.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling
  2026-06-02  3:21 ` [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling John Hubbard
  2026-06-02  3:38   ` sashiko-bot
@ 2026-06-03  5:45   ` Alexandre Courbot
  1 sibling, 0 replies; 65+ messages in thread
From: Alexandre Courbot @ 2026-06-03  5:45 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Timur Tabi, Alistair Popple, Eliot Courtney,
	Shashank Sharma, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nova-gpu, LKML

On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> On Hopper and Blackwell, FSP boots GSP with hardware lockdown enabled.
> After FSP Chain of Trust completes, the driver must poll for lockdown
> release before proceeding with GSP initialization. Add the register
> bit and helper functions needed for this polling.
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/fsp.rs           |  1 -
>  drivers/gpu/nova-core/gsp/hal/gh100.rs | 90 +++++++++++++++++++++++++-
>  drivers/gpu/nova-core/regs.rs          |  2 +
>  3 files changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index 352ef7683cf2..aec991afa669 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs
> @@ -142,7 +142,6 @@ pub(crate) fn new(
>  
>      /// DMA address of the FMC boot parameters, needed after boot for lockdown
>      /// release polling.
> -    #[expect(dead_code)]
>      pub(crate) fn boot_params_dma_handle(&self) -> u64 {
>          self.fmc_boot_params.dma_handle()

Since this is a short method, let's introduce it in this patch to reduce
the amount of temporary dead code.

>      }
> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index f41f3fea15ff..02aec5281389 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> @@ -5,7 +5,13 @@
>  
>  use kernel::{
>      device,
> -    dma::Coherent, //
> +    dma::Coherent,
> +    io::{
> +        poll::read_poll_timeout,
> +        register::WithBase,
> +        Io, //
> +    },
> +    time::Delta,
>  };
>  
>  use crate::{
> @@ -31,8 +37,85 @@
>          Gsp,
>          GspFwWprMeta, //
>      },
> +    regs,
>  };
>  
> +/// GSP lockdown pattern written by firmware to mbox0 while RISC-V branch privilege
> +/// lockdown is active. The low byte varies, the upper 24 bits are fixed.
> +const GSP_LOCKDOWN_PATTERN: u32 = 0xbadf_4100;
> +const GSP_LOCKDOWN_MASK: u32 = 0xffff_ff00;
> +
> +/// GSP falcon mailbox state, used to track lockdown release status.
> +struct GspMbox {
> +    mbox0: u32,
> +    mbox1: u32,
> +}
> +
> +impl GspMbox {
> +    /// Reads both mailboxes from the GSP falcon.
> +    fn read(gsp_falcon: &Falcon<GspEngine>, bar: &Bar0) -> Self {
> +        Self {
> +            mbox0: gsp_falcon.read_mailbox0(bar),
> +            mbox1: gsp_falcon.read_mailbox1(bar),
> +        }
> +    }
> +
> +    /// Returns `true` if the lockdown pattern is present in `mbox0`.
> +    fn is_locked_down(&self) -> bool {
> +        (self.mbox0 & GSP_LOCKDOWN_MASK) == GSP_LOCKDOWN_PATTERN
> +    }
> +
> +    /// Combines mailbox0 and mailbox1 into a 64-bit address.
> +    fn combined_addr(&self) -> u64 {
> +        (u64::from(self.mbox1) << 32) | u64::from(self.mbox0)
> +    }
> +
> +    /// Returns `true` if GSP lockdown has been released.
> +    ///
> +    /// Checks the lockdown pattern, validates the boot params address,
> +    /// and verifies the `HWCFG2` lockdown bit is clear.
> +    fn lockdown_released(&self, bar: &Bar0, fmc_boot_params_addr: u64) -> bool {
> +        if self.is_locked_down() {
> +            return false;
> +        }
> +
> +        if self.mbox0 != 0 && self.combined_addr() != fmc_boot_params_addr {
> +            return true;
> +        }
> +
> +        let hwcfg2 = bar.read(regs::NV_PFALCON_FALCON_HWCFG2::of::<GspEngine>());
> +        !hwcfg2.riscv_br_priv_lockdown()

For this I would prefer adding a method to `Falcon<Gsp>`, as it allows
us to keep `NV_PFALCON_FALCON_HWCFG2` local to the `falcon` module (in
prevision of moving all register definitions to the appropriate module).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction
  2026-06-03  3:53         ` John Hubbard
@ 2026-06-03 16:52           ` Timur Tabi
  0 siblings, 0 replies; 65+ messages in thread
From: Timur Tabi @ 2026-06-03 16:52 UTC (permalink / raw)
  To: Alexandre Courbot, John Hubbard
  Cc: lossin@kernel.org, a.hindborg@kernel.org, boqun.feng@gmail.com,
	Shashank Sharma, Zhi Wang, simona@ffwll.ch, tmgross@umich.edu,
	alex.gaynor@gmail.com, nova-gpu@lists.linux.dev,
	linux-kernel@vger.kernel.org, ojeda@kernel.org,
	bjorn3_gh@protonmail.com, Eliot Courtney, airlied@gmail.com,
	aliceryhl@google.com, bhelgaas@google.com, dakr@kernel.org,
	gary@garyguo.net, Alistair Popple

On Tue, 2026-06-02 at 20:53 -0700, John Hubbard wrote:
> I recall repeatedly hitting cases on one of the GPU architectures that
> used a shorter length, but I should go back and verify that it wasn't
> just a bringup "conclusion" that doesn't hold up after the dust settled.

So the lengths can actually vary by quite a bit:

gh100 len(ucode_hash)=48 len(ucode_sig)=384 len(ucode_pkey)=384
gb100 len(ucode_hash)=48 len(ucode_sig)=96 len(ucode_pkey)=97
gb10b len(ucode_hash)=48 len(ucode_sig)=384 len(ucode_pkey)=384
gb202 len(ucode_hash)=48 len(ucode_sig)=96 len(ucode_pkey)=97
gr100 len(ucode_hash)=48 len(ucode_sig)=96 len(ucode_pkey)=97

I'll ask around internally, but I think it's safe to say:

1) The hash must be 48 bytes
2) The signature could be 96 or 384 bytes
3) The pkey could be 97 (?????) or 384 bytes


^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2026-06-03 16:52 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
2026-06-02  3:20 ` [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture John Hubbard
2026-06-02  6:40   ` Eliot Courtney
2026-06-02  3:20 ` [PATCH v12 02/22] gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror John Hubbard
2026-06-02  3:20 ` [PATCH v12 03/22] gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size John Hubbard
2026-06-02  3:20 ` [PATCH v12 04/22] gpu: nova-core: Hopper/Blackwell: larger non-WPR heap John Hubbard
2026-06-02  3:20 ` [PATCH v12 05/22] gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap John Hubbard
2026-06-02  3:20 ` [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers John Hubbard
2026-06-02  3:30   ` sashiko-bot
2026-06-02  8:00     ` Alexandre Courbot
2026-06-02  7:12   ` Eliot Courtney
2026-06-02  8:26     ` Alexandre Courbot
2026-06-02  3:20 ` [PATCH v12 07/22] gpu: nova-core: don't assume 64-bit firmware images John Hubbard
2026-06-02  3:20 ` [PATCH v12 08/22] gpu: nova-core: add support for 32-bit " John Hubbard
2026-06-02  3:20 ` [PATCH v12 09/22] gpu: nova-core: add auto-detection of 32-bit, 64-bit " John Hubbard
2026-06-02  3:20 ` [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub John Hubbard
2026-06-02  6:50   ` Eliot Courtney
2026-06-02  3:20 ` [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image John Hubbard
2026-06-02  7:18   ` Eliot Courtney
2026-06-02  3:21 ` [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting John Hubbard
2026-06-02  7:56   ` Eliot Courtney
2026-06-02  8:22     ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
2026-06-02  3:32   ` sashiko-bot
2026-06-02  7:56     ` Alexandre Courbot
2026-06-02  8:11   ` Eliot Courtney
2026-06-02  8:28     ` Alexandre Courbot
2026-06-03  0:04   ` Timur Tabi
2026-06-03  0:20     ` Alexandre Courbot
2026-06-03  3:09       ` Timur Tabi
2026-06-03  3:53         ` John Hubbard
2026-06-03 16:52           ` Timur Tabi
2026-06-02  3:21 ` [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations John Hubbard
2026-06-02 11:42   ` Eliot Courtney
2026-06-02 14:55     ` Alexandre Courbot
2026-06-02 15:02   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
2026-06-02  3:33   ` sashiko-bot
2026-06-03  1:14     ` Alexandre Courbot
2026-06-03  1:41       ` Eliot Courtney
2026-06-02 12:21   ` Eliot Courtney
2026-06-03  1:34     ` Alexandre Courbot
2026-06-03  4:49       ` Eliot Courtney
2026-06-03  5:00         ` Alexandre Courbot
2026-06-03  1:00   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication John Hubbard
2026-06-02  5:36   ` sashiko-bot
2026-06-03  2:41     ` Alexandre Courbot
2026-06-02 12:53   ` Eliot Courtney
2026-06-02  3:21 ` [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging John Hubbard
2026-06-02  3:35   ` sashiko-bot
2026-06-02  3:21 ` [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version John Hubbard
2026-06-02 12:55   ` Eliot Courtney
2026-06-02  3:21 ` [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot John Hubbard
2026-06-02  3:40   ` sashiko-bot
2026-06-03  5:23     ` Alexandre Courbot
2026-06-03  5:19   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling John Hubbard
2026-06-02  3:38   ` sashiko-bot
2026-06-03  5:45   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 21/22] gpu: nova-core: add non-sec2 unload path John Hubbard
2026-06-02  3:21 ` [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path John Hubbard
2026-06-02  3:38   ` sashiko-bot
2026-06-02 12:38 ` [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support Danilo Krummrich
2026-06-02 13:37 ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox