[PATCH v4 00/11] gpu: nova-core: add Turing support

rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/11] gpu: nova-core: add Turing support
@ 2025-12-18  3:29 Timur Tabi
  2025-12-18  3:29 ` [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
                   ` (13 more replies)
  0 siblings, 14 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

This patch set adds basic support for pre-booting GSP-RM
on Turing.

There is also partial support for GA100, but it's currently not
fully implemented.  GA100 is considered experimental in Nouveau,
and so it hasn't been tested with NovaCore either.

That latest linux-firmware.git is required because it contains the
Generic Bootloader image that has not yet been propogated to
distros.

Summary of changes:

1. Introduce non-secure IMEM support.  For GA102 and later, only secure IMEM
is used.
2. Because of non-secure IMEM, Turing booter firmware images need some of
the headers parsed differently for stuff like the load target address.
3. Add support the tu10x firmware signature section in the ELF image.
4. Add several new registers used only on Turing.
5. Some functions that were considered generic Falcon operations are
actually different on Turing vs GA102+, so they are moved to the HAL.
6. The FRTS FWSEC firmware in VBIOS uses a different version of the
descriptor header.
7. On Turing/GA100 LIBOS args struct needs to have its 'size' field
aligned to 4KB.  So pad the struct to make it 4K.
8. Turing Falcons do not support DMA, so PIO is used to copy images
into IMEM/DMEM.

Changes from v3:
1. Fixed Rust formatting issues
2. Misc style changes as suggested by Alex
3. Merged regs.rs changes into commits that use them
4. Used Zeroable::zeroed()

Timur Tabi (11):
  gpu: nova-core: rename Imem to ImemSecure
  gpu: nova-core: add ImemNonSecure section infrastructure
  gpu: nova-core: support header parsing on Turing/GA100
  gpu: nova-core: add support for Turing/GA100 fwsignature
  gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
  gpu: nova-core: move some functions into the HAL
  gpu: nova-core: Add basic Turing HAL
  gpu: nova-core: add Falcon HAL method supports_dma()
  gpu: nova-core: add FalconUCodeDescV2 support
  gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  gpu: nova-core: add PIO support for loading firmware images

 drivers/gpu/nova-core/falcon.rs           | 246 +++++++++++++++++-----
 drivers/gpu/nova-core/falcon/hal.rs       |  17 ++
 drivers/gpu/nova-core/falcon/hal/ga102.rs |  47 +++++
 drivers/gpu/nova-core/falcon/hal/tu102.rs |  77 +++++++
 drivers/gpu/nova-core/firmware.rs         | 181 +++++++++++++++-
 drivers/gpu/nova-core/firmware/booter.rs  |  43 +++-
 drivers/gpu/nova-core/firmware/fwsec.rs   | 215 ++++++++++++++++---
 drivers/gpu/nova-core/firmware/gsp.rs     |   4 +-
 drivers/gpu/nova-core/gsp/boot.rs         |  10 +-
 drivers/gpu/nova-core/gsp/fw.rs           |  24 ++-
 drivers/gpu/nova-core/regs.rs             |  53 +++++
 drivers/gpu/nova-core/vbios.rs            |  75 ++++---
 12 files changed, 864 insertions(+), 128 deletions(-)
 create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs

base-commit: 60c7398bded2e11f0db40a409a241b8be5910ee2
prerequisite-patch-id: a3e23917ec535263604af95194422382f14c2f21
-- 
2.52.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-31 19:29   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 02/11] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

Rename FalconMem::Imem to ImemSecure to indicate that it references
Secure Instruction Memory.  This change has no functional impact.

On Falcon cores, pages in instruction memory can be tagged as Secure
or Non-Secure.  For GA102 and later, only Secure is used, which is why
FalconMem::Imem seems appropriate.  However, Turing firmware images
can also contain non-secure sections, and so FalconMem needs to support
that.  By renaming Imem to ImemSec now, future patches for Turing support
will be simpler.

Nouveau uses the term "IMEM" to refer both to the Instruction Memory
block on Falcon cores as well as to the images of secure firmware
uploaded to part of IMEM.  OpenRM uses the terms "ImemSec" and "ImemNs"
instead, and uses "IMEM" just to refer to the physical memory device.

Renaming these terms allows us to align with OpenRM, avoid confusion
between IMEM and ImemSec, and makes future patches simpler.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs          | 20 +++++++++++++-------
 drivers/gpu/nova-core/firmware/booter.rs | 12 ++++++------
 drivers/gpu/nova-core/firmware/fwsec.rs  |  2 +-
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 82c661aef594..0965cb50568b 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -237,8 +237,8 @@ fn from(value: PeregrineCoreSelect) -> Self {
 /// Different types of memory present in a falcon core.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub(crate) enum FalconMem {
-    /// Instruction Memory.
-    Imem,
+    /// Secure Instruction Memory.
+    ImemSecure,
     /// Data Memory.
     Dmem,
 }
@@ -345,8 +345,8 @@ pub(crate) struct FalconBromParams {
 
 /// Trait for providing load parameters of falcon firmwares.
 pub(crate) trait FalconLoadParams {
-    /// Returns the load parameters for `IMEM`.
-    fn imem_load_params(&self) -> FalconLoadTarget;
+    /// Returns the load parameters for Secure `IMEM`.
+    fn imem_sec_load_params(&self) -> FalconLoadTarget;
 
     /// Returns the load parameters for `DMEM`.
     fn dmem_load_params(&self) -> FalconLoadTarget;
@@ -457,7 +457,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
         //
         // For DMEM we can fold the start offset into the DMA handle.
         let (src_start, dma_start) = match target_mem {
-            FalconMem::Imem => (load_offsets.src_start, fw.dma_handle()),
+            FalconMem::ImemSecure => (load_offsets.src_start, fw.dma_handle()),
             FalconMem::Dmem => (
                 0,
                 fw.dma_handle_with_offset(load_offsets.src_start.into_safe_cast())?,
@@ -508,7 +508,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
 
         let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
             .set_size(DmaTrfCmdSize::Size256B)
-            .set_imem(target_mem == FalconMem::Imem)
+            .set_imem(target_mem == FalconMem::ImemSecure)
             .set_sec(if sec { 1 } else { 0 });
 
         for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
@@ -543,7 +543,13 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
                 .set_mem_type(FalconFbifMemType::Physical)
         });
 
-        self.dma_wr(bar, fw, FalconMem::Imem, fw.imem_load_params(), true)?;
+        self.dma_wr(
+            bar,
+            fw,
+            FalconMem::ImemSecure,
+            fw.imem_sec_load_params(),
+            true,
+        )?;
         self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
 
         self.hal.program_brom(self, bar, &fw.brom_params())?;
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index f107f753214a..096cd01dbc9d 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -251,8 +251,8 @@ impl<'a> FirmwareSignature<BooterFirmware> for BooterSignature<'a> {}
 
 /// The `Booter` loader firmware, responsible for loading the GSP.
 pub(crate) struct BooterFirmware {
-    // Load parameters for `IMEM` falcon memory.
-    imem_load_target: FalconLoadTarget,
+    // Load parameters for Secure `IMEM` falcon memory.
+    imem_sec_load_target: FalconLoadTarget,
     // Load parameters for `DMEM` falcon memory.
     dmem_load_target: FalconLoadTarget,
     // BROM falcon parameters.
@@ -354,7 +354,7 @@ pub(crate) fn new(
         };
 
         Ok(Self {
-            imem_load_target: FalconLoadTarget {
+            imem_sec_load_target: FalconLoadTarget {
                 src_start: app0.offset,
                 dst_start: 0,
                 len: app0.len,
@@ -371,8 +371,8 @@ pub(crate) fn new(
 }
 
 impl FalconLoadParams for BooterFirmware {
-    fn imem_load_params(&self) -> FalconLoadTarget {
-        self.imem_load_target.clone()
+    fn imem_sec_load_params(&self) -> FalconLoadTarget {
+        self.imem_sec_load_target.clone()
     }
 
     fn dmem_load_params(&self) -> FalconLoadTarget {
@@ -384,7 +384,7 @@ fn brom_params(&self) -> FalconBromParams {
     }
 
     fn boot_addr(&self) -> u32 {
-        self.imem_load_target.src_start
+        self.imem_sec_load_target.src_start
     }
 }
 
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index b28e34d279f4..6a2f5a0d4b15 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -224,7 +224,7 @@ pub(crate) struct FwsecFirmware {
 }
 
 impl FalconLoadParams for FwsecFirmware {
-    fn imem_load_params(&self) -> FalconLoadTarget {
+    fn imem_sec_load_params(&self) -> FalconLoadTarget {
         FalconLoadTarget {
             src_start: 0,
             dst_start: self.desc.imem_phys_base,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 02/11] gpu: nova-core: add ImemNonSecure section infrastructure
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
  2025-12-18  3:29 ` [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-31 19:35   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 03/11] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

The GSP booter firmware in Turing and GA100 includes a third memory
section called ImemNonSecure, which is non-secure IMEM.  This section
must be loaded separately from DMEM and secure IMEM, but only if it
actually exists.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs          | 20 ++++++++++++++++++--
 drivers/gpu/nova-core/firmware/booter.rs |  9 +++++++++
 drivers/gpu/nova-core/firmware/fwsec.rs  |  5 +++++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 0965cb50568b..b92152277d00 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -239,6 +239,8 @@ fn from(value: PeregrineCoreSelect) -> Self {
 pub(crate) enum FalconMem {
     /// Secure Instruction Memory.
     ImemSecure,
+    /// Non-Secure Instruction Memory.
+    ImemNonSecure,
     /// Data Memory.
     Dmem,
 }
@@ -348,6 +350,10 @@ pub(crate) trait FalconLoadParams {
     /// Returns the load parameters for Secure `IMEM`.
     fn imem_sec_load_params(&self) -> FalconLoadTarget;
 
+    /// Returns the load parameters for Non-Secure `IMEM`,
+    /// used only on Turing and GA100.
+    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget>;
+
     /// Returns the load parameters for `DMEM`.
     fn dmem_load_params(&self) -> FalconLoadTarget;
 
@@ -457,7 +463,9 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
         //
         // For DMEM we can fold the start offset into the DMA handle.
         let (src_start, dma_start) = match target_mem {
-            FalconMem::ImemSecure => (load_offsets.src_start, fw.dma_handle()),
+            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
+                (load_offsets.src_start, fw.dma_handle())
+            }
             FalconMem::Dmem => (
                 0,
                 fw.dma_handle_with_offset(load_offsets.src_start.into_safe_cast())?,
@@ -508,7 +516,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
 
         let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
             .set_size(DmaTrfCmdSize::Size256B)
-            .set_imem(target_mem == FalconMem::ImemSecure)
+            .set_imem(target_mem != FalconMem::Dmem)
             .set_sec(if sec { 1 } else { 0 });
 
         for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
@@ -552,6 +560,14 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
         )?;
         self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
 
+        if let Some(nmem) = fw.imem_ns_load_params() {
+            // This code should never actual get executed, because the Non-Secure
+            // section only exists on firmware used by Turing and GA100, and
+            // those platforms do not use DMA.  But we include this code for
+            // consistency.
+            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
+        }
+
         self.hal.program_brom(self, bar, &fw.brom_params())?;
 
         // Set `BootVec` to start of non-secure code.
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index 096cd01dbc9d..1b98bb47424c 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -253,6 +253,9 @@ impl<'a> FirmwareSignature<BooterFirmware> for BooterSignature<'a> {}
 pub(crate) struct BooterFirmware {
     // Load parameters for Secure `IMEM` falcon memory.
     imem_sec_load_target: FalconLoadTarget,
+    // Load parameters for Non-Secure `IMEM` falcon memory,
+    // used only on Turing and GA100
+    imem_ns_load_target: Option<FalconLoadTarget>,
     // Load parameters for `DMEM` falcon memory.
     dmem_load_target: FalconLoadTarget,
     // BROM falcon parameters.
@@ -359,6 +362,8 @@ pub(crate) fn new(
                 dst_start: 0,
                 len: app0.len,
             },
+            // Exists only in the booter image for Turing and GA100
+            imem_ns_load_target: None,
             dmem_load_target: FalconLoadTarget {
                 src_start: load_hdr.os_data_offset,
                 dst_start: 0,
@@ -375,6 +380,10 @@ fn imem_sec_load_params(&self) -> FalconLoadTarget {
         self.imem_sec_load_target.clone()
     }
 
+    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
+        self.imem_ns_load_target.clone()
+    }
+
     fn dmem_load_params(&self) -> FalconLoadTarget {
         self.dmem_load_target.clone()
     }
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 6a2f5a0d4b15..e4009faba6c5 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -232,6 +232,11 @@ fn imem_sec_load_params(&self) -> FalconLoadTarget {
         }
     }
 
+    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
+        // Only used on Turing and GA100, so return None for now
+        None
+    }
+
     fn dmem_load_params(&self) -> FalconLoadTarget {
         FalconLoadTarget {
             src_start: self.desc.imem_load_size,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 03/11] gpu: nova-core: support header parsing on Turing/GA100
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
  2025-12-18  3:29 ` [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
  2025-12-18  3:29 ` [PATCH v4 02/11] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-31 19:58   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

The Turing/GA100 version of Booter is slightly different from the
GA102+ version.  The headers are the same, but different fields of
the headers are used to identify the IMEM section.  In addition,
there is an NMEM section on Turing/GA100.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware/booter.rs | 28 ++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index 1b98bb47424c..86556cee8e67 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -356,14 +356,30 @@ pub(crate) fn new(
             }
         };
 
+        // There are two versions of Booter, one for Turing/GA100, and another for
+        // GA102+.  The extraction of the IMEM sections differs between the two
+        // versions.  Unfortunately, the file names are the same, and the headers
+        // don't indicate the versions.  The only way to differentiate is by the Chipset.
+        let (imem_sec_dst_start, imem_ns_load_target) = if chipset <= Chipset::GA100 {
+            (
+                app0.offset,
+                Some(FalconLoadTarget {
+                    src_start: 0,
+                    dst_start: load_hdr.os_code_offset,
+                    len: load_hdr.os_code_size,
+                }),
+            )
+        } else {
+            (0, None)
+        };
+
         Ok(Self {
             imem_sec_load_target: FalconLoadTarget {
                 src_start: app0.offset,
-                dst_start: 0,
+                dst_start: imem_sec_dst_start,
                 len: app0.len,
             },
-            // Exists only in the booter image for Turing and GA100
-            imem_ns_load_target: None,
+            imem_ns_load_target,
             dmem_load_target: FalconLoadTarget {
                 src_start: load_hdr.os_data_offset,
                 dst_start: 0,
@@ -393,7 +409,11 @@ fn brom_params(&self) -> FalconBromParams {
     }
 
     fn boot_addr(&self) -> u32 {
-        self.imem_sec_load_target.src_start
+        if let Some(ns_target) = &self.imem_ns_load_target {
+            ns_target.dst_start
+        } else {
+            self.imem_sec_load_target.src_start
+        }
     }
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (2 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 03/11] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-31 19:28   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

Turing and GA100 share the same GSP-RM firmware binary, and the
signature ELF section is labeled ".fwsignature_tu10x".

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware/gsp.rs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
index 0549805282ab..112488216bff 100644
--- a/drivers/gpu/nova-core/firmware/gsp.rs
+++ b/drivers/gpu/nova-core/firmware/gsp.rs
@@ -163,9 +163,11 @@ pub(crate) fn new<'a, 'b>(
         let fw_section = elf::elf64_section(fw.data(), ".fwimage").ok_or(EINVAL)?;
 
         let sigs_section = match chipset.arch() {
+            Architecture::Turing => ".fwsignature_tu10x",
+            // GA100 uses the same firmware as Turing
+            Architecture::Ampere if chipset == Chipset::GA100 => ".fwsignature_tu10x",
             Architecture::Ampere => ".fwsignature_ga10x",
             Architecture::Ada => ".fwsignature_ad10x",
-            _ => return Err(ENOTSUPP),
         };
         let signatures = elf::elf64_section(fw.data(), sigs_section)
             .ok_or(EINVAL)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (3 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-18 11:24   ` Alexandre Courbot
  2025-12-31 21:35   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL Timur Tabi
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

The with_falcon_mem() method initializes the 'imem' and 'sec' fields of
the NV_PFALCON_FALCON_DMATRFCMD register based on the value of
the FalconMem type.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs | 16 ++++------------
 drivers/gpu/nova-core/regs.rs   |  9 +++++++++
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index b92152277d00..44a1a531a361 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -454,7 +454,6 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
         fw: &F,
         target_mem: FalconMem,
         load_offsets: FalconLoadTarget,
-        sec: bool,
     ) -> Result {
         const DMA_LEN: u32 = 256;
 
@@ -516,8 +515,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
 
         let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
             .set_size(DmaTrfCmdSize::Size256B)
-            .set_imem(target_mem != FalconMem::Dmem)
-            .set_sec(if sec { 1 } else { 0 });
+            .with_falcon_mem(target_mem);
 
         for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
             // Perform a transfer of size `DMA_LEN`.
@@ -551,21 +549,15 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
                 .set_mem_type(FalconFbifMemType::Physical)
         });
 
-        self.dma_wr(
-            bar,
-            fw,
-            FalconMem::ImemSecure,
-            fw.imem_sec_load_params(),
-            true,
-        )?;
-        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
+        self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params())?;
+        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params())?;
 
         if let Some(nmem) = fw.imem_ns_load_params() {
             // This code should never actual get executed, because the Non-Secure
             // section only exists on firmware used by Turing and GA100, and
             // those platforms do not use DMA.  But we include this code for
             // consistency.
-            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
+            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem)?;
         }
 
         self.hal.program_brom(self, bar, &fw.brom_params())?;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 82cc6c0790e5..c049f5aaf2f2 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -16,6 +16,7 @@
         FalconCoreRevSubversion,
         FalconFbifMemType,
         FalconFbifTarget,
+        FalconMem,
         FalconModSelAlgo,
         FalconSecurityModel,
         PFalcon2Base,
@@ -325,6 +326,14 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
     16:16   set_dmtag as u8;
 });
 
+impl NV_PFALCON_FALCON_DMATRFCMD {
+    /// Programs the 'imem' and 'sec' fields for the given FalconMem
+    pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
+        self.set_imem(mem != FalconMem::Dmem)
+            .set_sec(if mem == FalconMem::ImemSecure { 1 } else { 0 })
+    }
+}
+
 register!(NV_PFALCON_FALCON_DMATRFFBOFFS @ PFalconBase[0x0000011c] {
     31:0    offs as u32;
 });
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (4 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-18 11:25   ` Alexandre Courbot
  2025-12-31 22:07   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 07/11] gpu: nova-core: Add basic Turing HAL Timur Tabi
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

A few Falcon methods are actually GPU-specific, so move them
into the HAL.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs           | 45 ++---------------------
 drivers/gpu/nova-core/falcon/hal.rs       | 10 +++++
 drivers/gpu/nova-core/falcon/hal/ga102.rs | 43 ++++++++++++++++++++++
 3 files changed, 56 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 44a1a531a361..6b54c0ca458a 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -13,7 +13,6 @@
     prelude::*,
     sync::aref::ARef,
     time::{
-        delay::fsleep,
         Delta, //
     },
 };
@@ -394,48 +393,11 @@ pub(crate) fn dma_reset(&self, bar: &Bar0) {
         regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, &E::ID);
     }
 
-    /// Wait for memory scrubbing to complete.
-    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
-        // TIMEOUT: memory scrubbing should complete in less than 20ms.
-        read_poll_timeout(
-            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
-            |r| r.mem_scrubbing_done(),
-            Delta::ZERO,
-            Delta::from_millis(20),
-        )
-        .map(|_| ())
-    }
-
-    /// Reset the falcon engine.
-    fn reset_eng(&self, bar: &Bar0) -> Result {
-        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
-
-        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
-        // RESET_READY so a non-failing timeout is used.
-        let _ = read_poll_timeout(
-            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
-            |r| r.reset_ready(),
-            Delta::ZERO,
-            Delta::from_micros(150),
-        );
-
-        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
-
-        // TIMEOUT: falcon engine should not take more than 10us to reset.
-        fsleep(Delta::from_micros(10));
-
-        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
-
-        self.reset_wait_mem_scrubbing(bar)?;
-
-        Ok(())
-    }
-
     /// Reset the controller, select the falcon core, and wait for memory scrubbing to complete.
     pub(crate) fn reset(&self, bar: &Bar0) -> Result {
-        self.reset_eng(bar)?;
+        self.hal.reset_eng(bar)?;
         self.hal.select_core(self, bar)?;
-        self.reset_wait_mem_scrubbing(bar)?;
+        self.hal.reset_wait_mem_scrubbing(bar)?;
 
         regs::NV_PFALCON_FALCON_RM::default()
             .set_value(regs::NV_PMC_BOOT_0::read(bar).into())
@@ -665,8 +627,7 @@ pub(crate) fn signature_reg_fuse_version(
     ///
     /// Returns `true` if the RISC-V core is active, `false` otherwise.
     pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
-        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
-        cpuctl.active_stat()
+        self.hal.is_riscv_active(bar)
     }
 
     /// Write the application version to the OS register.
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
index 8dc56a28ad65..c77a1568ea96 100644
--- a/drivers/gpu/nova-core/falcon/hal.rs
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -37,6 +37,16 @@ fn signature_reg_fuse_version(
 
     /// Program the boot ROM registers prior to starting a secure firmware.
     fn program_brom(&self, falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result;
+
+    /// Check if the RISC-V core is active.
+    /// Returns `true` if the RISC-V core is active, `false` otherwise.
+    fn is_riscv_active(&self, bar: &Bar0) -> bool;
+
+    /// Wait for memory scrubbing to complete.
+    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result;
+
+    /// Reset the falcon engine.
+    fn reset_eng(&self, bar: &Bar0) -> Result;
 }
 
 /// Returns a boxed falcon HAL adequate for `chipset`.
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
index 69a7a95cac16..232d51a4921f 100644
--- a/drivers/gpu/nova-core/falcon/hal/ga102.rs
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -6,6 +6,7 @@
     device,
     io::poll::read_poll_timeout,
     prelude::*,
+    time::delay::fsleep,
     time::Delta, //
 };
 
@@ -117,4 +118,46 @@ fn signature_reg_fuse_version(
     fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
         program_brom_ga102::<E>(bar, params)
     }
+
+    fn is_riscv_active(&self, bar: &Bar0) -> bool {
+        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
+        cpuctl.active_stat()
+    }
+
+    /// Wait for memory scrubbing to complete.
+    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
+        // TIMEOUT: memory scrubbing should complete in less than 20ms.
+        read_poll_timeout(
+            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
+            |r| r.mem_scrubbing_done(),
+            Delta::ZERO,
+            Delta::from_millis(20),
+        )
+        .map(|_| ())
+    }
+
+    /// Reset the falcon engine.
+    fn reset_eng(&self, bar: &Bar0) -> Result {
+        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
+
+        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
+        // RESET_READY so a non-failing timeout is used.
+        let _ = read_poll_timeout(
+            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
+            |r| r.reset_ready(),
+            Delta::ZERO,
+            Delta::from_micros(150),
+        );
+
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
+
+        // TIMEOUT: falcon engine should not take more than 10us to reset.
+        fsleep(Delta::from_micros(10));
+
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
+
+        self.reset_wait_mem_scrubbing(bar)?;
+
+        Ok(())
+    }
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 07/11] gpu: nova-core: Add basic Turing HAL
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (5 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-31 22:54   ` John Hubbard
  2025-12-18  3:29 ` [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

Add the basic HAL for recognizing Turing GPUs.  This isn't enough
to support booting GSP-RM on Turing, but it's a start.

Note that GA100, which boots using the same method as Turing, is not
supported yet.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon/hal.rs       |  4 ++
 drivers/gpu/nova-core/falcon/hal/tu102.rs | 73 +++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs             | 14 +++++
 3 files changed, 91 insertions(+)
 create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs

diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
index c77a1568ea96..c886ba03d1f6 100644
--- a/drivers/gpu/nova-core/falcon/hal.rs
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -13,6 +13,7 @@
 };
 
 mod ga102;
+mod tu102;
 
 /// Hardware Abstraction Layer for Falcon cores.
 ///
@@ -60,6 +61,9 @@ pub(super) fn falcon_hal<E: FalconEngine + 'static>(
     use Chipset::*;
 
     let hal = match chipset {
+        TU102 | TU104 | TU106 | TU116 | TU117 => {
+            KBox::new(tu102::Tu102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
+        }
         GA102 | GA103 | GA104 | GA106 | GA107 | AD102 | AD103 | AD104 | AD106 | AD107 => {
             KBox::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
         }
diff --git a/drivers/gpu/nova-core/falcon/hal/tu102.rs b/drivers/gpu/nova-core/falcon/hal/tu102.rs
new file mode 100644
index 000000000000..ac8f58ef6789
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal/tu102.rs
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::marker::PhantomData;
+
+use kernel::{
+    io::poll::read_poll_timeout,
+    prelude::*,
+    time::delay::fsleep,
+    time::Delta, //
+};
+
+use crate::driver::Bar0;
+use crate::falcon::{Falcon, FalconBromParams, FalconEngine};
+use crate::regs;
+
+use super::FalconHal;
+
+pub(super) struct Tu102<E: FalconEngine>(PhantomData<E>);
+
+impl<E: FalconEngine> Tu102<E> {
+    pub(super) fn new() -> Self {
+        Self(PhantomData)
+    }
+}
+
+impl<E: FalconEngine> FalconHal<E> for Tu102<E> {
+    fn select_core(&self, _falcon: &Falcon<E>, _bar: &Bar0) -> Result {
+        Ok(())
+    }
+
+    fn signature_reg_fuse_version(
+        &self,
+        _falcon: &Falcon<E>,
+        _bar: &Bar0,
+        _engine_id_mask: u16,
+        _ucode_id: u8,
+    ) -> Result<u32> {
+        Ok(0)
+    }
+
+    fn program_brom(&self, _falcon: &Falcon<E>, _bar: &Bar0, _params: &FalconBromParams) -> Result {
+        Ok(())
+    }
+
+    fn is_riscv_active(&self, bar: &Bar0) -> bool {
+        let cpuctl = regs::NV_PRISCV_RISCV_CORE_SWITCH_RISCV_STATUS::read(bar, &E::ID);
+        cpuctl.active_stat()
+    }
+
+    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
+        // TIMEOUT: memory scrubbing should complete in less than 10ms.
+        read_poll_timeout(
+            || Ok(regs::NV_PFALCON_FALCON_DMACTL::read(bar, &E::ID)),
+            |r| r.mem_scrubbing_done(),
+            Delta::ZERO,
+            Delta::from_millis(10),
+        )
+        .map(|_| ())
+    }
+
+    fn reset_eng(&self, bar: &Bar0) -> Result {
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
+
+        // TIMEOUT: falcon engine should not take more than 10us to reset.
+        fsleep(Delta::from_micros(10));
+
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
+
+        self.reset_wait_mem_scrubbing(bar)?;
+
+        Ok(())
+    }
+}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index c049f5aaf2f2..8188a566e1ae 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -307,6 +307,13 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
     7:7     secure_stat as bool;
 });
 
+impl NV_PFALCON_FALCON_DMACTL {
+    /// Returns `true` if memory scrubbing is completed.
+    pub(crate) fn mem_scrubbing_done(self) -> bool {
+        !self.dmem_scrubbing() && !self.imem_scrubbing()
+    }
+}
+
 register!(NV_PFALCON_FALCON_DMATRFBASE @ PFalconBase[0x00000110] {
     31:0    base as u32;
 });
@@ -389,6 +396,13 @@ pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
 
 // PRISCV
 
+// RISC-V status register for debug (Turing and GA100 only).
+// Reflects current RISC-V core status.
+register!(NV_PRISCV_RISCV_CORE_SWITCH_RISCV_STATUS @ PFalcon2Base[0x00000240] {
+    0:0     active_stat as bool, "RISC-V core active/inactive status";
+});
+
+// GA102 and later
 register!(NV_PRISCV_RISCV_CPUCTL @ PFalcon2Base[0x00000388] {
     0:0     halted as bool;
     7:7     active_stat as bool;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma()
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (6 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 07/11] gpu: nova-core: Add basic Turing HAL Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-18  7:48   ` Alexandre Courbot
  2025-12-18  3:29 ` [PATCH v4 09/11] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

Some GPUs do not support using DMA to transfer code/data from system
memory to Falcon memory, and instead must use programmed I/O (PIO).
Add a function to the Falcon HAL to indicate whether a given GPU's
Falcons support DMA for this purpose.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs           | 7 +++++++
 drivers/gpu/nova-core/falcon/hal.rs       | 3 +++
 drivers/gpu/nova-core/falcon/hal/ga102.rs | 4 ++++
 drivers/gpu/nova-core/falcon/hal/tu102.rs | 4 ++++
 4 files changed, 18 insertions(+)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 6b54c0ca458a..50c87dadf2ea 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -630,6 +630,13 @@ pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
         self.hal.is_riscv_active(bar)
     }
 
+    /// Check if this Falcon supports DMA for loading firmware.
+    ///
+    /// Returns `true` if DMA is supported, `false` if PIO must be used instead.
+    pub(crate) fn supports_dma(&self) -> bool {
+        self.hal.supports_dma()
+    }
+
     /// Write the application version to the OS register.
     pub(crate) fn write_os_version(&self, bar: &Bar0, app_version: u32) {
         regs::NV_PFALCON_FALCON_OS::default()
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
index c886ba03d1f6..49501aa6ff7f 100644
--- a/drivers/gpu/nova-core/falcon/hal.rs
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -48,6 +48,9 @@ fn signature_reg_fuse_version(
 
     /// Reset the falcon engine.
     fn reset_eng(&self, bar: &Bar0) -> Result;
+
+    /// Returns true of this Falcon supports DMA transfer from system memory to Falcon memory
+    fn supports_dma(&self) -> bool;
 }
 
 /// Returns a boxed falcon HAL adequate for `chipset`.
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
index 232d51a4921f..684ca72b7cfe 100644
--- a/drivers/gpu/nova-core/falcon/hal/ga102.rs
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -160,4 +160,8 @@ fn reset_eng(&self, bar: &Bar0) -> Result {
 
         Ok(())
     }
+
+    fn supports_dma(&self) -> bool {
+        true
+    }
 }
diff --git a/drivers/gpu/nova-core/falcon/hal/tu102.rs b/drivers/gpu/nova-core/falcon/hal/tu102.rs
index ac8f58ef6789..e549fbd7d1f5 100644
--- a/drivers/gpu/nova-core/falcon/hal/tu102.rs
+++ b/drivers/gpu/nova-core/falcon/hal/tu102.rs
@@ -70,4 +70,8 @@ fn reset_eng(&self, bar: &Bar0) -> Result {
 
         Ok(())
     }
+
+    fn supports_dma(&self) -> bool {
+        false
+    }
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 09/11] gpu: nova-core: add FalconUCodeDescV2 support
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (7 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-18  8:01   ` Alexandre Courbot
  2025-12-18  3:29 ` [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

The FRTS firmware in Turing and GA100 VBIOS has an older header
format (v2 instead of v3).  To support both v2 and v3 at runtime,
add the FalconUCodeDescV2 struct, and update code that references
the FalconUCodeDescV3 directly with a FalconUCodeDesc enum that
encapsulates both.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs       | 177 +++++++++++++++++++++++-
 drivers/gpu/nova-core/firmware/fwsec.rs |  72 ++++++----
 drivers/gpu/nova-core/vbios.rs          |  75 +++++-----
 3 files changed, 266 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 2d2008b33fb4..44897feb82a4 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -43,6 +43,46 @@ fn request_firmware(
         .and_then(|path| firmware::Firmware::request(&path, dev))
 }
 
+/// Structure used to describe some firmwares, notably FWSEC-FRTS.
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct FalconUCodeDescV2 {
+    /// Header defined by 'NV_BIT_FALCON_UCODE_DESC_HEADER_VDESC*' in OpenRM.
+    hdr: u32,
+    /// Stored size of the ucode after the header, compressed or uncompressed
+    stored_size: u32,
+    /// Uncompressed size of the ucode.  If store_size == uncompressed_size, then the ucode
+    /// is not compressed.
+    pub(crate) uncompressed_size: u32,
+    /// Code entry point
+    pub(crate) virtual_entry: u32,
+    /// Offset after the code segment at which the Application Interface Table headers are located.
+    pub(crate) interface_offset: u32,
+    /// Base address at which to load the code segment into 'IMEM'.
+    pub(crate) imem_phys_base: u32,
+    /// Size in bytes of the code to copy into 'IMEM'.
+    pub(crate) imem_load_size: u32,
+    /// Virtual 'IMEM' address (i.e. 'tag') at which the code should start.
+    pub(crate) imem_virt_base: u32,
+    /// Virtual address of secure IMEM segment.
+    pub(crate) imem_sec_base: u32,
+    /// Size of secure IMEM segment.
+    pub(crate) imem_sec_size: u32,
+    /// Offset into stored (uncompressed) image at which DMEM begins.
+    pub(crate) dmem_offset: u32,
+    /// Base address at which to load the data segment into 'DMEM'.
+    pub(crate) dmem_phys_base: u32,
+    /// Size in bytes of the data to copy into 'DMEM'.
+    pub(crate) dmem_load_size: u32,
+    /// "Alternate" Size of data to load into IMEM.
+    pub(crate) alt_imem_load_size: u32,
+    /// "Alternate" Size of data to load into DMEM.
+    pub(crate) alt_dmem_load_size: u32,
+}
+
+// SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+unsafe impl FromBytes for FalconUCodeDescV2 {}
+
 /// Structure used to describe some firmwares, notably FWSEC-FRTS.
 #[repr(C)]
 #[derive(Debug, Clone)]
@@ -76,13 +116,146 @@ pub(crate) struct FalconUCodeDescV3 {
     _reserved: u16,
 }
 
-impl FalconUCodeDescV3 {
+// SAFETY: all bit patterns are valid for this type, and it doesn't use
+// interior mutability.
+unsafe impl FromBytes for FalconUCodeDescV3 {}
+
+/// Enum wrapping the different versions of Falcon microcode descriptors.
+///
+/// This allows handling both V2 and V3 descriptor formats through a
+/// unified type, providing version-agnostic access to firmware metadata
+/// via the [`FalconUCodeDescriptor`] trait.
+#[derive(Debug, Clone)]
+pub(crate) enum FalconUCodeDesc {
+    V2(FalconUCodeDescV2),
+    V3(FalconUCodeDescV3),
+}
+
+/// Trait providing a common interface for accessing Falcon microcode descriptor fields.
+///
+/// This trait abstracts over the different descriptor versions ([`FalconUCodeDescV2`] and
+/// [`FalconUCodeDescV3`]), allowing code to work with firmware metadata without needing to
+/// know the specific descriptor version. Fields not present return zero.
+pub(crate) trait FalconUCodeDescriptor {
+    fn hdr(&self) -> u32;
+    fn imem_load_size(&self) -> u32;
+    fn interface_offset(&self) -> u32;
+    fn dmem_load_size(&self) -> u32;
+    fn pkc_data_offset(&self) -> u32;
+    fn engine_id_mask(&self) -> u16;
+    fn ucode_id(&self) -> u8;
+    fn signature_count(&self) -> u8;
+    fn signature_versions(&self) -> u16;
+}
+
+impl FalconUCodeDescriptor for FalconUCodeDescV2 {
+    fn hdr(&self) -> u32 {
+        self.hdr
+    }
+    fn imem_load_size(&self) -> u32 {
+        self.imem_load_size
+    }
+    fn interface_offset(&self) -> u32 {
+        self.interface_offset
+    }
+    fn dmem_load_size(&self) -> u32 {
+        self.dmem_load_size
+    }
+    fn pkc_data_offset(&self) -> u32 {
+        0
+    }
+    fn engine_id_mask(&self) -> u16 {
+        0
+    }
+    fn ucode_id(&self) -> u8 {
+        0
+    }
+    fn signature_count(&self) -> u8 {
+        0
+    }
+    fn signature_versions(&self) -> u16 {
+        0
+    }
+}
+
+impl FalconUCodeDescriptor for FalconUCodeDescV3 {
+    fn hdr(&self) -> u32 {
+        self.hdr
+    }
+    fn imem_load_size(&self) -> u32 {
+        self.imem_load_size
+    }
+    fn interface_offset(&self) -> u32 {
+        self.interface_offset
+    }
+    fn dmem_load_size(&self) -> u32 {
+        self.dmem_load_size
+    }
+    fn pkc_data_offset(&self) -> u32 {
+        self.pkc_data_offset
+    }
+    fn engine_id_mask(&self) -> u16 {
+        self.engine_id_mask
+    }
+    fn ucode_id(&self) -> u8 {
+        self.ucode_id
+    }
+    fn signature_count(&self) -> u8 {
+        self.signature_count
+    }
+    fn signature_versions(&self) -> u16 {
+        self.signature_versions
+    }
+}
+
+impl FalconUCodeDesc {
+    // Return trait object, the only match needed.
+    pub(crate) fn as_descriptor(&self) -> &dyn FalconUCodeDescriptor {
+        match self {
+            FalconUCodeDesc::V2(v2) => v2,
+            FalconUCodeDesc::V3(v3) => v3,
+        }
+    }
+
     /// Returns the size in bytes of the header.
     pub(crate) fn size(&self) -> usize {
+        let hdr = self.as_descriptor().hdr();
+
         const HDR_SIZE_SHIFT: u32 = 16;
         const HDR_SIZE_MASK: u32 = 0xffff0000;
+        ((hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT).into_safe_cast()
+    }
+
+    pub(crate) fn imem_load_size(&self) -> u32 {
+        self.as_descriptor().imem_load_size()
+    }
+
+    pub(crate) fn interface_offset(&self) -> u32 {
+        self.as_descriptor().interface_offset()
+    }
+
+    pub(crate) fn dmem_load_size(&self) -> u32 {
+        self.as_descriptor().dmem_load_size()
+    }
+
+    pub(crate) fn pkc_data_offset(&self) -> u32 {
+        self.as_descriptor().pkc_data_offset()
+    }
+
+    pub(crate) fn engine_id_mask(&self) -> u16 {
+        self.as_descriptor().engine_id_mask()
+    }
+
+    pub(crate) fn ucode_id(&self) -> u8 {
+        self.as_descriptor().ucode_id()
+    }
+
+    pub(crate) fn signature_count(&self) -> u8 {
+        self.as_descriptor().signature_count()
+    }
 
-        ((self.hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT).into_safe_cast()
+    pub(crate) fn signature_versions(&self) -> u16 {
+        self.as_descriptor().signature_versions()
     }
 }
 
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index e4009faba6c5..1c1dcdacf5f5 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -40,7 +40,7 @@
         FalconLoadTarget, //
     },
     firmware::{
-        FalconUCodeDescV3,
+        FalconUCodeDesc,
         FirmwareDmaObject,
         FirmwareSignature,
         Signed,
@@ -218,38 +218,59 @@ unsafe fn transmute_mut<T: Sized + FromBytes + AsBytes>(
 /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
 pub(crate) struct FwsecFirmware {
     /// Descriptor of the firmware.
-    desc: FalconUCodeDescV3,
+    desc: FalconUCodeDesc,
     /// GPU-accessible DMA object containing the firmware.
     ucode: FirmwareDmaObject<Self, Signed>,
 }
 
 impl FalconLoadParams for FwsecFirmware {
     fn imem_sec_load_params(&self) -> FalconLoadTarget {
-        FalconLoadTarget {
-            src_start: 0,
-            dst_start: self.desc.imem_phys_base,
-            len: self.desc.imem_load_size,
+        match &self.desc {
+            FalconUCodeDesc::V2(v2) => FalconLoadTarget {
+                src_start: 0,
+                dst_start: v2.imem_sec_base,
+                len: v2.imem_sec_size,
+            },
+            FalconUCodeDesc::V3(v3) => FalconLoadTarget {
+                src_start: 0,
+                dst_start: v3.imem_phys_base,
+                len: v3.imem_load_size,
+            },
         }
     }
 
     fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
-        // Only used on Turing and GA100, so return None for now
-        None
+        match &self.desc {
+            FalconUCodeDesc::V2(v2) => Some(FalconLoadTarget {
+                src_start: 0,
+                dst_start: v2.imem_phys_base,
+                len: v2.imem_load_size - v2.imem_sec_size,
+            }),
+            // Not used on V3 platforms
+            FalconUCodeDesc::V3(_v3) => None,
+        }
     }
 
     fn dmem_load_params(&self) -> FalconLoadTarget {
-        FalconLoadTarget {
-            src_start: self.desc.imem_load_size,
-            dst_start: self.desc.dmem_phys_base,
-            len: self.desc.dmem_load_size,
+        match &self.desc {
+            FalconUCodeDesc::V2(v2) => FalconLoadTarget {
+                src_start: v2.dmem_offset,
+                dst_start: v2.dmem_phys_base,
+                len: v2.dmem_load_size,
+            },
+            FalconUCodeDesc::V3(v3) => FalconLoadTarget {
+                src_start: v3.imem_load_size,
+                dst_start: v3.dmem_phys_base,
+                len: v3.dmem_load_size,
+            },
         }
     }
 
     fn brom_params(&self) -> FalconBromParams {
         FalconBromParams {
-            pkc_data_offset: self.desc.pkc_data_offset,
-            engine_id_mask: self.desc.engine_id_mask,
-            ucode_id: self.desc.ucode_id,
+            pkc_data_offset: self.desc.pkc_data_offset(),
+            engine_id_mask: self.desc.engine_id_mask(),
+            ucode_id: self.desc.ucode_id(),
         }
     }
 
@@ -273,10 +294,10 @@ impl FalconFirmware for FwsecFirmware {
 impl FirmwareDmaObject<FwsecFirmware, Unsigned> {
     fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Result<Self> {
         let desc = bios.fwsec_image().header()?;
-        let ucode = bios.fwsec_image().ucode(desc)?;
+        let ucode = bios.fwsec_image().ucode(&desc)?;
         let mut dma_object = DmaObject::from_data(dev, ucode)?;
 
-        let hdr_offset = usize::from_safe_cast(desc.imem_load_size + desc.interface_offset);
+        let hdr_offset = usize::from_safe_cast(desc.imem_load_size() + desc.interface_offset());
         // SAFETY: we have exclusive access to `dma_object`.
         let hdr: &FalconAppifHdrV1 = unsafe { transmute(&dma_object, hdr_offset) }?;
 
@@ -303,7 +324,7 @@ fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Re
             let dmem_mapper: &mut FalconAppifDmemmapperV3 = unsafe {
                 transmute_mut(
                     &mut dma_object,
-                    (desc.imem_load_size + dmem_base).into_safe_cast(),
+                    (desc.imem_load_size() + dmem_base).into_safe_cast(),
                 )
             }?;
 
@@ -317,7 +338,7 @@ fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Re
             let frts_cmd: &mut FrtsCmd = unsafe {
                 transmute_mut(
                     &mut dma_object,
-                    (desc.imem_load_size + cmd_in_buffer_offset).into_safe_cast(),
+                    (desc.imem_load_size() + cmd_in_buffer_offset).into_safe_cast(),
                 )
             }?;
 
@@ -364,11 +385,12 @@ pub(crate) fn new(
 
         // Patch signature if needed.
         let desc = bios.fwsec_image().header()?;
-        let ucode_signed = if desc.signature_count != 0 {
-            let sig_base_img = usize::from_safe_cast(desc.imem_load_size + desc.pkc_data_offset);
-            let desc_sig_versions = u32::from(desc.signature_versions);
+        let ucode_signed = if desc.signature_count() != 0 {
+            let sig_base_img =
+                usize::from_safe_cast(desc.imem_load_size() + desc.pkc_data_offset());
+            let desc_sig_versions = u32::from(desc.signature_versions());
             let reg_fuse_version =
-                falcon.signature_reg_fuse_version(bar, desc.engine_id_mask, desc.ucode_id)?;
+                falcon.signature_reg_fuse_version(bar, desc.engine_id_mask(), desc.ucode_id())?;
             dev_dbg!(
                 dev,
                 "desc_sig_versions: {:#x}, reg_fuse_version: {}\n",
@@ -402,7 +424,7 @@ pub(crate) fn new(
             dev_dbg!(dev, "patching signature with index {}\n", signature_idx);
             let signature = bios
                 .fwsec_image()
-                .sigs(desc)
+                .sigs(&desc)
                 .and_then(|sigs| sigs.get(signature_idx).ok_or(EINVAL))?;
 
             ucode_dma.patch_signature(signature, sig_base_img)?
@@ -411,7 +433,7 @@ pub(crate) fn new(
         };
 
         Ok(FwsecFirmware {
-            desc: desc.clone(),
+            desc: desc,
             ucode: ucode_signed,
         })
     }
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index abf423560ff4..11469bb1f352 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -19,6 +19,8 @@
     driver::Bar0,
     firmware::{
         fwsec::Bcrt30Rsa3kSignature,
+        FalconUCodeDesc,
+        FalconUCodeDescV2,
         FalconUCodeDescV3, //
     },
     num::FromSafeCast,
@@ -1004,19 +1006,10 @@ fn build(self) -> Result<FwSecBiosImage> {
 
 impl FwSecBiosImage {
     /// Get the FwSec header ([`FalconUCodeDescV3`]).
-    pub(crate) fn header(&self) -> Result<&FalconUCodeDescV3> {
+    pub(crate) fn header(&self) -> Result<FalconUCodeDesc> {
         // Get the falcon ucode offset that was found in setup_falcon_data.
         let falcon_ucode_offset = self.falcon_ucode_offset;
 
-        // Make sure the offset is within the data bounds.
-        if falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>() > self.base.data.len() {
-            dev_err!(
-                self.base.dev,
-                "fwsec-frts header not contained within BIOS bounds\n"
-            );
-            return Err(ERANGE);
-        }
-
         // Read the first 4 bytes to get the version.
         let hdr_bytes: [u8; 4] = self.base.data[falcon_ucode_offset..falcon_ucode_offset + 4]
             .try_into()
@@ -1024,33 +1017,49 @@ pub(crate) fn header(&self) -> Result<&FalconUCodeDescV3> {
         let hdr = u32::from_le_bytes(hdr_bytes);
         let ver = (hdr & 0xff00) >> 8;
 
-        if ver != 3 {
-            dev_err!(self.base.dev, "invalid fwsec firmware version: {:?}\n", ver);
-            return Err(EINVAL);
+        let hdr_size = match ver {
+            2 => core::mem::size_of::<FalconUCodeDescV2>(),
+            3 => core::mem::size_of::<FalconUCodeDescV3>(),
+            _ => {
+                dev_err!(self.base.dev, "invalid fwsec firmware version: {:?}\n", ver);
+                return Err(EINVAL);
+            }
+        };
+        // Make sure the offset is within the data bounds
+        if falcon_ucode_offset + hdr_size > self.base.data.len() {
+            dev_err!(
+                self.base.dev,
+                "fwsec-frts header not contained within BIOS bounds\n"
+            );
+            return Err(ERANGE);
         }
 
-        // Return a reference to the FalconUCodeDescV3 structure.
-        //
-        // SAFETY: We have checked that `falcon_ucode_offset + size_of::<FalconUCodeDescV3>` is
-        // within the bounds of `data`. Also, this data vector is from ROM, and the `data` field
-        // in `BiosImageBase` is immutable after construction.
-        Ok(unsafe {
-            &*(self
-                .base
-                .data
-                .as_ptr()
-                .add(falcon_ucode_offset)
-                .cast::<FalconUCodeDescV3>())
-        })
+        let data = self
+            .base
+            .data
+            .get(falcon_ucode_offset..falcon_ucode_offset + hdr_size)
+            .ok_or(EINVAL)?;
+
+        match ver {
+            2 => {
+                let v2 = FalconUCodeDescV2::from_bytes(data).ok_or(EINVAL)?;
+                Ok(FalconUCodeDesc::V2(v2.clone()))
+            }
+            3 => {
+                let v3 = FalconUCodeDescV3::from_bytes(data).ok_or(EINVAL)?;
+                Ok(FalconUCodeDesc::V3(v3.clone()))
+            }
+            _ => Err(EINVAL),
+        }
     }
 
     /// Get the ucode data as a byte slice
-    pub(crate) fn ucode(&self, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+    pub(crate) fn ucode(&self, desc: &FalconUCodeDesc) -> Result<&[u8]> {
         let falcon_ucode_offset = self.falcon_ucode_offset;
 
         // The ucode data follows the descriptor.
         let ucode_data_offset = falcon_ucode_offset + desc.size();
-        let size = usize::from_safe_cast(desc.imem_load_size + desc.dmem_load_size);
+        let size = usize::from_safe_cast(desc.imem_load_size() + desc.dmem_load_size());
 
         // Get the data slice, checking bounds in a single operation.
         self.base
@@ -1066,10 +1075,14 @@ pub(crate) fn ucode(&self, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
     }
 
     /// Get the signatures as a byte slice
-    pub(crate) fn sigs(&self, desc: &FalconUCodeDescV3) -> Result<&[Bcrt30Rsa3kSignature]> {
+    pub(crate) fn sigs(&self, desc: &FalconUCodeDesc) -> Result<&[Bcrt30Rsa3kSignature]> {
+        let hdr_size = match desc {
+            FalconUCodeDesc::V2(_v2) => core::mem::size_of::<FalconUCodeDescV2>(),
+            FalconUCodeDesc::V3(_v3) => core::mem::size_of::<FalconUCodeDescV3>(),
+        };
         // The signatures data follows the descriptor.
-        let sigs_data_offset = self.falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
-        let sigs_count = usize::from(desc.signature_count);
+        let sigs_data_offset = self.falcon_ucode_offset + hdr_size;
+        let sigs_count = usize::from(desc.signature_count());
         let sigs_size = sigs_count * core::mem::size_of::<Bcrt30Rsa3kSignature>();
 
         // Make sure the data is within bounds.
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (8 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 09/11] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-18 11:54   ` Alexandre Courbot
  2025-12-18  3:29 ` [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

On Turing and GA100 (i.e. the versions that use Libos v2), GSP-RM insists
that the 'size' parameter of the LibosMemoryRegionInitArgument struct be
aligned to 4KB.  The logging buffers are already aligned to that size, so
only the GSP_ARGUMENTS_CACHED struct needs to be adjusted.  Make that
adjustment by adding padding to the end of the struct.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw.rs | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index abffd6beec65..ab3ad038889c 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -889,17 +889,27 @@ unsafe impl AsBytes for GspMsgElement {}
 unsafe impl FromBytes for GspMsgElement {}
 
 /// Arguments for GSP startup.
-#[repr(transparent)]
-pub(crate) struct GspArgumentsCached(bindings::GSP_ARGUMENTS_CACHED);
+///
+/// On Turing and GA100, the entries in the `LibosMemoryRegionInitArgument`
+/// must all be a multiple of GSP_PAGE_SIZE in size, so add padding to force it
+/// to that size.
+#[repr(C)]
+pub(crate) struct GspArgumentsCached(
+    bindings::GSP_ARGUMENTS_CACHED,
+    [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
+);
 
 impl GspArgumentsCached {
     /// Creates the arguments for starting the GSP up using `cmdq` as its command queue.
     pub(crate) fn new(cmdq: &Cmdq) -> Self {
-        Self(bindings::GSP_ARGUMENTS_CACHED {
-            messageQueueInitArguments: MessageQueueInitArguments::new(cmdq).0,
-            bDmemStack: 1,
-            ..Default::default()
-        })
+        Self(
+            bindings::GSP_ARGUMENTS_CACHED {
+                messageQueueInitArguments: MessageQueueInitArguments::new(cmdq).0,
+                bDmemStack: 1,
+                ..Default::default()
+            },
+            Zeroable::zeroed(),
+        )
     }
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (9 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
@ 2025-12-18  3:29 ` Timur Tabi
  2025-12-18 12:59   ` Alexandre Courbot
  2025-12-28 17:45 ` [PATCH v4 00/11] gpu: nova-core: add Turing support Ewan Chorynski
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, John Hubbard,
	nouveau, rust-for-linux

Turing and GA100 use programmed I/O (PIO) instead of DMA to upload
firmware images into Falcon memory.

A new firmware called the Generic Bootloader (as opposed to the
GSP Bootloader) is used to upload FWSEC.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs         | 168 ++++++++++++++++++++++++
 drivers/gpu/nova-core/firmware.rs       |   4 +-
 drivers/gpu/nova-core/firmware/fwsec.rs | 142 +++++++++++++++++++-
 drivers/gpu/nova-core/gsp/boot.rs       |  10 +-
 drivers/gpu/nova-core/regs.rs           |  30 +++++
 5 files changed, 345 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 50c87dadf2ea..35d0e5e2ff9e 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -15,11 +15,16 @@
     time::{
         Delta, //
     },
+    transmute::AsBytes, //
 };
 
 use crate::{
     dma::DmaObject,
     driver::Bar0,
+    firmware::fwsec::{
+        BootloaderDmemDescV2,
+        GenericBootloader, //
+    },
     gpu::Chipset,
     num::{
         FromSafeCast,
@@ -406,6 +411,169 @@ pub(crate) fn reset(&self, bar: &Bar0) -> Result {
         Ok(())
     }
 
+    /// Write a byte array to Falcon memory using programmed I/O (PIO).
+    ///
+    /// Writes `img` to the specified `target_mem` (IMEM or DMEM) starting at `mem_base`.
+    /// For IMEM writes, tags are set for each 256-byte block starting from `tag`.
+    ///
+    /// Returns `EINVAL` if `img.len()` is not a multiple of 4.
+    fn pio_wr_bytes(
+        &self,
+        bar: &Bar0,
+        img: &[u8],
+        mem_base: u16,
+        target_mem: FalconMem,
+        port: u8,
+        tag: u16,
+    ) -> Result {
+        // Rejecting misaligned images here allows us to avoid checking
+        // inside the loops.
+        if img.len() % 4 != 0 {
+            return Err(EINVAL);
+        }
+
+        let port = usize::from(port);
+
+        match target_mem {
+            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
+                regs::NV_PFALCON_FALCON_IMEMC::default()
+                    .set_secure(target_mem == FalconMem::ImemSecure)
+                    .set_aincw(true)
+                    .set_offs(mem_base)
+                    .write(bar, &E::ID, port);
+
+                let mut tag = tag;
+                for block in img.chunks(256) {
+                    regs::NV_PFALCON_FALCON_IMEMT::default()
+                        .set_tag(tag)
+                        .write(bar, &E::ID, port);
+                    for word in block.chunks_exact(4) {
+                        let w = [word[0], word[1], word[2], word[3]];
+                        regs::NV_PFALCON_FALCON_IMEMD::default()
+                            .set_data(u32::from_le_bytes(w))
+                            .write(bar, &E::ID, port);
+                    }
+                    tag += 1;
+                }
+            }
+            FalconMem::Dmem => {
+                regs::NV_PFALCON_FALCON_DMEMC::default()
+                    .set_aincw(true)
+                    .set_offs(mem_base)
+                    .write(bar, &E::ID, port);
+
+                for block in img.chunks(256) {
+                    for word in block.chunks_exact(4) {
+                        let w = [word[0], word[1], word[2], word[3]];
+                        regs::NV_PFALCON_FALCON_DMEMD::default()
+                            .set_data(u32::from_le_bytes(w))
+                            .write(bar, &E::ID, port);
+                    }
+                }
+            }
+        }
+
+        Ok(())
+    }
+
+    fn pio_wr<F: FalconFirmware<Target = E>>(
+        &self,
+        bar: &Bar0,
+        fw: &F,
+        target_mem: FalconMem,
+        load_offsets: &FalconLoadTarget,
+        port: u8,
+        tag: u16,
+    ) -> Result {
+        let start = usize::from_safe_cast(load_offsets.src_start);
+        let len = usize::from_safe_cast(load_offsets.len);
+        let mem_base = u16::try_from(load_offsets.dst_start)?;
+
+        // SAFETY: as_slice() ensures that start+len is within range
+        let data = unsafe { fw.as_slice(start, len).map_err(|_| EINVAL)? };
+
+        self.pio_wr_bytes(bar, data, mem_base, target_mem, port, tag)
+    }
+
+    /// Perform a PIO copy into `IMEM` and `DMEM` of `fw`, and prepare the falcon to run it.
+    pub(crate) fn pio_load<F: FalconFirmware<Target = E>>(
+        &self,
+        bar: &Bar0,
+        fw: &F,
+        gbl: Option<&GenericBootloader>,
+    ) -> Result {
+        let imem_sec = fw.imem_sec_load_params();
+        let imem_ns = fw.imem_ns_load_params().ok_or(EINVAL)?;
+        let dmem = fw.dmem_load_params();
+
+        regs::NV_PFALCON_FBIF_CTL::read(bar, &E::ID)
+            .set_allow_phys_no_ctx(true)
+            .write(bar, &E::ID);
+
+        regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, &E::ID);
+
+        // If the Generic Bootloader was passed, then use it to boot FRTS
+        if let Some(gbl) = gbl {
+            let dst_start = u16::try_from(0x10000 - gbl.desc.code_size)?;
+            let data = &gbl.ucode[..usize::from_safe_cast(gbl.desc.code_size)];
+            let tag = u16::try_from(gbl.desc.start_tag)?;
+
+            self.pio_wr_bytes(bar, data, dst_start, FalconMem::ImemNonSecure, 0, tag)?;
+
+            // This structure tells the generic bootloader where to find the FWSEC
+            // image.
+            let dmem_desc = BootloaderDmemDescV2 {
+                reserved: [0; 4],
+                signature: [0; 4],
+                ctx_dma: 4, // FALCON_DMAIDX_PHYS_SYS_NCOH
+                code_dma_base: fw.dma_handle(),
+                non_sec_code_off: imem_ns.dst_start,
+                non_sec_code_size: imem_ns.len,
+                sec_code_off: imem_sec.dst_start,
+                sec_code_size: imem_sec.len,
+                code_entry_point: 0,
+                data_dma_base: fw.dma_handle() + u64::from(dmem.src_start),
+                data_size: dmem.len,
+                argc: 0,
+                argv: 0,
+            };
+
+            regs::NV_PFALCON_FBIF_TRANSCFG::update(bar, &E::ID, 4, |v| {
+                v.set_target(FalconFbifTarget::CoherentSysmem)
+                    .set_mem_type(FalconFbifMemType::Physical)
+            });
+
+            self.pio_wr_bytes(bar, dmem_desc.as_bytes(), 0, FalconMem::Dmem, 0, 0)?;
+        } else {
+            self.pio_wr(
+                bar,
+                fw,
+                FalconMem::ImemNonSecure,
+                &imem_ns,
+                0,
+                u16::try_from(imem_ns.dst_start >> 8)?,
+            )?;
+            self.pio_wr(
+                bar,
+                fw,
+                FalconMem::ImemSecure,
+                &imem_sec,
+                0,
+                u16::try_from(imem_sec.dst_start >> 8)?,
+            )?;
+            self.pio_wr(bar, fw, FalconMem::Dmem, &dmem, 0, 0)?;
+        }
+
+        self.hal.program_brom(self, bar, &fw.brom_params())?;
+
+        // Set `BootVec` to start of non-secure code.
+        regs::NV_PFALCON_FALCON_BOOTVEC::default()
+            .set_value(fw.boot_addr())
+            .write(bar, &E::ID);
+
+        Ok(())
+    }
+
     /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
     /// `target_mem`.
     ///
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 44897feb82a4..26efbf4f6760 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -31,7 +31,7 @@
 pub(crate) const FIRMWARE_VERSION: &str = "570.144";
 
 /// Requests the GPU firmware `name` suitable for `chipset`, with version `ver`.
-fn request_firmware(
+pub(crate) fn request_firmware(
     dev: &device::Device,
     chipset: gpu::Chipset,
     name: &str,
@@ -321,7 +321,7 @@ fn no_patch_signature(self) -> FirmwareDmaObject<F, Signed> {
 /// Header common to most firmware files.
 #[repr(C)]
 #[derive(Debug, Clone)]
-struct BinHdr {
+pub(crate) struct BinHdr {
     /// Magic number, must be `0x10de`.
     bin_magic: u32,
     /// Version of the header.
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 1c1dcdacf5f5..4c26257bbfe0 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -40,12 +40,15 @@
         FalconLoadTarget, //
     },
     firmware::{
+        BinHdr,
         FalconUCodeDesc,
         FirmwareDmaObject,
         FirmwareSignature,
         Signed,
         Unsigned, //
+        FIRMWARE_VERSION,
     },
+    gpu::Chipset,
     num::{
         FromSafeCast,
         IntoSafeCast, //
@@ -213,6 +216,72 @@ unsafe fn transmute_mut<T: Sized + FromBytes + AsBytes>(
     T::from_bytes_mut(unsafe { fw.as_slice_mut(offset, size_of::<T>())? }).ok_or(EINVAL)
 }
 
+/// Descriptor used by RM to figure out the requirements of the boot loader.
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct BootloaderDesc {
+    /// Starting tag of bootloader.
+    pub start_tag: u32,
+    /// DMEM offset where [`BootloaderDmemDescV2`] is to be loaded.
+    pub dmem_load_off: u32,
+    /// Offset of code section in the image.
+    pub code_off: u32,
+    /// Size of code section in the image.
+    pub code_size: u32,
+    /// Offset of data section in the image.
+    pub data_off: u32,
+    /// Size of data section in the image.
+    pub data_size: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for BootloaderDesc {}
+// SAFETY: This struct doesn't contain uninitialized bytes and doesn't have interior mutability.
+unsafe impl AsBytes for BootloaderDesc {}
+
+/// Structure used by the boot-loader to load the rest of the code.
+///
+/// This has to be filled by the GPU driver and copied into DMEM at offset
+/// [`BootloaderDesc.dmem_load_off`].
+#[repr(C, packed)]
+#[derive(Debug, Clone)]
+pub(crate) struct BootloaderDmemDescV2 {
+    /// Reserved, should always be first element.
+    pub reserved: [u32; 4],
+    /// 16B signature for secure code, 0s if no secure code.
+    pub signature: [u32; 4],
+    /// DMA context used by the bootloader while loading code/data.
+    pub ctx_dma: u32,
+    /// 256B-aligned physical FB address where code is located.
+    pub code_dma_base: u64,
+    /// Offset from `code_dma_base` where the non-secure code is located (must be multiple of 256).
+    pub non_sec_code_off: u32,
+    /// Size of the non-secure code part.
+    pub non_sec_code_size: u32,
+    /// Offset from `code_dma_base` where the secure code is located (must be multiple of 256).
+    pub sec_code_off: u32,
+    /// Size of the secure code part.
+    pub sec_code_size: u32,
+    /// Code entry point invoked by the bootloader after code is loaded.
+    pub code_entry_point: u32,
+    /// 256B-aligned physical FB address where data is located.
+    pub data_dma_base: u64,
+    /// Size of data block (should be multiple of 256B).
+    pub data_size: u32,
+    /// Arguments to be passed to the target firmware being loaded.
+    pub argc: u32,
+    /// Number of arguments to be passed to the target firmware being loaded.
+    pub argv: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for BootloaderDmemDescV2 {}
+// SAFETY: This struct doesn't contain uninitialized bytes and doesn't have interior mutability.
+unsafe impl AsBytes for BootloaderDmemDescV2 {}
+
+pub(crate) struct GenericBootloader {
+    pub desc: BootloaderDesc,
+    pub ucode: Vec<u8, kernel::alloc::allocator::Kmalloc>,
+}
+
 /// The FWSEC microcode, extracted from the BIOS and to be run on the GSP falcon.
 ///
 /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
@@ -221,6 +290,8 @@ pub(crate) struct FwsecFirmware {
     desc: FalconUCodeDesc,
     /// GPU-accessible DMA object containing the firmware.
     ucode: FirmwareDmaObject<Self, Signed>,
+    /// Generic bootloader
+    gen_bootloader: Option<GenericBootloader>,
 }
 
 impl FalconLoadParams for FwsecFirmware {
@@ -275,7 +346,19 @@ fn brom_params(&self) -> FalconBromParams {
     }
 
     fn boot_addr(&self) -> u32 {
-        0
+        match &self.desc {
+            FalconUCodeDesc::V2(_v2) => {
+                // On V2 platforms, the boot address is extracted from the
+                // generic bootloader, because the gbl is what actually copies
+                // FWSEC into memory, so that is what needs to be booted.
+                if let Some(ref gbl) = self.gen_bootloader {
+                    gbl.desc.start_tag << 8
+                } else {
+                    0
+                }
+            }
+            FalconUCodeDesc::V3(_v3) => 0,
+        }
     }
 }
 
@@ -376,6 +459,7 @@ impl FwsecFirmware {
     /// command.
     pub(crate) fn new(
         dev: &Device<device::Bound>,
+        chipset: Chipset,
         falcon: &Falcon<Gsp>,
         bar: &Bar0,
         bios: &Vbios,
@@ -432,9 +516,49 @@ pub(crate) fn new(
             ucode_dma.no_patch_signature()
         };
 
+        // The Generic Bootloader exists only on Turing and GA100.  To avoid a bogus
+        // console error message on other platforms, only try to load it if it's
+        // supposed to be there.
+        let gbl_fw = if chipset < Chipset::GA102 {
+            super::request_firmware(dev, chipset, "gen_bootloader", FIRMWARE_VERSION)
+        } else {
+            Err(ENOENT)
+        };
+
+        let gbl = match gbl_fw {
+            Ok(fw) => {
+                let hdr = fw
+                    .data()
+                    .get(0..size_of::<BinHdr>())
+                    .and_then(BinHdr::from_bytes_copy)
+                    .ok_or(EINVAL)?;
+
+                let desc_offset = usize::from_safe_cast(hdr.header_offset);
+                let desc = fw
+                    .data()
+                    .get(desc_offset..desc_offset + size_of::<BootloaderDesc>())
+                    .and_then(BootloaderDesc::from_bytes_copy)
+                    .ok_or(EINVAL)?;
+
+                let ucode_start = usize::from_safe_cast(hdr.data_offset);
+                let ucode_size = usize::from_safe_cast(hdr.data_size);
+                let ucode_data = fw
+                    .data()
+                    .get(ucode_start..ucode_start + ucode_size)
+                    .ok_or(EINVAL)?;
+
+                let mut ucode = KVec::new();
+                ucode.extend_from_slice(ucode_data, GFP_KERNEL)?;
+
+                Some(GenericBootloader { desc, ucode })
+            }
+            Err(_) => None,
+        };
+
         Ok(FwsecFirmware {
-            desc: desc,
+            desc,
             ucode: ucode_signed,
+            gen_bootloader: gbl,
         })
     }
 
@@ -449,9 +573,17 @@ pub(crate) fn run(
         falcon
             .reset(bar)
             .inspect_err(|e| dev_err!(dev, "Failed to reset GSP falcon: {:?}\n", e))?;
-        falcon
-            .dma_load(bar, self)
-            .inspect_err(|e| dev_err!(dev, "Failed to load FWSEC firmware: {:?}\n", e))?;
+
+        // If the Generic Bootloader was found, then upload it via PIO , otherwise
+        if let Some(ref gbl) = self.gen_bootloader {
+            falcon
+                .pio_load(bar, self, Some(gbl))
+                .inspect_err(|e| dev_err!(dev, "Failed to load FWSEC firmware: {:?}\n", e))?;
+        } else {
+            falcon
+                .dma_load(bar, self)
+                .inspect_err(|e| dev_err!(dev, "Failed to load FWSEC firmware: {:?}\n", e))?;
+        }
         let (mbox0, _) = falcon
             .boot(bar, Some(0), None)
             .inspect_err(|e| dev_err!(dev, "Failed to boot FWSEC firmware: {:?}\n", e))?;
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 54937606b5b0..fda01afda9ed 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -48,6 +48,7 @@ impl super::Gsp {
     /// created the WPR2 region.
     fn run_fwsec_frts(
         dev: &device::Device<device::Bound>,
+        chipset: Chipset,
         falcon: &Falcon<Gsp>,
         bar: &Bar0,
         bios: &Vbios,
@@ -65,6 +66,7 @@ fn run_fwsec_frts(
 
         let fwsec_frts = FwsecFirmware::new(
             dev,
+            chipset,
             falcon,
             bar,
             bios,
@@ -147,7 +149,7 @@ pub(crate) fn boot(
         let fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
         dev_dbg!(dev, "{:#x?}\n", fb_layout);
 
-        Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;
+        Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
 
         let booter_loader = BooterFirmware::new(
             dev,
@@ -186,7 +188,11 @@ pub(crate) fn boot(
         );
 
         sec2_falcon.reset(bar)?;
-        sec2_falcon.dma_load(bar, &booter_loader)?;
+        if sec2_falcon.supports_dma() {
+            sec2_falcon.dma_load(bar, &booter_loader)?;
+        } else {
+            sec2_falcon.pio_load(bar, &booter_loader, None)?;
+        }
         let wpr_handle = wpr_meta.dma_handle();
         let (mbox0, mbox1) = sec2_falcon.boot(
             bar,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 8188a566e1ae..c2fbb0299200 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -359,6 +359,36 @@ pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
     1:1     startcpu as bool;
 });
 
+// IMEM access control register. Up to 4 ports are available for IMEM access.
+register!(NV_PFALCON_FALCON_IMEMC @ PFalconBase[0x00000180[4; 16]] {
+    15:0      offs as u16, "IMEM block and word offset";
+    24:24     aincw as bool, "Auto-increment on write";
+    28:28     secure as bool, "Access secure IMEM";
+});
+
+// IMEM data register. Reading/writing this register accesses IMEM at the address
+// specified by the corresponding IMEMC register.
+register!(NV_PFALCON_FALCON_IMEMD @ PFalconBase[0x00000184[4; 16]] {
+    31:0      data as u32;
+});
+
+// IMEM tag register. Used to set the tag for the current IMEM block.
+register!(NV_PFALCON_FALCON_IMEMT @ PFalconBase[0x00000188[4; 16]] {
+    15:0      tag as u16;
+});
+
+// DMEM access control register. Up to 8 ports are available for DMEM access.
+register!(NV_PFALCON_FALCON_DMEMC @ PFalconBase[0x000001c0[8; 8]] {
+    15:0      offs as u16, "DMEM block and word offset";
+    24:24     aincw as bool, "Auto-increment on write";
+});
+
+// DMEM data register. Reading/writing this register accesses DMEM at the address
+// specified by the corresponding DMEMC register.
+register!(NV_PFALCON_FALCON_DMEMD @ PFalconBase[0x000001c4[8; 8]] {
+    31:0      data as u32;
+});
+
 // Actually known as `NV_PSEC_FALCON_ENGINE` and `NV_PGSP_FALCON_ENGINE` depending on the falcon
 // instance.
 register!(NV_PFALCON_FALCON_ENGINE @ PFalconBase[0x000003c0] {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma()
  2025-12-18  3:29 ` [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
@ 2025-12-18  7:48   ` Alexandre Courbot
  2025-12-19 12:49     ` Alexandre Courbot
  0 siblings, 1 reply; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-18  7:48 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
> Some GPUs do not support using DMA to transfer code/data from system
> memory to Falcon memory, and instead must use programmed I/O (PIO).
> Add a function to the Falcon HAL to indicate whether a given GPU's
> Falcons support DMA for this purpose.
>
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs           | 7 +++++++
>  drivers/gpu/nova-core/falcon/hal.rs       | 3 +++
>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 4 ++++
>  drivers/gpu/nova-core/falcon/hal/tu102.rs | 4 ++++
>  4 files changed, 18 insertions(+)
>
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index 6b54c0ca458a..50c87dadf2ea 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -630,6 +630,13 @@ pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
>          self.hal.is_riscv_active(bar)
>      }
>  
> +    /// Check if this Falcon supports DMA for loading firmware.
> +    ///
> +    /// Returns `true` if DMA is supported, `false` if PIO must be used instead.
> +    pub(crate) fn supports_dma(&self) -> bool {
> +        self.hal.supports_dma()
> +    }

Rather than this, we should make the selection of the loading method
transparent to the caller. Right now `boot.rs` needs to calls
`supports_dma` itself and choose the right method, which leaves room for
error.

I'd suggest the following:

- A `fn load(bar, fw) -> Result` method in the Falcon HAL that redirects
  to either `load_dma` or `load_pio`.
- A similarly-named public function in `Falcon` for the boot code to
  call,
- `load_dma` and `load_pio` becomes private - that way the HAL can call
  them, but not the user code.

Now we have the problem that the generic bootloader argument is only
useful for PIO, but the way it is handled is a bit too ad-hoc to my
taste. I wonder if we could pass a different `FalconFirmware` structure
when it is used so the same code can do the correct thing.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 09/11] gpu: nova-core: add FalconUCodeDescV2 support
  2025-12-18  3:29 ` [PATCH v4 09/11] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
@ 2025-12-18  8:01   ` Alexandre Courbot
  0 siblings, 0 replies; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-18  8:01 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
<snip>
> +impl FalconUCodeDesc {
> +    // Return trait object, the only match needed.
> +    pub(crate) fn as_descriptor(&self) -> &dyn FalconUCodeDescriptor {
> +        match self {
> +            FalconUCodeDesc::V2(v2) => v2,
> +            FalconUCodeDesc::V3(v3) => v3,
> +        }
> +    }
> +
>      /// Returns the size in bytes of the header.
>      pub(crate) fn size(&self) -> usize {
> +        let hdr = self.as_descriptor().hdr();
> +
>          const HDR_SIZE_SHIFT: u32 = 16;
>          const HDR_SIZE_MASK: u32 = 0xffff0000;
> +        ((hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT).into_safe_cast()
> +    }
> +
> +    pub(crate) fn imem_load_size(&self) -> u32 {
> +        self.as_descriptor().imem_load_size()
> +    }
> +
> +    pub(crate) fn interface_offset(&self) -> u32 {
> +        self.as_descriptor().interface_offset()
> +    }
> +
> +    pub(crate) fn dmem_load_size(&self) -> u32 {
> +        self.as_descriptor().dmem_load_size()
> +    }
> +
> +    pub(crate) fn pkc_data_offset(&self) -> u32 {
> +        self.as_descriptor().pkc_data_offset()
> +    }
> +
> +    pub(crate) fn engine_id_mask(&self) -> u16 {
> +        self.as_descriptor().engine_id_mask()
> +    }
> +
> +    pub(crate) fn ucode_id(&self) -> u8 {
> +        self.as_descriptor().ucode_id()
> +    }
> +
> +    pub(crate) fn signature_count(&self) -> u8 {
> +        self.as_descriptor().signature_count()
> +    }
>  
> -        ((self.hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT).into_safe_cast()
> +    pub(crate) fn signature_versions(&self) -> u16 {
> +        self.as_descriptor().signature_versions()
>      }
>  }

I wanted to make this comment on v3, but took the time to write some
code to validate the idea so this has slipped to this version. :)

This whole impl block becomes unneeded if you leverage `Deref` as a
replacement for `as_descriptor`:

    impl Deref for FalconUCodeDesc {
        type Target = dyn FalconUCodeDescriptor;

        fn deref(&self) -> &Self::Target {
            match self {
                FalconUCodeDesc::V2(v2) => v2,
                FalconUCodeDesc::V3(v3) => v3,
            }
        }
    }

What this does is that it makes all the methods of
`FalconUCodeDescriptor` available automagically whenever the `.`
dereference operator it used, which is basically what these proxy
methods did.

Then the only remaining method, `size`, can be moved as a default method
of `FalconUCodeDescriptor` itself since all it does is call other
methods from that trait.

>  
> diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
> index e4009faba6c5..1c1dcdacf5f5 100644
> --- a/drivers/gpu/nova-core/firmware/fwsec.rs
> +++ b/drivers/gpu/nova-core/firmware/fwsec.rs
> @@ -40,7 +40,7 @@
>          FalconLoadTarget, //
>      },
>      firmware::{
> -        FalconUCodeDescV3,
> +        FalconUCodeDesc,
>          FirmwareDmaObject,
>          FirmwareSignature,
>          Signed,
> @@ -218,38 +218,59 @@ unsafe fn transmute_mut<T: Sized + FromBytes + AsBytes>(
>  /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
>  pub(crate) struct FwsecFirmware {
>      /// Descriptor of the firmware.
> -    desc: FalconUCodeDescV3,
> +    desc: FalconUCodeDesc,
>      /// GPU-accessible DMA object containing the firmware.
>      ucode: FirmwareDmaObject<Self, Signed>,
>  }
>  
>  impl FalconLoadParams for FwsecFirmware {
>      fn imem_sec_load_params(&self) -> FalconLoadTarget {
> -        FalconLoadTarget {
> -            src_start: 0,
> -            dst_start: self.desc.imem_phys_base,
> -            len: self.desc.imem_load_size,
> +        match &self.desc {
> +            FalconUCodeDesc::V2(v2) => FalconLoadTarget {
> +                src_start: 0,
> +                dst_start: v2.imem_sec_base,
> +                len: v2.imem_sec_size,
> +            },
> +            FalconUCodeDesc::V3(v3) => FalconLoadTarget {
> +                src_start: 0,
> +                dst_start: v3.imem_phys_base,
> +                len: v3.imem_load_size,
> +            },
>          }
>      }
>  
>      fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
> -        // Only used on Turing and GA100, so return None for now
> -        None
> +        match &self.desc {
> +            FalconUCodeDesc::V2(v2) => Some(FalconLoadTarget {
> +                src_start: 0,
> +                dst_start: v2.imem_phys_base,
> +                len: v2.imem_load_size - v2.imem_sec_size,
> +            }),
> +            // Not used on V3 platforms
> +            FalconUCodeDesc::V3(_v3) => None,
> +        }
>      }
>  
>      fn dmem_load_params(&self) -> FalconLoadTarget {
> -        FalconLoadTarget {
> -            src_start: self.desc.imem_load_size,
> -            dst_start: self.desc.dmem_phys_base,
> -            len: self.desc.dmem_load_size,
> +        match &self.desc {
> +            FalconUCodeDesc::V2(v2) => FalconLoadTarget {
> +                src_start: v2.dmem_offset,
> +                dst_start: v2.dmem_phys_base,
> +                len: v2.dmem_load_size,
> +            },
> +            FalconUCodeDesc::V3(v3) => FalconLoadTarget {
> +                src_start: v3.imem_load_size,
> +                dst_start: v3.dmem_phys_base,
> +                len: v3.dmem_load_size,
> +            },
>          }
>      }
>  
>      fn brom_params(&self) -> FalconBromParams {
>          FalconBromParams {
> -            pkc_data_offset: self.desc.pkc_data_offset,
> -            engine_id_mask: self.desc.engine_id_mask,
> -            ucode_id: self.desc.ucode_id,
> +            pkc_data_offset: self.desc.pkc_data_offset(),
> +            engine_id_mask: self.desc.engine_id_mask(),
> +            ucode_id: self.desc.ucode_id(),
>          }

For `brom_params` you are calling the virtual methods of
`FalconUCodeDescriptor`, but on the other methods you are doing a match.
I guess the reason is that using virtual methods all the way would make
`FalconUCodeDescriptor` grow some more, but still we would like to avoid
the match here.

You can achieve this by adding the following methods to
`FalconUCodeDescriptor`:

    fn imem_sec_load_params(&self) -> FalconLoadTarget;
    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget>;
    fn dmem_load_params(&self) -> FalconLoadTarget;
    fn brom_params(&self) -> FalconBromParams;

And implementing them appropriately for the V2 and V3 headers. Then the
methods of the impl block above can just become:

    fn imem_sec_load_params(&self) -> FalconLoadTarget {
        self.desc.imem_sec_load_params()
    }

    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
        self.desc.imem_ns_load_params()
    }

    fn dmem_load_params(&self) -> FalconLoadTarget {
        self.desc.dmem_load_params()
    }

    fn brom_params(&self) -> FalconBromParams {
        self.desc.brom_params()
    }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
  2025-12-18  3:29 ` [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
@ 2025-12-18 11:24   ` Alexandre Courbot
  2025-12-31 21:35   ` John Hubbard
  1 sibling, 0 replies; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-18 11:24 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
> The with_falcon_mem() method initializes the 'imem' and 'sec' fields of
> the NV_PFALCON_FALCON_DMATRFCMD register based on the value of
> the FalconMem type.
>
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs | 16 ++++------------
>  drivers/gpu/nova-core/regs.rs   |  9 +++++++++
>  2 files changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index b92152277d00..44a1a531a361 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -454,7 +454,6 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>          fw: &F,
>          target_mem: FalconMem,
>          load_offsets: FalconLoadTarget,
> -        sec: bool,
>      ) -> Result {
>          const DMA_LEN: u32 = 256;
>  
> @@ -516,8 +515,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>  
>          let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
>              .set_size(DmaTrfCmdSize::Size256B)
> -            .set_imem(target_mem != FalconMem::Dmem)
> -            .set_sec(if sec { 1 } else { 0 });
> +            .with_falcon_mem(target_mem);
>  
>          for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
>              // Perform a transfer of size `DMA_LEN`.
> @@ -551,21 +549,15 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
>                  .set_mem_type(FalconFbifMemType::Physical)
>          });
>  
> -        self.dma_wr(
> -            bar,
> -            fw,
> -            FalconMem::ImemSecure,
> -            fw.imem_sec_load_params(),
> -            true,
> -        )?;
> -        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
> +        self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params())?;
> +        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params())?;
>  
>          if let Some(nmem) = fw.imem_ns_load_params() {
>              // This code should never actual get executed, because the Non-Secure
>              // section only exists on firmware used by Turing and GA100, and
>              // those platforms do not use DMA.  But we include this code for
>              // consistency.
> -            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
> +            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem)?;
>          }
>  
>          self.hal.program_brom(self, bar, &fw.brom_params())?;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 82cc6c0790e5..c049f5aaf2f2 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -16,6 +16,7 @@
>          FalconCoreRevSubversion,
>          FalconFbifMemType,
>          FalconFbifTarget,
> +        FalconMem,
>          FalconModSelAlgo,
>          FalconSecurityModel,
>          PFalcon2Base,
> @@ -325,6 +326,14 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
>      16:16   set_dmtag as u8;
>  });
>  
> +impl NV_PFALCON_FALCON_DMATRFCMD {
> +    /// Programs the 'imem' and 'sec' fields for the given FalconMem

Nit: should be `imem` and `sec` so rustdoc properly formats them.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL
  2025-12-18  3:29 ` [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL Timur Tabi
@ 2025-12-18 11:25   ` Alexandre Courbot
  2025-12-31 22:07   ` John Hubbard
  1 sibling, 0 replies; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-18 11:25 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
> A few Falcon methods are actually GPU-specific, so move them
> into the HAL.
>
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs           | 45 ++---------------------
>  drivers/gpu/nova-core/falcon/hal.rs       | 10 +++++
>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 43 ++++++++++++++++++++++
>  3 files changed, 56 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index 44a1a531a361..6b54c0ca458a 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -13,7 +13,6 @@
>      prelude::*,
>      sync::aref::ARef,
>      time::{
> -        delay::fsleep,
>          Delta, //
>      },
>  };
> @@ -394,48 +393,11 @@ pub(crate) fn dma_reset(&self, bar: &Bar0) {
>          regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, &E::ID);
>      }
>  
> -    /// Wait for memory scrubbing to complete.
> -    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
> -        // TIMEOUT: memory scrubbing should complete in less than 20ms.
> -        read_poll_timeout(
> -            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> -            |r| r.mem_scrubbing_done(),
> -            Delta::ZERO,
> -            Delta::from_millis(20),
> -        )
> -        .map(|_| ())
> -    }
> -
> -    /// Reset the falcon engine.
> -    fn reset_eng(&self, bar: &Bar0) -> Result {
> -        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
> -
> -        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
> -        // RESET_READY so a non-failing timeout is used.
> -        let _ = read_poll_timeout(
> -            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> -            |r| r.reset_ready(),
> -            Delta::ZERO,
> -            Delta::from_micros(150),
> -        );
> -
> -        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
> -
> -        // TIMEOUT: falcon engine should not take more than 10us to reset.
> -        fsleep(Delta::from_micros(10));
> -
> -        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
> -
> -        self.reset_wait_mem_scrubbing(bar)?;
> -
> -        Ok(())
> -    }
> -
>      /// Reset the controller, select the falcon core, and wait for memory scrubbing to complete.
>      pub(crate) fn reset(&self, bar: &Bar0) -> Result {
> -        self.reset_eng(bar)?;
> +        self.hal.reset_eng(bar)?;
>          self.hal.select_core(self, bar)?;
> -        self.reset_wait_mem_scrubbing(bar)?;
> +        self.hal.reset_wait_mem_scrubbing(bar)?;
>  
>          regs::NV_PFALCON_FALCON_RM::default()
>              .set_value(regs::NV_PMC_BOOT_0::read(bar).into())
> @@ -665,8 +627,7 @@ pub(crate) fn signature_reg_fuse_version(
>      ///
>      /// Returns `true` if the RISC-V core is active, `false` otherwise.
>      pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
> -        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
> -        cpuctl.active_stat()
> +        self.hal.is_riscv_active(bar)
>      }
>  
>      /// Write the application version to the OS register.
> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
> index 8dc56a28ad65..c77a1568ea96 100644
> --- a/drivers/gpu/nova-core/falcon/hal.rs
> +++ b/drivers/gpu/nova-core/falcon/hal.rs
> @@ -37,6 +37,16 @@ fn signature_reg_fuse_version(
>  
>      /// Program the boot ROM registers prior to starting a secure firmware.
>      fn program_brom(&self, falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result;
> +
> +    /// Check if the RISC-V core is active.
> +    /// Returns `true` if the RISC-V core is active, `false` otherwise.
> +    fn is_riscv_active(&self, bar: &Bar0) -> bool;
> +
> +    /// Wait for memory scrubbing to complete.
> +    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result;
> +
> +    /// Reset the falcon engine.
> +    fn reset_eng(&self, bar: &Bar0) -> Result;
>  }
>  
>  /// Returns a boxed falcon HAL adequate for `chipset`.
> diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
> index 69a7a95cac16..232d51a4921f 100644
> --- a/drivers/gpu/nova-core/falcon/hal/ga102.rs
> +++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
> @@ -6,6 +6,7 @@
>      device,
>      io::poll::read_poll_timeout,
>      prelude::*,
> +    time::delay::fsleep,
>      time::Delta, //
>  };
>  
> @@ -117,4 +118,46 @@ fn signature_reg_fuse_version(
>      fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
>          program_brom_ga102::<E>(bar, params)
>      }
> +
> +    fn is_riscv_active(&self, bar: &Bar0) -> bool {
> +        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
> +        cpuctl.active_stat()
> +    }
> +
> +    /// Wait for memory scrubbing to complete.

This doccomment is already in the HAL trait, so no need to repeat it
here.

> +    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
> +        // TIMEOUT: memory scrubbing should complete in less than 20ms.
> +        read_poll_timeout(
> +            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> +            |r| r.mem_scrubbing_done(),
> +            Delta::ZERO,
> +            Delta::from_millis(20),
> +        )
> +        .map(|_| ())
> +    }
> +
> +    /// Reset the falcon engine.

Ditto.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-18  3:29 ` [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
@ 2025-12-18 11:54   ` Alexandre Courbot
  2025-12-18 22:51     ` Timur Tabi
  0 siblings, 1 reply; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-18 11:54 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
> On Turing and GA100 (i.e. the versions that use Libos v2), GSP-RM insists
> that the 'size' parameter of the LibosMemoryRegionInitArgument struct be
> aligned to 4KB.  The logging buffers are already aligned to that size, so
> only the GSP_ARGUMENTS_CACHED struct needs to be adjusted.  Make that
> adjustment by adding padding to the end of the struct.
>
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/fw.rs | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index abffd6beec65..ab3ad038889c 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -889,17 +889,27 @@ unsafe impl AsBytes for GspMsgElement {}
>  unsafe impl FromBytes for GspMsgElement {}
>  
>  /// Arguments for GSP startup.
> -#[repr(transparent)]
> -pub(crate) struct GspArgumentsCached(bindings::GSP_ARGUMENTS_CACHED);
> +///
> +/// On Turing and GA100, the entries in the `LibosMemoryRegionInitArgument`
> +/// must all be a multiple of GSP_PAGE_SIZE in size, so add padding to force it
> +/// to that size.
> +#[repr(C)]
> +pub(crate) struct GspArgumentsCached(
> +    bindings::GSP_ARGUMENTS_CACHED,
> +    [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
> +);

In gsp.rs we are still initializing the rmarg as follows:

    dma_write!(rmargs[0] = fw::GspArgumentsCached::new(&cmdq))?;

Which passes the `GspArgumentsCached` queue by value to
`CoherentAllocation::field_write`, i.e. 4KB on the stack.

So I think the proper approach is to keep `GspArgumentsCached` as-is,
and use a different type just for allocation:

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index fb6f74797178..0feaff5784a7 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -25,10 +25,9 @@
 };

 use crate::{
-    gsp::cmdq::Cmdq,
-    gsp::fw::{
-        GspArgumentsCached,
-        LibosMemoryRegionInitArgument, //
+    gsp::{
+        cmdq::Cmdq,
+        fw::LibosMemoryRegionInitArgument, //
     },
     num,
 };
@@ -114,7 +113,7 @@ pub(crate) struct Gsp {
     /// Command queue.
     pub(crate) cmdq: Cmdq,
     /// RM arguments.
-    rmargs: CoherentAllocation<GspArgumentsCached>,
+    rmargs: CoherentAllocation<fw::GspArgumentsAligned>,
 }

 impl Gsp {
@@ -141,12 +140,12 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> Result<impl PinInit<Self

         let cmdq = Cmdq::new(dev)?;

-        let rmargs = CoherentAllocation::<GspArgumentsCached>::alloc_coherent(
+        let rmargs = CoherentAllocation::<fw::GspArgumentsAligned>::alloc_coherent(
             dev,
             1,
             GFP_KERNEL | __GFP_ZERO,
         )?;
-        dma_write!(rmargs[0] = fw::GspArgumentsCached::new(&cmdq))?;
+        dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
         dma_write!(libos[3] = LibosMemoryRegionInitArgument::new("RMARGS", &rmargs))?;

         Ok(try_pin_init!(Self {
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index abffd6beec65..15ca9c183ae1 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -906,9 +906,21 @@ pub(crate) fn new(cmdq: &Cmdq) -> Self {
 // SAFETY: Padding is explicit and will not contain uninitialized data.
 unsafe impl AsBytes for GspArgumentsCached {}

+/// On Turing and GA100, the entries in the `LibosMemoryRegionInitArgument`
+/// must all be a multiple of GSP_PAGE_SIZE in size, so add padding to force it
+/// to that size.
+#[repr(C)]
+pub(crate) struct GspArgumentsAligned {
+    pub(crate) inner: GspArgumentsCached,
+    _padding: [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
+}
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for GspArgumentsAligned {}
+
 // SAFETY: This struct only contains integer types for which all bit patterns
 // are valid.
-unsafe impl FromBytes for GspArgumentsCached {}
+unsafe impl FromBytes for GspArgumentsAligned {}

 /// Init arguments for the message queue.
 #[repr(transparent)]

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images
  2025-12-18  3:29 ` [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
@ 2025-12-18 12:59   ` Alexandre Courbot
  2025-12-30 22:56     ` Timur Tabi
  2025-12-31 21:30     ` Timur Tabi
  0 siblings, 2 replies; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-18 12:59 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
<snip>
> +    fn pio_wr<F: FalconFirmware<Target = E>>(
> +        &self,
> +        bar: &Bar0,
> +        fw: &F,
> +        target_mem: FalconMem,
> +        load_offsets: &FalconLoadTarget,
> +        port: u8,
> +        tag: u16,
> +    ) -> Result {
> +        let start = usize::from_safe_cast(load_offsets.src_start);
> +        let len = usize::from_safe_cast(load_offsets.len);
> +        let mem_base = u16::try_from(load_offsets.dst_start)?;
> +
> +        // SAFETY: as_slice() ensures that start+len is within range

That's not the safety concern - check the documentation for `as_slice`.
We need to ensure that there won't be any concurrent access to the DMA
object. Since we are the only user of it at this stage, that's a
guarantee we can indeed provide.

> +        let data = unsafe { fw.as_slice(start, len).map_err(|_| EINVAL)? };
> +
> +        self.pio_wr_bytes(bar, data, mem_base, target_mem, port, tag)
> +    }
> +
> +    /// Perform a PIO copy into `IMEM` and `DMEM` of `fw`, and prepare the falcon to run it.
> +    pub(crate) fn pio_load<F: FalconFirmware<Target = E>>(
> +        &self,
> +        bar: &Bar0,
> +        fw: &F,
> +        gbl: Option<&GenericBootloader>,
> +    ) -> Result {
> +        let imem_sec = fw.imem_sec_load_params();
> +        let imem_ns = fw.imem_ns_load_params().ok_or(EINVAL)?;
> +        let dmem = fw.dmem_load_params();
> +
> +        regs::NV_PFALCON_FBIF_CTL::read(bar, &E::ID)
> +            .set_allow_phys_no_ctx(true)
> +            .write(bar, &E::ID);
> +
> +        regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, &E::ID);
> +
> +        // If the Generic Bootloader was passed, then use it to boot FRTS
> +        if let Some(gbl) = gbl {
> +            let dst_start = u16::try_from(0x10000 - gbl.desc.code_size)?;
> +            let data = &gbl.ucode[..usize::from_safe_cast(gbl.desc.code_size)];
> +            let tag = u16::try_from(gbl.desc.start_tag)?;
> +
> +            self.pio_wr_bytes(bar, data, dst_start, FalconMem::ImemNonSecure, 0, tag)?;
> +
> +            // This structure tells the generic bootloader where to find the FWSEC
> +            // image.
> +            let dmem_desc = BootloaderDmemDescV2 {
> +                reserved: [0; 4],
> +                signature: [0; 4],
> +                ctx_dma: 4, // FALCON_DMAIDX_PHYS_SYS_NCOH
> +                code_dma_base: fw.dma_handle(),
> +                non_sec_code_off: imem_ns.dst_start,
> +                non_sec_code_size: imem_ns.len,
> +                sec_code_off: imem_sec.dst_start,
> +                sec_code_size: imem_sec.len,
> +                code_entry_point: 0,
> +                data_dma_base: fw.dma_handle() + u64::from(dmem.src_start),
> +                data_size: dmem.len,
> +                argc: 0,
> +                argv: 0,
> +            };
> +
> +            regs::NV_PFALCON_FBIF_TRANSCFG::update(bar, &E::ID, 4, |v| {
> +                v.set_target(FalconFbifTarget::CoherentSysmem)
> +                    .set_mem_type(FalconFbifMemType::Physical)
> +            });
> +
> +            self.pio_wr_bytes(bar, dmem_desc.as_bytes(), 0, FalconMem::Dmem, 0, 0)?;

So this `if` branch is really special-casing the generic bootloader. But
at the end of the day it just does these things:

- Write an `ImemNonSecure` section,
- Write an `Dmem` section,
- Program the `TRANSCFG` register so the bootloader can initiate the DMA
  transfer.

The first two steps can be expressed as a set of `FalconLoadTarget`s.
That way they can be handled by the non-generic-bootloader path, and we
can remove the `gbl` argument.

So `FwsecFirmware` could have an optional member that contains both the
generic bootloader and the `BootloaderDmemDescV2` corresponding to it.
If that optional member is `Some`, then it returns the `FalconLoadTarget`s
corresponding to the generic bootloader. Otherwise, it behaves as
before.

Interestingly there is no `ImemSecure` section to write so I guess we
will have to make `imem_sec_load_params` return an `Option` as well.

And `NV_PFALCON_FBIF_TRANSCFG` is always programmed as the worst thing
that can happen is that we don't use the DMA engine if there is no
generic bootloader.

> +        } else {
> +            self.pio_wr(
> +                bar,
> +                fw,
> +                FalconMem::ImemNonSecure,
> +                &imem_ns,
> +                0,
> +                u16::try_from(imem_ns.dst_start >> 8)?,
> +            )?;
> +            self.pio_wr(
> +                bar,
> +                fw,
> +                FalconMem::ImemSecure,
> +                &imem_sec,
> +                0,
> +                u16::try_from(imem_sec.dst_start >> 8)?,
> +            )?;
> +            self.pio_wr(bar, fw, FalconMem::Dmem, &dmem, 0, 0)?;
> +        }
> +
> +        self.hal.program_brom(self, bar, &fw.brom_params())?;
> +
> +        // Set `BootVec` to start of non-secure code.
> +        regs::NV_PFALCON_FALCON_BOOTVEC::default()
> +            .set_value(fw.boot_addr())
> +            .write(bar, &E::ID);
> +
> +        Ok(())
> +    }
> +
>      /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
>      /// `target_mem`.
>      ///
> diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
> index 44897feb82a4..26efbf4f6760 100644
> --- a/drivers/gpu/nova-core/firmware.rs
> +++ b/drivers/gpu/nova-core/firmware.rs
> @@ -31,7 +31,7 @@
>  pub(crate) const FIRMWARE_VERSION: &str = "570.144";
>  
>  /// Requests the GPU firmware `name` suitable for `chipset`, with version `ver`.
> -fn request_firmware(
> +pub(crate) fn request_firmware(

There is no need to change the visibility of this function.

>      dev: &device::Device,
>      chipset: gpu::Chipset,
>      name: &str,
> @@ -321,7 +321,7 @@ fn no_patch_signature(self) -> FirmwareDmaObject<F, Signed> {
>  /// Header common to most firmware files.
>  #[repr(C)]
>  #[derive(Debug, Clone)]
> -struct BinHdr {
> +pub(crate) struct BinHdr {

Same for this type.

>      /// Magic number, must be `0x10de`.
>      bin_magic: u32,
>      /// Version of the header.
> diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
> index 1c1dcdacf5f5..4c26257bbfe0 100644
> --- a/drivers/gpu/nova-core/firmware/fwsec.rs
> +++ b/drivers/gpu/nova-core/firmware/fwsec.rs
> @@ -40,12 +40,15 @@
>          FalconLoadTarget, //
>      },
>      firmware::{
> +        BinHdr,
>          FalconUCodeDesc,
>          FirmwareDmaObject,
>          FirmwareSignature,
>          Signed,
>          Unsigned, //
> +        FIRMWARE_VERSION,
>      },
> +    gpu::Chipset,
>      num::{
>          FromSafeCast,
>          IntoSafeCast, //
> @@ -213,6 +216,72 @@ unsafe fn transmute_mut<T: Sized + FromBytes + AsBytes>(
>      T::from_bytes_mut(unsafe { fw.as_slice_mut(offset, size_of::<T>())? }).ok_or(EINVAL)
>  }
>  
> +/// Descriptor used by RM to figure out the requirements of the boot loader.
> +#[repr(C)]
> +#[derive(Debug, Clone)]
> +pub(crate) struct BootloaderDesc {
> +    /// Starting tag of bootloader.
> +    pub start_tag: u32,
> +    /// DMEM offset where [`BootloaderDmemDescV2`] is to be loaded.
> +    pub dmem_load_off: u32,
> +    /// Offset of code section in the image.
> +    pub code_off: u32,
> +    /// Size of code section in the image.
> +    pub code_size: u32,
> +    /// Offset of data section in the image.
> +    pub data_off: u32,
> +    /// Size of data section in the image.
> +    pub data_size: u32,
> +}
> +// SAFETY: any byte sequence is valid for this struct.
> +unsafe impl FromBytes for BootloaderDesc {}
> +// SAFETY: This struct doesn't contain uninitialized bytes and doesn't have interior mutability.
> +unsafe impl AsBytes for BootloaderDesc {}

We only need to implement `FromBytes` for this type, `AsBytes` is not
needed.

> +
> +/// Structure used by the boot-loader to load the rest of the code.
> +///
> +/// This has to be filled by the GPU driver and copied into DMEM at offset
> +/// [`BootloaderDesc.dmem_load_off`].
> +#[repr(C, packed)]
> +#[derive(Debug, Clone)]
> +pub(crate) struct BootloaderDmemDescV2 {
> +    /// Reserved, should always be first element.
> +    pub reserved: [u32; 4],
> +    /// 16B signature for secure code, 0s if no secure code.
> +    pub signature: [u32; 4],
> +    /// DMA context used by the bootloader while loading code/data.
> +    pub ctx_dma: u32,
> +    /// 256B-aligned physical FB address where code is located.
> +    pub code_dma_base: u64,
> +    /// Offset from `code_dma_base` where the non-secure code is located (must be multiple of 256).
> +    pub non_sec_code_off: u32,
> +    /// Size of the non-secure code part.
> +    pub non_sec_code_size: u32,
> +    /// Offset from `code_dma_base` where the secure code is located (must be multiple of 256).
> +    pub sec_code_off: u32,
> +    /// Size of the secure code part.
> +    pub sec_code_size: u32,
> +    /// Code entry point invoked by the bootloader after code is loaded.
> +    pub code_entry_point: u32,
> +    /// 256B-aligned physical FB address where data is located.
> +    pub data_dma_base: u64,
> +    /// Size of data block (should be multiple of 256B).
> +    pub data_size: u32,
> +    /// Arguments to be passed to the target firmware being loaded.
> +    pub argc: u32,
> +    /// Number of arguments to be passed to the target firmware being loaded.
> +    pub argv: u32,
> +}
> +// SAFETY: any byte sequence is valid for this struct.
> +unsafe impl FromBytes for BootloaderDmemDescV2 {}
> +// SAFETY: This struct doesn't contain uninitialized bytes and doesn't have interior mutability.
> +unsafe impl AsBytes for BootloaderDmemDescV2 {}

Here we can do without `FromBytes`.

> +
> +pub(crate) struct GenericBootloader {
> +    pub desc: BootloaderDesc,
> +    pub ucode: Vec<u8, kernel::alloc::allocator::Kmalloc>,
> +}
> +
>  /// The FWSEC microcode, extracted from the BIOS and to be run on the GSP falcon.
>  ///
>  /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
> @@ -221,6 +290,8 @@ pub(crate) struct FwsecFirmware {
>      desc: FalconUCodeDesc,
>      /// GPU-accessible DMA object containing the firmware.
>      ucode: FirmwareDmaObject<Self, Signed>,
> +    /// Generic bootloader
> +    gen_bootloader: Option<GenericBootloader>,
>  }
>  
>  impl FalconLoadParams for FwsecFirmware {
> @@ -275,7 +346,19 @@ fn brom_params(&self) -> FalconBromParams {
>      }
>  
>      fn boot_addr(&self) -> u32 {
> -        0
> +        match &self.desc {
> +            FalconUCodeDesc::V2(_v2) => {
> +                // On V2 platforms, the boot address is extracted from the
> +                // generic bootloader, because the gbl is what actually copies
> +                // FWSEC into memory, so that is what needs to be booted.
> +                if let Some(ref gbl) = self.gen_bootloader {
> +                    gbl.desc.start_tag << 8
> +                } else {
> +                    0
> +                }
> +            }
> +            FalconUCodeDesc::V3(_v3) => 0,
> +        }
>      }
>  }
>  
> @@ -376,6 +459,7 @@ impl FwsecFirmware {
>      /// command.
>      pub(crate) fn new(
>          dev: &Device<device::Bound>,
> +        chipset: Chipset,
>          falcon: &Falcon<Gsp>,
>          bar: &Bar0,
>          bios: &Vbios,
> @@ -432,9 +516,49 @@ pub(crate) fn new(
>              ucode_dma.no_patch_signature()
>          };
>  
> +        // The Generic Bootloader exists only on Turing and GA100.  To avoid a bogus
> +        // console error message on other platforms, only try to load it if it's
> +        // supposed to be there.
> +        let gbl_fw = if chipset < Chipset::GA102 {
> +            super::request_firmware(dev, chipset, "gen_bootloader", FIRMWARE_VERSION)
> +        } else {
> +            Err(ENOENT)
> +        };

Using `Err` to indicate no firmware means that we will proceed even if
`request_firmware` returns an error. This should be:

    let gbl_fw = if chipset < Chipset::GA102 {
        Some(super::request_firmware(dev, chipset, "gen_bootloader", FIRMWARE_VERSION)?)
    } else {
        None
    };

> +
> +        let gbl = match gbl_fw {
> +            Ok(fw) => {
> +                let hdr = fw
> +                    .data()
> +                    .get(0..size_of::<BinHdr>())
> +                    .and_then(BinHdr::from_bytes_copy)
> +                    .ok_or(EINVAL)?;
> +
> +                let desc_offset = usize::from_safe_cast(hdr.header_offset);
> +                let desc = fw
> +                    .data()
> +                    .get(desc_offset..desc_offset + size_of::<BootloaderDesc>())
> +                    .and_then(BootloaderDesc::from_bytes_copy)
> +                    .ok_or(EINVAL)?;
> +
> +                let ucode_start = usize::from_safe_cast(hdr.data_offset);
> +                let ucode_size = usize::from_safe_cast(hdr.data_size);
> +                let ucode_data = fw
> +                    .data()
> +                    .get(ucode_start..ucode_start + ucode_size)
> +                    .ok_or(EINVAL)?;
> +
> +                let mut ucode = KVec::new();
> +                ucode.extend_from_slice(ucode_data, GFP_KERNEL)?;
> +
> +                Some(GenericBootloader { desc, ucode })
> +            }
> +            Err(_) => None,
> +        };
> +

Actually, let's put that code into a new `GenBootloader` type. You can
follow the example of `BooterFirmware`, which is quite similar (only a
bit more complex).


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-18 11:54   ` Alexandre Courbot
@ 2025-12-18 22:51     ` Timur Tabi
  2025-12-19  3:43       ` Alexandre Courbot
  0 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-18 22:51 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Thu, 2025-12-18 at 20:54 +0900, Alexandre Courbot wrote:
> +/// On Turing and GA100, the entries in the `LibosMemoryRegionInitArgument`
> +/// must all be a multiple of GSP_PAGE_SIZE in size, so add padding to force it
> +/// to that size.
> +#[repr(C)]
> +pub(crate) struct GspArgumentsAligned {
> +    pub(crate) inner: GspArgumentsCached,
> +    _padding: [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
> +}

Shouldn't I also remove the _padding from GspArgumentsCached:

pub(crate) struct GspArgumentsCached(
    bindings::GSP_ARGUMENTS_CACHED,
    [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
);

and with the padding removed, doesn't this

        dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;

now only copy the args and leave the padding completely uninitialized?  That's okay, I think, I just
want to make sure I'm not missing anything.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-18 22:51     ` Timur Tabi
@ 2025-12-19  3:43       ` Alexandre Courbot
  2025-12-19  4:34         ` Timur Tabi
  0 siblings, 1 reply; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-19  3:43 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Alexandre Courbot,
	dakr@kernel.org, Joel Fernandes, John Hubbard,
	rust-for-linux@vger.kernel.org

On Fri Dec 19, 2025 at 7:51 AM JST, Timur Tabi wrote:
> On Thu, 2025-12-18 at 20:54 +0900, Alexandre Courbot wrote:
>> +/// On Turing and GA100, the entries in the `LibosMemoryRegionInitArgument`
>> +/// must all be a multiple of GSP_PAGE_SIZE in size, so add padding to force it
>> +/// to that size.
>> +#[repr(C)]
>> +pub(crate) struct GspArgumentsAligned {
>> +    pub(crate) inner: GspArgumentsCached,
>> +    _padding: [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
>> +}
>
> Shouldn't I also remove the _padding from GspArgumentsCached:
>
> pub(crate) struct GspArgumentsCached(
>     bindings::GSP_ARGUMENTS_CACHED,
>     [u8; GSP_PAGE_SIZE - core::mem::size_of::<bindings::GSP_ARGUMENTS_CACHED>()],
> );

Yes, the patch I sent in my previous message was in replacement of
yours, not to be applied on top - I should have mentioned that.

>
> and with the padding removed, doesn't this
>
>         dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
>
> now only copy the args and leave the padding completely uninitialized?  That's okay, I think, I just
> want to make sure I'm not missing anything.

Since we call `alloc_coherent` with `__GFP_ZERO`, all memory will be
initialized to zero before that line is run. It's not 100% ideal but
does work.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-19  3:43       ` Alexandre Courbot
@ 2025-12-19  4:34         ` Timur Tabi
  2025-12-19  5:55           ` Alexandre Courbot
  0 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-19  4:34 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Fri, 2025-12-19 at 12:43 +0900, Alexandre Courbot wrote:
> 
> Yes, the patch I sent in my previous message was in replacement of
> yours, not to be applied on top - I should have mentioned that.

Ok, I will replace my patch with yours.

> > and with the padding removed, doesn't this
> > 
> >          dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
> > 
> > now only copy the args and leave the padding completely uninitialized?  That's okay, I
> > think, I just
> > want to make sure I'm not missing anything.
> 
> Since we call `alloc_coherent` with `__GFP_ZERO`, all memory will be
> initialized to zero before that line is run. It's not 100% ideal but
> does work.

Ok, I get it now. You allocate the block, and then dma_write! memcpys the GspArgumentsCached
that's on the stack to it.

When you say it's not ideal, is that because it still has to do a memcpy from the stack to the
DMA buffer, whereas Nouveau was able to write to the buffer directly?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-19  4:34         ` Timur Tabi
@ 2025-12-19  5:55           ` Alexandre Courbot
  0 siblings, 0 replies; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-19  5:55 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Alexandre Courbot,
	dakr@kernel.org, Joel Fernandes, John Hubbard,
	rust-for-linux@vger.kernel.org

On Fri Dec 19, 2025 at 1:34 PM JST, Timur Tabi wrote:
> On Fri, 2025-12-19 at 12:43 +0900, Alexandre Courbot wrote:
>> 
>> Yes, the patch I sent in my previous message was in replacement of
>> yours, not to be applied on top - I should have mentioned that.
>
> Ok, I will replace my patch with yours.
>
>> > and with the padding removed, doesn't this
>> > 
>> >          dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
>> > 
>> > now only copy the args and leave the padding completely uninitialized?  That's okay, I
>> > think, I just
>> > want to make sure I'm not missing anything.
>> 
>> Since we call `alloc_coherent` with `__GFP_ZERO`, all memory will be
>> initialized to zero before that line is run. It's not 100% ideal but
>> does work.
>
> Ok, I get it now. You allocate the block, and then dma_write! memcpys the GspArgumentsCached
> that's on the stack to it.
>
> When you say it's not ideal, is that because it still has to do a memcpy from the stack to the
> DMA buffer, whereas Nouveau was able to write to the buffer directly?

It is not ideal because you can end up DMAing uninitialized data to the
GSP if you don't allocate using `__GFP_ZERO`. In this case it is not a
big issue as the data is not expected to be valid, and is never accessed
through a slice, but it is not 100% bulletproof by design.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma()
  2025-12-18  7:48   ` Alexandre Courbot
@ 2025-12-19 12:49     ` Alexandre Courbot
  0 siblings, 0 replies; 46+ messages in thread
From: Alexandre Courbot @ 2025-12-19 12:49 UTC (permalink / raw)
  To: Alexandre Courbot, Timur Tabi, Danilo Krummrich, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

On Thu Dec 18, 2025 at 4:48 PM JST, Alexandre Courbot wrote:
> On Thu Dec 18, 2025 at 12:29 PM JST, Timur Tabi wrote:
>> Some GPUs do not support using DMA to transfer code/data from system
>> memory to Falcon memory, and instead must use programmed I/O (PIO).
>> Add a function to the Falcon HAL to indicate whether a given GPU's
>> Falcons support DMA for this purpose.
>>
>> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/falcon.rs           | 7 +++++++
>>  drivers/gpu/nova-core/falcon/hal.rs       | 3 +++
>>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 4 ++++
>>  drivers/gpu/nova-core/falcon/hal/tu102.rs | 4 ++++
>>  4 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
>> index 6b54c0ca458a..50c87dadf2ea 100644
>> --- a/drivers/gpu/nova-core/falcon.rs
>> +++ b/drivers/gpu/nova-core/falcon.rs
>> @@ -630,6 +630,13 @@ pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
>>          self.hal.is_riscv_active(bar)
>>      }
>>  
>> +    /// Check if this Falcon supports DMA for loading firmware.
>> +    ///
>> +    /// Returns `true` if DMA is supported, `false` if PIO must be used instead.
>> +    pub(crate) fn supports_dma(&self) -> bool {
>> +        self.hal.supports_dma()
>> +    }
>
> Rather than this, we should make the selection of the loading method
> transparent to the caller. Right now `boot.rs` needs to calls
> `supports_dma` itself and choose the right method, which leaves room for
> error.
>
> I'd suggest the following:
>
> - A `fn load(bar, fw) -> Result` method in the Falcon HAL that redirects
>   to either `load_dma` or `load_pio`.

My suggestion won't work because `load` takes a generic parameter, and a
virtual function cannot be generic.

I see two possible workarounds:

1. The HAL `load` method takes a `&dyn FalconFirmware`. I think
   `FalconFirmware` can be turned into a trait object, so that should
   work.

2. We do something that is a mix of your initial proposal and mine: The
   new `Falcon::load` method calls a HAL method that returns the kind of
   load operation supported (please use an enum instead of a boolean for
   clarity and future-proofing), and dispatches accordingly to either
   `pio_load` or `dma_load`.

In both cases, `dma_load` and `pio_load` still become private methods of
`Falcon`.

I would tend to favor 2. because the trait object looks a bit laborious,
and it will call back into `dma_load` or `pio_load` which are still
generic (this will work, but defeats the purpose of having them generic
in the first place). Also it is probably simpler to implement as it is
closer to your original idea.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (10 preceding siblings ...)
  2025-12-18  3:29 ` [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
@ 2025-12-28 17:45 ` Ewan Chorynski
  2025-12-30 21:42   ` Timur Tabi
  2025-12-31  2:58 ` John Hubbard
  2025-12-31 20:06 ` John Hubbard
  13 siblings, 1 reply; 46+ messages in thread
From: Ewan Chorynski @ 2025-12-28 17:45 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Hello

On Thu Dec 18, 2025 at 4:29 AM CET, Timur Tabi wrote:
> This patch set adds basic support for pre-booting GSP-RM
> on Turing.
>
> There is also partial support for GA100, but it's currently not
> fully implemented.  GA100 is considered experimental in Nouveau,
> and so it hasn't been tested with NovaCore either.
>
> That latest linux-firmware.git is required because it contains the
> Generic Bootloader image that has not yet been propogated to
> distros.

I was curious to see Nova probe on my laptop equipped with a TU117M
GeForce GTX 1650 Mobile. I compiled with your patchset applied on drm-next
and downloaded the latest linux-firmware.git.

However, NovaCore fails to probe my device with the following logs :

[    3.497672] NovaCore 0000:01:00.0: FbLayout {
                   fb: 0x0..0x100000000,
                   vga_workspace: 0xfff00000..0x100000000,
                   frts: 0xffe00000..0xfff00000,
                   boot: 0xffdff000..0xffe00000,
                   elf: 0xfe2c0000..0xffdf4ea0,
                   wpr2_heap: 0xf7900000..0xfe200000,
                   wpr2: 0xf7800000..0xfff00000,
                   heap: 0xf7700000..0xf7800000,
                   vf_partition_count: 0x0,
               }
[    3.623000] NovaCore 0000:01:00.0: GSP MBOX0: 0xffffe000, MBOX1: 0x0
[    3.623026] NovaCore 0000:01:00.0: Using SEC2 to load and run the booter_load firmware...
[    3.626236] NovaCore 0000:01:00.0: SEC2 MBOX0: 0x31, MBOX10x0
[    3.626265] NovaCore 0000:01:00.0: Booter-load failed with error 0x31

Is this expected to happen ? Here is the full device description from
lspci :

01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] (rev a1)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-28 17:45 ` [PATCH v4 00/11] gpu: nova-core: add Turing support Ewan Chorynski
@ 2025-12-30 21:42   ` Timur Tabi
  2026-01-04 10:21     ` Ewan Chorynski
  0 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-30 21:42 UTC (permalink / raw)
  To: ewan.chorynski@ik.me, Alexandre Courbot, dakr@kernel.org,
	nouveau@lists.freedesktop.org, Joel Fernandes, John Hubbard,
	rust-for-linux@vger.kernel.org

On Sun, 2025-12-28 at 18:45 +0100, Ewan Chorynski wrote:
> 
> [    3.623000] NovaCore 0000:01:00.0: GSP MBOX0: 0xffffe000, MBOX1: 0x0
> [    3.623026] NovaCore 0000:01:00.0: Using SEC2 to load and run the booter_load firmware...
> [    3.626236] NovaCore 0000:01:00.0: SEC2 MBOX0: 0x31, MBOX10x0
> [    3.626265] NovaCore 0000:01:00.0: Booter-load failed with error 0x31
> 
> Is this expected to happen ? Here is the full device description from
> lspci :
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q]
> (rev a1)

This should work on your GPU with my patches.  Error 0x31 is a generic failure error code that
indicates that booter_load failed to initialize, and that could be for any number of reasons. 
Unfortunately, the only way I've been able to debug such issues is to reproduce them in-house with a
custom build of booter_load.

I suggest that for now, you hold off until these commits are merged, along with my debugfs patches,
and then I can try to find a similar GPU internally that exhibits the failure and see if there's
something else missing.  Turing is the most complicated of all GSP-capable GPUs to boot, partly
because the firmware does a terrible job of reporting errors.

I'm assuming that Nouveau boots just fine with the same firmware images?  If you turn on debug
logging in Nouveau, it should say that it's booting with 570.144.  You can force it by deleting all
the *535.113.01* images in /lib/firmware/nvidia/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images
  2025-12-18 12:59   ` Alexandre Courbot
@ 2025-12-30 22:56     ` Timur Tabi
  2025-12-31 23:22       ` Timur Tabi
  2025-12-31 21:30     ` Timur Tabi
  1 sibling, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-30 22:56 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Thu, 2025-12-18 at 21:59 +0900, Alexandre Courbot wrote:
> So this `if` branch is really special-casing the generic bootloader. But
> at the end of the day it just does these things:
> 
> - Write an `ImemNonSecure` section,
> - Write an `Dmem` section,
> - Program the `TRANSCFG` register so the bootloader can initiate the DMA
>   transfer.
> 
> The first two steps can be expressed as a set of `FalconLoadTarget`s.
> That way they can be handled by the non-generic-bootloader path, and we
> can remove the `gbl` argument.

Hmmm... that would require that I implement FalconLoadParams for GenericBootloader.  That's not a
bad idea.  I'm not sure how I would build the Dmem FalconLoadTarget, however, given that it needs to
reference data from the FRTS FalconFirmware object.

impl FalconLoadParams for GenericBootloader {
    fn imem_sec_load_params(&self) -> FalconLoadTarget {
        FalconLoadTarget {
        }
    }

    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
        FalconLoadTarget {
        }
    }

    fn dmem_load_params(&self) -> FalconLoadTarget {
        FalconLoadTarget {
	   // How do I extract data from  the FRTS firmware image here?
        }
    }
}

> So `FwsecFirmware` could have an optional member that contains both the
> generic bootloader and the `BootloaderDmemDescV2` corresponding to it.
> If that optional member is `Some`, then it returns the `FalconLoadTarget`s
> corresponding to the generic bootloader. Otherwise, it behaves as
> before.
> 
> Interestingly there is no `ImemSecure` section to write so I guess we
> will have to make `imem_sec_load_params` return an `Option` as well.
> 
> And `NV_PFALCON_FBIF_TRANSCFG` is always programmed as the worst thing
> that can happen is that we don't use the DMA engine if there is no
> generic bootloader.

Yeah, I don't understand why this is being programmed at all in the PIO case, but that's what
Nouveau does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/tu102.c#n132

> 
> > +        // The Generic Bootloader exists only on Turing and GA100.  To avoid a bogus
> > +        // console error message on other platforms, only try to load it if it's
> > +        // supposed to be there.
> > +        let gbl_fw = if chipset < Chipset::GA102 {
> > +            super::request_firmware(dev, chipset, "gen_bootloader", FIRMWARE_VERSION)
> > +        } else {
> > +            Err(ENOENT)
> > +        };
> 
> Using `Err` to indicate no firmware means that we will proceed even if
> `request_firmware` returns an error. This should be:
> 
>     let gbl_fw = if chipset < Chipset::GA102 {
>         Some(super::request_firmware(dev, chipset, "gen_bootloader", FIRMWARE_VERSION)?)
>     } else {
>         None
>     };

Ok.


> 
> > +
> > +        let gbl = match gbl_fw {
> > +            Ok(fw) => {
> > +                let hdr = fw
> > +                    .data()
> > +                    .get(0..size_of::<BinHdr>())
> > +                    .and_then(BinHdr::from_bytes_copy)
> > +                    .ok_or(EINVAL)?;
> > +
> > +                let desc_offset = usize::from_safe_cast(hdr.header_offset);
> > +                let desc = fw
> > +                    .data()
> > +                    .get(desc_offset..desc_offset + size_of::<BootloaderDesc>())
> > +                    .and_then(BootloaderDesc::from_bytes_copy)
> > +                    .ok_or(EINVAL)?;
> > +
> > +                let ucode_start = usize::from_safe_cast(hdr.data_offset);
> > +                let ucode_size = usize::from_safe_cast(hdr.data_size);
> > +                let ucode_data = fw
> > +                    .data()
> > +                    .get(ucode_start..ucode_start + ucode_size)
> > +                    .ok_or(EINVAL)?;
> > +
> > +                let mut ucode = KVec::new();
> > +                ucode.extend_from_slice(ucode_data, GFP_KERNEL)?;
> > +
> > +                Some(GenericBootloader { desc, ucode })
> > +            }
> > +            Err(_) => None,
> > +        };
> > +
> 
> Actually, let's put that code into a new `GenBootloader` type. You can
> follow the example of `BooterFirmware`, which is quite similar (only a
> bit more complex).

Sorry, I'm a bit confused.  What's the difference between GenBootloader and GenericBootloader? 
Can't I just add a new() constructor to GenericBootloader?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (11 preceding siblings ...)
  2025-12-28 17:45 ` [PATCH v4 00/11] gpu: nova-core: add Turing support Ewan Chorynski
@ 2025-12-31  2:58 ` John Hubbard
  2025-12-31  4:26   ` Timur Tabi
  2025-12-31 20:06 ` John Hubbard
  13 siblings, 1 reply; 46+ messages in thread
From: John Hubbard @ 2025-12-31  2:58 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> This patch set adds basic support for pre-booting GSP-RM
> on Turing.

Hi Timur,

I gave this a quick test on TU117, but it is looking for a firmware
file that is not on disk.

Should I be updating linux-firmware? Or something else is wrong?

NovaCore 0000:01:00.0: Direct firmware load for nvidia/tu117/gsp/gen_bootloader-570.144.bin failed with error -2
NovaCore 0000:01:00.0: WPR2 region not created after running FWSEC-FRTS
NovaCore 0000:01:00.0: probe with driver NovaCore failed with error -5


thanks,
-- 
John Hubbard

> 
> There is also partial support for GA100, but it's currently not
> fully implemented.  GA100 is considered experimental in Nouveau,
> and so it hasn't been tested with NovaCore either.
> 
> That latest linux-firmware.git is required because it contains the
> Generic Bootloader image that has not yet been propogated to
> distros.
> 
> Summary of changes:
> 
> 1. Introduce non-secure IMEM support.  For GA102 and later, only secure IMEM
> is used.
> 2. Because of non-secure IMEM, Turing booter firmware images need some of
> the headers parsed differently for stuff like the load target address.
> 3. Add support the tu10x firmware signature section in the ELF image.
> 4. Add several new registers used only on Turing.
> 5. Some functions that were considered generic Falcon operations are
> actually different on Turing vs GA102+, so they are moved to the HAL.
> 6. The FRTS FWSEC firmware in VBIOS uses a different version of the
> descriptor header.
> 7. On Turing/GA100 LIBOS args struct needs to have its 'size' field
> aligned to 4KB.  So pad the struct to make it 4K.
> 8. Turing Falcons do not support DMA, so PIO is used to copy images
> into IMEM/DMEM.
> 
> Changes from v3:
> 1. Fixed Rust formatting issues
> 2. Misc style changes as suggested by Alex
> 3. Merged regs.rs changes into commits that use them
> 4. Used Zeroable::zeroed()
> 
> Timur Tabi (11):
>   gpu: nova-core: rename Imem to ImemSecure
>   gpu: nova-core: add ImemNonSecure section infrastructure
>   gpu: nova-core: support header parsing on Turing/GA100
>   gpu: nova-core: add support for Turing/GA100 fwsignature
>   gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
>   gpu: nova-core: move some functions into the HAL
>   gpu: nova-core: Add basic Turing HAL
>   gpu: nova-core: add Falcon HAL method supports_dma()
>   gpu: nova-core: add FalconUCodeDescV2 support
>   gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
>   gpu: nova-core: add PIO support for loading firmware images
> 
>  drivers/gpu/nova-core/falcon.rs           | 246 +++++++++++++++++-----
>  drivers/gpu/nova-core/falcon/hal.rs       |  17 ++
>  drivers/gpu/nova-core/falcon/hal/ga102.rs |  47 +++++
>  drivers/gpu/nova-core/falcon/hal/tu102.rs |  77 +++++++
>  drivers/gpu/nova-core/firmware.rs         | 181 +++++++++++++++-
>  drivers/gpu/nova-core/firmware/booter.rs  |  43 +++-
>  drivers/gpu/nova-core/firmware/fwsec.rs   | 215 ++++++++++++++++---
>  drivers/gpu/nova-core/firmware/gsp.rs     |   4 +-
>  drivers/gpu/nova-core/gsp/boot.rs         |  10 +-
>  drivers/gpu/nova-core/gsp/fw.rs           |  24 ++-
>  drivers/gpu/nova-core/regs.rs             |  53 +++++
>  drivers/gpu/nova-core/vbios.rs            |  75 ++++---
>  12 files changed, 864 insertions(+), 128 deletions(-)
>  create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs
> 
> 
> base-commit: 60c7398bded2e11f0db40a409a241b8be5910ee2
> prerequisite-patch-id: a3e23917ec535263604af95194422382f14c2f21



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31  2:58 ` John Hubbard
@ 2025-12-31  4:26   ` Timur Tabi
  2025-12-31  6:17     ` John Hubbard
  0 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-31  4:26 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Tue, 2025-12-30 at 18:58 -0800, John Hubbard wrote:
> On 12/17/25 7:29 PM, Timur Tabi wrote:
> > This patch set adds basic support for pre-booting GSP-RM
> > on Turing.
> 
> Hi Timur,
> 
> I gave this a quick test on TU117, but it is looking for a firmware
> file that is not on disk.
> 
> Should I be updating linux-firmware? Or something else is wrong?

I mean, I literally say that in the patchset description, about two paragraphs down:

 That latest linux-firmware.git is required because it contains the
 Generic Bootloader image that has not yet been propogated to
 distros.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31  4:26   ` Timur Tabi
@ 2025-12-31  6:17     ` John Hubbard
  2025-12-31 16:33       ` Timur Tabi
  0 siblings, 1 reply; 46+ messages in thread
From: John Hubbard @ 2025-12-31  6:17 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Alexandre Courbot,
	dakr@kernel.org, Joel Fernandes, rust-for-linux@vger.kernel.org

On 12/30/25 8:26 PM, Timur Tabi wrote:
> On Tue, 2025-12-30 at 18:58 -0800, John Hubbard wrote:
>> On 12/17/25 7:29 PM, Timur Tabi wrote:
>>> This patch set adds basic support for pre-booting GSP-RM
>>> on Turing.
>>
>> Hi Timur,
>>
>> I gave this a quick test on TU117, but it is looking for a firmware
>> file that is not on disk.
>>
>> Should I be updating linux-firmware? Or something else is wrong?
> 
> I mean, I literally say that in the patchset description, about two paragraphs down:
> 
>   That latest linux-firmware.git is required because it contains the
>   Generic Bootloader image that has not yet been propogated to
>   distros.

Actually, I should have been much more specific, and should have
asked, "what git repository and branch should I be using for linux-firmware?".

Because none of the branches in either of these seem to have any file named
gen_bootloader-570.144.bin:

       https://github.com/NVIDIA/linux-firmware.git
       git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git



thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31  6:17     ` John Hubbard
@ 2025-12-31 16:33       ` Timur Tabi
  2025-12-31 17:29         ` John Hubbard
  0 siblings, 1 reply; 46+ messages in thread
From: Timur Tabi @ 2025-12-31 16:33 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Tue, 2025-12-30 at 22:17 -0800, John Hubbard wrote:
> Actually, I should have been much more specific, and should have
> asked, "what git repository and branch should I be using for linux-firmware?".
> 
> Because none of the branches in either of these seem to have any file named
> gen_bootloader-570.144.bin:
> 
>        https://github.com/NVIDIA/linux-firmware.git
>        git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

It's right there, in the `main` branch:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia/tu102/gsp

Did you forget that you need to run the installer script, and not just blindly copy the files
over?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31 16:33       ` Timur Tabi
@ 2025-12-31 17:29         ` John Hubbard
  2025-12-31 19:15           ` John Hubbard
  0 siblings, 1 reply; 46+ messages in thread
From: John Hubbard @ 2025-12-31 17:29 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Alexandre Courbot,
	dakr@kernel.org, Joel Fernandes, rust-for-linux@vger.kernel.org

On 12/31/25 8:33 AM, Timur Tabi wrote:
> On Tue, 2025-12-30 at 22:17 -0800, John Hubbard wrote:
>> Actually, I should have been much more specific, and should have
>> asked, "what git repository and branch should I be using for linux-firmware?".
>>
>> Because none of the branches in either of these seem to have any file named
>> gen_bootloader-570.144.bin:
>>
>>         https://github.com/NVIDIA/linux-firmware.git
>>         git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
> 
> It's right there, in the `main` branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia/tu102/gsp
> 
> Did you forget that you need to run the installer script, and not just blindly copy the files
> over?

ummm, maybe. OK, definitely. :)

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31 17:29         ` John Hubbard
@ 2025-12-31 19:15           ` John Hubbard
  2025-12-31 19:23             ` Timur Tabi
  0 siblings, 1 reply; 46+ messages in thread
From: John Hubbard @ 2025-12-31 19:15 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Alexandre Courbot,
	dakr@kernel.org, Joel Fernandes, rust-for-linux@vger.kernel.org

On 12/31/25 9:29 AM, John Hubbard wrote:
> On 12/31/25 8:33 AM, Timur Tabi wrote:
>> On Tue, 2025-12-30 at 22:17 -0800, John Hubbard wrote:
>>> Actually, I should have been much more specific, and should have
>>> asked, "what git repository and branch should I be using for linux-firmware?".
>>>
>>> Because none of the branches in either of these seem to have any file named
>>> gen_bootloader-570.144.bin:
>>>
>>>         https://github.com/NVIDIA/linux-firmware.git
>>>         git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
>>
>> It's right there, in the `main` branch:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia/tu102/gsp
>>
>> Did you forget that you need to run the installer script, and not just blindly copy the files
>> over?
> 
> ummm, maybe. OK, definitely. :)
> 

So now that I have the right linux-firmware is installed, it's still
broken on TU117, but this fixes it:

<blueforge> linux-github (v4_20251217_ttabi_nvidia_com)$ git d
diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
index 112488216bff..60d657f4d204 100644
--- a/drivers/gpu/nova-core/firmware/gsp.rs
+++ b/drivers/gpu/nova-core/firmware/gsp.rs
@@ -163,6 +163,9 @@ pub(crate) fn new<'a, 'b>(
         let fw_section = elf::elf64_section(fw.data(), ".fwimage").ok_or(EINVAL)?;
 
         let sigs_section = match chipset.arch() {
+            Architecture::Turing if chipset == Chipset::TU116 || chipset == Chipset::TU117 => {
+                ".fwsignature_tu11x"
+            }
             Architecture::Turing => ".fwsignature_tu10x",
             // GA100 uses the same firmware as Turing
             Architecture::Ampere if chipset == Chipset::GA100 => ".fwsignature_tu10x",

You can merge that in somewhere, or let me know if you prefer a formal
patch from me, whatever is easiest for you.


thanks,
-- 
John Hubbard


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31 19:15           ` John Hubbard
@ 2025-12-31 19:23             ` Timur Tabi
  0 siblings, 0 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-31 19:23 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org
  Cc: ewan.chorynski@ik.me

On Wed, 2025-12-31 at 11:15 -0800, John Hubbard wrote:
> --- a/drivers/gpu/nova-core/firmware/gsp.rs
> +++ b/drivers/gpu/nova-core/firmware/gsp.rs
> @@ -163,6 +163,9 @@ pub(crate) fn new<'a, 'b>(
>          let fw_section = elf::elf64_section(fw.data(), ".fwimage").ok_or(EINVAL)?;
>  
>          let sigs_section = match chipset.arch() {
> +            Architecture::Turing if chipset == Chipset::TU116 || chipset == Chipset::TU117 => {
> +                ".fwsignature_tu11x"
> +            }
>              Architecture::Turing => ".fwsignature_tu10x",
>              // GA100 uses the same firmware as Turing
>              Architecture::Ampere if chipset == Chipset::GA100 => ".fwsignature_tu10x",
> 
> You can merge that in somewhere, or let me know if you prefer a formal
> patch from me, whatever is easiest for you.

Interesting, thanks.  I will merge it into my patchset.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature
  2025-12-18  3:29 ` [PATCH v4 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
@ 2025-12-31 19:28   ` John Hubbard
  0 siblings, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 19:28 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> Turing and GA100 share the same GSP-RM firmware binary, and the
> signature ELF section is labeled ".fwsignature_tu10x".
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware/gsp.rs | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
> index 0549805282ab..112488216bff 100644
> --- a/drivers/gpu/nova-core/firmware/gsp.rs
> +++ b/drivers/gpu/nova-core/firmware/gsp.rs
> @@ -163,9 +163,11 @@ pub(crate) fn new<'a, 'b>(
>          let fw_section = elf::elf64_section(fw.data(), ".fwimage").ok_or(EINVAL)?;
>  
>          let sigs_section = match chipset.arch() {

After applying the fix that I mentioned in my other reply:

+            Architecture::Turing if chipset == Chipset::TU116 || chipset == Chipset::TU117 => {
+                ".fwsignature_tu11x"
+            }

, please feel free to add

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard

> +            Architecture::Turing => ".fwsignature_tu10x",
> +            // GA100 uses the same firmware as Turing
> +            Architecture::Ampere if chipset == Chipset::GA100 => ".fwsignature_tu10x",
>              Architecture::Ampere => ".fwsignature_ga10x",
>              Architecture::Ada => ".fwsignature_ad10x",
> -            _ => return Err(ENOTSUPP),
>          };
>          let signatures = elf::elf64_section(fw.data(), sigs_section)
>              .ok_or(EINVAL)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure
  2025-12-18  3:29 ` [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
@ 2025-12-31 19:29   ` John Hubbard
  0 siblings, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 19:29 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> Rename FalconMem::Imem to ImemSecure to indicate that it references
> Secure Instruction Memory.  This change has no functional impact.
> 
> On Falcon cores, pages in instruction memory can be tagged as Secure
> or Non-Secure.  For GA102 and later, only Secure is used, which is why
> FalconMem::Imem seems appropriate.  However, Turing firmware images
> can also contain non-secure sections, and so FalconMem needs to support
> that.  By renaming Imem to ImemSec now, future patches for Turing support
> will be simpler.
> 
> Nouveau uses the term "IMEM" to refer both to the Instruction Memory
> block on Falcon cores as well as to the images of secure firmware
> uploaded to part of IMEM.  OpenRM uses the terms "ImemSec" and "ImemNs"
> instead, and uses "IMEM" just to refer to the physical memory device.
> 
> Renaming these terms allows us to align with OpenRM, avoid confusion
> between IMEM and ImemSec, and makes future patches simpler.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs          | 20 +++++++++++++-------
>  drivers/gpu/nova-core/firmware/booter.rs | 12 ++++++------
>  drivers/gpu/nova-core/firmware/fwsec.rs  |  2 +-
>  3 files changed, 20 insertions(+), 14 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard

> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index 82c661aef594..0965cb50568b 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -237,8 +237,8 @@ fn from(value: PeregrineCoreSelect) -> Self {
>  /// Different types of memory present in a falcon core.
>  #[derive(Debug, Clone, Copy, PartialEq, Eq)]
>  pub(crate) enum FalconMem {
> -    /// Instruction Memory.
> -    Imem,
> +    /// Secure Instruction Memory.
> +    ImemSecure,
>      /// Data Memory.
>      Dmem,
>  }
> @@ -345,8 +345,8 @@ pub(crate) struct FalconBromParams {
>  
>  /// Trait for providing load parameters of falcon firmwares.
>  pub(crate) trait FalconLoadParams {
> -    /// Returns the load parameters for `IMEM`.
> -    fn imem_load_params(&self) -> FalconLoadTarget;
> +    /// Returns the load parameters for Secure `IMEM`.
> +    fn imem_sec_load_params(&self) -> FalconLoadTarget;
>  
>      /// Returns the load parameters for `DMEM`.
>      fn dmem_load_params(&self) -> FalconLoadTarget;
> @@ -457,7 +457,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>          //
>          // For DMEM we can fold the start offset into the DMA handle.
>          let (src_start, dma_start) = match target_mem {
> -            FalconMem::Imem => (load_offsets.src_start, fw.dma_handle()),
> +            FalconMem::ImemSecure => (load_offsets.src_start, fw.dma_handle()),
>              FalconMem::Dmem => (
>                  0,
>                  fw.dma_handle_with_offset(load_offsets.src_start.into_safe_cast())?,
> @@ -508,7 +508,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>  
>          let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
>              .set_size(DmaTrfCmdSize::Size256B)
> -            .set_imem(target_mem == FalconMem::Imem)
> +            .set_imem(target_mem == FalconMem::ImemSecure)
>              .set_sec(if sec { 1 } else { 0 });
>  
>          for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
> @@ -543,7 +543,13 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
>                  .set_mem_type(FalconFbifMemType::Physical)
>          });
>  
> -        self.dma_wr(bar, fw, FalconMem::Imem, fw.imem_load_params(), true)?;
> +        self.dma_wr(
> +            bar,
> +            fw,
> +            FalconMem::ImemSecure,
> +            fw.imem_sec_load_params(),
> +            true,
> +        )?;
>          self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
>  
>          self.hal.program_brom(self, bar, &fw.brom_params())?;
> diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
> index f107f753214a..096cd01dbc9d 100644
> --- a/drivers/gpu/nova-core/firmware/booter.rs
> +++ b/drivers/gpu/nova-core/firmware/booter.rs
> @@ -251,8 +251,8 @@ impl<'a> FirmwareSignature<BooterFirmware> for BooterSignature<'a> {}
>  
>  /// The `Booter` loader firmware, responsible for loading the GSP.
>  pub(crate) struct BooterFirmware {
> -    // Load parameters for `IMEM` falcon memory.
> -    imem_load_target: FalconLoadTarget,
> +    // Load parameters for Secure `IMEM` falcon memory.
> +    imem_sec_load_target: FalconLoadTarget,
>      // Load parameters for `DMEM` falcon memory.
>      dmem_load_target: FalconLoadTarget,
>      // BROM falcon parameters.
> @@ -354,7 +354,7 @@ pub(crate) fn new(
>          };
>  
>          Ok(Self {
> -            imem_load_target: FalconLoadTarget {
> +            imem_sec_load_target: FalconLoadTarget {
>                  src_start: app0.offset,
>                  dst_start: 0,
>                  len: app0.len,
> @@ -371,8 +371,8 @@ pub(crate) fn new(
>  }
>  
>  impl FalconLoadParams for BooterFirmware {
> -    fn imem_load_params(&self) -> FalconLoadTarget {
> -        self.imem_load_target.clone()
> +    fn imem_sec_load_params(&self) -> FalconLoadTarget {
> +        self.imem_sec_load_target.clone()
>      }
>  
>      fn dmem_load_params(&self) -> FalconLoadTarget {
> @@ -384,7 +384,7 @@ fn brom_params(&self) -> FalconBromParams {
>      }
>  
>      fn boot_addr(&self) -> u32 {
> -        self.imem_load_target.src_start
> +        self.imem_sec_load_target.src_start
>      }
>  }
>  
> diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
> index b28e34d279f4..6a2f5a0d4b15 100644
> --- a/drivers/gpu/nova-core/firmware/fwsec.rs
> +++ b/drivers/gpu/nova-core/firmware/fwsec.rs
> @@ -224,7 +224,7 @@ pub(crate) struct FwsecFirmware {
>  }
>  
>  impl FalconLoadParams for FwsecFirmware {
> -    fn imem_load_params(&self) -> FalconLoadTarget {
> +    fn imem_sec_load_params(&self) -> FalconLoadTarget {
>          FalconLoadTarget {
>              src_start: 0,
>              dst_start: self.desc.imem_phys_base,



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 02/11] gpu: nova-core: add ImemNonSecure section infrastructure
  2025-12-18  3:29 ` [PATCH v4 02/11] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
@ 2025-12-31 19:35   ` John Hubbard
  0 siblings, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 19:35 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> The GSP booter firmware in Turing and GA100 includes a third memory
> section called ImemNonSecure, which is non-secure IMEM.  This section
> must be loaded separately from DMEM and secure IMEM, but only if it
> actually exists.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs          | 20 ++++++++++++++++++--
>  drivers/gpu/nova-core/firmware/booter.rs |  9 +++++++++
>  drivers/gpu/nova-core/firmware/fwsec.rs  |  5 +++++
>  3 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index 0965cb50568b..b92152277d00 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -239,6 +239,8 @@ fn from(value: PeregrineCoreSelect) -> Self {
>  pub(crate) enum FalconMem {
>      /// Secure Instruction Memory.
>      ImemSecure,
> +    /// Non-Secure Instruction Memory.
> +    ImemNonSecure,
>      /// Data Memory.
>      Dmem,
>  }
> @@ -348,6 +350,10 @@ pub(crate) trait FalconLoadParams {
>      /// Returns the load parameters for Secure `IMEM`.
>      fn imem_sec_load_params(&self) -> FalconLoadTarget;
>  
> +    /// Returns the load parameters for Non-Secure `IMEM`,
> +    /// used only on Turing and GA100.
> +    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget>;
> +
>      /// Returns the load parameters for `DMEM`.
>      fn dmem_load_params(&self) -> FalconLoadTarget;
>  
> @@ -457,7 +463,9 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>          //
>          // For DMEM we can fold the start offset into the DMA handle.
>          let (src_start, dma_start) = match target_mem {
> -            FalconMem::ImemSecure => (load_offsets.src_start, fw.dma_handle()),
> +            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
> +                (load_offsets.src_start, fw.dma_handle())
> +            }
>              FalconMem::Dmem => (
>                  0,
>                  fw.dma_handle_with_offset(load_offsets.src_start.into_safe_cast())?,
> @@ -508,7 +516,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>  
>          let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
>              .set_size(DmaTrfCmdSize::Size256B)
> -            .set_imem(target_mem == FalconMem::ImemSecure)
> +            .set_imem(target_mem != FalconMem::Dmem)
>              .set_sec(if sec { 1 } else { 0 });
>  
>          for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
> @@ -552,6 +560,14 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
>          )?;
>          self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
>  
> +        if let Some(nmem) = fw.imem_ns_load_params() {
> +            // This code should never actual get executed, because the Non-Secure
> +            // section only exists on firmware used by Turing and GA100, and
> +            // those platforms do not use DMA.  But we include this code for
> +            // consistency.
> +            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
> +        }

Let's please delete that whole "if let" block, and instead just return with
an error in this case. This part of the firmware story will never change,
because it's only for older chips, so we don't need any kind of "what if"
code. 

With that, please feel free to add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard

> +
>          self.hal.program_brom(self, bar, &fw.brom_params())?;
>  
>          // Set `BootVec` to start of non-secure code.
> diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
> index 096cd01dbc9d..1b98bb47424c 100644
> --- a/drivers/gpu/nova-core/firmware/booter.rs
> +++ b/drivers/gpu/nova-core/firmware/booter.rs
> @@ -253,6 +253,9 @@ impl<'a> FirmwareSignature<BooterFirmware> for BooterSignature<'a> {}
>  pub(crate) struct BooterFirmware {
>      // Load parameters for Secure `IMEM` falcon memory.
>      imem_sec_load_target: FalconLoadTarget,
> +    // Load parameters for Non-Secure `IMEM` falcon memory,
> +    // used only on Turing and GA100
> +    imem_ns_load_target: Option<FalconLoadTarget>,
>      // Load parameters for `DMEM` falcon memory.
>      dmem_load_target: FalconLoadTarget,
>      // BROM falcon parameters.
> @@ -359,6 +362,8 @@ pub(crate) fn new(
>                  dst_start: 0,
>                  len: app0.len,
>              },
> +            // Exists only in the booter image for Turing and GA100
> +            imem_ns_load_target: None,
>              dmem_load_target: FalconLoadTarget {
>                  src_start: load_hdr.os_data_offset,
>                  dst_start: 0,
> @@ -375,6 +380,10 @@ fn imem_sec_load_params(&self) -> FalconLoadTarget {
>          self.imem_sec_load_target.clone()
>      }
>  
> +    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
> +        self.imem_ns_load_target.clone()
> +    }
> +
>      fn dmem_load_params(&self) -> FalconLoadTarget {
>          self.dmem_load_target.clone()
>      }
> diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
> index 6a2f5a0d4b15..e4009faba6c5 100644
> --- a/drivers/gpu/nova-core/firmware/fwsec.rs
> +++ b/drivers/gpu/nova-core/firmware/fwsec.rs
> @@ -232,6 +232,11 @@ fn imem_sec_load_params(&self) -> FalconLoadTarget {
>          }
>      }
>  
> +    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
> +        // Only used on Turing and GA100, so return None for now
> +        None
> +    }
> +
>      fn dmem_load_params(&self) -> FalconLoadTarget {
>          FalconLoadTarget {
>              src_start: self.desc.imem_load_size,



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 03/11] gpu: nova-core: support header parsing on Turing/GA100
  2025-12-18  3:29 ` [PATCH v4 03/11] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
@ 2025-12-31 19:58   ` John Hubbard
  0 siblings, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 19:58 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> The Turing/GA100 version of Booter is slightly different from the
> GA102+ version.  The headers are the same, but different fields of
> the headers are used to identify the IMEM section.  In addition,
> there is an NMEM section on Turing/GA100.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware/booter.rs | 28 ++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
> index 1b98bb47424c..86556cee8e67 100644
> --- a/drivers/gpu/nova-core/firmware/booter.rs
> +++ b/drivers/gpu/nova-core/firmware/booter.rs
> @@ -356,14 +356,30 @@ pub(crate) fn new(
>              }
>          };
>  
> +        // There are two versions of Booter, one for Turing/GA100, and another for
> +        // GA102+.  The extraction of the IMEM sections differs between the two
> +        // versions.  Unfortunately, the file names are the same, and the headers
> +        // don't indicate the versions.  The only way to differentiate is by the Chipset.

Very helpful comment, I would not have figured that out without it.

> +        let (imem_sec_dst_start, imem_ns_load_target) = if chipset <= Chipset::GA100 {
> +            (
> +                app0.offset,
> +                Some(FalconLoadTarget {
> +                    src_start: 0,
> +                    dst_start: load_hdr.os_code_offset,
> +                    len: load_hdr.os_code_size,
> +                }),
> +            )
> +        } else {
> +            (0, None)
> +        };
> +
>          Ok(Self {
>              imem_sec_load_target: FalconLoadTarget {
>                  src_start: app0.offset,
> -                dst_start: 0,
> +                dst_start: imem_sec_dst_start,
>                  len: app0.len,
>              },
> -            // Exists only in the booter image for Turing and GA100
> -            imem_ns_load_target: None,
> +            imem_ns_load_target,
>              dmem_load_target: FalconLoadTarget {
>                  src_start: load_hdr.os_data_offset,
>                  dst_start: 0,
> @@ -393,7 +409,11 @@ fn brom_params(&self) -> FalconBromParams {
>      }
>  
>      fn boot_addr(&self) -> u32 {
> -        self.imem_sec_load_target.src_start
> +        if let Some(ns_target) = &self.imem_ns_load_target {
> +            ns_target.dst_start
> +        } else {
> +            self.imem_sec_load_target.src_start
> +        }
>      }
>  }
>  

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
                   ` (12 preceding siblings ...)
  2025-12-31  2:58 ` John Hubbard
@ 2025-12-31 20:06 ` John Hubbard
  2025-12-31 20:11   ` Timur Tabi
  13 siblings, 1 reply; 46+ messages in thread
From: John Hubbard @ 2025-12-31 20:06 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> This patch set adds basic support for pre-booting GSP-RM
> on Turing.
> 
> There is also partial support for GA100, but it's currently not
> fully implemented.  GA100 is considered experimental in Nouveau,
> and so it hasn't been tested with NovaCore either.

Hi Timur,

Are you also working on rebasing this onto drm-rust-next? There are
some changes and conflicts to work through, but I'm hoping that we
are very close to getting this ready for merging, so that's the
next step.


thanks,
-- 
John Hubbard

> 
> That latest linux-firmware.git is required because it contains the
> Generic Bootloader image that has not yet been propogated to
> distros.
> 
> Summary of changes:
> 
> 1. Introduce non-secure IMEM support.  For GA102 and later, only secure IMEM
> is used.
> 2. Because of non-secure IMEM, Turing booter firmware images need some of
> the headers parsed differently for stuff like the load target address.
> 3. Add support the tu10x firmware signature section in the ELF image.
> 4. Add several new registers used only on Turing.
> 5. Some functions that were considered generic Falcon operations are
> actually different on Turing vs GA102+, so they are moved to the HAL.
> 6. The FRTS FWSEC firmware in VBIOS uses a different version of the
> descriptor header.
> 7. On Turing/GA100 LIBOS args struct needs to have its 'size' field
> aligned to 4KB.  So pad the struct to make it 4K.
> 8. Turing Falcons do not support DMA, so PIO is used to copy images
> into IMEM/DMEM.
> 
> Changes from v3:
> 1. Fixed Rust formatting issues
> 2. Misc style changes as suggested by Alex
> 3. Merged regs.rs changes into commits that use them
> 4. Used Zeroable::zeroed()
> 
> Timur Tabi (11):
>   gpu: nova-core: rename Imem to ImemSecure
>   gpu: nova-core: add ImemNonSecure section infrastructure
>   gpu: nova-core: support header parsing on Turing/GA100
>   gpu: nova-core: add support for Turing/GA100 fwsignature
>   gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
>   gpu: nova-core: move some functions into the HAL
>   gpu: nova-core: Add basic Turing HAL
>   gpu: nova-core: add Falcon HAL method supports_dma()
>   gpu: nova-core: add FalconUCodeDescV2 support
>   gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
>   gpu: nova-core: add PIO support for loading firmware images
> 
>  drivers/gpu/nova-core/falcon.rs           | 246 +++++++++++++++++-----
>  drivers/gpu/nova-core/falcon/hal.rs       |  17 ++
>  drivers/gpu/nova-core/falcon/hal/ga102.rs |  47 +++++
>  drivers/gpu/nova-core/falcon/hal/tu102.rs |  77 +++++++
>  drivers/gpu/nova-core/firmware.rs         | 181 +++++++++++++++-
>  drivers/gpu/nova-core/firmware/booter.rs  |  43 +++-
>  drivers/gpu/nova-core/firmware/fwsec.rs   | 215 ++++++++++++++++---
>  drivers/gpu/nova-core/firmware/gsp.rs     |   4 +-
>  drivers/gpu/nova-core/gsp/boot.rs         |  10 +-
>  drivers/gpu/nova-core/gsp/fw.rs           |  24 ++-
>  drivers/gpu/nova-core/regs.rs             |  53 +++++
>  drivers/gpu/nova-core/vbios.rs            |  75 ++++---
>  12 files changed, 864 insertions(+), 128 deletions(-)
>  create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs
> 
> 
> base-commit: 60c7398bded2e11f0db40a409a241b8be5910ee2
> prerequisite-patch-id: a3e23917ec535263604af95194422382f14c2f21



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-31 20:06 ` John Hubbard
@ 2025-12-31 20:11   ` Timur Tabi
  0 siblings, 0 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-31 20:11 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Wed, 2025-12-31 at 12:06 -0800, John Hubbard wrote:
> > There is also partial support for GA100, but it's currently not
> > fully implemented.  GA100 is considered experimental in Nouveau,
> > and so it hasn't been tested with NovaCore either.
> 
> Hi Timur,
> 
> Are you also working on rebasing this onto drm-rust-next? There are
> some changes and conflicts to work through, but I'm hoping that we
> are very close to getting this ready for merging, so that's the
> next step.

Every time I post a new version, it is rebased onto drm-rust-next.  Unfortunately, Alex has asked
for some significant changes, so it'll be a while before I can post v5.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images
  2025-12-18 12:59   ` Alexandre Courbot
  2025-12-30 22:56     ` Timur Tabi
@ 2025-12-31 21:30     ` Timur Tabi
  1 sibling, 0 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-31 21:30 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Thu, 2025-12-18 at 21:59 +0900, Alexandre Courbot wrote:
> So `FwsecFirmware` could have an optional member that contains both the
> generic bootloader and the `BootloaderDmemDescV2` corresponding to it.
> If that optional member is `Some`, then it returns the `FalconLoadTarget`s
> corresponding to the generic bootloader. Otherwise, it behaves as
> before.

Unfortunately, this won't work because the BootloaderDmemDescV2() depends on properties of the
FwsecFirmware, i.e. it is self-referential.  

So I tried this:

pub(crate) struct GenericBootloader {
    pub desc: BootloaderDesc,
    pub desc2: BootloaderDmemDescV2,   <--- new field
    pub ucode: Vec<u8, kernel::alloc::allocator::Kmalloc>,
}

and we already have this:

pub(crate) struct FwsecFirmware {
    /// Descriptor of the firmware.
    desc: FalconUCodeDesc,
    /// GPU-accessible DMA object containing the firmware.
    ucode: FirmwareDmaObject<Self, Signed>,
    /// Generic bootloader
    gen_bootloader: Option<GenericBootloader>,
}

So in the constructor for FwsecFirmware, I tried to do this:

                let imem_sec = self.imem_sec_load_params();
                let imem_ns = self.imem_ns_load_params().ok_or(EINVAL)?;
                let dmem = self.dmem_load_params();

                // This structure tells the generic bootloader where to find the
                // FWSEC image.
                let desc2 = BootloaderDmemDescV2 {
                    reserved: [0; 4],
                    signature: [0; 4],
                    ctx_dma: 4, // FALCON_DMAIDX_PHYS_SYS_NCOH
                    code_dma_base: ucode_dma.0.dma_handle(),
                    non_sec_code_off: imem_ns.dst_start,
                    non_sec_code_size: imem_ns.len,
                    sec_code_off: imem_sec.dst_start,
                    sec_code_size: imem_sec.len,
                    code_entry_point: 0,
                    data_dma_base: fw.dma_handle() + u64::from(dmem.src_start),
                    data_size: dmem.len,
                    argc: 0,
                    argv: 0,
                };

This obviously doesn't work because `self` doesn't exist it.  It is the output of the contructor
(FwsecFirmware::new()), so I can't use it in the constructor.  The self.imem_sec_load_params(), etc
functions all depend on whether it's a v2 or a v3 descriptor (local variable `desc` in ::new()).  I
would need to dismantle all that code and manually re-implement it in this constructor.  The truth
of the matter is that the BootloaderDmemDescV2 object needs to be built *after* the FwsecFirmware
object exists, because it references several fields inside FwsecFirmware descriptor, which could
technically be either a v2 or a v3.

Now I could assume that it's always a v2 descriptor, but that would just ignore the entire
infrastructure we created to support both versions transparently.

I will post a v5 that has only your minor suggestions.  I recommend that we collaborate offline to
implement the major ones, if you still feel they are necessary.  However, I would prefer that we
implement them as follow-on patches, in order to have Turing support make the 6.20-rc6 deadline.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
  2025-12-18  3:29 ` [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
  2025-12-18 11:24   ` Alexandre Courbot
@ 2025-12-31 21:35   ` John Hubbard
  1 sibling, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 21:35 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> The with_falcon_mem() method initializes the 'imem' and 'sec' fields of
> the NV_PFALCON_FALCON_DMATRFCMD register based on the value of
> the FalconMem type.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs | 16 ++++------------
>  drivers/gpu/nova-core/regs.rs   |  9 +++++++++
>  2 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index b92152277d00..44a1a531a361 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -454,7 +454,6 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>          fw: &F,
>          target_mem: FalconMem,
>          load_offsets: FalconLoadTarget,
> -        sec: bool,
>      ) -> Result {
>          const DMA_LEN: u32 = 256;
>  
> @@ -516,8 +515,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
>  
>          let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
>              .set_size(DmaTrfCmdSize::Size256B)
> -            .set_imem(target_mem != FalconMem::Dmem)
> -            .set_sec(if sec { 1 } else { 0 });
> +            .with_falcon_mem(target_mem);
>  
>          for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
>              // Perform a transfer of size `DMA_LEN`.
> @@ -551,21 +549,15 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
>                  .set_mem_type(FalconFbifMemType::Physical)
>          });
>  
> -        self.dma_wr(
> -            bar,
> -            fw,
> -            FalconMem::ImemSecure,
> -            fw.imem_sec_load_params(),
> -            true,
> -        )?;
> -        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
> +        self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params())?;
> +        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params())?;
>  
>          if let Some(nmem) = fw.imem_ns_load_params() {
>              // This code should never actual get executed, because the Non-Secure
>              // section only exists on firmware used by Turing and GA100, and
>              // those platforms do not use DMA.  But we include this code for
>              // consistency.
> -            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
> +            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem)?;
>          }
>  
>          self.hal.program_brom(self, bar, &fw.brom_params())?;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 82cc6c0790e5..c049f5aaf2f2 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -16,6 +16,7 @@
>          FalconCoreRevSubversion,
>          FalconFbifMemType,
>          FalconFbifTarget,
> +        FalconMem,
>          FalconModSelAlgo,
>          FalconSecurityModel,
>          PFalcon2Base,
> @@ -325,6 +326,14 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
>      16:16   set_dmtag as u8;
>  });
>  
> +impl NV_PFALCON_FALCON_DMATRFCMD {
> +    /// Programs the 'imem' and 'sec' fields for the given FalconMem
> +    pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
> +        self.set_imem(mem != FalconMem::Dmem)
> +            .set_sec(if mem == FalconMem::ImemSecure { 1 } else { 0 })

I went down a deep rabbit hole trying to understand why these methods
are chained, even though setting is infallible. And I found a major
API design choice that I passionately disagree with, in bitfield!(),
so I'm posting a separate patch to "fix" it. ha. :)

Anyway, this is all fine here, so:


Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard



> +    }
> +}
> +
>  register!(NV_PFALCON_FALCON_DMATRFFBOFFS @ PFalconBase[0x0000011c] {
>      31:0    offs as u32;
>  });



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL
  2025-12-18  3:29 ` [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL Timur Tabi
  2025-12-18 11:25   ` Alexandre Courbot
@ 2025-12-31 22:07   ` John Hubbard
  1 sibling, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 22:07 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> A few Falcon methods are actually GPU-specific, so move them
> into the HAL.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs           | 45 ++---------------------
>  drivers/gpu/nova-core/falcon/hal.rs       | 10 +++++
>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 43 ++++++++++++++++++++++
>  3 files changed, 56 insertions(+), 42 deletions(-)


Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard

> 
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> index 44a1a531a361..6b54c0ca458a 100644
> --- a/drivers/gpu/nova-core/falcon.rs
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -13,7 +13,6 @@
>      prelude::*,
>      sync::aref::ARef,
>      time::{
> -        delay::fsleep,
>          Delta, //
>      },
>  };
> @@ -394,48 +393,11 @@ pub(crate) fn dma_reset(&self, bar: &Bar0) {
>          regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, &E::ID);
>      }
>  
> -    /// Wait for memory scrubbing to complete.
> -    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
> -        // TIMEOUT: memory scrubbing should complete in less than 20ms.
> -        read_poll_timeout(
> -            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> -            |r| r.mem_scrubbing_done(),
> -            Delta::ZERO,
> -            Delta::from_millis(20),
> -        )
> -        .map(|_| ())
> -    }
> -
> -    /// Reset the falcon engine.
> -    fn reset_eng(&self, bar: &Bar0) -> Result {
> -        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
> -
> -        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
> -        // RESET_READY so a non-failing timeout is used.
> -        let _ = read_poll_timeout(
> -            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> -            |r| r.reset_ready(),
> -            Delta::ZERO,
> -            Delta::from_micros(150),
> -        );
> -
> -        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
> -
> -        // TIMEOUT: falcon engine should not take more than 10us to reset.
> -        fsleep(Delta::from_micros(10));
> -
> -        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
> -
> -        self.reset_wait_mem_scrubbing(bar)?;
> -
> -        Ok(())
> -    }
> -
>      /// Reset the controller, select the falcon core, and wait for memory scrubbing to complete.
>      pub(crate) fn reset(&self, bar: &Bar0) -> Result {
> -        self.reset_eng(bar)?;
> +        self.hal.reset_eng(bar)?;
>          self.hal.select_core(self, bar)?;
> -        self.reset_wait_mem_scrubbing(bar)?;
> +        self.hal.reset_wait_mem_scrubbing(bar)?;
>  
>          regs::NV_PFALCON_FALCON_RM::default()
>              .set_value(regs::NV_PMC_BOOT_0::read(bar).into())
> @@ -665,8 +627,7 @@ pub(crate) fn signature_reg_fuse_version(
>      ///
>      /// Returns `true` if the RISC-V core is active, `false` otherwise.
>      pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
> -        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
> -        cpuctl.active_stat()
> +        self.hal.is_riscv_active(bar)
>      }
>  
>      /// Write the application version to the OS register.
> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
> index 8dc56a28ad65..c77a1568ea96 100644
> --- a/drivers/gpu/nova-core/falcon/hal.rs
> +++ b/drivers/gpu/nova-core/falcon/hal.rs
> @@ -37,6 +37,16 @@ fn signature_reg_fuse_version(
>  
>      /// Program the boot ROM registers prior to starting a secure firmware.
>      fn program_brom(&self, falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result;
> +
> +    /// Check if the RISC-V core is active.
> +    /// Returns `true` if the RISC-V core is active, `false` otherwise.
> +    fn is_riscv_active(&self, bar: &Bar0) -> bool;
> +
> +    /// Wait for memory scrubbing to complete.
> +    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result;
> +
> +    /// Reset the falcon engine.
> +    fn reset_eng(&self, bar: &Bar0) -> Result;
>  }
>  
>  /// Returns a boxed falcon HAL adequate for `chipset`.
> diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
> index 69a7a95cac16..232d51a4921f 100644
> --- a/drivers/gpu/nova-core/falcon/hal/ga102.rs
> +++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
> @@ -6,6 +6,7 @@
>      device,
>      io::poll::read_poll_timeout,
>      prelude::*,
> +    time::delay::fsleep,
>      time::Delta, //
>  };
>  
> @@ -117,4 +118,46 @@ fn signature_reg_fuse_version(
>      fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
>          program_brom_ga102::<E>(bar, params)
>      }
> +
> +    fn is_riscv_active(&self, bar: &Bar0) -> bool {
> +        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
> +        cpuctl.active_stat()
> +    }
> +
> +    /// Wait for memory scrubbing to complete.
> +    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
> +        // TIMEOUT: memory scrubbing should complete in less than 20ms.
> +        read_poll_timeout(
> +            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> +            |r| r.mem_scrubbing_done(),
> +            Delta::ZERO,
> +            Delta::from_millis(20),
> +        )
> +        .map(|_| ())
> +    }
> +
> +    /// Reset the falcon engine.
> +    fn reset_eng(&self, bar: &Bar0) -> Result {
> +        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
> +
> +        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
> +        // RESET_READY so a non-failing timeout is used.
> +        let _ = read_poll_timeout(
> +            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
> +            |r| r.reset_ready(),
> +            Delta::ZERO,
> +            Delta::from_micros(150),
> +        );
> +
> +        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
> +
> +        // TIMEOUT: falcon engine should not take more than 10us to reset.
> +        fsleep(Delta::from_micros(10));
> +
> +        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
> +
> +        self.reset_wait_mem_scrubbing(bar)?;
> +
> +        Ok(())
> +    }
>  }



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 07/11] gpu: nova-core: Add basic Turing HAL
  2025-12-18  3:29 ` [PATCH v4 07/11] gpu: nova-core: Add basic Turing HAL Timur Tabi
@ 2025-12-31 22:54   ` John Hubbard
  0 siblings, 0 replies; 46+ messages in thread
From: John Hubbard @ 2025-12-31 22:54 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	nouveau, rust-for-linux

On 12/17/25 7:29 PM, Timur Tabi wrote:
> Add the basic HAL for recognizing Turing GPUs.  This isn't enough
> to support booting GSP-RM on Turing, but it's a start.
> 
> Note that GA100, which boots using the same method as Turing, is not
> supported yet.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/hal.rs       |  4 ++
>  drivers/gpu/nova-core/falcon/hal/tu102.rs | 73 +++++++++++++++++++++++
>  drivers/gpu/nova-core/regs.rs             | 14 +++++
>  3 files changed, 91 insertions(+)
>  create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs
> 
> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
> index c77a1568ea96..c886ba03d1f6 100644
> --- a/drivers/gpu/nova-core/falcon/hal.rs
> +++ b/drivers/gpu/nova-core/falcon/hal.rs
> @@ -13,6 +13,7 @@
>  };
>  
>  mod ga102;
> +mod tu102;
>  
>  /// Hardware Abstraction Layer for Falcon cores.
>  ///
> @@ -60,6 +61,9 @@ pub(super) fn falcon_hal<E: FalconEngine + 'static>(
>      use Chipset::*;
>  
>      let hal = match chipset {
> +        TU102 | TU104 | TU106 | TU116 | TU117 => {
> +            KBox::new(tu102::Tu102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
> +        }
>          GA102 | GA103 | GA104 | GA106 | GA107 | AD102 | AD103 | AD104 | AD106 | AD107 => {
>              KBox::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
>          }
> diff --git a/drivers/gpu/nova-core/falcon/hal/tu102.rs b/drivers/gpu/nova-core/falcon/hal/tu102.rs
> new file mode 100644
> index 000000000000..ac8f58ef6789
> --- /dev/null
> +++ b/drivers/gpu/nova-core/falcon/hal/tu102.rs
> @@ -0,0 +1,73 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use core::marker::PhantomData;
> +
> +use kernel::{
> +    io::poll::read_poll_timeout,
> +    prelude::*,
> +    time::delay::fsleep,
> +    time::Delta, //
> +};
> +
> +use crate::driver::Bar0;
> +use crate::falcon::{Falcon, FalconBromParams, FalconEngine};
> +use crate::regs;

This needs to be formatted like this:

use crate::{
    driver::Bar0,
    falcon::{
        Falcon,
        FalconBromParams,
        FalconEngine, //
    },
    regs, //
};


With that applied, please feel free to add:


Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard


> +
> +use super::FalconHal;
> +
> +pub(super) struct Tu102<E: FalconEngine>(PhantomData<E>);
> +
> +impl<E: FalconEngine> Tu102<E> {
> +    pub(super) fn new() -> Self {
> +        Self(PhantomData)
> +    }
> +}
> +
> +impl<E: FalconEngine> FalconHal<E> for Tu102<E> {
> +    fn select_core(&self, _falcon: &Falcon<E>, _bar: &Bar0) -> Result {
> +        Ok(())
> +    }
> +
> +    fn signature_reg_fuse_version(
> +        &self,
> +        _falcon: &Falcon<E>,
> +        _bar: &Bar0,
> +        _engine_id_mask: u16,
> +        _ucode_id: u8,
> +    ) -> Result<u32> {
> +        Ok(0)
> +    }
> +
> +    fn program_brom(&self, _falcon: &Falcon<E>, _bar: &Bar0, _params: &FalconBromParams) -> Result {
> +        Ok(())
> +    }
> +
> +    fn is_riscv_active(&self, bar: &Bar0) -> bool {
> +        let cpuctl = regs::NV_PRISCV_RISCV_CORE_SWITCH_RISCV_STATUS::read(bar, &E::ID);
> +        cpuctl.active_stat()
> +    }
> +
> +    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
> +        // TIMEOUT: memory scrubbing should complete in less than 10ms.
> +        read_poll_timeout(
> +            || Ok(regs::NV_PFALCON_FALCON_DMACTL::read(bar, &E::ID)),
> +            |r| r.mem_scrubbing_done(),
> +            Delta::ZERO,
> +            Delta::from_millis(10),
> +        )
> +        .map(|_| ())
> +    }
> +
> +    fn reset_eng(&self, bar: &Bar0) -> Result {
> +        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
> +
> +        // TIMEOUT: falcon engine should not take more than 10us to reset.
> +        fsleep(Delta::from_micros(10));
> +
> +        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
> +
> +        self.reset_wait_mem_scrubbing(bar)?;
> +
> +        Ok(())
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index c049f5aaf2f2..8188a566e1ae 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -307,6 +307,13 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
>      7:7     secure_stat as bool;
>  });
>  
> +impl NV_PFALCON_FALCON_DMACTL {
> +    /// Returns `true` if memory scrubbing is completed.
> +    pub(crate) fn mem_scrubbing_done(self) -> bool {
> +        !self.dmem_scrubbing() && !self.imem_scrubbing()
> +    }
> +}
> +
>  register!(NV_PFALCON_FALCON_DMATRFBASE @ PFalconBase[0x00000110] {
>      31:0    base as u32;
>  });
> @@ -389,6 +396,13 @@ pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
>  
>  // PRISCV
>  
> +// RISC-V status register for debug (Turing and GA100 only).
> +// Reflects current RISC-V core status.
> +register!(NV_PRISCV_RISCV_CORE_SWITCH_RISCV_STATUS @ PFalcon2Base[0x00000240] {
> +    0:0     active_stat as bool, "RISC-V core active/inactive status";
> +});
> +
> +// GA102 and later
>  register!(NV_PRISCV_RISCV_CPUCTL @ PFalcon2Base[0x00000388] {
>      0:0     halted as bool;
>      7:7     active_stat as bool;



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images
  2025-12-30 22:56     ` Timur Tabi
@ 2025-12-31 23:22       ` Timur Tabi
  0 siblings, 0 replies; 46+ messages in thread
From: Timur Tabi @ 2025-12-31 23:22 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Alexandre Courbot, dakr@kernel.org,
	Joel Fernandes, John Hubbard, rust-for-linux@vger.kernel.org

On Tue, 2025-12-30 at 22:56 +0000, Timur Tabi wrote:
> Hmmm... that would require that I implement FalconLoadParams for GenericBootloader.  That's not a
> bad idea.  I'm not sure how I would build the Dmem FalconLoadTarget, however, given that it needs
> to
> reference data from the FRTS FalconFirmware object.
> 
> impl FalconLoadParams for GenericBootloader {
>     fn imem_sec_load_params(&self) -> FalconLoadTarget {
>         FalconLoadTarget {
>         }
>     }
> 
>     fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
>         FalconLoadTarget {
>         }
>     }
> 
>     fn dmem_load_params(&self) -> FalconLoadTarget {
>         FalconLoadTarget {
> 	   // How do I extract data from  the FRTS firmware image here?
>         }
>     }
> }

So I'm having doubts about this approach now.  It turns out that we would need to make everything
except dmem_load_params return an Option<>, including boot_args.  I would also need to add a
start_tag() method for PIO, and that would also return an Option<>.  Does it really make sense to
have FalconLoadParams look like this:

pub(crate) trait FalconLoadParams {
    fn imem_sec_load_params(&self) -> Option<FalconLoadTarget>;
    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget>;
    fn dmem_load_params(&self) -> FalconLoadTarget;
    fn brom_params(&self) -> Option<FalconBromParams>;
    fn boot_addr(&self) -> Option<u32>;
    fn start_tag(&self) -> Option<u16>;
}

This seems to defeat the purpose of having a unified FalconLoadParams, if almost everything in it is
optional.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2025-12-30 21:42   ` Timur Tabi
@ 2026-01-04 10:21     ` Ewan Chorynski
  2026-01-04 15:01       ` Timur Tabi
  0 siblings, 1 reply; 46+ messages in thread
From: Ewan Chorynski @ 2026-01-04 10:21 UTC (permalink / raw)
  To: Timur Tabi, ewan.chorynski@ik.me, Alexandre Courbot,
	dakr@kernel.org, nouveau@lists.freedesktop.org, Joel Fernandes,
	John Hubbard, rust-for-linux@vger.kernel.org

On Tue Dec 30, 2025 at 10:42 PM CET, Timur Tabi wrote:
> On Sun, 2025-12-28 at 18:45 +0100, Ewan Chorynski wrote:
>> 
>> [    3.623000] NovaCore 0000:01:00.0: GSP MBOX0: 0xffffe000, MBOX1: 0x0
>> [    3.623026] NovaCore 0000:01:00.0: Using SEC2 to load and run the booter_load firmware...
>> [    3.626236] NovaCore 0000:01:00.0: SEC2 MBOX0: 0x31, MBOX10x0
>> [    3.626265] NovaCore 0000:01:00.0: Booter-load failed with error 0x31
>> 
>> Is this expected to happen ? Here is the full device description from
>> lspci :
>> 
>> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q]
>> (rev a1)
>
> This should work on your GPU with my patches.  Error 0x31 is a generic failure error code that
> indicates that booter_load failed to initialize, and that could be for any number of reasons. 
> Unfortunately, the only way I've been able to debug such issues is to reproduce them in-house with a
> custom build of booter_load.
>
> I suggest that for now, you hold off until these commits are merged, along with my debugfs patches,
> and then I can try to find a similar GPU internally that exhibits the failure and see if there's
> something else missing.  Turing is the most complicated of all GSP-capable GPUs to boot, partly
> because the firmware does a terrible job of reporting errors.
>
> I'm assuming that Nouveau boots just fine with the same firmware images?  If you turn on debug
> logging in Nouveau, it should say that it's booting with 570.144.  You can force it by deleting all
> the *535.113.01* images in /lib/firmware/nvidia/

Nouveau is not probing either with 570.114.

[    5.619048] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/gsp-570.144.bin" loaded - 28542040 byte(s)
[    5.619431] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/bootloader-570.144.bin" loaded - 4196 byte(s)
[    5.619944] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/booter_load-570.144.bin" loaded - 59272 byte(s)
[    5.620351] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/booter_unload-570.144.bin" loaded - 39304 byte(s)
[    5.620356] nouveau 0000:01:00.0: gsp: RM version: 570.144
...
[    5.860631] nouveau 0000:01:00.0: sec2(gsp):booter-load: booting
[    5.860943] nouveau 0000:01:00.0: sec2(gsp): mbox 00000031 00000000
[    5.860961] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
[    5.860982] nouveau 0000:01:00.0: gsp: released sec2 falcon

This suggests that the firmware itself is part of the issue. I can send
the full boot logs from nouveau in debug if needed.

I also tried the fix given by John at [1] but got the same error so it
may be a different issue.

[1]: https://lore.kernel.org/rust-for-linux/4beba5c3-a117-4cb7-8fed-2f1c133dfec2@nvidia.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] gpu: nova-core: add Turing support
  2026-01-04 10:21     ` Ewan Chorynski
@ 2026-01-04 15:01       ` Timur Tabi
  0 siblings, 0 replies; 46+ messages in thread
From: Timur Tabi @ 2026-01-04 15:01 UTC (permalink / raw)
  To: ewan.chorynski@ik.me, Alexandre Courbot, dakr@kernel.org,
	nouveau@lists.freedesktop.org, Joel Fernandes, John Hubbard,
	rust-for-linux@vger.kernel.org

On Sun, 2026-01-04 at 11:21 +0100, Ewan Chorynski wrote:
> Nouveau is not probing either with 570.114.
> 
> [    5.619048] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/gsp-570.144.bin" loaded -
> 28542040 byte(s)
> [    5.619431] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/bootloader-570.144.bin"
> loaded - 4196 byte(s)
> [    5.619944] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/booter_load-570.144.bin"
> loaded - 59272 byte(s)
> [    5.620351] nouveau 0000:01:00.0: gsp: firmware "nvidia/tu117/gsp/booter_unload-
> 570.144.bin" loaded - 39304 byte(s)
> [    5.620356] nouveau 0000:01:00.0: gsp: RM version: 570.144
> ...
> [    5.860631] nouveau 0000:01:00.0: sec2(gsp):booter-load: booting
> [    5.860943] nouveau 0000:01:00.0: sec2(gsp): mbox 00000031 00000000
> [    5.860961] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
> [    5.860982] nouveau 0000:01:00.0: gsp: released sec2 falcon
> 
> This suggests that the firmware itself is part of the issue. I can send
> the full boot logs from nouveau in debug if needed.

If Nouveau doesn't boot, then this is a more serious problem.  Unfortunately, logs won't help,
because there are none.  Either booter_load boots, or it doesn't, and error 0x31 simply means
that it failed very early, before anything useful was done.  The only way to debug this is with
an internal repro.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2026-01-04 15:01 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18  3:29 [PATCH v4 00/11] gpu: nova-core: add Turing support Timur Tabi
2025-12-18  3:29 ` [PATCH v4 01/11] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
2025-12-31 19:29   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 02/11] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
2025-12-31 19:35   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 03/11] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
2025-12-31 19:58   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
2025-12-31 19:28   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
2025-12-18 11:24   ` Alexandre Courbot
2025-12-31 21:35   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 06/11] gpu: nova-core: move some functions into the HAL Timur Tabi
2025-12-18 11:25   ` Alexandre Courbot
2025-12-31 22:07   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 07/11] gpu: nova-core: Add basic Turing HAL Timur Tabi
2025-12-31 22:54   ` John Hubbard
2025-12-18  3:29 ` [PATCH v4 08/11] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
2025-12-18  7:48   ` Alexandre Courbot
2025-12-19 12:49     ` Alexandre Courbot
2025-12-18  3:29 ` [PATCH v4 09/11] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
2025-12-18  8:01   ` Alexandre Courbot
2025-12-18  3:29 ` [PATCH v4 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
2025-12-18 11:54   ` Alexandre Courbot
2025-12-18 22:51     ` Timur Tabi
2025-12-19  3:43       ` Alexandre Courbot
2025-12-19  4:34         ` Timur Tabi
2025-12-19  5:55           ` Alexandre Courbot
2025-12-18  3:29 ` [PATCH v4 11/11] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
2025-12-18 12:59   ` Alexandre Courbot
2025-12-30 22:56     ` Timur Tabi
2025-12-31 23:22       ` Timur Tabi
2025-12-31 21:30     ` Timur Tabi
2025-12-28 17:45 ` [PATCH v4 00/11] gpu: nova-core: add Turing support Ewan Chorynski
2025-12-30 21:42   ` Timur Tabi
2026-01-04 10:21     ` Ewan Chorynski
2026-01-04 15:01       ` Timur Tabi
2025-12-31  2:58 ` John Hubbard
2025-12-31  4:26   ` Timur Tabi
2025-12-31  6:17     ` John Hubbard
2025-12-31 16:33       ` Timur Tabi
2025-12-31 17:29         ` John Hubbard
2025-12-31 19:15           ` John Hubbard
2025-12-31 19:23             ` Timur Tabi
2025-12-31 20:06 ` John Hubbard
2025-12-31 20:11   ` Timur Tabi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).