[PATCH v2 00/13] gpu: nova-core: add Turing support

rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/13] gpu: nova-core: add Turing support
@ 2025-12-01 23:39 Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 01/13] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

This patch set adds basic support for pre-booting GSP-RM
on Turing.

There is also partial support for GA100, but it's currently not
fully implemented.  GA100 is considered experimental in Nouveau,
and so it hasn't been tested with NovaCore either.

That latest linux-firmware.git is required because it contains the
Generic Bootloader image that has not yet been propogated to
distros.

Summary of changes:

1. Introduce non-secure IMEM support.  For GA102 and later, only secure IMEM
is used.
2. Because of non-secure IMEM, Turing booter firmware images need some of
the headers parsed differently for stuff like the load target address.
3. Add support the tu10x firmware signature section in the ELF image.
4. Add several new registers used only on Turing.
5. Some functions that were considered generic Falcon operations are
actually different on Turing vs GA102+, so they are moved to the HAL.
6. The FRTS FWSEC firmware in VBIOS uses a different version of the
descriptor header.
7. I don't know why this isn't necessary on GA102+, but GSP-RM
LIBOS args struct needs to have its 'size' field aligned to 4KB.
8. Turing Falcons do not support DMA, so PIO is used to copy images
into IMEM/DMEM.

Changes from v1:
1. Replaced pointer/length with slice in PIO code.
2. Added RFC patch to implement trait object FalconUCodeDescriptor .
3. Added comments to new registers, structs, and other places.
4. Fixed all CLIPPY complaints.
5. Added supports_dma() method for Falcon HAL
6. Renamed ImemSec and ImemNs to ImemSecure and ImemNonSecure
7. Several other miscellaneous fixes based on review comments.

Timur Tabi (13):
  gpu: nova-core: rename Imem to ImemSecure
  gpu: nova-core: add ImemNonSecure section infrastructure
  gpu: nova-core: support header parsing on Turing/GA100
  gpu: nova-core: add support for Turing/GA100 fwsignature
  gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
  gpu: nova-core: add Turing boot registers
  gpu: nova-core: move some functions into the HAL
  gpu: nova-core: Add basic Turing HAL
  gpu: nova-core: add Falcon HAL method supports_dma()
  gpu: nova-core: add FalconUCodeDescV2 support
  gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  gpu: nova-core: add PIO support for loading firmware images
  [RFC] gpu: nova: implement trait object FalconUCodeDescriptor

 drivers/gpu/nova-core/falcon.rs           | 226 +++++++++++++++++-----
 drivers/gpu/nova-core/falcon/hal.rs       |  19 +-
 drivers/gpu/nova-core/falcon/hal/ga102.rs |  47 +++++
 drivers/gpu/nova-core/falcon/hal/tu102.rs |  78 ++++++++
 drivers/gpu/nova-core/firmware.rs         | 135 ++++++++++++-
 drivers/gpu/nova-core/firmware/booter.rs  |  46 ++++-
 drivers/gpu/nova-core/firmware/fwsec.rs   | 215 +++++++++++++++++---
 drivers/gpu/nova-core/firmware/gsp.rs     |   9 +-
 drivers/gpu/nova-core/gsp/boot.rs         |  10 +-
 drivers/gpu/nova-core/gsp/fw.rs           |   2 +-
 drivers/gpu/nova-core/regs.rs             |  73 +++++++
 drivers/gpu/nova-core/vbios.rs            |  73 ++++---
 12 files changed, 805 insertions(+), 128 deletions(-)
 create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs

base-commit: 57dc2ea0b7bdb828c5d966d9135c28fe854933a4
prerequisite-patch-id: fcf54aca59a74f7ca677919565b427d18406462c
-- 
2.52.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/13] gpu: nova-core: rename Imem to ImemSecure
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 02/13] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Rename FalconMem::Imem to ImemSecure to indicate that it references
Secure Instruction Memory.  This change has no functional impact.

On Falcon cores, pages in instruction memory can be tagged as Secure
or Non-Secure.  For GA102 and later, only Secure is used, which is why
FalconMem::Imem seems appropriate.  However, Turing firmware images
can also contain non-secure sections, and so FalconMem needs to support
that.  By renaming Imem to ImemSec now, future patches for Turing support
will be simpler.

Nouveau uses the term "IMEM" to refer both to the Instruction Memory
block on Falcon cores as well as to the images of secure firmware
uploaded to part of IMEM.  OpenRM uses the terms "ImemSec" and "ImemNs"
instead, and uses "IMEM" just to refer to the physical memory device.

Renaming these terms allows us to align with OpenRM, avoid confusion
between IMEM and ImemSec, and makes future patches simpler.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs          | 14 +++++++-------
 drivers/gpu/nova-core/firmware/booter.rs | 12 ++++++------
 drivers/gpu/nova-core/firmware/fwsec.rs  |  2 +-
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 82c661aef594..618e3962d83a 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -237,8 +237,8 @@ fn from(value: PeregrineCoreSelect) -> Self {
 /// Different types of memory present in a falcon core.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub(crate) enum FalconMem {
-    /// Instruction Memory.
-    Imem,
+    /// Secure Instruction Memory.
+    ImemSecure,
     /// Data Memory.
     Dmem,
 }
@@ -345,8 +345,8 @@ pub(crate) struct FalconBromParams {
 
 /// Trait for providing load parameters of falcon firmwares.
 pub(crate) trait FalconLoadParams {
-    /// Returns the load parameters for `IMEM`.
-    fn imem_load_params(&self) -> FalconLoadTarget;
+    /// Returns the load parameters for Secure `IMEM`.
+    fn imem_sec_load_params(&self) -> FalconLoadTarget;
 
     /// Returns the load parameters for `DMEM`.
     fn dmem_load_params(&self) -> FalconLoadTarget;
@@ -457,7 +457,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
         //
         // For DMEM we can fold the start offset into the DMA handle.
         let (src_start, dma_start) = match target_mem {
-            FalconMem::Imem => (load_offsets.src_start, fw.dma_handle()),
+            FalconMem::ImemSecure => (load_offsets.src_start, fw.dma_handle()),
             FalconMem::Dmem => (
                 0,
                 fw.dma_handle_with_offset(load_offsets.src_start.into_safe_cast())?,
@@ -508,7 +508,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
 
         let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
             .set_size(DmaTrfCmdSize::Size256B)
-            .set_imem(target_mem == FalconMem::Imem)
+            .set_imem(target_mem == FalconMem::ImemSecure)
             .set_sec(if sec { 1 } else { 0 });
 
         for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
@@ -543,7 +543,7 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
                 .set_mem_type(FalconFbifMemType::Physical)
         });
 
-        self.dma_wr(bar, fw, FalconMem::Imem, fw.imem_load_params(), true)?;
+        self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params(), true)?;
         self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
 
         self.hal.program_brom(self, bar, &fw.brom_params())?;
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index f107f753214a..096cd01dbc9d 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -251,8 +251,8 @@ impl<'a> FirmwareSignature<BooterFirmware> for BooterSignature<'a> {}
 
 /// The `Booter` loader firmware, responsible for loading the GSP.
 pub(crate) struct BooterFirmware {
-    // Load parameters for `IMEM` falcon memory.
-    imem_load_target: FalconLoadTarget,
+    // Load parameters for Secure `IMEM` falcon memory.
+    imem_sec_load_target: FalconLoadTarget,
     // Load parameters for `DMEM` falcon memory.
     dmem_load_target: FalconLoadTarget,
     // BROM falcon parameters.
@@ -354,7 +354,7 @@ pub(crate) fn new(
         };
 
         Ok(Self {
-            imem_load_target: FalconLoadTarget {
+            imem_sec_load_target: FalconLoadTarget {
                 src_start: app0.offset,
                 dst_start: 0,
                 len: app0.len,
@@ -371,8 +371,8 @@ pub(crate) fn new(
 }
 
 impl FalconLoadParams for BooterFirmware {
-    fn imem_load_params(&self) -> FalconLoadTarget {
-        self.imem_load_target.clone()
+    fn imem_sec_load_params(&self) -> FalconLoadTarget {
+        self.imem_sec_load_target.clone()
     }
 
     fn dmem_load_params(&self) -> FalconLoadTarget {
@@ -384,7 +384,7 @@ fn brom_params(&self) -> FalconBromParams {
     }
 
     fn boot_addr(&self) -> u32 {
-        self.imem_load_target.src_start
+        self.imem_sec_load_target.src_start
     }
 }
 
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index b28e34d279f4..6a2f5a0d4b15 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -224,7 +224,7 @@ pub(crate) struct FwsecFirmware {
 }
 
 impl FalconLoadParams for FwsecFirmware {
-    fn imem_load_params(&self) -> FalconLoadTarget {
+    fn imem_sec_load_params(&self) -> FalconLoadTarget {
         FalconLoadTarget {
             src_start: 0,
             dst_start: self.desc.imem_phys_base,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 02/13] gpu: nova-core: add ImemNonSecure section infrastructure
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 01/13] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 03/13] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

The GSP booter firmware in Turing and GA100 includes a third memory
section called ImemNonSecure, which is non-secure IMEM.  This section
must be loaded separately from DMEM and secure IMEM, but only if it
actually exists.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs          | 21 +++++++++++++++++++--
 drivers/gpu/nova-core/firmware/booter.rs |  9 +++++++++
 drivers/gpu/nova-core/firmware/fwsec.rs  |  5 +++++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 618e3962d83a..fed597a29fa4 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -239,6 +239,8 @@ fn from(value: PeregrineCoreSelect) -> Self {
 pub(crate) enum FalconMem {
     /// Secure Instruction Memory.
     ImemSecure,
+    /// Non-Secure Instruction Memory.
+    ImemNonSecure,
     /// Data Memory.
     Dmem,
 }
@@ -348,6 +350,10 @@ pub(crate) trait FalconLoadParams {
     /// Returns the load parameters for Secure `IMEM`.
     fn imem_sec_load_params(&self) -> FalconLoadTarget;
 
+    /// Returns the load parameters for Non-Secure `IMEM`,
+    /// used only on Turing and GA100.
+    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget>;
+
     /// Returns the load parameters for `DMEM`.
     fn dmem_load_params(&self) -> FalconLoadTarget;
 
@@ -457,7 +463,10 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
         //
         // For DMEM we can fold the start offset into the DMA handle.
         let (src_start, dma_start) = match target_mem {
-            FalconMem::ImemSecure => (load_offsets.src_start, fw.dma_handle()),
+            FalconMem::ImemSecure | FalconMem::ImemNonSecure => (
+                load_offsets.src_start,
+                fw.dma_handle()
+            ),
             FalconMem::Dmem => (
                 0,
                 fw.dma_handle_with_offset(load_offsets.src_start.into_safe_cast())?,
@@ -508,7 +517,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
 
         let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
             .set_size(DmaTrfCmdSize::Size256B)
-            .set_imem(target_mem == FalconMem::ImemSecure)
+            .set_imem(target_mem != FalconMem::Dmem)
             .set_sec(if sec { 1 } else { 0 });
 
         for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
@@ -546,6 +555,14 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
         self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params(), true)?;
         self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
 
+        if let Some(nmem) = fw.imem_ns_load_params() {
+            // This code should never actual get executed, because the Non-Secure
+            // section only exists on firmware used by Turing and GA100, and
+            // those platforms do not use DMA.  But we include this code for
+            // consistency.
+            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
+        }
+
         self.hal.program_brom(self, bar, &fw.brom_params())?;
 
         // Set `BootVec` to start of non-secure code.
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index 096cd01dbc9d..1b98bb47424c 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -253,6 +253,9 @@ impl<'a> FirmwareSignature<BooterFirmware> for BooterSignature<'a> {}
 pub(crate) struct BooterFirmware {
     // Load parameters for Secure `IMEM` falcon memory.
     imem_sec_load_target: FalconLoadTarget,
+    // Load parameters for Non-Secure `IMEM` falcon memory,
+    // used only on Turing and GA100
+    imem_ns_load_target: Option<FalconLoadTarget>,
     // Load parameters for `DMEM` falcon memory.
     dmem_load_target: FalconLoadTarget,
     // BROM falcon parameters.
@@ -359,6 +362,8 @@ pub(crate) fn new(
                 dst_start: 0,
                 len: app0.len,
             },
+            // Exists only in the booter image for Turing and GA100
+            imem_ns_load_target: None,
             dmem_load_target: FalconLoadTarget {
                 src_start: load_hdr.os_data_offset,
                 dst_start: 0,
@@ -375,6 +380,10 @@ fn imem_sec_load_params(&self) -> FalconLoadTarget {
         self.imem_sec_load_target.clone()
     }
 
+    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
+        self.imem_ns_load_target.clone()
+    }
+
     fn dmem_load_params(&self) -> FalconLoadTarget {
         self.dmem_load_target.clone()
     }
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 6a2f5a0d4b15..e4009faba6c5 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -232,6 +232,11 @@ fn imem_sec_load_params(&self) -> FalconLoadTarget {
         }
     }
 
+    fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
+        // Only used on Turing and GA100, so return None for now
+        None
+    }
+
     fn dmem_load_params(&self) -> FalconLoadTarget {
         FalconLoadTarget {
             src_start: self.desc.imem_load_size,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 03/13] gpu: nova-core: support header parsing on Turing/GA100
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 01/13] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 02/13] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 04/13] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

The Turing/GA100 version of Booter is slightly different from the
GA102+ version.  The headers are the same, but different fields of
the headers are used to identify the IMEM section.  In addition,
there is an NMEM section on Turing/GA100.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware/booter.rs | 33 +++++++++++++++++++-----
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index 1b98bb47424c..7ceea7cc9a87 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -356,14 +356,27 @@ pub(crate) fn new(
             }
         };
 
+        // There are two versions of Booter, one for Turing/GA100, and another for
+        // GA102+.  The extraction of the IMEM sections differs between the two
+        // versions.  Unfortunately, the file names are the same, and the headers
+        // don't indicate the versions.  The only way to differentiate is by the Chipset.
+
         Ok(Self {
-            imem_sec_load_target: FalconLoadTarget {
-                src_start: app0.offset,
-                dst_start: 0,
-                len: app0.len,
+            imem_sec_load_target:
+                FalconLoadTarget {
+                    src_start: app0.offset,
+                    dst_start: if chipset > Chipset::GA100 { 0 } else { app0.offset },
+                    len: app0.len,
+                },
+            imem_ns_load_target: if chipset > Chipset::GA100 {
+                None
+            } else {
+                Some(FalconLoadTarget {
+                    src_start: 0,
+                    dst_start: load_hdr.os_code_offset,
+                    len: load_hdr.os_code_size,
+                })
             },
-            // Exists only in the booter image for Turing and GA100
-            imem_ns_load_target: None,
             dmem_load_target: FalconLoadTarget {
                 src_start: load_hdr.os_data_offset,
                 dst_start: 0,
@@ -393,7 +406,13 @@ fn brom_params(&self) -> FalconBromParams {
     }
 
     fn boot_addr(&self) -> u32 {
-        self.imem_sec_load_target.src_start
+        if let Some(ns_target) = &self.imem_ns_load_target {
+            // Turing and GA100 - use non-secure load target
+            ns_target.dst_start
+        } else {
+            // GA102 and later - use secure load target
+            self.imem_sec_load_target.src_start
+        }
     }
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 04/13] gpu: nova-core: add support for Turing/GA100 fwsignature
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (2 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 03/13] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 05/13] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Turing and GA100 share the same GSP-RM firmware binary, and the
signature ELF section is labeled ".fwsignature_tu10x".

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware/gsp.rs | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
index 0549805282ab..aa5a6433c377 100644
--- a/drivers/gpu/nova-core/firmware/gsp.rs
+++ b/drivers/gpu/nova-core/firmware/gsp.rs
@@ -163,9 +163,14 @@ pub(crate) fn new<'a, 'b>(
         let fw_section = elf::elf64_section(fw.data(), ".fwimage").ok_or(EINVAL)?;
 
         let sigs_section = match chipset.arch() {
-            Architecture::Ampere => ".fwsignature_ga10x",
+            Architecture::Turing | Architecture::Ampere =>
+                if chipset > Chipset::GA100 {
+                    ".fwsignature_ga10x"
+                } else {
+                    // GA100 uses the same firmware as Turing
+                    ".fwsignature_tu10x"
+                },
             Architecture::Ada => ".fwsignature_ad10x",
-            _ => return Err(ENOTSUPP),
         };
         let signatures = elf::elf64_section(fw.data(), sigs_section)
             .ok_or(EINVAL)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/13] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (3 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 04/13] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 06/13] gpu: nova-core: add Turing boot registers Timur Tabi
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

The with_falcon_mem() method initializes the 'imem' and 'sec' fields of
the NV_PFALCON_FALCON_DMATRFCMD register based on the value of
the FalconMem type.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs | 10 ++++------
 drivers/gpu/nova-core/regs.rs   | 10 ++++++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index fed597a29fa4..a480f8bbeb8e 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -454,7 +454,6 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
         fw: &F,
         target_mem: FalconMem,
         load_offsets: FalconLoadTarget,
-        sec: bool,
     ) -> Result {
         const DMA_LEN: u32 = 256;
 
@@ -517,8 +516,7 @@ fn dma_wr<F: FalconFirmware<Target = E>>(
 
         let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
             .set_size(DmaTrfCmdSize::Size256B)
-            .set_imem(target_mem != FalconMem::Dmem)
-            .set_sec(if sec { 1 } else { 0 });
+            .with_falcon_mem(target_mem);
 
         for pos in (0..num_transfers).map(|i| i * DMA_LEN) {
             // Perform a transfer of size `DMA_LEN`.
@@ -552,15 +550,15 @@ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F)
                 .set_mem_type(FalconFbifMemType::Physical)
         });
 
-        self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params(), true)?;
-        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params(), true)?;
+        self.dma_wr(bar, fw, FalconMem::ImemSecure, fw.imem_sec_load_params())?;
+        self.dma_wr(bar, fw, FalconMem::Dmem, fw.dmem_load_params())?;
 
         if let Some(nmem) = fw.imem_ns_load_params() {
             // This code should never actual get executed, because the Non-Secure
             // section only exists on firmware used by Turing and GA100, and
             // those platforms do not use DMA.  But we include this code for
             // consistency.
-            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem, false)?;
+            self.dma_wr(bar, fw, FalconMem::ImemNonSecure, nmem)?;
         }
 
         self.hal.program_brom(self, bar, &fw.brom_params())?;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 82cc6c0790e5..88bec1d4830b 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -16,6 +16,7 @@
         FalconCoreRevSubversion,
         FalconFbifMemType,
         FalconFbifTarget,
+        FalconMem,
         FalconModSelAlgo,
         FalconSecurityModel,
         PFalcon2Base,
@@ -325,6 +326,15 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
     16:16   set_dmtag as u8;
 });
 
+impl NV_PFALCON_FALCON_DMATRFCMD {
+    /// Programs the 'imem' and 'sec' fields for the given FalconMem
+    pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
+        self
+            .set_imem(mem != FalconMem::Dmem)
+            .set_sec(if mem == FalconMem::ImemSecure { 1 } else { 0 })
+    }
+}
+
 register!(NV_PFALCON_FALCON_DMATRFFBOFFS @ PFalconBase[0x0000011c] {
     31:0    offs as u32;
 });
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 06/13] gpu: nova-core: add Turing boot registers
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (4 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 05/13] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 07/13] gpu: nova-core: move some functions into the HAL Timur Tabi
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Define some more GPU registers used to boot GSP-RM on Turing and GA100.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/regs.rs | 63 +++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 88bec1d4830b..2143869d27ba 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -258,6 +258,11 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
     6:6     swgen0 as bool;
 });
 
+// Interrupt mask clear register. Writing 1 to a bit clears the corresponding interrupt mask.
+register!(NV_PFALCON_FALCON_IRQMCLR @ PFalconBase[0x00000014] {
+    31:0    value as u32;
+});
+
 register!(NV_PFALCON_FALCON_MAILBOX0 @ PFalconBase[0x00000040] {
     31:0    value as u32;
 });
@@ -266,6 +271,14 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
     31:0    value as u32;
 });
 
+// Interface enable register.
+register!(NV_PFALCON_FALCON_ITFEN @ PFalconBase[0x00000048] {
+    0:0     ctxen as bool, "Context interface enable";
+    1:1     mthden as bool, "Method interface enable";
+    2:2     postwr as bool;
+    4:4     secwl_cpuctl_alias as bool;
+});
+
 // Used to store version information about the firmware running
 // on the Falcon processor.
 register!(NV_PFALCON_FALCON_OS @ PFalconBase[0x00000080] {
@@ -307,6 +320,13 @@ pub(crate) fn mem_scrubbing_done(self) -> bool {
     7:7     secure_stat as bool;
 });
 
+impl NV_PFALCON_FALCON_DMACTL {
+    /// Returns `true` if memory scrubbing is completed.
+    pub(crate) fn mem_scrubbing_done(self) -> bool {
+        !self.dmem_scrubbing() && !self.imem_scrubbing()
+    }
+}
+
 register!(NV_PFALCON_FALCON_DMATRFBASE @ PFalconBase[0x00000110] {
     31:0    base as u32;
 });
@@ -353,6 +373,42 @@ pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
     1:1     startcpu as bool;
 });
 
+// Config memory base address. Specifies the upper address bits that must be matched
+// to access the config aperture. The base may not be zero as that would conflict with DMEM.
+register!(NV_PFALCON2_FALCON_CMEMBASE @ PFalcon2Base[0x00000160] {
+    31:18     value as u16;
+});
+
+// IMEM access control register. Up to 4 ports are available for IMEM access.
+register!(NV_PFALCON_FALCON_IMEMC @ PFalconBase[0x00000180[4; 16]] {
+    15:0      offs as u16, "IMEM block and word offset";
+    24:24     aincw as bool, "Auto-increment on write";
+    28:28     secure as bool, "Access secure IMEM";
+});
+
+// IMEM data register. Reading/writing this register accesses IMEM at the address
+// specified by the corresponding IMEMC register.
+register!(NV_PFALCON_FALCON_IMEMD @ PFalconBase[0x00000184[4; 16]] {
+    31:0      data as u32;
+});
+
+// IMEM tag register. Used to set the tag for the current IMEM block.
+register!(NV_PFALCON_FALCON_IMEMT @ PFalconBase[0x00000188[4; 16]] {
+    15:0      tag as u16;
+});
+
+// DMEM access control register. Up to 8 ports are available for DMEM access.
+register!(NV_PFALCON_FALCON_DMEMC @ PFalconBase[0x000001c0[8; 8]] {
+    15:0      offs as u16, "DMEM block and word offset";
+    24:24     aincw as bool, "Auto-increment on write";
+});
+
+// DMEM data register. Reading/writing this register accesses DMEM at the address
+// specified by the corresponding DMEMC register.
+register!(NV_PFALCON_FALCON_DMEMD @ PFalconBase[0x000001c4[8; 8]] {
+    31:0      data as u32;
+});
+
 // Actually known as `NV_PSEC_FALCON_ENGINE` and `NV_PGSP_FALCON_ENGINE` depending on the falcon
 // instance.
 register!(NV_PFALCON_FALCON_ENGINE @ PFalconBase[0x000003c0] {
@@ -390,6 +446,13 @@ pub(crate) fn with_falcon_mem(self, mem: FalconMem) -> Self {
 
 // PRISCV
 
+// RISC-V status register for debug (Turing and GA100 only).
+// Reflects current RISC-V core status.
+register!(NV_PRISCV_RISCV_CORE_SWITCH_RISCV_STATUS @ PFalcon2Base[0x00000240] {
+    0:0     active_stat as bool, "RISC-V core active/inactive status";
+});
+
+// GA102 and later
 register!(NV_PRISCV_RISCV_CPUCTL @ PFalcon2Base[0x00000388] {
     0:0     halted as bool;
     7:7     active_stat as bool;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/13] gpu: nova-core: move some functions into the HAL
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (5 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 06/13] gpu: nova-core: add Turing boot registers Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 08/13] gpu: nova-core: Add basic Turing HAL Timur Tabi
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

A few Falcon methods are actually GPU-specific, so move them
into the HAL.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs           | 45 ++---------------------
 drivers/gpu/nova-core/falcon/hal.rs       | 10 +++++
 drivers/gpu/nova-core/falcon/hal/ga102.rs | 43 ++++++++++++++++++++++
 3 files changed, 56 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index a480f8bbeb8e..f2da6a4a9d36 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -13,7 +13,6 @@
     prelude::*,
     sync::aref::ARef,
     time::{
-        delay::fsleep,
         Delta, //
     },
 };
@@ -394,48 +393,11 @@ pub(crate) fn dma_reset(&self, bar: &Bar0) {
         regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, &E::ID);
     }
 
-    /// Wait for memory scrubbing to complete.
-    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
-        // TIMEOUT: memory scrubbing should complete in less than 20ms.
-        read_poll_timeout(
-            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
-            |r| r.mem_scrubbing_done(),
-            Delta::ZERO,
-            Delta::from_millis(20),
-        )
-        .map(|_| ())
-    }
-
-    /// Reset the falcon engine.
-    fn reset_eng(&self, bar: &Bar0) -> Result {
-        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
-
-        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
-        // RESET_READY so a non-failing timeout is used.
-        let _ = read_poll_timeout(
-            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
-            |r| r.reset_ready(),
-            Delta::ZERO,
-            Delta::from_micros(150),
-        );
-
-        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
-
-        // TIMEOUT: falcon engine should not take more than 10us to reset.
-        fsleep(Delta::from_micros(10));
-
-        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
-
-        self.reset_wait_mem_scrubbing(bar)?;
-
-        Ok(())
-    }
-
     /// Reset the controller, select the falcon core, and wait for memory scrubbing to complete.
     pub(crate) fn reset(&self, bar: &Bar0) -> Result {
-        self.reset_eng(bar)?;
+        self.hal.reset_eng(bar)?;
         self.hal.select_core(self, bar)?;
-        self.reset_wait_mem_scrubbing(bar)?;
+        self.hal.reset_wait_mem_scrubbing(bar)?;
 
         regs::NV_PFALCON_FALCON_RM::default()
             .set_value(regs::NV_PMC_BOOT_0::read(bar).into())
@@ -666,8 +628,7 @@ pub(crate) fn signature_reg_fuse_version(
     ///
     /// Returns `true` if the RISC-V core is active, `false` otherwise.
     pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
-        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
-        cpuctl.active_stat()
+        self.hal.is_riscv_active(bar)
     }
 
     /// Write the application version to the OS register.
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
index 8dc56a28ad65..c77a1568ea96 100644
--- a/drivers/gpu/nova-core/falcon/hal.rs
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -37,6 +37,16 @@ fn signature_reg_fuse_version(
 
     /// Program the boot ROM registers prior to starting a secure firmware.
     fn program_brom(&self, falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result;
+
+    /// Check if the RISC-V core is active.
+    /// Returns `true` if the RISC-V core is active, `false` otherwise.
+    fn is_riscv_active(&self, bar: &Bar0) -> bool;
+
+    /// Wait for memory scrubbing to complete.
+    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result;
+
+    /// Reset the falcon engine.
+    fn reset_eng(&self, bar: &Bar0) -> Result;
 }
 
 /// Returns a boxed falcon HAL adequate for `chipset`.
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
index 69a7a95cac16..232d51a4921f 100644
--- a/drivers/gpu/nova-core/falcon/hal/ga102.rs
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -6,6 +6,7 @@
     device,
     io::poll::read_poll_timeout,
     prelude::*,
+    time::delay::fsleep,
     time::Delta, //
 };
 
@@ -117,4 +118,46 @@ fn signature_reg_fuse_version(
     fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
         program_brom_ga102::<E>(bar, params)
     }
+
+    fn is_riscv_active(&self, bar: &Bar0) -> bool {
+        let cpuctl = regs::NV_PRISCV_RISCV_CPUCTL::read(bar, &E::ID);
+        cpuctl.active_stat()
+    }
+
+    /// Wait for memory scrubbing to complete.
+    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
+        // TIMEOUT: memory scrubbing should complete in less than 20ms.
+        read_poll_timeout(
+            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
+            |r| r.mem_scrubbing_done(),
+            Delta::ZERO,
+            Delta::from_millis(20),
+        )
+        .map(|_| ())
+    }
+
+    /// Reset the falcon engine.
+    fn reset_eng(&self, bar: &Bar0) -> Result {
+        let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID);
+
+        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
+        // RESET_READY so a non-failing timeout is used.
+        let _ = read_poll_timeout(
+            || Ok(regs::NV_PFALCON_FALCON_HWCFG2::read(bar, &E::ID)),
+            |r| r.reset_ready(),
+            Delta::ZERO,
+            Delta::from_micros(150),
+        );
+
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
+
+        // TIMEOUT: falcon engine should not take more than 10us to reset.
+        fsleep(Delta::from_micros(10));
+
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
+
+        self.reset_wait_mem_scrubbing(bar)?;
+
+        Ok(())
+    }
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/13] gpu: nova-core: Add basic Turing HAL
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (6 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 07/13] gpu: nova-core: move some functions into the HAL Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 09/13] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Add the basic HAL for recognizing Turing GPUs.  This isn't enough
to support booting GSP-RM on Turing, but it's a start.

Note that GA100, which boots using the same method as Turing, is not
supported yet.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon/hal.rs       |  6 +-
 drivers/gpu/nova-core/falcon/hal/tu102.rs | 74 +++++++++++++++++++++++
 2 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs

diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
index c77a1568ea96..7a6610e9d0a8 100644
--- a/drivers/gpu/nova-core/falcon/hal.rs
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -13,6 +13,7 @@
 };
 
 mod ga102;
+mod tu102;
 
 /// Hardware Abstraction Layer for Falcon cores.
 ///
@@ -60,9 +61,12 @@ pub(super) fn falcon_hal<E: FalconEngine + 'static>(
     use Chipset::*;
 
     let hal = match chipset {
+        TU102 | TU104 | TU106 | TU116 | TU117 => {
+            KBox::new(tu102::Tu102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
+        },
         GA102 | GA103 | GA104 | GA106 | GA107 | AD102 | AD103 | AD104 | AD106 | AD107 => {
             KBox::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
-        }
+        },
         _ => return Err(ENOTSUPP),
     };
 
diff --git a/drivers/gpu/nova-core/falcon/hal/tu102.rs b/drivers/gpu/nova-core/falcon/hal/tu102.rs
new file mode 100644
index 000000000000..19a8b49ca2cd
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal/tu102.rs
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::marker::PhantomData;
+
+use kernel::{
+    io::poll::read_poll_timeout,
+    prelude::*,
+    time::delay::fsleep,
+    time::Delta, //
+};
+
+use crate::driver::Bar0;
+use crate::falcon::{
+    Falcon, FalconBromParams, FalconEngine
+};
+use crate::regs;
+
+use super::FalconHal;
+
+pub(super) struct Tu102<E: FalconEngine>(PhantomData<E>);
+
+impl<E: FalconEngine> Tu102<E> {
+    pub(super) fn new() -> Self {
+        Self(PhantomData)
+    }
+}
+
+impl<E: FalconEngine> FalconHal<E> for Tu102<E> {
+    fn select_core(&self, _falcon: &Falcon<E>, _bar: &Bar0) -> Result {
+        Ok(())
+    }
+
+    fn signature_reg_fuse_version(
+        &self,
+        _falcon: &Falcon<E>,
+        _bar: &Bar0,
+        _engine_id_mask: u16,
+        _ucode_id: u8,
+    ) -> Result<u32> {
+        Ok(0)
+    }
+
+    fn program_brom(&self, _falcon: &Falcon<E>, _bar: &Bar0, _params: &FalconBromParams) -> Result {
+        Ok(())
+    }
+
+    fn is_riscv_active(&self, bar: &Bar0) -> bool {
+        let cpuctl = regs::NV_PRISCV_RISCV_CORE_SWITCH_RISCV_STATUS::read(bar, &E::ID);
+        cpuctl.active_stat()
+    }
+
+    fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
+        read_poll_timeout(
+            || Ok(regs::NV_PFALCON_FALCON_DMACTL::read(bar, &E::ID)),
+            |r| r.mem_scrubbing_done(),
+            Delta::ZERO,
+            Delta::from_millis(10),
+        )
+        .map(|_| ())
+    }
+
+    fn reset_eng(&self, bar: &Bar0) -> Result {
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(true));
+
+        // TIMEOUT: falcon engine should not take more than 10us to reset.
+        fsleep(Delta::from_micros(10));
+
+        regs::NV_PFALCON_FALCON_ENGINE::update(bar, &E::ID, |v| v.set_reset(false));
+
+        self.reset_wait_mem_scrubbing(bar)?;
+
+        Ok(())
+    }
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 09/13] gpu: nova-core: add Falcon HAL method supports_dma()
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (7 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 08/13] gpu: nova-core: Add basic Turing HAL Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 10/13] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Some GPUs do not support using DMA to transfer code/data from system
memory to Falcon memory, and instead must use programmed I/O (PIO).
Add a function to the Falcon HAL to indicate whether a given GPU's
Falcons support DMA for this purpose.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs           | 7 +++++++
 drivers/gpu/nova-core/falcon/hal.rs       | 3 +++
 drivers/gpu/nova-core/falcon/hal/ga102.rs | 4 ++++
 drivers/gpu/nova-core/falcon/hal/tu102.rs | 4 ++++
 4 files changed, 18 insertions(+)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index f2da6a4a9d36..2770d608a2cf 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -631,6 +631,13 @@ pub(crate) fn is_riscv_active(&self, bar: &Bar0) -> bool {
         self.hal.is_riscv_active(bar)
     }
 
+    /// Check if this Falcon supports DMA for loading firmware.
+    ///
+    /// Returns `true` if DMA is supported, `false` if PIO must be used instead.
+    pub(crate) fn supports_dma(&self) -> bool {
+        self.hal.supports_dma()
+    }
+
     /// Write the application version to the OS register.
     pub(crate) fn write_os_version(&self, bar: &Bar0, app_version: u32) {
         regs::NV_PFALCON_FALCON_OS::default()
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
index 7a6610e9d0a8..a760620221c3 100644
--- a/drivers/gpu/nova-core/falcon/hal.rs
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -48,6 +48,9 @@ fn signature_reg_fuse_version(
 
     /// Reset the falcon engine.
     fn reset_eng(&self, bar: &Bar0) -> Result;
+
+    /// Returns true of this Falcon supports DMA transfer from system memory to Falcon memory
+    fn supports_dma(&self) -> bool;
 }
 
 /// Returns a boxed falcon HAL adequate for `chipset`.
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
index 232d51a4921f..684ca72b7cfe 100644
--- a/drivers/gpu/nova-core/falcon/hal/ga102.rs
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -160,4 +160,8 @@ fn reset_eng(&self, bar: &Bar0) -> Result {
 
         Ok(())
     }
+
+    fn supports_dma(&self) -> bool {
+        true
+    }
 }
diff --git a/drivers/gpu/nova-core/falcon/hal/tu102.rs b/drivers/gpu/nova-core/falcon/hal/tu102.rs
index 19a8b49ca2cd..d8ec914bd3f8 100644
--- a/drivers/gpu/nova-core/falcon/hal/tu102.rs
+++ b/drivers/gpu/nova-core/falcon/hal/tu102.rs
@@ -71,4 +71,8 @@ fn reset_eng(&self, bar: &Bar0) -> Result {
 
         Ok(())
     }
+
+    fn supports_dma(&self) -> bool {
+        false
+    }
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 10/13] gpu: nova-core: add FalconUCodeDescV2 support
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (8 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 09/13] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 11/13] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

The FRTS firmware in Turing and GA100 VBIOS has an older header
format (v2 instead of v3).  To support both v2 and v3 at runtime,
add the FalconUCodeDescV2 struct, and update code that references
the FalconUCodeDescV3 directly with a FalconUCodeDesc enum that
encapsulates both.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs       | 114 +++++++++++++++++++++++-
 drivers/gpu/nova-core/firmware/fwsec.rs |  72 +++++++++------
 drivers/gpu/nova-core/vbios.rs          |  73 ++++++++-------
 3 files changed, 201 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 2d2008b33fb4..169b07ca340a 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -43,6 +43,46 @@ fn request_firmware(
         .and_then(|path| firmware::Firmware::request(&path, dev))
 }
 
+/// Structure used to describe some firmwares, notably FWSEC-FRTS.
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct FalconUCodeDescV2 {
+    /// Header defined by 'NV_BIT_FALCON_UCODE_DESC_HEADER_VDESC*' in OpenRM.
+    hdr: u32,
+    /// Stored size of the ucode after the header, compressed or uncompressed
+    stored_size: u32,
+    /// Uncompressed size of the ucode.  If store_size == uncompressed_size, then the ucode
+    /// is not compressed.
+    pub(crate) uncompressed_size: u32,
+    /// Code entry point
+    pub(crate) virtual_entry: u32,
+    /// Offset after the code segment at which the Application Interface Table headers are located.
+    pub(crate) interface_offset: u32,
+    /// Base address at which to load the code segment into 'IMEM'.
+    pub(crate) imem_phys_base: u32,
+    /// Size in bytes of the code to copy into 'IMEM'.
+    pub(crate) imem_load_size: u32,
+    /// Virtual 'IMEM' address (i.e. 'tag') at which the code should start.
+    pub(crate) imem_virt_base: u32,
+    /// Virtual address of secure IMEM segment.
+    pub(crate) imem_sec_base: u32,
+    /// Size of secure IMEM segment.
+    pub(crate) imem_sec_size: u32,
+    /// Offset into stored (uncompressed) image at which DMEM begins.
+    pub(crate) dmem_offset: u32,
+    /// Base address at which to load the data segment into 'DMEM'.
+    pub(crate) dmem_phys_base: u32,
+    /// Size in bytes of the data to copy into 'DMEM'.
+    pub(crate) dmem_load_size: u32,
+    /// "Alternate" Size of data to load into IMEM.
+    pub(crate) alt_imem_load_size: u32,
+    /// "Alternate" Size of data to load into DMEM.
+    pub(crate) alt_dmem_load_size: u32,
+}
+
+// SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+unsafe impl FromBytes for FalconUCodeDescV2 {}
+
 /// Structure used to describe some firmwares, notably FWSEC-FRTS.
 #[repr(C)]
 #[derive(Debug, Clone)]
@@ -76,13 +116,83 @@ pub(crate) struct FalconUCodeDescV3 {
     _reserved: u16,
 }
 
-impl FalconUCodeDescV3 {
+// SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+unsafe impl FromBytes for FalconUCodeDescV3 {}
+
+#[derive(Debug, Clone)]
+pub(crate) enum FalconUCodeDesc {
+    V2(FalconUCodeDescV2),
+    V3(FalconUCodeDescV3),
+}
+
+impl FalconUCodeDesc {
     /// Returns the size in bytes of the header.
     pub(crate) fn size(&self) -> usize {
+        let hdr = match self {
+            FalconUCodeDesc::V2(v2) => v2.hdr,
+            FalconUCodeDesc::V3(v3) => v3.hdr,
+        };
+
         const HDR_SIZE_SHIFT: u32 = 16;
         const HDR_SIZE_MASK: u32 = 0xffff0000;
+        ((hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT).into_safe_cast()
+    }
 
-        ((self.hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT).into_safe_cast()
+    pub(crate) fn imem_load_size(&self) -> u32 {
+        match self {
+            FalconUCodeDesc::V2(v2) => v2.imem_load_size,
+            FalconUCodeDesc::V3(v3) => v3.imem_load_size,
+        }
+    }
+
+    pub(crate) fn interface_offset(&self) -> u32 {
+        match self {
+            FalconUCodeDesc::V2(v2) => v2.interface_offset,
+            FalconUCodeDesc::V3(v3) => v3.interface_offset,
+        }
+    }
+
+
+    pub(crate) fn dmem_load_size(&self) -> u32 {
+        match self {
+            FalconUCodeDesc::V2(v2) => v2.dmem_load_size,
+            FalconUCodeDesc::V3(v3) => v3.dmem_load_size,
+        }
+    }
+
+    pub(crate) fn pkc_data_offset(&self) -> u32 {
+        match self {
+            FalconUCodeDesc::V2(_v2) => 0,
+            FalconUCodeDesc::V3(v3) => v3.pkc_data_offset,
+        }
+    }
+
+    pub(crate) fn engine_id_mask(&self) -> u16 {
+        match self {
+            FalconUCodeDesc::V2(_v2) => 0,
+            FalconUCodeDesc::V3(v3) => v3.engine_id_mask,
+        }
+    }
+
+    pub(crate) fn ucode_id(&self) -> u8 {
+        match self {
+            FalconUCodeDesc::V2(_v2) => 0,
+            FalconUCodeDesc::V3(v3) => v3.ucode_id,
+        }
+    }
+
+    pub(crate) fn signature_count(&self) -> u8 {
+        match self {
+            FalconUCodeDesc::V2(_v2) => 0,
+            FalconUCodeDesc::V3(v3) => v3.signature_count,
+        }
+    }
+
+    pub(crate) fn signature_versions(&self) -> u16 {
+        match self {
+            FalconUCodeDesc::V2(_v2) => 0,
+            FalconUCodeDesc::V3(v3) => v3.signature_versions,
+        }
     }
 }
 
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index e4009faba6c5..36ff8ed51c23 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -40,7 +40,7 @@
         FalconLoadTarget, //
     },
     firmware::{
-        FalconUCodeDescV3,
+        FalconUCodeDesc,
         FirmwareDmaObject,
         FirmwareSignature,
         Signed,
@@ -218,38 +218,59 @@ unsafe fn transmute_mut<T: Sized + FromBytes + AsBytes>(
 /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
 pub(crate) struct FwsecFirmware {
     /// Descriptor of the firmware.
-    desc: FalconUCodeDescV3,
+    desc: FalconUCodeDesc,
     /// GPU-accessible DMA object containing the firmware.
     ucode: FirmwareDmaObject<Self, Signed>,
 }
 
 impl FalconLoadParams for FwsecFirmware {
     fn imem_sec_load_params(&self) -> FalconLoadTarget {
-        FalconLoadTarget {
-            src_start: 0,
-            dst_start: self.desc.imem_phys_base,
-            len: self.desc.imem_load_size,
+        match &self.desc {
+            FalconUCodeDesc::V2(v2) => FalconLoadTarget {
+                src_start: 0,
+                dst_start: v2.imem_sec_base,
+                len: v2.imem_sec_size,
+            },
+            FalconUCodeDesc::V3(v3) => FalconLoadTarget {
+                src_start: 0,
+                dst_start: v3.imem_phys_base,
+                len: v3.imem_load_size,
+            }
         }
     }
 
     fn imem_ns_load_params(&self) -> Option<FalconLoadTarget> {
-        // Only used on Turing and GA100, so return None for now
-        None
+        match &self.desc {
+            FalconUCodeDesc::V2(v2) => Some(FalconLoadTarget {
+                src_start: 0,
+                dst_start: v2.imem_phys_base,
+                len: v2.imem_load_size - v2.imem_sec_size,
+            }),
+            // Not used on V3 platforms
+            FalconUCodeDesc::V3(_v3) => None,
+        }
     }
 
     fn dmem_load_params(&self) -> FalconLoadTarget {
-        FalconLoadTarget {
-            src_start: self.desc.imem_load_size,
-            dst_start: self.desc.dmem_phys_base,
-            len: self.desc.dmem_load_size,
+        match &self.desc {
+            FalconUCodeDesc::V2(v2) => FalconLoadTarget {
+                src_start: v2.dmem_offset,
+                dst_start: v2.dmem_phys_base,
+                len: v2.dmem_load_size,
+            },
+            FalconUCodeDesc::V3(v3) => FalconLoadTarget {
+                src_start: v3.imem_load_size,
+                dst_start: v3.dmem_phys_base,
+                len: v3.dmem_load_size,
+            }
         }
     }
 
     fn brom_params(&self) -> FalconBromParams {
         FalconBromParams {
-            pkc_data_offset: self.desc.pkc_data_offset,
-            engine_id_mask: self.desc.engine_id_mask,
-            ucode_id: self.desc.ucode_id,
+            pkc_data_offset: self.desc.pkc_data_offset(),
+            engine_id_mask: self.desc.engine_id_mask(),
+            ucode_id: self.desc.ucode_id(),
         }
     }
 
@@ -273,10 +294,10 @@ impl FalconFirmware for FwsecFirmware {
 impl FirmwareDmaObject<FwsecFirmware, Unsigned> {
     fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Result<Self> {
         let desc = bios.fwsec_image().header()?;
-        let ucode = bios.fwsec_image().ucode(desc)?;
+        let ucode = bios.fwsec_image().ucode(&desc)?;
         let mut dma_object = DmaObject::from_data(dev, ucode)?;
 
-        let hdr_offset = usize::from_safe_cast(desc.imem_load_size + desc.interface_offset);
+        let hdr_offset = usize::from_safe_cast(desc.imem_load_size() + desc.interface_offset());
         // SAFETY: we have exclusive access to `dma_object`.
         let hdr: &FalconAppifHdrV1 = unsafe { transmute(&dma_object, hdr_offset) }?;
 
@@ -303,7 +324,7 @@ fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Re
             let dmem_mapper: &mut FalconAppifDmemmapperV3 = unsafe {
                 transmute_mut(
                     &mut dma_object,
-                    (desc.imem_load_size + dmem_base).into_safe_cast(),
+                    (desc.imem_load_size() + dmem_base).into_safe_cast(),
                 )
             }?;
 
@@ -317,7 +338,7 @@ fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Re
             let frts_cmd: &mut FrtsCmd = unsafe {
                 transmute_mut(
                     &mut dma_object,
-                    (desc.imem_load_size + cmd_in_buffer_offset).into_safe_cast(),
+                    (desc.imem_load_size() + cmd_in_buffer_offset).into_safe_cast(),
                 )
             }?;
 
@@ -364,11 +385,12 @@ pub(crate) fn new(
 
         // Patch signature if needed.
         let desc = bios.fwsec_image().header()?;
-        let ucode_signed = if desc.signature_count != 0 {
-            let sig_base_img = usize::from_safe_cast(desc.imem_load_size + desc.pkc_data_offset);
-            let desc_sig_versions = u32::from(desc.signature_versions);
+        let ucode_signed = if desc.signature_count() != 0 {
+            let sig_base_img =
+                usize::from_safe_cast(desc.imem_load_size() + desc.pkc_data_offset());
+            let desc_sig_versions = u32::from(desc.signature_versions());
             let reg_fuse_version =
-                falcon.signature_reg_fuse_version(bar, desc.engine_id_mask, desc.ucode_id)?;
+                falcon.signature_reg_fuse_version(bar, desc.engine_id_mask(), desc.ucode_id())?;
             dev_dbg!(
                 dev,
                 "desc_sig_versions: {:#x}, reg_fuse_version: {}\n",
@@ -402,7 +424,7 @@ pub(crate) fn new(
             dev_dbg!(dev, "patching signature with index {}\n", signature_idx);
             let signature = bios
                 .fwsec_image()
-                .sigs(desc)
+                .sigs(&desc)
                 .and_then(|sigs| sigs.get(signature_idx).ok_or(EINVAL))?;
 
             ucode_dma.patch_signature(signature, sig_base_img)?
@@ -411,7 +433,7 @@ pub(crate) fn new(
         };
 
         Ok(FwsecFirmware {
-            desc: desc.clone(),
+            desc: desc,
             ucode: ucode_signed,
         })
     }
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index abf423560ff4..8144079f64f3 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -19,6 +19,8 @@
     driver::Bar0,
     firmware::{
         fwsec::Bcrt30Rsa3kSignature,
+        FalconUCodeDesc,
+        FalconUCodeDescV2,
         FalconUCodeDescV3, //
     },
     num::FromSafeCast,
@@ -1004,19 +1006,10 @@ fn build(self) -> Result<FwSecBiosImage> {
 
 impl FwSecBiosImage {
     /// Get the FwSec header ([`FalconUCodeDescV3`]).
-    pub(crate) fn header(&self) -> Result<&FalconUCodeDescV3> {
+    pub(crate) fn header(&self) -> Result<FalconUCodeDesc> {
         // Get the falcon ucode offset that was found in setup_falcon_data.
         let falcon_ucode_offset = self.falcon_ucode_offset;
 
-        // Make sure the offset is within the data bounds.
-        if falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>() > self.base.data.len() {
-            dev_err!(
-                self.base.dev,
-                "fwsec-frts header not contained within BIOS bounds\n"
-            );
-            return Err(ERANGE);
-        }
-
         // Read the first 4 bytes to get the version.
         let hdr_bytes: [u8; 4] = self.base.data[falcon_ucode_offset..falcon_ucode_offset + 4]
             .try_into()
@@ -1024,33 +1017,47 @@ pub(crate) fn header(&self) -> Result<&FalconUCodeDescV3> {
         let hdr = u32::from_le_bytes(hdr_bytes);
         let ver = (hdr & 0xff00) >> 8;
 
-        if ver != 3 {
-            dev_err!(self.base.dev, "invalid fwsec firmware version: {:?}\n", ver);
-            return Err(EINVAL);
+        let hdr_size = match ver {
+            2 => core::mem::size_of::<FalconUCodeDescV2>(),
+            3 => core::mem::size_of::<FalconUCodeDescV3>(),
+            _ => {
+                dev_err!(self.base.dev, "invalid fwsec firmware version: {:?}\n", ver);
+                return Err(EINVAL);
+            }
+        };
+        // Make sure the offset is within the data bounds
+        if falcon_ucode_offset + hdr_size > self.base.data.len() {
+            dev_err!(
+                self.base.dev,
+                "fwsec-frts header not contained within BIOS bounds\n"
+            );
+            return Err(ERANGE);
         }
 
-        // Return a reference to the FalconUCodeDescV3 structure.
-        //
-        // SAFETY: We have checked that `falcon_ucode_offset + size_of::<FalconUCodeDescV3>` is
-        // within the bounds of `data`. Also, this data vector is from ROM, and the `data` field
-        // in `BiosImageBase` is immutable after construction.
-        Ok(unsafe {
-            &*(self
-                .base
-                .data
-                .as_ptr()
-                .add(falcon_ucode_offset)
-                .cast::<FalconUCodeDescV3>())
-        })
+        let data = self.base.data
+            .get(falcon_ucode_offset..falcon_ucode_offset + hdr_size)
+            .ok_or(EINVAL)?;
+
+        match ver {
+            2 => {
+                let v2 = FalconUCodeDescV2::from_bytes(data).ok_or(EINVAL)?;
+                Ok(FalconUCodeDesc::V2(v2.clone()))
+            }
+            3 => {
+                let v3 = FalconUCodeDescV3::from_bytes(data).ok_or(EINVAL)?;
+                Ok(FalconUCodeDesc::V3(v3.clone()))
+            }
+            _ => Err(EINVAL),
+        }
     }
 
     /// Get the ucode data as a byte slice
-    pub(crate) fn ucode(&self, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+    pub(crate) fn ucode(&self, desc: &FalconUCodeDesc) -> Result<&[u8]> {
         let falcon_ucode_offset = self.falcon_ucode_offset;
 
         // The ucode data follows the descriptor.
         let ucode_data_offset = falcon_ucode_offset + desc.size();
-        let size = usize::from_safe_cast(desc.imem_load_size + desc.dmem_load_size);
+        let size = usize::from_safe_cast(desc.imem_load_size() + desc.dmem_load_size());
 
         // Get the data slice, checking bounds in a single operation.
         self.base
@@ -1066,10 +1073,14 @@ pub(crate) fn ucode(&self, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
     }
 
     /// Get the signatures as a byte slice
-    pub(crate) fn sigs(&self, desc: &FalconUCodeDescV3) -> Result<&[Bcrt30Rsa3kSignature]> {
+    pub(crate) fn sigs(&self, desc: &FalconUCodeDesc) -> Result<&[Bcrt30Rsa3kSignature]> {
+        let hdr_size = match desc {
+            FalconUCodeDesc::V2(_v2) => core::mem::size_of::<FalconUCodeDescV2>(),
+            FalconUCodeDesc::V3(_v3) => core::mem::size_of::<FalconUCodeDescV3>(),
+        };
         // The signatures data follows the descriptor.
-        let sigs_data_offset = self.falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
-        let sigs_count = usize::from(desc.signature_count);
+        let sigs_data_offset = self.falcon_ucode_offset + hdr_size;
+        let sigs_count = usize::from(desc.signature_count());
         let sigs_size = sigs_count * core::mem::size_of::<Bcrt30Rsa3kSignature>();
 
         // Make sure the data is within bounds.
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 11/13] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (9 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 10/13] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-01 23:39 ` [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

GSP-RM insists that the 'size' parameter of the
LibosMemoryRegionInitArgument struct be aligned to 4KB.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index abffd6beec65..71964ee0dec9 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -670,7 +670,7 @@ fn id8(name: &str) -> u64 {
         Self(bindings::LibosMemoryRegionInitArgument {
             id8: id8(name),
             pa: obj.dma_handle(),
-            size: num::usize_as_u64(obj.size()),
+            size: num::usize_as_u64(obj.size().next_multiple_of(GSP_PAGE_SIZE)),
             kind: num::u32_into_u8::<
                 { bindings::LibosMemoryRegionKind_LIBOS_MEMORY_REGION_CONTIGUOUS },
             >(),
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (10 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 11/13] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-02 21:23   ` Joel Fernandes
  2025-12-02 21:28   ` Joel Fernandes
  2025-12-01 23:39 ` [PATCH v2 13/13] [RFC] gpu: nova: implement trait object FalconUCodeDescriptor Timur Tabi
  2025-12-02  2:55 ` [PATCH v2 00/13] gpu: nova-core: add Turing support John Hubbard
  13 siblings, 2 replies; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Turing and GA100 use programmed I/O (PIO) instead of DMA to upload
firmware images into Falcon memory.

A new firmware called the Generic Bootloader (as opposed to the
GSP Bootloader) is used to upload FWSEC.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs         | 149 +++++++++++++++++++++++-
 drivers/gpu/nova-core/firmware.rs       |   4 +-
 drivers/gpu/nova-core/firmware/fwsec.rs | 142 +++++++++++++++++++++-
 drivers/gpu/nova-core/gsp/boot.rs       |  10 +-
 4 files changed, 293 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 2770d608a2cf..88f65ee7805a 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -12,14 +12,17 @@
     io::poll::read_poll_timeout,
     prelude::*,
     sync::aref::ARef,
-    time::{
-        Delta, //
-    },
+    time::Delta,
+    transmute::AsBytes, //
 };
 
 use crate::{
     dma::DmaObject,
     driver::Bar0,
+    firmware::fwsec::{
+        BootloaderDmemDescV2,
+        GenericBootloader, //
+    },
     gpu::Chipset,
     num::{
         FromSafeCast,
@@ -406,6 +409,146 @@ pub(crate) fn reset(&self, bar: &Bar0) -> Result {
         Ok(())
     }
 
+
+    /// See nvkm_falcon_pio_wr - takes a byte array instead of a FalconFirmware
+    fn pio_wr_bytes(
+        &self,
+        bar: &Bar0,
+        img: &[u8],
+        mem_base: u16,
+        target_mem: FalconMem,
+        port: u8,
+        tag: u16
+    ) {
+        let port = usize::from(port);
+
+        match target_mem {
+            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
+                regs::NV_PFALCON_FALCON_IMEMC::default()
+                    .set_secure(target_mem == FalconMem::ImemSecure)
+                    .set_aincw(true)
+                    .set_offs(mem_base)
+                    .write(bar, &E::ID, port);
+
+                let mut tag = tag;
+                for block in img.chunks(256) {
+                    regs::NV_PFALCON_FALCON_IMEMT::default()
+                        .set_tag(tag)
+                        .write(bar, &E::ID, port);
+                    for word in block.chunks(4) {
+                        let w = u32::from_le_bytes(word.try_into().unwrap());
+                        regs::NV_PFALCON_FALCON_IMEMD::default()
+                            .set_data(w)
+                            .write(bar, &E::ID, port);
+                    }
+                    tag += 1;
+                }
+            },
+            FalconMem::Dmem => {
+                regs::NV_PFALCON_FALCON_DMEMC::default()
+                    .set_aincw(true)
+                    .set_offs(mem_base)
+                    .write(bar, &E::ID, port);
+
+                for block in img.chunks(256) {
+                    for word in block.chunks_exact(4) {
+                        regs::NV_PFALCON_FALCON_DMEMD::default()
+                            .set_data(u32::from_le_bytes(word.try_into().unwrap()))
+                            .write(bar, &E::ID, port);
+                    }
+                }
+            },
+        }
+    }
+
+    fn pio_wr<F: FalconFirmware<Target = E>>(
+        &self,
+        bar: &Bar0,
+        fw: &F,
+        target_mem: FalconMem,
+        load_offsets: &FalconLoadTarget,
+        port: u8,
+        tag: u16,
+    ) -> Result {
+        let start = usize::from_safe_cast(load_offsets.src_start);
+        let len = usize::from_safe_cast(load_offsets.len);
+
+        // SAFETY: as_slice() ensures that start+len is within range
+        let data = unsafe { fw.as_slice(start, len)? };
+
+        self.pio_wr_bytes(bar, data, u16::try_from(load_offsets.dst_start)?, target_mem, port, tag);
+
+        Ok(())
+    }
+
+    /// Perform a PIO copy into `IMEM` and `DMEM` of `fw`, and prepare the falcon to run it.
+    pub(crate) fn pio_load<F: FalconFirmware<Target = E>>(
+        &self,
+        bar: &Bar0,
+        fw: &F,
+        gbl: Option<&GenericBootloader>
+    ) -> Result {
+        let imem_sec = fw.imem_sec_load_params();
+        let imem_ns = fw.imem_ns_load_params().unwrap();
+        let dmem = fw.dmem_load_params();
+
+        regs::NV_PFALCON_FBIF_CTL::read(bar, &E::ID)
+            .set_allow_phys_no_ctx(true)
+            .write(bar, &E::ID);
+
+        regs::NV_PFALCON_FALCON_DMACTL::default()
+            .write(bar, &E::ID);
+
+        // If the Generic Bootloader was passed, then use it to boot FRTS
+        if let Some(gbl) =  gbl {
+            let dst_start = u16::try_from(0x10000 - gbl.desc.code_size)?;
+            let data = &gbl.ucode[..usize::from_safe_cast(gbl.desc.code_size)];
+            let tag = u16::try_from(gbl.desc.start_tag)?;
+
+            self.pio_wr_bytes(bar, data, dst_start, FalconMem::ImemNonSecure, 0, tag);
+
+            // This structure tells the generic bootloader where to find the FWSEC
+            // image.
+            let dmem_desc = BootloaderDmemDescV2 {
+                reserved: [0; 4],
+                signature: [0; 4],
+                ctx_dma: 4, // FALCON_DMAIDX_PHYS_SYS_NCOH
+                code_dma_base: fw.dma_handle(),
+                non_sec_code_off: imem_ns.dst_start,
+                non_sec_code_size: imem_ns.len,
+                sec_code_off: imem_sec.dst_start,
+                sec_code_size: imem_sec.len,
+                code_entry_point: 0,
+                data_dma_base: fw.dma_handle() + u64::from(dmem.src_start),
+                data_size: dmem.len,
+                argc: 0,
+                argv: 0,
+            };
+
+            regs::NV_PFALCON_FBIF_TRANSCFG::update(bar, &E::ID, 4, |v| {
+                v.set_target(FalconFbifTarget::CoherentSysmem)
+                    .set_mem_type(FalconFbifMemType::Physical)
+            });
+
+            self.pio_wr_bytes(bar, dmem_desc.as_bytes(), 0, FalconMem::Dmem, 0, 0);
+        } else {
+            self.pio_wr(bar, fw, FalconMem::ImemNonSecure, &imem_ns, 0,
+                u16::try_from(imem_ns.dst_start >> 8)?)?;
+            self.pio_wr(bar, fw, FalconMem::ImemSecure, &imem_sec, 0,
+                u16::try_from(imem_sec.dst_start >> 8)?)?;
+            self.pio_wr(bar, fw, FalconMem::Dmem, &dmem, 0, 0)?;
+        }
+
+        self.hal.program_brom(self, bar, &fw.brom_params())?;
+
+        // Set `BootVec` to start of non-secure code.
+        regs::NV_PFALCON_FALCON_BOOTVEC::default()
+            .set_value(fw.boot_addr())
+            .write(bar, &E::ID);
+
+        Ok(())
+    }
+
     /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
     /// `target_mem`.
     ///
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 169b07ca340a..3008d18f9313 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -31,7 +31,7 @@
 pub(crate) const FIRMWARE_VERSION: &str = "570.144";
 
 /// Requests the GPU firmware `name` suitable for `chipset`, with version `ver`.
-fn request_firmware(
+pub(crate) fn request_firmware(
     dev: &device::Device,
     chipset: gpu::Chipset,
     name: &str,
@@ -258,7 +258,7 @@ fn no_patch_signature(self) -> FirmwareDmaObject<F, Signed> {
 /// Header common to most firmware files.
 #[repr(C)]
 #[derive(Debug, Clone)]
-struct BinHdr {
+pub(crate) struct BinHdr {
     /// Magic number, must be `0x10de`.
     bin_magic: u32,
     /// Version of the header.
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 36ff8ed51c23..159aedd221e8 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -40,12 +40,15 @@
         FalconLoadTarget, //
     },
     firmware::{
+        FIRMWARE_VERSION,
+        BinHdr,
         FalconUCodeDesc,
         FirmwareDmaObject,
         FirmwareSignature,
         Signed,
         Unsigned, //
     },
+    gpu::Chipset,
     num::{
         FromSafeCast,
         IntoSafeCast, //
@@ -213,6 +216,72 @@ unsafe fn transmute_mut<T: Sized + FromBytes + AsBytes>(
     T::from_bytes_mut(unsafe { fw.as_slice_mut(offset, size_of::<T>())? }).ok_or(EINVAL)
 }
 
+/// Descriptor used by RM to figure out the requirements of the boot loader.
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct BootloaderDesc {
+	/// Starting tag of bootloader.
+	pub start_tag: u32,
+	/// DMEM offset where [`BootloaderDmemDescV2`] is to be loaded.
+	pub dmem_load_off: u32,
+	/// Offset of code section in the image.
+	pub code_off: u32,
+	/// Size of code section in the image.
+	pub code_size: u32,
+	/// Offset of data section in the image.
+	pub data_off: u32,
+	/// Size of data section in the image.
+	pub data_size: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for BootloaderDesc {}
+// SAFETY: This struct doesn't contain uninitialized bytes and doesn't have interior mutability.
+unsafe impl AsBytes for BootloaderDesc {}
+
+/// Structure used by the boot-loader to load the rest of the code.
+///
+/// This has to be filled by the GPU driver and copied into DMEM at offset
+/// [`BootloaderDesc.dmem_load_off`].
+#[repr(C, packed)]
+#[derive(Debug, Clone)]
+pub(crate) struct BootloaderDmemDescV2 {
+	/// Reserved, should always be first element.
+	pub reserved: [u32; 4],
+	/// 16B signature for secure code, 0s if no secure code.
+	pub signature: [u32; 4],
+	/// DMA context used by the bootloader while loading code/data.
+	pub ctx_dma: u32,
+	/// 256B-aligned physical FB address where code is located.
+	pub code_dma_base: u64,
+	/// Offset from `code_dma_base` where the non-secure code is located (must be multiple of 256).
+	pub non_sec_code_off: u32,
+	/// Size of the non-secure code part.
+	pub non_sec_code_size: u32,
+	/// Offset from `code_dma_base` where the secure code is located (must be multiple of 256).
+	pub sec_code_off: u32,
+	/// Size of the secure code part.
+	pub sec_code_size: u32,
+	/// Code entry point invoked by the bootloader after code is loaded.
+	pub code_entry_point: u32,
+	/// 256B-aligned physical FB address where data is located.
+	pub data_dma_base: u64,
+	/// Size of data block (should be multiple of 256B).
+	pub data_size: u32,
+	/// Arguments to be passed to the target firmware being loaded.
+	pub argc: u32,
+	/// Number of arguments to be passed to the target firmware being loaded.
+	pub argv: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for BootloaderDmemDescV2 {}
+// SAFETY: This struct doesn't contain uninitialized bytes and doesn't have interior mutability.
+unsafe impl AsBytes for BootloaderDmemDescV2 {}
+
+pub(crate) struct GenericBootloader {
+    pub desc: BootloaderDesc,
+    pub ucode: Vec<u8, kernel::alloc::allocator::Kmalloc>,
+}
+
 /// The FWSEC microcode, extracted from the BIOS and to be run on the GSP falcon.
 ///
 /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
@@ -221,6 +290,8 @@ pub(crate) struct FwsecFirmware {
     desc: FalconUCodeDesc,
     /// GPU-accessible DMA object containing the firmware.
     ucode: FirmwareDmaObject<Self, Signed>,
+    /// Generic bootloader
+    gen_bootloader: Option<GenericBootloader>,
 }
 
 impl FalconLoadParams for FwsecFirmware {
@@ -275,7 +346,19 @@ fn brom_params(&self) -> FalconBromParams {
     }
 
     fn boot_addr(&self) -> u32 {
-        0
+        match &self.desc {
+            FalconUCodeDesc::V2(_v2) => {
+                // On V2 platforms, the boot address is extracted from the
+                // generic bootloader, because the gbl is what actually copies
+                // FWSEC into memory, so that is what needs to be booted.
+                if let Some(ref gbl) = self.gen_bootloader {
+                    gbl.desc.start_tag << 8
+                } else {
+                    0
+                }
+            },
+            FalconUCodeDesc::V3(_v3) => 0,
+        }
     }
 }
 
@@ -376,6 +459,7 @@ impl FwsecFirmware {
     /// command.
     pub(crate) fn new(
         dev: &Device<device::Bound>,
+        chipset: Chipset,
         falcon: &Falcon<Gsp>,
         bar: &Bar0,
         bios: &Vbios,
@@ -432,9 +516,49 @@ pub(crate) fn new(
             ucode_dma.no_patch_signature()
         };
 
+        // The Generic Bootloader exists only on Turing and GA100.  To avoid a bogus
+        // console error message on other platforms, only try to load it if it's
+        // supposed to be there.
+        let gbl_fw = if chipset < Chipset::GA102 {
+            super::request_firmware(dev, chipset, "gen_bootloader", FIRMWARE_VERSION)
+        } else {
+            Err(ENOENT)
+        };
+
+        let gbl = match gbl_fw {
+            Ok(fw) => {
+                let hdr = fw.data()
+                    .get(0..size_of::<BinHdr>())
+                    .and_then(BinHdr::from_bytes_copy)
+                    .ok_or(EINVAL)?;
+
+                let desc_offset = usize::from_safe_cast(hdr.header_offset);
+                let desc = fw.data()
+                    .get(desc_offset..desc_offset + size_of::<BootloaderDesc>())
+                    .and_then(BootloaderDesc::from_bytes_copy)
+                    .ok_or(EINVAL)?;
+
+                let ucode_start = usize::from_safe_cast(hdr.data_offset);
+                let ucode_size = usize::from_safe_cast(hdr.data_size);
+                let ucode_data = fw.data()
+                    .get(ucode_start..ucode_start + ucode_size)
+                    .ok_or(EINVAL)?;
+
+                let mut ucode = KVec::new();
+                ucode.extend_from_slice(ucode_data, GFP_KERNEL)?;
+
+                Some(GenericBootloader {
+                    desc,
+                    ucode,
+                })
+            },
+            Err(_) => None,
+        };
+
         Ok(FwsecFirmware {
-            desc: desc,
+            desc,
             ucode: ucode_signed,
+            gen_bootloader: gbl,
         })
     }
 
@@ -449,9 +573,17 @@ pub(crate) fn run(
         falcon
             .reset(bar)
             .inspect_err(|e| dev_err!(dev, "Failed to reset GSP falcon: {:?}\n", e))?;
-        falcon
-            .dma_load(bar, self)
-            .inspect_err(|e| dev_err!(dev, "Failed to load FWSEC firmware: {:?}\n", e))?;
+
+        // If the Generic Bootloader was found, then upload it via PIO , otherwise
+        if let Some(ref gbl) = self.gen_bootloader {
+            falcon
+                .pio_load(bar, self, Some(gbl))
+                .inspect_err(|e| dev_err!(dev, "Failed to load FWSEC firmware: {:?}\n", e))?;
+        } else {
+            falcon
+                .dma_load(bar, self)
+                .inspect_err(|e| dev_err!(dev, "Failed to load FWSEC firmware: {:?}\n", e))?;
+        }
         let (mbox0, _) = falcon
             .boot(bar, Some(0), None)
             .inspect_err(|e| dev_err!(dev, "Failed to boot FWSEC firmware: {:?}\n", e))?;
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 54937606b5b0..fda01afda9ed 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -48,6 +48,7 @@ impl super::Gsp {
     /// created the WPR2 region.
     fn run_fwsec_frts(
         dev: &device::Device<device::Bound>,
+        chipset: Chipset,
         falcon: &Falcon<Gsp>,
         bar: &Bar0,
         bios: &Vbios,
@@ -65,6 +66,7 @@ fn run_fwsec_frts(
 
         let fwsec_frts = FwsecFirmware::new(
             dev,
+            chipset,
             falcon,
             bar,
             bios,
@@ -147,7 +149,7 @@ pub(crate) fn boot(
         let fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
         dev_dbg!(dev, "{:#x?}\n", fb_layout);
 
-        Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;
+        Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
 
         let booter_loader = BooterFirmware::new(
             dev,
@@ -186,7 +188,11 @@ pub(crate) fn boot(
         );
 
         sec2_falcon.reset(bar)?;
-        sec2_falcon.dma_load(bar, &booter_loader)?;
+        if sec2_falcon.supports_dma() {
+            sec2_falcon.dma_load(bar, &booter_loader)?;
+        } else {
+            sec2_falcon.pio_load(bar, &booter_loader, None)?;
+        }
         let wpr_handle = wpr_meta.dma_handle();
         let (mbox0, mbox1) = sec2_falcon.boot(
             bar,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 13/13] [RFC] gpu: nova: implement trait object FalconUCodeDescriptor
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (11 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
@ 2025-12-01 23:39 ` Timur Tabi
  2025-12-02  3:00   ` John Hubbard
  2025-12-02  2:55 ` [PATCH v2 00/13] gpu: nova-core: add Turing support John Hubbard
  13 siblings, 1 reply; 27+ messages in thread
From: Timur Tabi @ 2025-12-01 23:39 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Lyude Paul, Joel Fernandes,
	John Hubbard, nouveau, rust-for-linux

Implement the trait object FalconUCodeDescriptor to handle the two
versions of the Falcon microcode descriptor.

Introduce the FalconUCodeDescriptor trait to provide a unified interface
for accessing fields in both V2 and V3 Falcon microcode descriptor formats.
This replaces repetitive match statements in each accessor method with a
single as_descriptor() method that returns a trait object, reducing boilerplate
and making it easier to add new accessors in the future.

However, not all match states can be eliminated.  The FalconLoadParams
implementation still needs to match on the version because different fields
of the descriptor are used depending on the version.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 91 ++++++++++++++++++-------------
 1 file changed, 54 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 3008d18f9313..2ad56a387a79 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -125,13 +125,55 @@ pub(crate) enum FalconUCodeDesc {
     V3(FalconUCodeDescV3),
 }
 
+// First define trait
+pub(crate) trait FalconUCodeDescriptor {
+    fn hdr(&self) -> u32;
+    fn imem_load_size(&self) -> u32;
+    fn interface_offset(&self) -> u32;
+    fn dmem_load_size(&self) -> u32;
+    fn pkc_data_offset(&self) -> u32;
+    fn engine_id_mask(&self) -> u16;
+    fn ucode_id(&self) -> u8;
+    fn signature_count(&self) -> u8;
+    fn signature_versions(&self) -> u16;
+}
+
+impl FalconUCodeDescriptor for FalconUCodeDescV2 {
+    fn hdr(&self) -> u32  { self.hdr }
+    fn imem_load_size(&self) -> u32  { self.imem_load_size }
+    fn interface_offset(&self) -> u32  { self.interface_offset }
+    fn dmem_load_size(&self) -> u32 { self.dmem_load_size }
+    fn pkc_data_offset(&self) -> u32 { 0 }
+    fn engine_id_mask(&self) -> u16 { 0 }
+    fn ucode_id(&self) -> u8 { 0 }
+    fn signature_count(&self) -> u8 { 0 }
+    fn signature_versions(&self) -> u16 { 0 }
+}
+
+impl FalconUCodeDescriptor for FalconUCodeDescV3 {
+    fn hdr(&self) -> u32  { self.hdr }
+    fn imem_load_size(&self) -> u32  { self.imem_load_size }
+    fn interface_offset(&self) -> u32  { self.interface_offset }
+    fn dmem_load_size(&self) -> u32 { self.dmem_load_size }
+    fn pkc_data_offset(&self) -> u32 { self.pkc_data_offset }
+    fn engine_id_mask(&self) -> u16 { self.engine_id_mask }
+    fn ucode_id(&self) -> u8 { self.ucode_id }
+    fn signature_count(&self) -> u8 { self.signature_count }
+    fn signature_versions(&self) -> u16 { self.signature_versions }
+}
+
 impl FalconUCodeDesc {
+    // Return trait object, the only match needed.
+    pub(crate) fn as_descriptor(&self) -> &dyn FalconUCodeDescriptor {
+        match self {
+            FalconUCodeDesc::V2(v2) => v2,
+            FalconUCodeDesc::V3(v3) => v3,
+        }
+    }
+
     /// Returns the size in bytes of the header.
     pub(crate) fn size(&self) -> usize {
-        let hdr = match self {
-            FalconUCodeDesc::V2(v2) => v2.hdr,
-            FalconUCodeDesc::V3(v3) => v3.hdr,
-        };
+        let hdr = self.as_descriptor().hdr();
 
         const HDR_SIZE_SHIFT: u32 = 16;
         const HDR_SIZE_MASK: u32 = 0xffff0000;
@@ -139,60 +181,35 @@ pub(crate) fn size(&self) -> usize {
     }
 
     pub(crate) fn imem_load_size(&self) -> u32 {
-        match self {
-            FalconUCodeDesc::V2(v2) => v2.imem_load_size,
-            FalconUCodeDesc::V3(v3) => v3.imem_load_size,
-        }
+        self.as_descriptor().imem_load_size()
     }
 
     pub(crate) fn interface_offset(&self) -> u32 {
-        match self {
-            FalconUCodeDesc::V2(v2) => v2.interface_offset,
-            FalconUCodeDesc::V3(v3) => v3.interface_offset,
-        }
+        self.as_descriptor().interface_offset()
     }
 
-
     pub(crate) fn dmem_load_size(&self) -> u32 {
-        match self {
-            FalconUCodeDesc::V2(v2) => v2.dmem_load_size,
-            FalconUCodeDesc::V3(v3) => v3.dmem_load_size,
-        }
+        self.as_descriptor().dmem_load_size()
     }
 
     pub(crate) fn pkc_data_offset(&self) -> u32 {
-        match self {
-            FalconUCodeDesc::V2(_v2) => 0,
-            FalconUCodeDesc::V3(v3) => v3.pkc_data_offset,
-        }
+        self.as_descriptor().pkc_data_offset()
     }
 
     pub(crate) fn engine_id_mask(&self) -> u16 {
-        match self {
-            FalconUCodeDesc::V2(_v2) => 0,
-            FalconUCodeDesc::V3(v3) => v3.engine_id_mask,
-        }
+        self.as_descriptor().engine_id_mask()
     }
 
     pub(crate) fn ucode_id(&self) -> u8 {
-        match self {
-            FalconUCodeDesc::V2(_v2) => 0,
-            FalconUCodeDesc::V3(v3) => v3.ucode_id,
-        }
+        self.as_descriptor().ucode_id()
     }
 
     pub(crate) fn signature_count(&self) -> u8 {
-        match self {
-            FalconUCodeDesc::V2(_v2) => 0,
-            FalconUCodeDesc::V3(v3) => v3.signature_count,
-        }
+        self.as_descriptor().signature_count()
     }
 
     pub(crate) fn signature_versions(&self) -> u16 {
-        match self {
-            FalconUCodeDesc::V2(_v2) => 0,
-            FalconUCodeDesc::V3(v3) => v3.signature_versions,
-        }
+        self.as_descriptor().signature_versions()
     }
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] gpu: nova-core: add Turing support
  2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
                   ` (12 preceding siblings ...)
  2025-12-01 23:39 ` [PATCH v2 13/13] [RFC] gpu: nova: implement trait object FalconUCodeDescriptor Timur Tabi
@ 2025-12-02  2:55 ` John Hubbard
  13 siblings, 0 replies; 27+ messages in thread
From: John Hubbard @ 2025-12-02  2:55 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Lyude Paul,
	Joel Fernandes, nouveau, rust-for-linux

On 12/1/25 3:39 PM, Timur Tabi wrote:
> This patch set adds basic support for pre-booting GSP-RM
> on Turing.
> 
> There is also partial support for GA100, but it's currently not
> fully implemented.  GA100 is considered experimental in Nouveau,
> and so it hasn't been tested with NovaCore either.
> 
> That latest linux-firmware.git is required because it contains the
> Generic Bootloader image that has not yet been propogated to
> distros.
> 

From an initial skim read of the patches, this is looking pretty clean.
I'll try to do a full review in the next day or two


thanks,
-- 
John Hubbard

> Summary of changes:
> 
> 1. Introduce non-secure IMEM support.  For GA102 and later, only secure IMEM
> is used.
> 2. Because of non-secure IMEM, Turing booter firmware images need some of
> the headers parsed differently for stuff like the load target address.
> 3. Add support the tu10x firmware signature section in the ELF image.
> 4. Add several new registers used only on Turing.
> 5. Some functions that were considered generic Falcon operations are
> actually different on Turing vs GA102+, so they are moved to the HAL.
> 6. The FRTS FWSEC firmware in VBIOS uses a different version of the
> descriptor header.
> 7. I don't know why this isn't necessary on GA102+, but GSP-RM
> LIBOS args struct needs to have its 'size' field aligned to 4KB.
> 8. Turing Falcons do not support DMA, so PIO is used to copy images
> into IMEM/DMEM.
> 
> Changes from v1:
> 1. Replaced pointer/length with slice in PIO code.
> 2. Added RFC patch to implement trait object FalconUCodeDescriptor .
> 3. Added comments to new registers, structs, and other places.
> 4. Fixed all CLIPPY complaints.
> 5. Added supports_dma() method for Falcon HAL
> 6. Renamed ImemSec and ImemNs to ImemSecure and ImemNonSecure
> 7. Several other miscellaneous fixes based on review comments.
> 
> Timur Tabi (13):
>   gpu: nova-core: rename Imem to ImemSecure
>   gpu: nova-core: add ImemNonSecure section infrastructure
>   gpu: nova-core: support header parsing on Turing/GA100
>   gpu: nova-core: add support for Turing/GA100 fwsignature
>   gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem()
>   gpu: nova-core: add Turing boot registers
>   gpu: nova-core: move some functions into the HAL
>   gpu: nova-core: Add basic Turing HAL
>   gpu: nova-core: add Falcon HAL method supports_dma()
>   gpu: nova-core: add FalconUCodeDescV2 support
>   gpu: nova-core: align LibosMemoryRegionInitArgument size to page size
>   gpu: nova-core: add PIO support for loading firmware images
>   [RFC] gpu: nova: implement trait object FalconUCodeDescriptor
> 
>  drivers/gpu/nova-core/falcon.rs           | 226 +++++++++++++++++-----
>  drivers/gpu/nova-core/falcon/hal.rs       |  19 +-
>  drivers/gpu/nova-core/falcon/hal/ga102.rs |  47 +++++
>  drivers/gpu/nova-core/falcon/hal/tu102.rs |  78 ++++++++
>  drivers/gpu/nova-core/firmware.rs         | 135 ++++++++++++-
>  drivers/gpu/nova-core/firmware/booter.rs  |  46 ++++-
>  drivers/gpu/nova-core/firmware/fwsec.rs   | 215 +++++++++++++++++---
>  drivers/gpu/nova-core/firmware/gsp.rs     |   9 +-
>  drivers/gpu/nova-core/gsp/boot.rs         |  10 +-
>  drivers/gpu/nova-core/gsp/fw.rs           |   2 +-
>  drivers/gpu/nova-core/regs.rs             |  73 +++++++
>  drivers/gpu/nova-core/vbios.rs            |  73 ++++---
>  12 files changed, 805 insertions(+), 128 deletions(-)
>  create mode 100644 drivers/gpu/nova-core/falcon/hal/tu102.rs
> 
> 
> base-commit: 57dc2ea0b7bdb828c5d966d9135c28fe854933a4
> prerequisite-patch-id: fcf54aca59a74f7ca677919565b427d18406462c



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 13/13] [RFC] gpu: nova: implement trait object FalconUCodeDescriptor
  2025-12-01 23:39 ` [PATCH v2 13/13] [RFC] gpu: nova: implement trait object FalconUCodeDescriptor Timur Tabi
@ 2025-12-02  3:00   ` John Hubbard
  0 siblings, 0 replies; 27+ messages in thread
From: John Hubbard @ 2025-12-02  3:00 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Lyude Paul,
	Joel Fernandes, nouveau, rust-for-linux

On 12/1/25 3:39 PM, Timur Tabi wrote:
> Implement the trait object FalconUCodeDescriptor to handle the two
> versions of the Falcon microcode descriptor.
> 
> Introduce the FalconUCodeDescriptor trait to provide a unified interface
> for accessing fields in both V2 and V3 Falcon microcode descriptor formats.
> This replaces repetitive match statements in each accessor method with a
> single as_descriptor() method that returns a trait object, reducing boilerplate
> and making it easier to add new accessors in the future.
> 
> However, not all match states can be eliminated.  The FalconLoadParams
> implementation still needs to match on the version because different fields
> of the descriptor are used depending on the version.
> 
> Signed-off-by: Timur Tabi <ttabi@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware.rs | 91 ++++++++++++++++++-------------
>  1 file changed, 54 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
> index 3008d18f9313..2ad56a387a79 100644
> --- a/drivers/gpu/nova-core/firmware.rs
> +++ b/drivers/gpu/nova-core/firmware.rs
> @@ -125,13 +125,55 @@ pub(crate) enum FalconUCodeDesc {
>      V3(FalconUCodeDescV3),
>  }
>  
> +// First define trait
> +pub(crate) trait FalconUCodeDescriptor {
> +    fn hdr(&self) -> u32;
> +    fn imem_load_size(&self) -> u32;
> +    fn interface_offset(&self) -> u32;
> +    fn dmem_load_size(&self) -> u32;
> +    fn pkc_data_offset(&self) -> u32;
> +    fn engine_id_mask(&self) -> u16;
> +    fn ucode_id(&self) -> u8;
> +    fn signature_count(&self) -> u8;
> +    fn signature_versions(&self) -> u16;
> +}
> +
> +impl FalconUCodeDescriptor for FalconUCodeDescV2 {
> +    fn hdr(&self) -> u32  { self.hdr }
> +    fn imem_load_size(&self) -> u32  { self.imem_load_size }
> +    fn interface_offset(&self) -> u32  { self.interface_offset }
> +    fn dmem_load_size(&self) -> u32 { self.dmem_load_size }
> +    fn pkc_data_offset(&self) -> u32 { 0 }
> +    fn engine_id_mask(&self) -> u16 { 0 }
> +    fn ucode_id(&self) -> u8 { 0 }
> +    fn signature_count(&self) -> u8 { 0 }
> +    fn signature_versions(&self) -> u16 { 0 }
> +}
> +
> +impl FalconUCodeDescriptor for FalconUCodeDescV3 {
> +    fn hdr(&self) -> u32  { self.hdr }
> +    fn imem_load_size(&self) -> u32  { self.imem_load_size }
> +    fn interface_offset(&self) -> u32  { self.interface_offset }
> +    fn dmem_load_size(&self) -> u32 { self.dmem_load_size }
> +    fn pkc_data_offset(&self) -> u32 { self.pkc_data_offset }
> +    fn engine_id_mask(&self) -> u16 { self.engine_id_mask }
> +    fn ucode_id(&self) -> u8 { self.ucode_id }
> +    fn signature_count(&self) -> u8 { self.signature_count }
> +    fn signature_versions(&self) -> u16 { self.signature_versions }
> +}

Being able to see the differences between v2 and v3 in just 20 lines
of code is surprisingly helpful. I like this much more than I expected
to like it. :)

> +
>  impl FalconUCodeDesc {
> +    // Return trait object, the only match needed.
> +    pub(crate) fn as_descriptor(&self) -> &dyn FalconUCodeDescriptor {
> +        match self {
> +            FalconUCodeDesc::V2(v2) => v2,
> +            FalconUCodeDesc::V3(v3) => v3,
> +        }
> +    }
> +
>      /// Returns the size in bytes of the header.
>      pub(crate) fn size(&self) -> usize {
> -        let hdr = match self {
> -            FalconUCodeDesc::V2(v2) => v2.hdr,
> -            FalconUCodeDesc::V3(v3) => v3.hdr,
> -        };
> +        let hdr = self.as_descriptor().hdr();
>  
>          const HDR_SIZE_SHIFT: u32 = 16;
>          const HDR_SIZE_MASK: u32 = 0xffff0000;
> @@ -139,60 +181,35 @@ pub(crate) fn size(&self) -> usize {
>      }
>  
>      pub(crate) fn imem_load_size(&self) -> u32 {
> -        match self {
> -            FalconUCodeDesc::V2(v2) => v2.imem_load_size,
> -            FalconUCodeDesc::V3(v3) => v3.imem_load_size,
> -        }
> +        self.as_descriptor().imem_load_size()
>      }
>  
>      pub(crate) fn interface_offset(&self) -> u32 {
> -        match self {
> -            FalconUCodeDesc::V2(v2) => v2.interface_offset,
> -            FalconUCodeDesc::V3(v3) => v3.interface_offset,
> -        }
> +        self.as_descriptor().interface_offset()
>      }
>  
> -
>      pub(crate) fn dmem_load_size(&self) -> u32 {
> -        match self {
> -            FalconUCodeDesc::V2(v2) => v2.dmem_load_size,
> -            FalconUCodeDesc::V3(v3) => v3.dmem_load_size,
> -        }
> +        self.as_descriptor().dmem_load_size()
>      }
>  
>      pub(crate) fn pkc_data_offset(&self) -> u32 {
> -        match self {
> -            FalconUCodeDesc::V2(_v2) => 0,
> -            FalconUCodeDesc::V3(v3) => v3.pkc_data_offset,
> -        }
> +        self.as_descriptor().pkc_data_offset()
>      }
>  
>      pub(crate) fn engine_id_mask(&self) -> u16 {
> -        match self {
> -            FalconUCodeDesc::V2(_v2) => 0,
> -            FalconUCodeDesc::V3(v3) => v3.engine_id_mask,
> -        }
> +        self.as_descriptor().engine_id_mask()
>      }
>  
>      pub(crate) fn ucode_id(&self) -> u8 {
> -        match self {
> -            FalconUCodeDesc::V2(_v2) => 0,
> -            FalconUCodeDesc::V3(v3) => v3.ucode_id,
> -        }
> +        self.as_descriptor().ucode_id()
>      }
>  
>      pub(crate) fn signature_count(&self) -> u8 {
> -        match self {
> -            FalconUCodeDesc::V2(_v2) => 0,
> -            FalconUCodeDesc::V3(v3) => v3.signature_count,
> -        }
> +        self.as_descriptor().signature_count()
>      }
>  
>      pub(crate) fn signature_versions(&self) -> u16 {
> -        match self {
> -            FalconUCodeDesc::V2(_v2) => 0,
> -            FalconUCodeDesc::V3(v3) => v3.signature_versions,
> -        }
> +        self.as_descriptor().signature_versions()
>      }
>  }
>  

Overall, I'm really preferring this end result. The differences
between v2 and v3 are concisely encapsulated, and the remaining
code is clear and less error prone.


thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-01 23:39 ` [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
@ 2025-12-02 21:23   ` Joel Fernandes
  2025-12-02 22:51     ` Timur Tabi
  2025-12-02 21:28   ` Joel Fernandes
  1 sibling, 1 reply; 27+ messages in thread
From: Joel Fernandes @ 2025-12-02 21:23 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Lyude Paul,
	John Hubbard, nouveau, rust-for-linux



On 12/1/2025 6:39 PM, Timur Tabi wrote:
>  
> +
> +    /// See nvkm_falcon_pio_wr - takes a byte array instead of a FalconFirmware
> +    fn pio_wr_bytes(
> +        &self,
> +        bar: &Bar0,
> +        img: &[u8],
> +        mem_base: u16,
> +        target_mem: FalconMem,
> +        port: u8,
> +        tag: u16
> +    ) {
> +        let port = usize::from(port);
> +
> +        match target_mem {
> +            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
> +                regs::NV_PFALCON_FALCON_IMEMC::default()
> +                    .set_secure(target_mem == FalconMem::ImemSecure)
> +                    .set_aincw(true)
> +                    .set_offs(mem_base)
> +                    .write(bar, &E::ID, port);
> +
> +                let mut tag = tag;
> +                for block in img.chunks(256) {
> +                    regs::NV_PFALCON_FALCON_IMEMT::default()
> +                        .set_tag(tag)
> +                        .write(bar, &E::ID, port);
> +                    for word in block.chunks(4) {
> +                        let w = u32::from_le_bytes(word.try_into().unwrap());

If img.size is not a multiple of 4 bytes, this can panic right?

Even if it is unlikely, unwrap() is quite frowned up due to possibility of
panic. I'd recommend something like the following since the function cannot
return an error:

                        let w = if let Ok(bytes) = word.try_into() {
                            u32::from_le_bytes(bytes)
                        } else {
			    // can print a warning here too if needed.
                            let mut buf = [0u8; 4];
                            buf[..word.len()].copy_from_slice(word);
                            u32::from_le_bytes(buf)
                        };

Btw, I wish we could encode the slice length constraint in the slice type itself
(i.e., the slice length ought to be a certain multiple). But I think there's no
way to do that without introducing a new type.

Thanks.






^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-01 23:39 ` [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
  2025-12-02 21:23   ` Joel Fernandes
@ 2025-12-02 21:28   ` Joel Fernandes
  1 sibling, 0 replies; 27+ messages in thread
From: Joel Fernandes @ 2025-12-02 21:28 UTC (permalink / raw)
  To: Timur Tabi, Danilo Krummrich, Alexandre Courbot, Lyude Paul,
	John Hubbard, nouveau, rust-for-linux



On 12/1/2025 6:39 PM, Timur Tabi wrote:
> +
> +    /// Perform a PIO copy into `IMEM` and `DMEM` of `fw`, and prepare the falcon to run it.
> +    pub(crate) fn pio_load<F: FalconFirmware<Target = E>>(
> +        &self,
> +        bar: &Bar0,
> +        fw: &F,
> +        gbl: Option<&GenericBootloader>
> +    ) -> Result {
> +        let imem_sec = fw.imem_sec_load_params();
> +        let imem_ns = fw.imem_ns_load_params().unwrap();

Same unwrap() is here too.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-02 21:23   ` Joel Fernandes
@ 2025-12-02 22:51     ` Timur Tabi
  2025-12-02 23:20       ` Joel Fernandes
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Tabi @ 2025-12-02 22:51 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Joel Fernandes, dakr@kernel.org,
	lyude@redhat.com, Alexandre Courbot, John Hubbard,
	rust-for-linux@vger.kernel.org

On Tue, 2025-12-02 at 16:23 -0500, Joel Fernandes wrote:
> 
> 
> On 12/1/2025 6:39 PM, Timur Tabi wrote:
> >  
> > +
> > +    /// See nvkm_falcon_pio_wr - takes a byte array instead of a FalconFirmware
> > +    fn pio_wr_bytes(
> > +        &self,
> > +        bar: &Bar0,
> > +        img: &[u8],
> > +        mem_base: u16,
> > +        target_mem: FalconMem,
> > +        port: u8,
> > +        tag: u16
> > +    ) {
> > +        let port = usize::from(port);
> > +
> > +        match target_mem {
> > +            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
> > +                regs::NV_PFALCON_FALCON_IMEMC::default()
> > +                    .set_secure(target_mem == FalconMem::ImemSecure)
> > +                    .set_aincw(true)
> > +                    .set_offs(mem_base)
> > +                    .write(bar, &E::ID, port);
> > +
> > +                let mut tag = tag;
> > +                for block in img.chunks(256) {
> > +                    regs::NV_PFALCON_FALCON_IMEMT::default()
> > +                        .set_tag(tag)
> > +                        .write(bar, &E::ID, port);
> > +                    for word in block.chunks(4) {
> > +                        let w = u32::from_le_bytes(word.try_into().unwrap());
> 
> If img.size is not a multiple of 4 bytes, this can panic right?

I think so.  I just noticed that I used chunks(4) here and chunks_exact(4) in the Dmem loop below. 
I need to make it consistent.

chunks(4) will return &[u8; 3] if the buffer is shy one byte.  chunks_exact(4) will simply skip the
last 3 bytes.

The problem is that it is an error for these images to not be a multiple of 4.  Such an image is
just not valid.

So it's a lot simpler to just reject these misaligned images.  The previous version of this function
did return a Result, maybe I should put that back.  It just seems wasteful to test for misalignment
on every pass of the loop.

What we really need is for from_le_bytes() to be less picky about the slice size.  If I give it
&[u8; 3], then it should be able to handle that.

> Even if it is unlikely, unwrap() is quite frowned up due to possibility of
> panic. I'd recommend something like the following since the function cannot
> return an error:
> 
>                         let w = if let Ok(bytes) = word.try_into() {
>                             u32::from_le_bytes(bytes)
>                         } else {
> 			    // can print a warning here too if needed.
>                             let mut buf = [0u8; 4];
>                             buf[..word.len()].copy_from_slice(word);
>                             u32::from_le_bytes(buf)
>                         };

Wouldn't it be simpler to use chunks_exact() and then remainder()?  That way, we wouldn't need a
test inside the loop?

> Btw, I wish we could encode the slice length constraint in the slice type itself
> (i.e., the slice length ought to be a certain multiple). But I think there's no
> way to do that without introducing a new type.

Wouldn't it be a run-time constraint anyway? With the exception of the BootloaderDmemDescV2 write, 
all of the calls to pio_wr_bytes() have lengths only known at runtime.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-02 22:51     ` Timur Tabi
@ 2025-12-02 23:20       ` Joel Fernandes
  2025-12-02 23:40         ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: Joel Fernandes @ 2025-12-02 23:20 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, dakr@kernel.org,
	lyude@redhat.com, Alexandre Courbot, John Hubbard,
	rust-for-linux@vger.kernel.org



On 12/2/2025 5:51 PM, Timur Tabi wrote:
> On Tue, 2025-12-02 at 16:23 -0500, Joel Fernandes wrote:
>>
>>
>> On 12/1/2025 6:39 PM, Timur Tabi wrote:
>>>  
>>> +
>>> +    /// See nvkm_falcon_pio_wr - takes a byte array instead of a FalconFirmware
>>> +    fn pio_wr_bytes(
>>> +        &self,
>>> +        bar: &Bar0,
>>> +        img: &[u8],
>>> +        mem_base: u16,
>>> +        target_mem: FalconMem,
>>> +        port: u8,
>>> +        tag: u16
>>> +    ) {
>>> +        let port = usize::from(port);
>>> +
>>> +        match target_mem {
>>> +            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
>>> +                regs::NV_PFALCON_FALCON_IMEMC::default()
>>> +                    .set_secure(target_mem == FalconMem::ImemSecure)
>>> +                    .set_aincw(true)
>>> +                    .set_offs(mem_base)
>>> +                    .write(bar, &E::ID, port);
>>> +
>>> +                let mut tag = tag;
>>> +                for block in img.chunks(256) {
>>> +                    regs::NV_PFALCON_FALCON_IMEMT::default()
>>> +                        .set_tag(tag)
>>> +                        .write(bar, &E::ID, port);
>>> +                    for word in block.chunks(4) {
>>> +                        let w = u32::from_le_bytes(word.try_into().unwrap());
>>
>> If img.size is not a multiple of 4 bytes, this can panic right?
> 
> I think so.  I just noticed that I used chunks(4) here and chunks_exact(4) in the Dmem loop below. 
> I need to make it consistent.
> 
> chunks(4) will return &[u8; 3] if the buffer is shy one byte.  chunks_exact(4) will simply skip the
> last 3 bytes.
> 
> The problem is that it is an error for these images to not be a multiple of 4.  Such an image is
> just not valid.
> 
> So it's a lot simpler to just reject these misaligned images.  The previous version of this function
> did return a Result, maybe I should put that back.  It just seems wasteful to test for misalignment
> on every pass of the loop.
> 
> What we really need is for from_le_bytes() to be less picky about the slice size.  If I give it
> &[u8; 3], then it should be able to handle that.

Yeah, but I guess whether to pad with zero's or error out when the slice is not
a multiple of 4 is questionable.

> 
>> Even if it is unlikely, unwrap() is quite frowned up due to possibility of
>> panic. I'd recommend something like the following since the function cannot
>> return an error:
>>
>>                         let w = if let Ok(bytes) = word.try_into() {
>>                             u32::from_le_bytes(bytes)
>>                         } else {
>> 			    // can print a warning here too if needed.
>>                             let mut buf = [0u8; 4];
>>                             buf[..word.len()].copy_from_slice(word);
>>                             u32::from_le_bytes(buf)
>>                         };
> 
> Wouldn't it be simpler to use chunks_exact() and then remainder()?  That way, we wouldn't need a
> test inside the loop?

Sure.

> 
>> Btw, I wish we could encode the slice length constraint in the slice type itself
>> (i.e., the slice length ought to be a certain multiple). But I think there's no
>> way to do that without introducing a new type.
> 
> Wouldn't it be a run-time constraint anyway? With the exception of the BootloaderDmemDescV2 write, 
> all of the calls to pio_wr_bytes() have lengths only known at runtime.

I am not sure but I think rust code is expected to not panic and handle
situations gracefully even in the face of runtime constraints being violated,
you could argue that the image length being violated is UB but I don't think
that'd be enough to justify the unwrap(). But perhaps someone from the rust core
team can chime in about that because I also have that question. Can a "FW image
corruption" type of scenarios be considered something that safe rust code not
need to worry about since it falls under the UB umbrella (similar to memory
corruption)?

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-02 23:20       ` Joel Fernandes
@ 2025-12-02 23:40         ` John Hubbard
  2025-12-02 23:48           ` Timur Tabi
  0 siblings, 1 reply; 27+ messages in thread
From: John Hubbard @ 2025-12-02 23:40 UTC (permalink / raw)
  To: Joel Fernandes, Timur Tabi, nouveau@lists.freedesktop.org,
	dakr@kernel.org, lyude@redhat.com, Alexandre Courbot,
	rust-for-linux@vger.kernel.org

On 12/2/25 3:20 PM, Joel Fernandes wrote:
> On 12/2/2025 5:51 PM, Timur Tabi wrote:
>> On Tue, 2025-12-02 at 16:23 -0500, Joel Fernandes wrote:
...
>>> If img.size is not a multiple of 4 bytes, this can panic right?

Rust for Linux avoids .unwrap() for similar reasons that we prefer WARN*()
over BUG*() these days, on the C side: avoid killing the machine if at
all possible. Because it changes a routine bug into a harder-to-work-with
bug.

...
>> Wouldn't it be a run-time constraint anyway? With the exception of the BootloaderDmemDescV2 write, 
>> all of the calls to pio_wr_bytes() have lengths only known at runtime.
> 
> I am not sure but I think rust code is expected to not panic and handle
> situations gracefully even in the face of runtime constraints being violated,
> you could argue that the image length being violated is UB but I don't think
> that'd be enough to justify the unwrap(). But perhaps someone from the rust core

Agreed. This situation should return an -EINVAL Result, approximately.

In fact, I just finished looking through my Hopper/Blackwell PIO code, which
also needs 4-byte alignment, and concluded that returning -EINVAL for misaligned
data seems to be the appropriate way to handle things.

> team can chime in about that because I also have that question. Can a "FW image
> corruption" type of scenarios be considered something that safe rust code not
> need to worry about since it falls under the UB umbrella (similar to memory
> corruption)?
> 

I'm not the core Rust team, but I will chime in anyway: misaligned or
corrupted firmware should not *directly* cause a panic. We should detect
and error out.

thanks,
-- 
John Hubbard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-02 23:40         ` John Hubbard
@ 2025-12-02 23:48           ` Timur Tabi
  2025-12-03  0:35             ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Tabi @ 2025-12-02 23:48 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Joel Fernandes, dakr@kernel.org,
	lyude@redhat.com, Alexandre Courbot, John Hubbard,
	rust-for-linux@vger.kernel.org

On Tue, 2025-12-02 at 15:40 -0800, John Hubbard wrote:
> In fact, I just finished looking through my Hopper/Blackwell PIO code, which
> also needs 4-byte alignment, and concluded that returning -EINVAL for misaligned
> data seems to be the appropriate way to handle things.

I've added this for v3:

        // Rejecting misaligned images here allows us to avoid checking
        // inside the loops.
        if img.len() % 4 != 0 {
            return Err(EINVAL);
        }

And I manually create the &[u8; 4] now:

            for word in block.chunks_exact(4) {
                let w = [word[0], word[1], word[2], word[3]];
                regs::NV_PFALCON_FALCON_IMEMD::default()
                    .set_data(u32::from_le_bytes(w))
                    .write(bar, &E::ID, port);

word[3] will always exist because of chunks_exact(4).


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-02 23:48           ` Timur Tabi
@ 2025-12-03  0:35             ` John Hubbard
  2025-12-03  0:42               ` Timur Tabi
  0 siblings, 1 reply; 27+ messages in thread
From: John Hubbard @ 2025-12-03  0:35 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Joel Fernandes,
	dakr@kernel.org, lyude@redhat.com, Alexandre Courbot,
	rust-for-linux@vger.kernel.org

On 12/2/25 3:48 PM, Timur Tabi wrote:
> On Tue, 2025-12-02 at 15:40 -0800, John Hubbard wrote:
>> In fact, I just finished looking through my Hopper/Blackwell PIO code, which
>> also needs 4-byte alignment, and concluded that returning -EINVAL for misaligned
>> data seems to be the appropriate way to handle things.
> 
> I've added this for v3:
> 
>         // Rejecting misaligned images here allows us to avoid checking
>         // inside the loops.
>         if img.len() % 4 != 0 {
>             return Err(EINVAL);
>         }

Looks good.

> 
> And I manually create the &[u8; 4] now:
> 
>             for word in block.chunks_exact(4) {
>                 let w = [word[0], word[1], word[2], word[3]];

Yes, this is probably the best way. Although...

>                 regs::NV_PFALCON_FALCON_IMEMD::default()
>                     .set_data(u32::from_le_bytes(w))
>                     .write(bar, &E::ID, port);
> 
> word[3] will always exist because of chunks_exact(4).
> 

Interesting, I was just looking at this, and the 4-byte manual
construction bothered me a little ("why must I do this?"), so I'm
currently wondering if "// PANIC..." plus an "infallible" .unwrap()
is reasonable, for example:

impl Falcon<Fsp> {
...
    pub(crate) fn write_emem(&self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
        if offset % 4 != 0 || data.len() % 4 != 0 {
            return Err(EINVAL);
        }
...
        for chunk in data.chunks_exact(4) {
            // PANIC: `chunks_exact(4)` guarantees each chunk is exactly 4 bytes.
            let word = u32::from_le_bytes(chunk.try_into().unwrap());
            regs::NV_PFALCON_FALCON_EMEM_DATA::default()
                .set_data(word)
                .write(bar, &Fsp::ID);
        }

...but actually, I think your way is better, because you don't have
just justify an .unwrap().

What do you think?

I figured you'd enjoy this, coming as it does just one email after I
wrote "never .unwrap()". haha :)

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-03  0:35             ` John Hubbard
@ 2025-12-03  0:42               ` Timur Tabi
  2025-12-03  0:45                 ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: Timur Tabi @ 2025-12-03  0:42 UTC (permalink / raw)
  To: nouveau@lists.freedesktop.org, Joel Fernandes, dakr@kernel.org,
	lyude@redhat.com, Alexandre Courbot, John Hubbard,
	rust-for-linux@vger.kernel.org

On Tue, 2025-12-02 at 16:35 -0800, John Hubbard wrote:
>         for chunk in data.chunks_exact(4) {
>             // PANIC: `chunks_exact(4)` guarantees each chunk is exactly 4 bytes.
>             let word = u32::from_le_bytes(chunk.try_into().unwrap());
>             regs::NV_PFALCON_FALCON_EMEM_DATA::default()
>                 .set_data(word)
>                 .write(bar, &Fsp::ID);
>         }
> 
> ...but actually, I think your way is better, because you don't have
> just justify an .unwrap().
> 
> What do you think?

I agree.  Avoiding unwrap(), even with a comment, is always a good idea.

> I figured you'd enjoy this, coming as it does just one email after I
> wrote "never .unwrap()". haha :)

I think your code is effectively identical to mine, except that I don't need the PANIC comment.  I
suspect that in both cases, the compiler cannot tell that each chunk is always 4 bytes and
try_into() will never panic.  In my case, word[3] always exists and will never panic either.  So I'm
guess that the compiler will still emit code to check for panic.  I don't know.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-03  0:42               ` Timur Tabi
@ 2025-12-03  0:45                 ` John Hubbard
  2025-12-03  2:14                   ` Joel Fernandes
  0 siblings, 1 reply; 27+ messages in thread
From: John Hubbard @ 2025-12-03  0:45 UTC (permalink / raw)
  To: Timur Tabi, nouveau@lists.freedesktop.org, Joel Fernandes,
	dakr@kernel.org, lyude@redhat.com, Alexandre Courbot,
	rust-for-linux@vger.kernel.org

On 12/2/25 4:42 PM, Timur Tabi wrote:
> On Tue, 2025-12-02 at 16:35 -0800, John Hubbard wrote:
>>         for chunk in data.chunks_exact(4) {
>>             // PANIC: `chunks_exact(4)` guarantees each chunk is exactly 4 bytes.
>>             let word = u32::from_le_bytes(chunk.try_into().unwrap());
>>             regs::NV_PFALCON_FALCON_EMEM_DATA::default()
>>                 .set_data(word)
>>                 .write(bar, &Fsp::ID);
>>         }
>>
>> ...but actually, I think your way is better, because you don't have
>> just justify an .unwrap().
>>
>> What do you think?
> 
> I agree.  Avoiding unwrap(), even with a comment, is always a good idea.
> 
>> I figured you'd enjoy this, coming as it does just one email after I
>> wrote "never .unwrap()". haha :)
> 
> I think your code is effectively identical to mine, except that I don't need the PANIC comment.  I

Yes. I'm changing my code over to the non-unwrap approach now. That
really is clearly better.


> suspect that in both cases, the compiler cannot tell that each chunk is always 4 bytes and
> try_into() will never panic.  In my case, word[3] always exists and will never panic either.  So I'm
> guess that the compiler will still emit code to check for panic.  I don't know.
> 

Good question. Maybe someone with Rust experience can enlighten us on
that one.

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-03  0:45                 ` John Hubbard
@ 2025-12-03  2:14                   ` Joel Fernandes
  2025-12-03  2:21                     ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: Joel Fernandes @ 2025-12-03  2:14 UTC (permalink / raw)
  To: John Hubbard
  Cc: Timur Tabi, nouveau@lists.freedesktop.org, dakr@kernel.org,
	lyude@redhat.com, Alexandre Courbot,
	rust-for-linux@vger.kernel.org



> On Dec 2, 2025, at 7:46 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 12/2/25 4:42 PM, Timur Tabi wrote:
>>> On Tue, 2025-12-02 at 16:35 -0800, John Hubbard wrote:
>>>         for chunk in data.chunks_exact(4) {
>>>             // PANIC: `chunks_exact(4)` guarantees each chunk is exactly 4 bytes.
>>>             let word = u32::from_le_bytes(chunk.try_into().unwrap());
>>>             regs::NV_PFALCON_FALCON_EMEM_DATA::default()
>>>                 .set_data(word)
>>>                 .write(bar, &Fsp::ID);
>>>         }
>>> 
>>> ...but actually, I think your way is better, because you don't have
>>> just justify an .unwrap().
>>> 
>>> What do you think?
>> 
>> I agree.  Avoiding unwrap(), even with a comment, is always a good idea.
>> 
>>> I figured you'd enjoy this, coming as it does just one email after I
>>> wrote "never .unwrap()". haha :)
>> 
>> I think your code is effectively identical to mine, except that I don't need the PANIC comment.  I
> 
> Yes. I'm changing my code over to the non-unwrap approach now. That
> really is clearly better.
> 
> 
>> suspect that in both cases, the compiler cannot tell that each chunk is always 4 bytes and
>> try_into() will never panic.  In my case, word[3] always exists and will never panic either.  So I'm
>> guess that the compiler will still emit code to check for panic.  I don't know.
>> 
> 
> Good question. Maybe someone with Rust experience can enlighten us on
> that one.

If the optimizer could not eliminate the dead code, IMO that is a bug, pretty much like how we rely on dead code elimination optimization for build_assert.

It is worth compiling and checking but I am almost certain the dead code for an unreachable case (in this case, panicking, will be eliminated).

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images
  2025-12-03  2:14                   ` Joel Fernandes
@ 2025-12-03  2:21                     ` John Hubbard
  0 siblings, 0 replies; 27+ messages in thread
From: John Hubbard @ 2025-12-03  2:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Timur Tabi, nouveau@lists.freedesktop.org, dakr@kernel.org,
	lyude@redhat.com, Alexandre Courbot,
	rust-for-linux@vger.kernel.org

On 12/2/25 6:14 PM, Joel Fernandes wrote:
>> On Dec 2, 2025, at 7:46 PM, John Hubbard <jhubbard@nvidia.com> wrote:
...
>>> suspect that in both cases, the compiler cannot tell that each chunk is always 4 bytes and
>>> try_into() will never panic.  In my case, word[3] always exists and will never panic either.  So I'm
>>> guess that the compiler will still emit code to check for panic.  I don't know.
>>>
>>
>> Good question. Maybe someone with Rust experience can enlighten us on
>> that one.
> 
> If the optimizer could not eliminate the dead code, IMO that is a bug, pretty much like how we rely on dead code elimination optimization for build_assert.
> 
> It is worth compiling and checking but I am almost certain the dead code for an unreachable case (in this case, panicking, will be eliminated).
> 

Well yes, but the question remains: is the compiler able to even identify
this as dead code? Because it depends on the early -EINVAL return, in
order to be correct. Is rustc able to figure out that kind of situation?

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-12-03  2:22 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 23:39 [PATCH v2 00/13] gpu: nova-core: add Turing support Timur Tabi
2025-12-01 23:39 ` [PATCH v2 01/13] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
2025-12-01 23:39 ` [PATCH v2 02/13] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
2025-12-01 23:39 ` [PATCH v2 03/13] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
2025-12-01 23:39 ` [PATCH v2 04/13] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
2025-12-01 23:39 ` [PATCH v2 05/13] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
2025-12-01 23:39 ` [PATCH v2 06/13] gpu: nova-core: add Turing boot registers Timur Tabi
2025-12-01 23:39 ` [PATCH v2 07/13] gpu: nova-core: move some functions into the HAL Timur Tabi
2025-12-01 23:39 ` [PATCH v2 08/13] gpu: nova-core: Add basic Turing HAL Timur Tabi
2025-12-01 23:39 ` [PATCH v2 09/13] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
2025-12-01 23:39 ` [PATCH v2 10/13] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
2025-12-01 23:39 ` [PATCH v2 11/13] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
2025-12-01 23:39 ` [PATCH v2 12/13] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
2025-12-02 21:23   ` Joel Fernandes
2025-12-02 22:51     ` Timur Tabi
2025-12-02 23:20       ` Joel Fernandes
2025-12-02 23:40         ` John Hubbard
2025-12-02 23:48           ` Timur Tabi
2025-12-03  0:35             ` John Hubbard
2025-12-03  0:42               ` Timur Tabi
2025-12-03  0:45                 ` John Hubbard
2025-12-03  2:14                   ` Joel Fernandes
2025-12-03  2:21                     ` John Hubbard
2025-12-02 21:28   ` Joel Fernandes
2025-12-01 23:39 ` [PATCH v2 13/13] [RFC] gpu: nova: implement trait object FalconUCodeDescriptor Timur Tabi
2025-12-02  3:00   ` John Hubbard
2025-12-02  2:55 ` [PATCH v2 00/13] gpu: nova-core: add Turing support John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).