rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
@ 2025-05-01 12:58 Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 01/21] rust: devres: allow to borrow a reference to the resource's Device Alexandre Courbot
                   ` (20 more replies)
  0 siblings, 21 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot, Shirish Baskaran

Hi everyone,

Second revision of this continuation of my previous RFCs [1] to complete
the first step of GSP booting (running the FWSEC-FRTS firmware extracted
from the BIOS) on Ampere devices. Thanks for all the feedback on the
first version.

While this series is still far from bringing the GPU into a state where
it can do anything useful, it sets up the basic layout of the driver
upon which we can build in order to continue with the next steps of GSP
booting, as well as supporting more chipsets.

Upon successful probe, the driver will display the range of the WPR2
region constructed by FWSEC-FRTS:

  [   95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
  [   95.436002] NovaCore 0000:01:00.0: GPU instance built

This code is based on nova-next with the try_access_with patch [2].

There are bits of documentation still missing, these are addressed by
Joel in his own documentation patch series. I'll also double-check and
send follow-up patches if anything is still missing after that.

I have also tried to look at ways to split the patch adding falcon
support, but couldn't find any that would not be awkward. Starting
review from `falcon.rs` and going down to the HAL should be the logical
order for a smooth review.

[1] https://lore.kernel.org/rust-for-linux/20250320-nova_timer-v3-0-79aa2ad25a79@nvidia.com/
[2] https://lore.kernel.org/rust-for-linux/20250411-try_with-v4-0-f470ac79e2e2@nvidia.com/
[3] https://lore.kernel.org/lkml/20250423192857.199712-1-fujita.tomonori@gmail.com/

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v2:
- Rebased on latest nova-next.
- Fixed all clippy warnings.
- Added `count` and `size` methods to `CoherentAllocation`.
- Added method to obtain a reference to the `Device` from a `Devres`
  (this is super convenient).
- Split `DmaObject` into its own patch and added `Deref` implementation.
- Squashed field names from [3] into "extract FWSEC from BIOS".
- Fixed erroneous use of `ERANGE` error.
- Reworked `register!()` macro towards a more intuitive syntax, moved
  its helper macros into internal rules to avoid polluting the macro
  namespace.
- Renamed all registers to capital snake case to better match OpenRM.
- Removed declarations for registers that are not used yet.
- Added more documentation for items not covered by Joel's documentation
  patches.
- Removed timer device and replaced it with a helper function using
  `Ktime`. This also made [4] unneeded so it is dropped.
- Unregister the sysmem flush page upon device destruction.
- ... probably more that I forgot. >_<
- Link to v1: https://lore.kernel.org/r/20250420-nova-frts-v1-0-ecd1cca23963@nvidia.com

[3] https://lore.kernel.org/all/20250423225405.139613-6-joelagnelf@nvidia.com/
[4] https://lore.kernel.org/lkml/20250420-nova-frts-v1-1-ecd1cca23963@nvidia.com/

---
Alexandre Courbot (19):
      rust: devres: allow to borrow a reference to the resource's Device
      rust: dma: expose the count and size of CoherentAllocation
      gpu: nova-core: derive useful traits for Chipset
      gpu: nova-core: add missing GA100 definition
      gpu: nova-core: take bound device in Gpu::new
      gpu: nova-core: define registers layout using helper macro
      gpu: nova-core: fix layout of NV_PMC_BOOT_0
      gpu: nova-core: introduce helper macro for register access
      gpu: nova-core: move Firmware to firmware module
      rust: make ETIMEDOUT error available
      gpu: nova-core: wait for GFW_BOOT completion
      gpu: nova-core: add DMA object struct
      gpu: nova-core: register sysmem flush page
      gpu: nova-core: add helper function to wait on condition
      gpu: nova-core: add falcon register definitions and base code
      gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
      gpu: nova-core: compute layout of the FRTS region
      gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
      gpu: nova-core: load and run FWSEC-FRTS

Joel Fernandes (2):
      rust: num: Add an upward alignment helper for usize
      nova-core: Add support for VBIOS ucode extraction for boot

 Documentation/gpu/nova/core/todo.rst      |    6 +
 drivers/gpu/nova-core/devinit.rs          |   43 ++
 drivers/gpu/nova-core/dma.rs              |   57 ++
 drivers/gpu/nova-core/driver.rs           |    2 +-
 drivers/gpu/nova-core/falcon.rs           |  543 ++++++++++++++
 drivers/gpu/nova-core/falcon/gsp.rs       |   25 +
 drivers/gpu/nova-core/falcon/hal.rs       |   55 ++
 drivers/gpu/nova-core/falcon/hal/ga102.rs |  115 +++
 drivers/gpu/nova-core/falcon/sec2.rs      |    8 +
 drivers/gpu/nova-core/firmware.rs         |  105 ++-
 drivers/gpu/nova-core/firmware/fwsec.rs   |  360 +++++++++
 drivers/gpu/nova-core/gpu.rs              |  223 ++++--
 drivers/gpu/nova-core/gsp.rs              |    3 +
 drivers/gpu/nova-core/gsp/fb.rs           |  109 +++
 drivers/gpu/nova-core/nova_core.rs        |   23 +
 drivers/gpu/nova-core/regs.rs             |  259 +++++--
 drivers/gpu/nova-core/regs/macros.rs      |  380 ++++++++++
 drivers/gpu/nova-core/util.rs             |   29 +
 drivers/gpu/nova-core/vbios.rs            | 1149 +++++++++++++++++++++++++++++
 rust/kernel/devres.rs                     |    6 +
 rust/kernel/dma.rs                        |   14 +
 rust/kernel/error.rs                      |    1 +
 rust/kernel/lib.rs                        |    1 +
 rust/kernel/num.rs                        |   21 +
 24 files changed, 3445 insertions(+), 92 deletions(-)
---
base-commit: fc55584e00fc8409cbaef4bcd984016dca6f1b6b
change-id: 20250417-nova-frts-96ef299abe2c

Best regards,
-- 
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/21] rust: devres: allow to borrow a reference to the resource's Device
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 02/21] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Functions that take a &Devres as argument might need to print error
messages related on behalf the device. Providing the reference here
removes the need for drivers to store their own for that sole purpose.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/devres.rs | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/rust/kernel/devres.rs b/rust/kernel/devres.rs
index 1e58f5d22044192449658a5a763c972ef11f1790..cc9702e31e8c60999ec873d9290c69be7843e774 100644
--- a/rust/kernel/devres.rs
+++ b/rust/kernel/devres.rs
@@ -196,3 +196,9 @@ fn drop(&mut self) {
         DevresInner::remove_action(&self.0);
     }
 }
+
+impl<T> AsRef<Device> for Devres<T> {
+    fn as_ref(&self) -> &Device {
+        &self.0.dev
+    }
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 02/21] rust: dma: expose the count and size of CoherentAllocation
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 01/21] rust: devres: allow to borrow a reference to the resource's Device Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 03/21] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

These properties are very useful to have and should be accessible.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/dma.rs | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 605e01e35715667f93297fd9ec49d8e7032e0910..18602d771054fceb80c29278b1945254312ed7c6 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -201,6 +201,20 @@ pub fn alloc_coherent(
         CoherentAllocation::alloc_attrs(dev, count, gfp_flags, Attrs(0))
     }
 
+    /// Returns the number of elements `T` in this allocation.
+    ///
+    /// Note that this is not the size of the allocation in bytes, which is provided by
+    /// [`Self::size`].
+    pub fn count(&self) -> usize {
+        self.count
+    }
+
+    /// Returns the size in bytes of this allocation.
+    pub fn size(&self) -> usize {
+        // This is guaranteed not to overflow as the same operation has been done in `alloc_attrs`.
+        self.count * core::mem::size_of::<T>()
+    }
+
     /// Returns the base address to the allocated region in the CPU's virtual address space.
     pub fn start_ptr(&self) -> *const T {
         self.cpu_addr

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 03/21] gpu: nova-core: derive useful traits for Chipset
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 01/21] rust: devres: allow to borrow a reference to the resource's Device Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 02/21] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 04/21] gpu: nova-core: add missing GA100 definition Alexandre Courbot
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will commonly need to compare chipset versions, so derive the
ordering traits to make that possible. Also derive Copy and Clone since
passing Chipset by value will be more efficient than by reference.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 17c9660da45034762edaa78e372d8821144cdeb7..4de67a2dc16302c00530026156d7264cbc7e5b32 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -13,7 +13,7 @@ macro_rules! define_chipset {
     ({ $($variant:ident = $value:expr),* $(,)* }) =>
     {
         /// Enum representation of the GPU chipset.
-        #[derive(fmt::Debug)]
+        #[derive(fmt::Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
         pub(crate) enum Chipset {
             $($variant = $value),*,
         }

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 04/21] gpu: nova-core: add missing GA100 definition
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (2 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 03/21] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 05/21] gpu: nova-core: take bound device in Gpu::new Alexandre Courbot
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

linux-firmware contains a directory for GA100, and it is a defined
chipset in Nouveau.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 4de67a2dc16302c00530026156d7264cbc7e5b32..9fe6aedaa9563799c2624d461d4e37ee9b094909 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -54,6 +54,7 @@ fn try_from(value: u32) -> Result<Self, Self::Error> {
     TU117 = 0x167,
     TU116 = 0x168,
     // Ampere
+    GA100 = 0x170,
     GA102 = 0x172,
     GA103 = 0x173,
     GA104 = 0x174,
@@ -73,7 +74,7 @@ pub(crate) fn arch(&self) -> Architecture {
             Self::TU102 | Self::TU104 | Self::TU106 | Self::TU117 | Self::TU116 => {
                 Architecture::Turing
             }
-            Self::GA102 | Self::GA103 | Self::GA104 | Self::GA106 | Self::GA107 => {
+            Self::GA100 | Self::GA102 | Self::GA103 | Self::GA104 | Self::GA106 | Self::GA107 => {
                 Architecture::Ampere
             }
             Self::AD102 | Self::AD103 | Self::AD104 | Self::AD106 | Self::AD107 => {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 05/21] gpu: nova-core: take bound device in Gpu::new
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (3 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 04/21] gpu: nova-core: add missing GA100 definition Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 06/21] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will need to perform things like allocating DMA memory during device
creation, so make sure to take the device context that will allow us to
perform these actions.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 9fe6aedaa9563799c2624d461d4e37ee9b094909..19a17cdc204b013482c0d307c5838cf3044c8cc8 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -183,7 +183,10 @@ pub(crate) struct Gpu {
 }
 
 impl Gpu {
-    pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<Self>> {
+    pub(crate) fn new(
+        pdev: &pci::Device<device::Bound>,
+        bar: Devres<Bar0>,
+    ) -> Result<impl PinInit<Self>> {
         let spec = Spec::new(&bar)?;
         let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 06/21] gpu: nova-core: define registers layout using helper macro
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (4 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 05/21] gpu: nova-core: take bound device in Gpu::new Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 07/21] gpu: nova-core: fix layout of NV_PMC_BOOT_0 Alexandre Courbot
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add the register!() macro, which defines a given register's layout and
provide bit-field accessors with a way to convert them to a given type.
This macro will allow us to make clear definitions of the registers and
manipulate their fields safely.

The long-term goal is to eventually move it to the kernel crate so it
can be used my other drivers as well, but it was agreed to first land it
into nova-core and make it mature there.

To illustrate its usage, use it to define the layout for the Boot0
(renamed to NV_PMC_BOOT_0 to match OpenRM's naming scheme) and take
advantage of its accessors.

Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 Documentation/gpu/nova/core/todo.rst |   6 +
 drivers/gpu/nova-core/gpu.rs         |  10 +-
 drivers/gpu/nova-core/regs.rs        |  61 ++----
 drivers/gpu/nova-core/regs/macros.rs | 379 +++++++++++++++++++++++++++++++++++
 4 files changed, 402 insertions(+), 54 deletions(-)

diff --git a/Documentation/gpu/nova/core/todo.rst b/Documentation/gpu/nova/core/todo.rst
index 234d753d3eacc709b928b1ccbfc9750ef36ec4ed..8a459fc088121f770bfcda5dfb4ef51c712793ce 100644
--- a/Documentation/gpu/nova/core/todo.rst
+++ b/Documentation/gpu/nova/core/todo.rst
@@ -102,7 +102,13 @@ Usage:
 	let boot0 = Boot0::read(&bar);
 	pr_info!("Revision: {}\n", boot0.revision());
 
+Note: a work-in-progress implementation currently resides in
+`drivers/gpu/nova-core/regs/macros.rs` and is used in nova-core. It would be
+nice to improve it (possibly using proc macros) and move it to the `kernel`
+crate so it can be used by other components as well.
+
 | Complexity: Advanced
+| Contact: Alexandre Courbot
 
 Delay / Sleep abstractions
 --------------------------
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 19a17cdc204b013482c0d307c5838cf3044c8cc8..f95f095baa68c9f7ffe3b1e615548aac5c2a0c6c 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -112,10 +112,10 @@ pub(crate) struct Revision {
 }
 
 impl Revision {
-    fn from_boot0(boot0: regs::Boot0) -> Self {
+    fn from_boot0(boot0: regs::NV_PMC_BOOT_0) -> Self {
         Self {
-            major: boot0.major_rev(),
-            minor: boot0.minor_rev(),
+            major: boot0.major_revision(),
+            minor: boot0.minor_revision(),
         }
     }
 }
@@ -136,10 +136,10 @@ pub(crate) struct Spec {
 impl Spec {
     fn new(bar: &Devres<Bar0>) -> Result<Spec> {
         let bar = bar.try_access().ok_or(ENXIO)?;
-        let boot0 = regs::Boot0::read(&bar);
+        let boot0 = regs::NV_PMC_BOOT_0::read(&*bar);
 
         Ok(Self {
-            chipset: boot0.chipset().try_into()?,
+            chipset: boot0.chipset()?,
             revision: Revision::from_boot0(boot0),
         })
     }
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index b1a25b86ef17a6710e6236d5e7f1f26cd4407ce3..498fefb52f33bf01518f19d32287962f1fdc3224 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -1,55 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use crate::driver::Bar0;
+// Required to retain the original register names used by OpenRM, which are all capital snake case
+// but are mapped to types.
+#![allow(non_camel_case_types)]
 
-// TODO
-//
-// Create register definitions via generic macros. See task "Generic register
-// abstraction" in Documentation/gpu/nova/core/todo.rst.
+#[macro_use]
+mod macros;
 
-const BOOT0_OFFSET: usize = 0x00000000;
+use crate::gpu::Chipset;
 
-// 3:0 - chipset minor revision
-const BOOT0_MINOR_REV_SHIFT: u8 = 0;
-const BOOT0_MINOR_REV_MASK: u32 = 0x0000000f;
+/* PMC */
 
-// 7:4 - chipset major revision
-const BOOT0_MAJOR_REV_SHIFT: u8 = 4;
-const BOOT0_MAJOR_REV_MASK: u32 = 0x000000f0;
-
-// 23:20 - chipset implementation Identifier (depends on architecture)
-const BOOT0_IMPL_SHIFT: u8 = 20;
-const BOOT0_IMPL_MASK: u32 = 0x00f00000;
-
-// 28:24 - chipset architecture identifier
-const BOOT0_ARCH_MASK: u32 = 0x1f000000;
-
-// 28:20 - chipset identifier (virtual register field combining BOOT0_IMPL and
-//         BOOT0_ARCH)
-const BOOT0_CHIPSET_SHIFT: u8 = BOOT0_IMPL_SHIFT;
-const BOOT0_CHIPSET_MASK: u32 = BOOT0_IMPL_MASK | BOOT0_ARCH_MASK;
-
-#[derive(Copy, Clone)]
-pub(crate) struct Boot0(u32);
-
-impl Boot0 {
-    #[inline]
-    pub(crate) fn read(bar: &Bar0) -> Self {
-        Self(bar.read32(BOOT0_OFFSET))
-    }
-
-    #[inline]
-    pub(crate) fn chipset(&self) -> u32 {
-        (self.0 & BOOT0_CHIPSET_MASK) >> BOOT0_CHIPSET_SHIFT
-    }
-
-    #[inline]
-    pub(crate) fn minor_rev(&self) -> u8 {
-        ((self.0 & BOOT0_MINOR_REV_MASK) >> BOOT0_MINOR_REV_SHIFT) as u8
-    }
-
-    #[inline]
-    pub(crate) fn major_rev(&self) -> u8 {
-        ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as u8
-    }
-}
+register!(NV_PMC_BOOT_0 @ 0x00000000, "Basic revision information about the GPU" {
+    3:0     minor_revision as u8, "Minor revision of the chip";
+    7:4     major_revision as u8, "Major revision of the chip";
+    28:20   chipset as u32 ?=> Chipset, "Chipset model";
+});
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
new file mode 100644
index 0000000000000000000000000000000000000000..a08d4f2aa0a32b00e80dae4e6b2c79d072241734
--- /dev/null
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -0,0 +1,379 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Macro to define register layout and accessors.
+//!
+//! A single register typically includes several fields, which are accessed through a combination
+//! of bit-shift and mask operations that introduce a class of potential mistakes, notably because
+//! not all possible field values are necessarily valid.
+//!
+//! The macro in this module allow to define, using an intruitive and readable syntax, a dedicated
+//! type for each register with its own field accessors that can return an error is a field's value
+//! is invalid.
+
+/// Defines a dedicated type for a register with an absolute offset, alongside with getter and
+/// setter methods for its fields and methods to read and write it from an `Io` region.
+///
+/// Example:
+///
+/// ```no_run
+/// register!(BOOT_0 @ 0x00000100, "Basic revision information about the GPU" {
+///    3:0     minor_revision as u8, "Minor revision of the chip";
+///    7:4     major_revision as u8, "Major revision of the chip";
+///    28:20   chipset as u32 ?=> Chipset, "Chipset model";
+/// });
+/// ```
+///
+/// This defines a `BOOT_0` type which can be read or written from offset `0x100` of an `Io`
+/// region. It is composed of 3 fields, for instance `minor_revision` is made of the 4 less
+/// significant bits of the register. Each field can be accessed and modified using accessor
+/// methods:
+///
+/// ```no_run
+/// // Read from the register's defined offset (0x100).
+/// let boot0 = BOOT_0::read(&bar);
+/// pr_info!("chip revision: {}.{}", boot0.major_revision(), boot0.minor_revision());
+///
+/// // `Chipset::try_from` will be called with the value of the field and returns an error if the
+/// // value is invalid.
+/// let chipset = boot0.chipset()?;
+///
+/// // Update some fields and write the value back.
+/// boot0.set_major_revision(3).set_minor_revision(10).write(&bar);
+///
+/// // Or just read and update the register in a single step:
+/// BOOT_0::alter(&bar, |r| r.set_major_revision(3).set_minor_revision(10));
+/// ```
+///
+/// Fields can be defined as follows:
+///
+/// - `as <type>` simply returns the field value casted as the requested integer type, typically
+///   `u32`, `u16`, `u8` or `bool`. Note that `bool` fields must have a range of 1 bit.
+/// - `as <type> => <into_type>` calls `<into_type>`'s `From::<<type>>` implementation and returns
+///   the result.
+/// - `as <type> ?=> <try_into_type>` calls `<try_into_type>`'s `TryFrom::<<type>>` implementation
+///   and returns the result. This is useful on fields for which not all values are value.
+///
+/// The documentation strings are optional. If present, they will be added to the type's
+/// definition, or the field getter and setter methods they are attached to.
+///
+/// Putting a `+` before the address of the register makes it relative to a base: the `read` and
+/// `write` methods take a `base` argument that is added to the specified address before access,
+/// and `try_read` and `try_write` methods are also created, allowing access with offsets unknown
+/// at compile-time:
+///
+/// ```no_run
+/// register!(CPU_CTL @ +0x0000010, "CPU core control" {
+///    0:0     start as bool, "Start the CPU core";
+/// });
+///
+/// // Flip the `start` switch for the CPU core which base address is at `CPU_BASE`.
+/// let cpuctl = CPU_CTL::read(&bar, CPU_BASE);
+/// cpuctl.set_start(true).write(&bar, CPU_BASE);
+/// ```
+macro_rules! register {
+    // Creates a register at a fixed offset of the MMIO space.
+    (
+        $name:ident @ $offset:literal $(, $comment:literal)? {
+            $($fields:tt)*
+        }
+    ) => {
+        register!(@common $name $(, $comment)?);
+        register!(@field_accessors $name { $($fields)* });
+        register!(@io $name @ $offset);
+    };
+
+    // Creates a register at a relative offset from a base address.
+    (
+        $name:ident @ + $offset:literal $(, $comment:literal)? {
+            $($fields:tt)*
+        }
+    ) => {
+        register!(@common $name $(, $comment)?);
+        register!(@field_accessors $name { $($fields)* });
+        register!(@io$name @ + $offset);
+    };
+
+    // Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
+    // and conversion to regular `u32`).
+    (@common $name:ident $(, $comment:literal)?) => {
+        $(
+        #[doc=$comment]
+        )?
+        #[repr(transparent)]
+        #[derive(Clone, Copy, Default)]
+        pub(crate) struct $name(u32);
+
+        // TODO: display the raw hex value, then the value of all the fields. This requires
+        // matching the fields, which will complexify the syntax considerably...
+        impl ::core::fmt::Debug for $name {
+            fn fmt(&self, f: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
+                f.debug_tuple(stringify!($name))
+                    .field(&format_args!("0x{0:x}", &self.0))
+                    .finish()
+            }
+        }
+
+        impl core::ops::BitOr for $name {
+            type Output = Self;
+
+            fn bitor(self, rhs: Self) -> Self::Output {
+                Self(self.0 | rhs.0)
+            }
+        }
+
+        impl ::core::convert::From<$name> for u32 {
+            fn from(reg: $name) -> u32 {
+                reg.0
+            }
+        }
+    };
+
+    // Defines all the field getter/methods methods for `$name`.
+    (
+        @field_accessors $name:ident {
+        $($hi:tt:$lo:tt $field:ident as $type:tt
+            $(?=> $try_into_type:ty)?
+            $(=> $into_type:ty)?
+            $(, $comment:literal)?
+        ;
+        )*
+        }
+    ) => {
+        $(
+            register!(@check_field_bounds $hi:$lo $field as $type);
+        )*
+
+        #[allow(dead_code)]
+        impl $name {
+            $(
+            register!(@field_accessor $name $hi:$lo $field as $type
+                $(?=> $try_into_type)?
+                $(=> $into_type)?
+                $(, $comment)?
+                ;
+            );
+            )*
+        }
+    };
+
+    // Boolean fields must have `$hi == $lo`.
+    (@check_field_bounds $hi:tt:$lo:tt $field:ident as bool) => {
+        #[allow(clippy::eq_op)]
+        const _: () = {
+            kernel::build_assert!(
+                $hi == $lo,
+                concat!("boolean field `", stringify!($field), "` covers more than one bit")
+            );
+        };
+    };
+
+    // Non-boolean fields must have `$hi >= $lo`.
+    (@check_field_bounds $hi:tt:$lo:tt $field:ident as $type:tt) => {
+        #[allow(clippy::eq_op)]
+        const _: () = {
+            kernel::build_assert!(
+                $hi >= $lo,
+                concat!("field `", stringify!($field), "`'s MSB is smaller than its LSB")
+            );
+        };
+    };
+
+    // Catches fields defined as `bool` and convert them into a boolean value.
+    (
+        @field_accessor $name:ident $hi:tt:$lo:tt $field:ident as bool => $into_type:ty
+            $(, $comment:literal)?;
+    ) => {
+        register!(
+            @leaf_accessor $name $hi:$lo $field as bool
+            { |f| <$into_type>::from(if f != 0 { true } else { false }) }
+            $into_type => $into_type $(, $comment)?;
+        );
+    };
+
+    // Shortcut for fields defined as `bool` without the `=>` syntax.
+    (
+        @field_accessor $name:ident $hi:tt:$lo:tt $field:ident as bool $(, $comment:literal)?;
+    ) => {
+        register!(@field_accessor $name $hi:$lo $field as bool => bool $(, $comment)?;);
+    };
+
+    // Catches the `?=>` syntax for non-boolean fields.
+    (
+        @field_accessor $name:ident $hi:tt:$lo:tt $field:ident as $type:tt ?=> $try_into_type:ty
+            $(, $comment:literal)?;
+    ) => {
+        register!(@leaf_accessor $name $hi:$lo $field as $type
+            { |f| <$try_into_type>::try_from(f as $type) } $try_into_type =>
+            ::core::result::Result<
+                $try_into_type,
+                <$try_into_type as ::core::convert::TryFrom<$type>>::Error
+            >
+            $(, $comment)?;);
+    };
+
+    // Catches the `=>` syntax for non-boolean fields.
+    (
+        @field_accessor $name:ident $hi:tt:$lo:tt $field:ident as $type:tt => $into_type:ty
+            $(, $comment:literal)?;
+    ) => {
+        register!(@leaf_accessor $name $hi:$lo $field as $type
+            { |f| <$into_type>::from(f as $type) } $into_type => $into_type $(, $comment)?;);
+    };
+
+    // Shortcut for fields defined as non-`bool` without the `=>` or `?=>` syntax.
+    (
+        @field_accessor $name:ident $hi:tt:$lo:tt $field:ident as $type:tt
+            $(, $comment:literal)?;
+    ) => {
+        register!(@field_accessor $name $hi:$lo $field as $type => $type $(, $comment)?;);
+    };
+
+    // Generates the accessor methods for a single field.
+    (
+        @leaf_accessor $name:ident $hi:tt:$lo:tt $field:ident as $type:ty
+            { $process:expr } $to_type:ty => $res_type:ty $(, $comment:literal)?;
+    ) => {
+        kernel::macros::paste!(
+        const [<$field:upper>]: ::core::ops::RangeInclusive<u8> = $lo..=$hi;
+        const [<$field:upper _MASK>]: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
+        const [<$field:upper _SHIFT>]: u32 = Self::[<$field:upper _MASK>].trailing_zeros();
+        );
+
+        $(
+        #[doc="Returns the value of this field:"]
+        #[doc=$comment]
+        )?
+        #[inline]
+        pub(crate) fn $field(self) -> $res_type {
+            kernel::macros::paste!(
+            const MASK: u32 = $name::[<$field:upper _MASK>];
+            const SHIFT: u32 = $name::[<$field:upper _SHIFT>];
+            );
+            let field = ((self.0 & MASK) >> SHIFT);
+
+            $process(field)
+        }
+
+        kernel::macros::paste!(
+        $(
+        #[doc="Sets the value of this field:"]
+        #[doc=$comment]
+        )?
+        #[inline]
+        pub(crate) fn [<set_ $field>](mut self, value: $to_type) -> Self {
+            const MASK: u32 = $name::[<$field:upper _MASK>];
+            const SHIFT: u32 = $name::[<$field:upper _SHIFT>];
+            let value = ((value as u32) << SHIFT) & MASK;
+            self.0 = (self.0 & !MASK) | value;
+
+            self
+        }
+        );
+    };
+
+    // Creates the IO accessors for a fixed offset register.
+    (@io $name:ident @ $offset:literal) => {
+        #[allow(dead_code)]
+        impl $name {
+            #[inline]
+            pub(crate) fn read<const SIZE: usize, T>(io: &T) -> Self where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+            {
+                Self(io.read32($offset))
+            }
+
+            #[inline]
+            pub(crate) fn write<const SIZE: usize, T>(self, io: &T) where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+            {
+                io.write32(self.0, $offset)
+            }
+
+            #[inline]
+            pub(crate) fn alter<const SIZE: usize, T, F>(
+                io: &T,
+                f: F,
+            ) where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+                F: ::core::ops::FnOnce(Self) -> Self,
+            {
+                let reg = f(Self::read(io));
+                reg.write(io);
+            }
+        }
+    };
+
+    // Create the IO accessors for a relative offset register.
+    (@io $name:ident @ + $offset:literal) => {
+        #[allow(dead_code)]
+        impl $name {
+            #[inline]
+            pub(crate) fn read<const SIZE: usize, T>(
+                io: &T,
+                base: usize,
+            ) -> Self where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+            {
+                Self(io.read32(base + $offset))
+            }
+
+            #[inline]
+            pub(crate) fn write<const SIZE: usize, T>(
+                self,
+                io: &T,
+                base: usize,
+            ) where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+            {
+                io.write32(self.0, base + $offset)
+            }
+
+            #[inline]
+            pub(crate) fn alter<const SIZE: usize, T, F>(
+                io: &T,
+                base: usize,
+                f: F,
+            ) where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+                F: ::core::ops::FnOnce(Self) -> Self,
+            {
+                let reg = f(Self::read(io, base));
+                reg.write(io, base);
+            }
+
+            #[inline]
+            pub(crate) fn try_read<const SIZE: usize, T>(
+                io: &T,
+                base: usize,
+            ) -> ::kernel::error::Result<Self> where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+            {
+                io.try_read32(base + $offset).map(Self)
+            }
+
+            #[inline]
+            pub(crate) fn try_write<const SIZE: usize, T>(
+                self,
+                io: &T,
+                base: usize,
+            ) -> ::kernel::error::Result<()> where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+            {
+                io.try_write32(self.0, base + $offset)
+            }
+
+            #[inline]
+            pub(crate) fn try_alter<const SIZE: usize, T, F>(
+                io: &T,
+                base: usize,
+                f: F,
+            ) -> ::kernel::error::Result<()> where
+                T: ::core::ops::Deref<Target = ::kernel::io::Io<SIZE>>,
+                F: ::core::ops::FnOnce(Self) -> Self,
+            {
+                let reg = f(Self::try_read(io, base)?);
+                reg.try_write(io, base)
+            }
+        }
+    };
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 07/21] gpu: nova-core: fix layout of NV_PMC_BOOT_0
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (5 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 06/21] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 08/21] gpu: nova-core: introduce helper macro for register access Alexandre Courbot
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

The layout of NV_PMC_BOOT_0 has two small issues:

- The "chipset" field, while useful to identify a chip, is actually an
  aggregate of two distinct fields named "architecture" and
  "implementation".
- The "architecture" field is split, with its MSB being at a different
  location than the rest of its bits.

Redefine the register layout to match its actual definition as provided
by OpenRM and expose the fully-constructed "architecture" field through
our own "Architecture" type. The "chipset" pseudo-field is also useful
to have, so keep providing it.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs  | 19 ++++++++++++++++---
 drivers/gpu/nova-core/regs.rs | 26 ++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index f95f095baa68c9f7ffe3b1e615548aac5c2a0c6c..64c38425098c19360a7c938f2b86a55ca3c48880 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -101,9 +101,22 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 /// Enum representation of the GPU generation.
 #[derive(fmt::Debug)]
 pub(crate) enum Architecture {
-    Turing,
-    Ampere,
-    Ada,
+    Turing = 0x16,
+    Ampere = 0x17,
+    Ada = 0x19,
+}
+
+impl TryFrom<u8> for Architecture {
+    type Error = Error;
+
+    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
+        match value {
+            0x16 => Ok(Self::Turing),
+            0x17 => Ok(Self::Ampere),
+            0x19 => Ok(Self::Ada),
+            _ => Err(ENODEV),
+        }
+    }
 }
 
 pub(crate) struct Revision {
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 498fefb52f33bf01518f19d32287962f1fdc3224..bfb9555b203ff880c0fc373bb22e5ce6048015d4 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -4,15 +4,37 @@
 // but are mapped to types.
 #![allow(non_camel_case_types)]
 
+use kernel::error::Error;
+
 #[macro_use]
 mod macros;
 
-use crate::gpu::Chipset;
+use crate::gpu::{Architecture, Chipset};
 
 /* PMC */
 
 register!(NV_PMC_BOOT_0 @ 0x00000000, "Basic revision information about the GPU" {
     3:0     minor_revision as u8, "Minor revision of the chip";
     7:4     major_revision as u8, "Major revision of the chip";
-    28:20   chipset as u32 ?=> Chipset, "Chipset model";
+    8:8     architecture_1 as u8, "MSB of the architecture";
+    23:20   implementation as u8, "Implementation version of the architecture";
+    28:24   architecture_0 as u8, "Lower bits of the architecture";
 });
+
+impl NV_PMC_BOOT_0 {
+    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
+    pub(crate) fn architecture(self) -> Result<Architecture, Error> {
+        Architecture::try_from(
+            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0.len()),
+        )
+    }
+
+    /// Combines `architecture` and `implementation` to obtain a code unique to the chipset.
+    pub(crate) fn chipset(self) -> Result<Chipset, Error> {
+        self.architecture()
+            .map(|arch| {
+                ((arch as u32) << Self::IMPLEMENTATION.len()) | self.implementation() as u32
+            })
+            .and_then(Chipset::try_from)
+    }
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 08/21] gpu: nova-core: introduce helper macro for register access
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (6 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 07/21] gpu: nova-core: fix layout of NV_PMC_BOOT_0 Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Acquiring the BAR temporarily to access is going to be a very common
pattern, that typically takes two lines of code and introduces a
short-lived local variable.

Add the Nova-local convenience with_bar!() macro, which uses
Revocable::try_access_with() and converts its returned Option into the
proper error as needed.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs         |  3 +--
 drivers/gpu/nova-core/nova_core.rs   | 18 ++++++++++++++++++
 drivers/gpu/nova-core/regs/macros.rs |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 64c38425098c19360a7c938f2b86a55ca3c48880..275b005d262e0a01a9ef1498836ef3c3019cb497 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -148,8 +148,7 @@ pub(crate) struct Spec {
 
 impl Spec {
     fn new(bar: &Devres<Bar0>) -> Result<Spec> {
-        let bar = bar.try_access().ok_or(ENXIO)?;
-        let boot0 = regs::NV_PMC_BOOT_0::read(&*bar);
+        let boot0 = with_bar!(bar, regs::NV_PMC_BOOT_0::read)?;
 
         Ok(Self {
             chipset: boot0.chipset()?,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index a91cd924054b49966937a8db6aab9cd0614f10de..0eecd612e34efc046dad852e6239de6ffa5fdd62 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -2,6 +2,24 @@
 
 //! Nova Core GPU Driver
 
+#[macro_use]
+mod macros {
+    /// Convenience macro to run a closure while holding [`crate::driver::Bar0`].
+    ///
+    /// If the bar cannot be acquired, then `ENXIO` is returned.
+    ///
+    /// If a `?` is present before the `bar` argument, then the `Result` returned by the closure is
+    /// merged into the `Result` of the macro itself to avoid having a `Result<Result<>>`.
+    macro_rules! with_bar {
+        ($bar:expr, $closure:expr) => {
+            $bar.try_access_with($closure).ok_or(ENXIO)
+        };
+        (? $bar:expr, $closure:expr) => {
+            with_bar!($bar, $closure).and_then(|r| r)
+        };
+    }
+}
+
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
index a08d4f2aa0a32b00e80dae4e6b2c79d072241734..7ecc70efb3cd723b673cd72915e72b8a4a009f06 100644
--- a/drivers/gpu/nova-core/regs/macros.rs
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -68,6 +68,7 @@
 ///
 /// // Flip the `start` switch for the CPU core which base address is at `CPU_BASE`.
 /// let cpuctl = CPU_CTL::read(&bar, CPU_BASE);
+/// pr_info!("CPU CTL: {:#x}", cpuctl);
 /// cpuctl.set_start(true).write(&bar, CPU_BASE);
 /// ```
 macro_rules! register {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (7 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 08/21] gpu: nova-core: introduce helper macro for register access Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-02 21:14   ` Timur Tabi
  2025-05-01 12:58 ` [PATCH v2 10/21] rust: make ETIMEDOUT error available Alexandre Courbot
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will extend the firmware methods, so move it to its own module
instead to keep gpu.rs focused.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 46 +++++++++++++++++++++++++++++++++++++--
 drivers/gpu/nova-core/gpu.rs      | 35 +++--------------------------
 2 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 6e6361c59ca1ae9a52185e66e850ba1db93eb8ce..cb79d039948858e657c9a23a62ed27ff780ac169 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -1,12 +1,54 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use crate::gpu;
+//! Contains structures and functions dedicated to the parsing, building and patching of firmwares
+//! to be loaded into a given execution unit.
+
+use kernel::device;
 use kernel::firmware;
+use kernel::prelude::*;
+use kernel::str::CString;
+
+use crate::gpu;
+use crate::gpu::Chipset;
+
+pub(crate) const FIRMWARE_VERSION: &str = "535.113.01";
+
+/// Structure encapsulating the firmware blobs required for the GPU to operate.
+#[expect(dead_code)]
+pub(crate) struct Firmware {
+    pub booter_load: firmware::Firmware,
+    pub booter_unload: firmware::Firmware,
+    pub bootloader: firmware::Firmware,
+    pub gsp: firmware::Firmware,
+}
+
+impl Firmware {
+    pub(crate) fn new(
+        dev: &device::Device<device::Bound>,
+        chipset: Chipset,
+        ver: &str,
+    ) -> Result<Firmware> {
+        let mut chip_name = CString::try_from_fmt(fmt!("{}", chipset))?;
+        chip_name.make_ascii_lowercase();
+
+        let request = |name_| {
+            CString::try_from_fmt(fmt!("nvidia/{}/gsp/{}-{}.bin", &*chip_name, name_, ver))
+                .and_then(|path| firmware::Firmware::request(&path, dev))
+        };
+
+        Ok(Firmware {
+            booter_load: request("booter_load")?,
+            booter_unload: request("booter_unload")?,
+            bootloader: request("bootloader")?,
+            gsp: request("gsp")?,
+        })
+    }
+}
 
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
 impl<const N: usize> ModInfoBuilder<N> {
-    const VERSION: &'static str = "535.113.01";
+    const VERSION: &'static str = FIRMWARE_VERSION;
 
     const fn make_entry_file(self, chipset: &str, fw: &str) -> Self {
         ModInfoBuilder(
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 275b005d262e0a01a9ef1498836ef3c3019cb497..b1ee465de907432325c4d0e0dead3a52e81ed067 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,10 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use kernel::{
-    device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*, str::CString,
-};
+use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
 use crate::driver::Bar0;
+use crate::firmware::Firmware;
 use crate::regs;
 use crate::util;
 use core::fmt;
@@ -157,34 +156,6 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
     }
 }
 
-/// Structure encapsulating the firmware blobs required for the GPU to operate.
-#[expect(dead_code)]
-pub(crate) struct Firmware {
-    booter_load: firmware::Firmware,
-    booter_unload: firmware::Firmware,
-    bootloader: firmware::Firmware,
-    gsp: firmware::Firmware,
-}
-
-impl Firmware {
-    fn new(dev: &device::Device, spec: &Spec, ver: &str) -> Result<Firmware> {
-        let mut chip_name = CString::try_from_fmt(fmt!("{}", spec.chipset))?;
-        chip_name.make_ascii_lowercase();
-
-        let request = |name_| {
-            CString::try_from_fmt(fmt!("nvidia/{}/gsp/{}-{}.bin", &*chip_name, name_, ver))
-                .and_then(|path| firmware::Firmware::request(&path, dev))
-        };
-
-        Ok(Firmware {
-            booter_load: request("booter_load")?,
-            booter_unload: request("booter_unload")?,
-            bootloader: request("bootloader")?,
-            gsp: request("gsp")?,
-        })
-    }
-}
-
 /// Structure holding the resources required to operate the GPU.
 #[pin_data]
 pub(crate) struct Gpu {
@@ -200,7 +171,7 @@ pub(crate) fn new(
         bar: Devres<Bar0>,
     ) -> Result<impl PinInit<Self>> {
         let spec = Spec::new(&bar)?;
-        let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;
+        let fw = Firmware::new(pdev.as_ref(), spec.chipset, "535.113.01")?;
 
         dev_info!(
             pdev.as_ref(),

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 10/21] rust: make ETIMEDOUT error available
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (8 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 11/21] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will use this error in the nova-core driver.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/error.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 3dee3139fcd4379b94748c0ba1965f4e1865b633..083c7b068cf4e185100de96e520c54437898ee72 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -65,6 +65,7 @@ macro_rules! declare_err {
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
     declare_err!(EOVERFLOW, "Value too large for defined data type.");
+    declare_err!(ETIMEDOUT, "Connection timed out.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 11/21] gpu: nova-core: wait for GFW_BOOT completion
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (9 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 10/21] rust: make ETIMEDOUT error available Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 12/21] gpu: nova-core: add DMA object struct Alexandre Courbot
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Upon reset, the GPU executes the GFW_BOOT firmware in order to
initialize its base parameters such as clocks. The driver must ensure
that this step is completed before using the hardware.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/devinit.rs   | 43 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/driver.rs    |  2 +-
 drivers/gpu/nova-core/gpu.rs       |  5 +++++
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 11 ++++++++++
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
new file mode 100644
index 0000000000000000000000000000000000000000..31a313a0652cfb16c490a346d06c47ca5b0edde9
--- /dev/null
+++ b/drivers/gpu/nova-core/devinit.rs
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Methods for device initialization.
+
+use kernel::bindings;
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+/// Wait for devinit FW completion.
+///
+/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
+/// considered unusable until this step is completed, so it must be waited on very early during
+/// driver initialization.
+pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
+    let mut timeout = 2000;
+
+    loop {
+        let gfw_booted =
+            with_bar!(
+                bar,
+                |b| regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK::read(b)
+                    .read_protection_level0()
+                    && (regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_05::read(b).value() & 0xff) == 0xff
+            )?;
+
+        if gfw_booted {
+            return Ok(());
+        }
+
+        if timeout == 0 {
+            return Err(ETIMEDOUT);
+        }
+        timeout -= 1;
+
+        // TODO: use `read_poll_timeout` once it is available.
+        // (https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/)
+        // SAFETY: msleep should be safe to call with any parameter.
+        unsafe { bindings::msleep(2) };
+    }
+}
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x1000000;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index b1ee465de907432325c4d0e0dead3a52e81ed067..ac246d08141e95927a25cada4ecb7e23156ac6b4 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -2,6 +2,7 @@
 
 use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
+use crate::devinit;
 use crate::driver::Bar0;
 use crate::firmware::Firmware;
 use crate::regs;
@@ -181,6 +182,10 @@ pub(crate) fn new(
             spec.revision
         );
 
+        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
+        devinit::wait_gfw_boot_completion(&bar)
+            .inspect_err(|_| dev_err!(pdev.as_ref(), "GFW boot did not complete"))?;
+
         Ok(pin_init!(Self { spec, bar, fw }))
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -20,6 +20,7 @@ macro_rules! with_bar {
     }
 }
 
+mod devinit;
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index bfb9555b203ff880c0fc373bb22e5ce6048015d4..401d885539cee03cbe732102f5e2233785a7b284 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -38,3 +38,14 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
             .and_then(Chipset::try_from)
     }
 }
+
+/* PGC6 */
+
+register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {
+    0:0     read_protection_level0 as bool;
+});
+
+// TODO: This is an array of registers.
+register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05 @ 0x00118234 {
+    31:0    value as u32;
+});

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 12/21] gpu: nova-core: add DMA object struct
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (10 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 11/21] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 13/21] gpu: nova-core: register sysmem flush page Alexandre Courbot
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Since we will need to allocate lots of distinct memory chunks to be
shared between GPU and CPU, introduce a type dedicated to that. It is a
light wrapper around CoherentAllocation.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/dma.rs       | 60 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |  1 +
 2 files changed, 61 insertions(+)

diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
new file mode 100644
index 0000000000000000000000000000000000000000..9d90ae01d0044eaab4ddbc3eba216741d7a623ef
--- /dev/null
+++ b/drivers/gpu/nova-core/dma.rs
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Simple DMA object wrapper.
+
+// To be removed when all code is used.
+#![expect(dead_code)]
+
+use core::ops::{Deref, DerefMut};
+
+use kernel::device;
+use kernel::dma::CoherentAllocation;
+use kernel::page::PAGE_SIZE;
+use kernel::prelude::*;
+
+pub(crate) struct DmaObject {
+    dma: CoherentAllocation<u8>,
+}
+
+impl DmaObject {
+    pub(crate) fn new(dev: &device::Device<device::Bound>, len: usize) -> Result<Self> {
+        let len = core::alloc::Layout::from_size_align(len, PAGE_SIZE)
+            .map_err(|_| EINVAL)?
+            .pad_to_align()
+            .size();
+        let dma = CoherentAllocation::alloc_coherent(dev, len, GFP_KERNEL | __GFP_ZERO)?;
+
+        Ok(Self { dma })
+    }
+
+    pub(crate) fn from_data(dev: &device::Device<device::Bound>, data: &[u8]) -> Result<Self> {
+        Self::new(dev, data.len()).map(|mut dma_obj| {
+            // SAFETY:
+            // - The copied data fits within the size of the allocated object.
+            // - We have just created this object and there is no other user at this stage.
+            unsafe {
+                core::ptr::copy_nonoverlapping(
+                    data.as_ptr(),
+                    dma_obj.dma.start_ptr_mut(),
+                    data.len(),
+                );
+            }
+
+            dma_obj
+        })
+    }
+}
+
+impl Deref for DmaObject {
+    type Target = CoherentAllocation<u8>;
+
+    fn deref(&self) -> &Self::Target {
+        &self.dma
+    }
+}
+
+impl DerefMut for DmaObject {
+    fn deref_mut(&mut self) -> &mut Self::Target {
+        &mut self.dma
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 878161e060f54da7738c656f6098936a62dcaa93..37c7eb0ea7a926bee4e3c661028847291bf07fa2 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -21,6 +21,7 @@ macro_rules! with_bar {
 }
 
 mod devinit;
+mod dma;
 mod driver;
 mod firmware;
 mod gpu;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 13/21] gpu: nova-core: register sysmem flush page
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (11 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 12/21] gpu: nova-core: add DMA object struct Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 14/21] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

A page of system memory is reserved so sysmembar can perform a read on
it if a system write occurred since the last flush. Do this early as it
can be required to e.g. reset the GPU falcons.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs  | 51 +++++++++++++++++++++++++++++++++++++++++--
 drivers/gpu/nova-core/regs.rs | 10 +++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index ac246d08141e95927a25cada4ecb7e23156ac6b4..08302967375274fd88bb9b4fef4969e79e82f3b4 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -3,6 +3,7 @@
 use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
 use crate::devinit;
+use crate::dma::DmaObject;
 use crate::driver::Bar0;
 use crate::firmware::Firmware;
 use crate::regs;
@@ -158,12 +159,32 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
 }
 
 /// Structure holding the resources required to operate the GPU.
-#[pin_data]
+#[pin_data(PinnedDrop)]
 pub(crate) struct Gpu {
     spec: Spec,
     /// MMIO mapping of PCI BAR 0
     bar: Devres<Bar0>,
     fw: Firmware,
+    // System memory page required for flushing all pending GPU-side memory writes done through
+    // PCIE into system memory.
+    sysmem_flush: DmaObject,
+}
+
+#[pinned_drop]
+impl PinnedDrop for Gpu {
+    fn drop(self: Pin<&mut Self>) {
+        // Unregister the sysmem flush page before we release it.
+        let _ = with_bar!(&self.bar, |b| {
+            regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::default()
+                .set_adr_39_08(0)
+                .write(b);
+            if self.spec.chipset >= Chipset::GA102 {
+                regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::default()
+                    .set_adr_63_40(0)
+                    .write(b);
+            }
+        });
+    }
 }
 
 impl Gpu {
@@ -186,6 +207,32 @@ pub(crate) fn new(
         devinit::wait_gfw_boot_completion(&bar)
             .inspect_err(|_| dev_err!(pdev.as_ref(), "GFW boot did not complete"))?;
 
-        Ok(pin_init!(Self { spec, bar, fw }))
+        // System memory page required for sysmembar to properly flush into system memory.
+        let sysmem_flush = {
+            let page = DmaObject::new(pdev.as_ref(), kernel::bindings::PAGE_SIZE)?;
+
+            // Register the sysmem flush page.
+            with_bar!(bar, |b| {
+                let handle = page.dma_handle();
+
+                regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::default()
+                    .set_adr_39_08((handle >> 8) as u32)
+                    .write(b);
+                if spec.chipset >= Chipset::GA102 {
+                    regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::default()
+                        .set_adr_63_40((handle >> 40) as u32)
+                        .write(b);
+                }
+            })?;
+
+            page
+        };
+
+        Ok(pin_init!(Self {
+            spec,
+            bar,
+            fw,
+            sysmem_flush,
+        }))
     }
 }
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 401d885539cee03cbe732102f5e2233785a7b284..218cb6441eb0e5c6e5b52eabba006163eec0c8b4 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -39,6 +39,16 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
     }
 }
 
+/* PFB */
+
+register!(NV_PFB_NISO_FLUSH_SYSMEM_ADDR @ 0x00100c10 {
+    31:0    adr_39_08 as u32;
+});
+
+register!(NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI @ 0x00100c40 {
+    23:0    adr_63_40 as u32;
+});
+
 /* PGC6 */
 
 register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 14/21] gpu: nova-core: add helper function to wait on condition
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (12 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 13/21] gpu: nova-core: register sysmem flush page Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

While programming the hardware, we frequently need to busy-wait until
a condition (like a given bit of a register to switch value) happens.

Add a basic `wait_on` helper function to wait on such conditions
expressed as a closure, with a timeout argument.

This is temporary as we will switch to `read_poll_timeout` [1] once it
is available.

[1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/util.rs | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/nova-core/util.rs b/drivers/gpu/nova-core/util.rs
index 332a64cfc6a9d7d787fbdc228887c0be53a97160..afb525228431a2645afe7bb34988e9537757b1d7 100644
--- a/drivers/gpu/nova-core/util.rs
+++ b/drivers/gpu/nova-core/util.rs
@@ -1,5 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::time::Duration;
+
+use kernel::prelude::*;
+use kernel::time::Ktime;
+
 pub(crate) const fn to_lowercase_bytes<const N: usize>(s: &str) -> [u8; N] {
     let src = s.as_bytes();
     let mut dst = [0; N];
@@ -19,3 +24,28 @@ pub(crate) const fn const_bytes_to_str(bytes: &[u8]) -> &str {
         Err(_) => kernel::build_error!("Bytes are not valid UTF-8."),
     }
 }
+
+/// Wait until `cond` is true or `timeout` elapsed.
+///
+/// When `cond` evaluates to `Some`, its return value is returned.
+///
+/// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
+/// `Some`.
+///
+/// TODO: replace with `read_poll_timeout` once it is available.
+/// (https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/)
+#[expect(dead_code)]
+pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(timeout: Duration, cond: F) -> Result<R> {
+    let start_time = Ktime::ktime_get();
+
+    loop {
+        if let Some(ret) = cond() {
+            return Ok(ret);
+        }
+
+        let cur_time = Ktime::ktime_get();
+        if (cur_time - start_time).to_ns() > timeout.as_nanos() as i64 {
+            return Err(ETIMEDOUT);
+        }
+    }
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (13 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 14/21] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 13:52   ` Joel Fernandes
  2025-05-01 12:58 ` [PATCH v2 16/21] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add the common Falcon code and HAL for Ampere GPUs, and instantiate the
GSP and SEC2 Falcons that will be required to boot the GSP.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs           | 546 ++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/falcon/gsp.rs       |  25 ++
 drivers/gpu/nova-core/falcon/hal.rs       |  55 +++
 drivers/gpu/nova-core/falcon/hal/ga102.rs | 115 +++++++
 drivers/gpu/nova-core/falcon/sec2.rs      |   8 +
 drivers/gpu/nova-core/gpu.rs              |  11 +
 drivers/gpu/nova-core/nova_core.rs        |   1 +
 drivers/gpu/nova-core/regs.rs             | 125 +++++++
 drivers/gpu/nova-core/util.rs             |   1 -
 9 files changed, 886 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
new file mode 100644
index 0000000000000000000000000000000000000000..7cae45645e548bab5b85cb53880898cedbae778a
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -0,0 +1,546 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Falcon microprocessor base support
+
+// To be removed when all code is used.
+#![expect(dead_code)]
+
+use core::time::Duration;
+use hal::FalconHal;
+use kernel::bindings;
+use kernel::device;
+use kernel::devres::Devres;
+use kernel::prelude::*;
+use kernel::sync::Arc;
+
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+use crate::regs;
+use crate::util;
+
+pub(crate) mod gsp;
+mod hal;
+pub(crate) mod sec2;
+
+/// Revision number of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
+/// register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRev {
+    #[default]
+    Rev1 = 1,
+    Rev2 = 2,
+    Rev3 = 3,
+    Rev4 = 4,
+    Rev5 = 5,
+    Rev6 = 6,
+    Rev7 = 7,
+}
+
+impl TryFrom<u8> for FalconCoreRev {
+    type Error = Error;
+
+    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
+        use FalconCoreRev::*;
+
+        let rev = match value {
+            1 => Rev1,
+            2 => Rev2,
+            3 => Rev3,
+            4 => Rev4,
+            5 => Rev5,
+            6 => Rev6,
+            7 => Rev7,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(rev)
+    }
+}
+
+/// Revision subversion number of a falcon core, used in the
+/// [`crate::regs::NV_PFALCON_FALCON_HWCFG1`] register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRevSubversion {
+    #[default]
+    Subversion0 = 0,
+    Subversion1 = 1,
+    Subversion2 = 2,
+    Subversion3 = 3,
+}
+
+impl TryFrom<u8> for FalconCoreRevSubversion {
+    type Error = Error;
+
+    fn try_from(value: u8) -> Result<Self> {
+        use FalconCoreRevSubversion::*;
+
+        let sub_version = match value & 0b11 {
+            0 => Subversion0,
+            1 => Subversion1,
+            2 => Subversion2,
+            3 => Subversion3,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(sub_version)
+    }
+}
+
+/// Security model of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
+/// register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone)]
+pub(crate) enum FalconSecurityModel {
+    /// Non-Secure: runs unsigned code without privileges.
+    #[default]
+    None = 0,
+    /// Low-secure: runs unsigned code with some privileges. Can only be entered from `Heavy` mode.
+    Light = 2,
+    /// High-Secure: runs signed code with full privileges.
+    Heavy = 3,
+}
+
+impl TryFrom<u8> for FalconSecurityModel {
+    type Error = Error;
+
+    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
+        use FalconSecurityModel::*;
+
+        let sec_model = match value {
+            0 => None,
+            2 => Light,
+            3 => Heavy,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(sec_model)
+    }
+}
+
+/// Signing algorithm for a given firmware, used in the [`crate::regs::NV_PFALCON2_FALCON_MOD_SEL`]
+/// register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq)]
+pub(crate) enum FalconModSelAlgo {
+    /// RSA3K.
+    #[default]
+    Rsa3k = 1,
+}
+
+impl TryFrom<u8> for FalconModSelAlgo {
+    type Error = Error;
+
+    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
+        match value {
+            1 => Ok(FalconModSelAlgo::Rsa3k),
+            _ => Err(EINVAL),
+        }
+    }
+}
+
+/// Valid values for the `size` field of the [`crate::regs::NV_PFALCON_FALCON_DMATRFCMD`] register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq)]
+pub(crate) enum DmaTrfCmdSize {
+    /// 256 bytes transfer.
+    #[default]
+    Size256B = 0x6,
+}
+
+impl TryFrom<u8> for DmaTrfCmdSize {
+    type Error = Error;
+
+    fn try_from(value: u8) -> Result<Self> {
+        match value {
+            0x6 => Ok(Self::Size256B),
+            _ => Err(EINVAL),
+        }
+    }
+}
+
+/// Currently active core on a dual falcon/riscv (Peregrine) controller.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum PeregrineCoreSelect {
+    /// Falcon core is active.
+    Falcon = 0,
+    /// RISC-V core is active.
+    Riscv = 1,
+}
+
+impl From<bool> for PeregrineCoreSelect {
+    fn from(value: bool) -> Self {
+        match value {
+            false => PeregrineCoreSelect::Falcon,
+            true => PeregrineCoreSelect::Riscv,
+        }
+    }
+}
+
+/// Different types of memory present in a falcon core.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum FalconMem {
+    /// Instruction Memory.
+    Imem,
+    /// Data Memory.
+    Dmem,
+}
+
+/// Target/source of a DMA transfer to/from falcon memory.
+#[derive(Debug, Clone, Default)]
+pub(crate) enum FalconFbifTarget {
+    /// VRAM.
+    #[default]
+    LocalFb = 0,
+    /// Coherent system memory.
+    CoherentSysmem = 1,
+    /// Non-coherent system memory.
+    NoncoherentSysmem = 2,
+}
+
+impl TryFrom<u8> for FalconFbifTarget {
+    type Error = Error;
+
+    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
+        let res = match value {
+            0 => Self::LocalFb,
+            1 => Self::CoherentSysmem,
+            2 => Self::NoncoherentSysmem,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(res)
+    }
+}
+
+/// Type of memory addresses to use.
+#[derive(Debug, Clone, Default)]
+pub(crate) enum FalconFbifMemType {
+    /// Physical memory addresses.
+    #[default]
+    Virtual = 0,
+    /// Virtual memory addresses.
+    Physical = 1,
+}
+
+impl From<bool> for FalconFbifMemType {
+    fn from(value: bool) -> Self {
+        match value {
+            false => Self::Virtual,
+            true => Self::Physical,
+        }
+    }
+}
+
+/// Trait defining the parameters of a given Falcon instance.
+pub(crate) trait FalconEngine: Sync {
+    /// Base I/O address for the falcon, relative from which its registers are accessed.
+    const BASE: usize;
+}
+
+/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
+#[derive(Debug)]
+pub(crate) struct FalconLoadTarget {
+    /// Offset from the start of the source object to copy from.
+    pub(crate) src_start: u32,
+    /// Offset from the start of the destination memory to copy into.
+    pub(crate) dst_start: u32,
+    /// Number of bytes to copy.
+    pub(crate) len: u32,
+}
+
+/// Parameters for the falcon boot ROM.
+#[derive(Debug)]
+pub(crate) struct FalconBromParams {
+    /// Offset in `DMEM`` of the firmware's signature.
+    pub(crate) pkc_data_offset: u32,
+    /// Mask of engines valid for this firmware.
+    pub(crate) engine_id_mask: u16,
+    /// ID of the ucode used to infer a fuse register to validate the signature.
+    pub(crate) ucode_id: u8,
+}
+
+/// Trait for a falcon firmware.
+pub(crate) trait FalconFirmware {
+    /// Engine on which this firmware is to be loaded.
+    type Target: FalconEngine;
+
+    /// Returns the DMA handle of the object containing the firmware.
+    fn dma_handle(&self) -> bindings::dma_addr_t;
+
+    /// Returns the load parameters for `IMEM`.
+    fn imem_load(&self) -> FalconLoadTarget;
+
+    /// Returns the load parameters for `DMEM`.
+    fn dmem_load(&self) -> FalconLoadTarget;
+
+    /// Returns the parameters to write into the BROM registers.
+    fn brom_params(&self) -> FalconBromParams;
+
+    /// Returns the start address of the firmware.
+    fn boot_addr(&self) -> u32;
+}
+
+/// Contains the base parameters common to all Falcon instances.
+pub(crate) struct Falcon<E: FalconEngine> {
+    hal: Arc<dyn FalconHal<E>>,
+}
+
+impl<E: FalconEngine + 'static> Falcon<E> {
+    /// Create a new falcon instance.
+    ///
+    /// `need_riscv` is set to `true` if the caller expects the falcon to be a dual falcon/riscv
+    /// controller.
+    pub(crate) fn new(
+        dev: &device::Device,
+        chipset: Chipset,
+        bar: &Devres<Bar0>,
+        need_riscv: bool,
+    ) -> Result<Self> {
+        let hwcfg1 = with_bar!(bar, |b| regs::NV_PFALCON_FALCON_HWCFG1::read(b, E::BASE))?;
+        // Ensure that the revision and security model contain valid values.
+        let _rev = hwcfg1.core_rev()?;
+        let _sec_model = hwcfg1.security_model()?;
+
+        if need_riscv {
+            let hwcfg2 = with_bar!(bar, |b| regs::NV_PFALCON_FALCON_HWCFG2::read(b, E::BASE))?;
+            if !hwcfg2.riscv() {
+                dev_err!(
+                    dev,
+                    "riscv support requested on a controller that does not support it\n"
+                );
+                return Err(EINVAL);
+            }
+        }
+
+        Ok(Self {
+            hal: hal::create_falcon_hal(chipset)?,
+        })
+    }
+
+    /// Wait for memory scrubbing to complete.
+    fn reset_wait_mem_scrubbing(&self, bar: &Devres<Bar0>) -> Result<()> {
+        util::wait_on(Duration::from_millis(20), || {
+            bar.try_access_with(|b| regs::NV_PFALCON_FALCON_HWCFG2::read(b, E::BASE))
+                .and_then(|r| if r.mem_scrubbing() { Some(()) } else { None })
+        })
+    }
+
+    /// Reset the falcon engine.
+    fn reset_eng(&self, bar: &Devres<Bar0>) -> Result<()> {
+        let _ = with_bar!(bar, |b| regs::NV_PFALCON_FALCON_HWCFG2::read(b, E::BASE))?;
+
+        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
+        // RESET_READY so a non-failing timeout is used.
+        let _ = util::wait_on(Duration::from_micros(150), || {
+            bar.try_access_with(|b| regs::NV_PFALCON_FALCON_HWCFG2::read(b, E::BASE))
+                .and_then(|r| if r.reset_ready() { Some(()) } else { None })
+        });
+
+        with_bar!(bar, |b| regs::NV_PFALCON_FALCON_ENGINE::alter(
+            b,
+            E::BASE,
+            |v| v.set_reset(true)
+        ))?;
+
+        let _: Result<()> = util::wait_on(Duration::from_micros(10), || None);
+
+        with_bar!(bar, |b| regs::NV_PFALCON_FALCON_ENGINE::alter(
+            b,
+            E::BASE,
+            |v| v.set_reset(false)
+        ))?;
+
+        self.reset_wait_mem_scrubbing(bar)?;
+
+        Ok(())
+    }
+
+    /// Reset the controller, select the falcon core, and wait for memory scrubbing to complete.
+    pub(crate) fn reset(&self, bar: &Devres<Bar0>) -> Result<()> {
+        self.reset_eng(bar)?;
+        self.hal.select_core(bar)?;
+        self.reset_wait_mem_scrubbing(bar)?;
+
+        with_bar!(bar, |b| {
+            regs::NV_PFALCON_FALCON_RM::default()
+                .set_value(regs::NV_PMC_BOOT_0::read(b).into())
+                .write(b, E::BASE)
+        })
+    }
+
+    /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
+    /// `target_mem`.
+    ///
+    /// `sec` is set if the loaded firmware is expected to run in secure mode.
+    fn dma_wr(
+        &self,
+        bar: &Devres<Bar0>,
+        dma_handle: bindings::dma_addr_t,
+        target_mem: FalconMem,
+        load_offsets: FalconLoadTarget,
+        sec: bool,
+    ) -> Result<()> {
+        const DMA_LEN: u32 = 256;
+
+        // For IMEM, we want to use the start offset as a virtual address tag for each page, since
+        // code addresses in the firmware (and the boot vector) are virtual.
+        //
+        // For DMEM we can fold the start offset into the DMA handle.
+        let (src_start, dma_start) = match target_mem {
+            FalconMem::Imem => (load_offsets.src_start, dma_handle),
+            FalconMem::Dmem => (
+                0,
+                dma_handle + load_offsets.src_start as bindings::dma_addr_t,
+            ),
+        };
+        if dma_start % DMA_LEN as bindings::dma_addr_t > 0 {
+            dev_err!(
+                bar.as_ref(),
+                "DMA transfer start addresses must be a multiple of {}",
+                DMA_LEN
+            );
+            return Err(EINVAL);
+        }
+        if load_offsets.len % DMA_LEN > 0 {
+            dev_err!(
+                bar.as_ref(),
+                "DMA transfer length must be a multiple of {}",
+                DMA_LEN
+            );
+            return Err(EINVAL);
+        }
+
+        // Set up the base source DMA address.
+        with_bar!(bar, |b| {
+            regs::NV_PFALCON_FALCON_DMATRFBASE::default()
+                .set_base((dma_start >> 8) as u32)
+                .write(b, E::BASE);
+            regs::NV_PFALCON_FALCON_DMATRFBASE1::default()
+                .set_base((dma_start >> 40) as u16)
+                .write(b, E::BASE)
+        })?;
+
+        let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
+            .set_size(DmaTrfCmdSize::Size256B)
+            .set_imem(target_mem == FalconMem::Imem)
+            .set_sec(if sec { 1 } else { 0 });
+
+        for pos in (0..load_offsets.len).step_by(DMA_LEN as usize) {
+            // Perform a transfer of size `DMA_LEN`.
+            with_bar!(bar, |b| {
+                regs::NV_PFALCON_FALCON_DMATRFMOFFS::default()
+                    .set_offs(load_offsets.dst_start + pos)
+                    .write(b, E::BASE);
+                regs::NV_PFALCON_FALCON_DMATRFFBOFFS::default()
+                    .set_offs(src_start + pos)
+                    .write(b, E::BASE);
+                cmd.write(b, E::BASE)
+            })?;
+
+            // Wait for the transfer to complete.
+            util::wait_on(Duration::from_millis(2000), || {
+                bar.try_access_with(|b| regs::NV_PFALCON_FALCON_DMATRFCMD::read(b, E::BASE))
+                    .and_then(|v| if v.idle() { Some(()) } else { None })
+            })?;
+        }
+
+        Ok(())
+    }
+
+    /// Perform a DMA load into `IMEM` and `DMEM` of `fw`, and prepare the falcon to run it.
+    pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(
+        &self,
+        bar: &Devres<Bar0>,
+        fw: &F,
+    ) -> Result<()> {
+        let dma_handle = fw.dma_handle();
+
+        with_bar!(bar, |b| {
+            regs::NV_PFALCON_FBIF_CTL::alter(b, E::BASE, |v| v.set_allow_phys_no_ctx(true));
+            regs::NV_PFALCON_FALCON_DMACTL::default().write(b, E::BASE);
+            regs::NV_PFALCON_FBIF_TRANSCFG::alter(b, E::BASE, |v| {
+                v.set_target(FalconFbifTarget::CoherentSysmem)
+                    .set_mem_type(FalconFbifMemType::Physical)
+            });
+        })?;
+
+        self.dma_wr(bar, dma_handle, FalconMem::Imem, fw.imem_load(), true)?;
+        self.dma_wr(bar, dma_handle, FalconMem::Dmem, fw.dmem_load(), true)?;
+
+        self.hal.program_brom(bar, &fw.brom_params())?;
+
+        with_bar!(bar, |b| {
+            // Set `BootVec` to start of non-secure code.
+            regs::NV_PFALCON_FALCON_BOOTVEC::default()
+                .set_value(fw.boot_addr())
+                .write(b, E::BASE);
+        })?;
+
+        Ok(())
+    }
+
+    /// Start running the loaded firmware.
+    ///
+    /// `mbox0` and `mbox1` are optional parameters to write into the `MBOX0` and `MBOX1` registers
+    /// prior to running.
+    ///
+    /// Returns `MBOX0` and `MBOX1` after the firmware has stopped running.
+    pub(crate) fn boot(
+        &self,
+        bar: &Devres<Bar0>,
+        mbox0: Option<u32>,
+        mbox1: Option<u32>,
+    ) -> Result<(u32, u32)> {
+        with_bar!(bar, |b| {
+            if let Some(mbox0) = mbox0 {
+                regs::NV_PFALCON_FALCON_MAILBOX0::default()
+                    .set_value(mbox0)
+                    .write(b, E::BASE);
+            }
+
+            if let Some(mbox1) = mbox1 {
+                regs::NV_PFALCON_FALCON_MAILBOX1::default()
+                    .set_value(mbox1)
+                    .write(b, E::BASE);
+            }
+
+            match regs::NV_PFALCON_FALCON_CPUCTL::read(b, E::BASE).alias_en() {
+                true => regs::NV_PFALCON_FALCON_CPUCTL_ALIAS::default()
+                    .set_startcpu(true)
+                    .write(b, E::BASE),
+                false => regs::NV_PFALCON_FALCON_CPUCTL::default()
+                    .set_startcpu(true)
+                    .write(b, E::BASE),
+            }
+        })?;
+
+        util::wait_on(Duration::from_secs(2), || {
+            bar.try_access()
+                .map(|b| regs::NV_PFALCON_FALCON_CPUCTL::read(&*b, E::BASE))
+                .and_then(|v| if v.halted() { Some(()) } else { None })
+        })?;
+
+        let (mbox0, mbox1) = with_bar!(bar, |b| {
+            let mbox0 = regs::NV_PFALCON_FALCON_MAILBOX0::read(b, E::BASE).value();
+            let mbox1 = regs::NV_PFALCON_FALCON_MAILBOX1::read(b, E::BASE).value();
+
+            (mbox0, mbox1)
+        })?;
+
+        Ok((mbox0, mbox1))
+    }
+
+    /// Returns the fused version of the signature to use in order to run a HS firmware on this
+    /// falcon instance. `engine_id_mask` and `ucode_id` are obtained from the firmware header.
+    pub(crate) fn get_signature_reg_fuse_version(
+        &self,
+        bar: &Devres<Bar0>,
+        engine_id_mask: u16,
+        ucode_id: u8,
+    ) -> Result<u32> {
+        self.hal
+            .get_signature_reg_fuse_version(bar, engine_id_mask, ucode_id)
+    }
+}
diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..008e672315df5def72caf930403ef7487c46b525
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/gsp.rs
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::{
+    driver::Bar0,
+    falcon::{Falcon, FalconEngine},
+    regs,
+};
+
+pub(crate) struct Gsp;
+impl FalconEngine for Gsp {
+    const BASE: usize = 0x00110000;
+}
+
+impl Falcon<Gsp> {
+    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
+    /// allow GSP to signal CPU for processing new messages in message queue.
+    pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
+        with_bar!(bar, |b| regs::NV_PFALCON_FALCON_IRQSCLR::default()
+            .set_swgen0(true)
+            .write(b, Gsp::BASE))
+    }
+}
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
new file mode 100644
index 0000000000000000000000000000000000000000..0d21a6fc52786d99f3e71e5d72f4dcd0be7d4b38
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+use kernel::sync::Arc;
+
+use crate::driver::Bar0;
+use crate::falcon::{FalconBromParams, FalconEngine};
+use crate::gpu::Chipset;
+
+mod ga102;
+
+/// Hardware Abstraction Layer for Falcon cores.
+///
+/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
+/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
+/// registers.
+pub(crate) trait FalconHal<E: FalconEngine>: Sync {
+    // Activates the Falcon core if the engine is a risvc/falcon dual engine.
+    fn select_core(&self, _bar: &Devres<Bar0>) -> Result<()> {
+        Ok(())
+    }
+
+    /// Returns the fused version of the signature to use in order to run a HS firmware on this
+    /// falcon instance. `engine_id_mask` and `ucode_id` are obtained from the firmware header.
+    fn get_signature_reg_fuse_version(
+        &self,
+        bar: &Devres<Bar0>,
+        engine_id_mask: u16,
+        ucode_id: u8,
+    ) -> Result<u32>;
+
+    // Program the boot ROM registers prior to starting a secure firmware.
+    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
+}
+
+/// Returns a boxed falcon HAL adequate for the passed `chipset`.
+///
+/// We use this function and a heap-allocated trait object instead of statically defined trait
+/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
+/// requested HAL.
+///
+/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
+pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
+    chipset: Chipset,
+) -> Result<Arc<dyn FalconHal<E>>> {
+    let hal = match chipset {
+        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
+            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>
+        }
+        _ => return Err(ENOTSUPP),
+    };
+
+    Ok(hal)
+}
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
new file mode 100644
index 0000000000000000000000000000000000000000..d4177ad19a2589c12b1173a3e89c26ccc676142c
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::marker::PhantomData;
+use core::time::Duration;
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::falcon::{FalconBromParams, FalconEngine, FalconModSelAlgo, PeregrineCoreSelect};
+use crate::regs;
+use crate::util;
+
+use super::FalconHal;
+
+fn select_core_ga102<E: FalconEngine>(bar: &Devres<Bar0>) -> Result<()> {
+    let bcr_ctrl = with_bar!(bar, |b| regs::NV_PRISCV_RISCV_BCR_CTRL::read(b, E::BASE))?;
+    if bcr_ctrl.core_select() != PeregrineCoreSelect::Falcon {
+        with_bar!(bar, |b| regs::NV_PRISCV_RISCV_BCR_CTRL::default()
+            .set_core_select(PeregrineCoreSelect::Falcon)
+            .write(b, E::BASE))?;
+
+        util::wait_on(Duration::from_millis(10), || {
+            bar.try_access_with(|b| regs::NV_PRISCV_RISCV_BCR_CTRL::read(b, E::BASE))
+                .and_then(|v| if v.valid() { Some(()) } else { None })
+        })?;
+    }
+
+    Ok(())
+}
+
+fn get_signature_reg_fuse_version_ga102(
+    bar: &Devres<Bar0>,
+    engine_id_mask: u16,
+    ucode_id: u8,
+) -> Result<u32> {
+    // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
+    // registers, which are an array. Our register definition macros do not allow us to manage them
+    // properly, so we need to hardcode their addresses for now.
+
+    // Each engine has 16 ucode version registers numbered from 1 to 16.
+    if ucode_id == 0 || ucode_id > 16 {
+        dev_warn!(bar.as_ref(), "invalid ucode id {:#x}", ucode_id);
+        return Err(EINVAL);
+    }
+    let reg_fuse = if engine_id_mask & 0x0001 != 0 {
+        // NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION
+        0x824140
+    } else if engine_id_mask & 0x0004 != 0 {
+        // NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION
+        0x824100
+    } else if engine_id_mask & 0x0400 != 0 {
+        // NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION
+        0x8241c0
+    } else {
+        dev_warn!(
+            bar.as_ref(),
+            "unexpected engine_id_mask {:#x}",
+            engine_id_mask
+        );
+        return Err(EINVAL);
+    } + ((ucode_id - 1) as usize * core::mem::size_of::<u32>());
+
+    let reg_fuse_version = with_bar!(bar, |b| { b.read32(reg_fuse) })?;
+
+    // Equivalent of Find Last Set bit.
+    Ok(u32::BITS - reg_fuse_version.leading_zeros())
+}
+
+fn program_brom_ga102<E: FalconEngine>(
+    bar: &Devres<Bar0>,
+    params: &FalconBromParams,
+) -> Result<()> {
+    with_bar!(bar, |b| {
+        regs::NV_PFALCON2_FALCON_BROM_PARAADDR::default()
+            .set_value(params.pkc_data_offset)
+            .write(b, E::BASE);
+        regs::NV_PFALCON2_FALCON_BROM_ENGIDMASK::default()
+            .set_value(params.engine_id_mask as u32)
+            .write(b, E::BASE);
+        regs::NV_PFALCON2_FALCON_BROM_CURR_UCODE_ID::default()
+            .set_ucode_id(params.ucode_id)
+            .write(b, E::BASE);
+        regs::NV_PFALCON2_FALCON_MOD_SEL::default()
+            .set_algo(FalconModSelAlgo::Rsa3k)
+            .write(b, E::BASE)
+    })
+}
+
+pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
+
+impl<E: FalconEngine> Ga102<E> {
+    pub(super) fn new() -> Self {
+        Self(PhantomData)
+    }
+}
+
+impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
+    fn select_core(&self, bar: &Devres<Bar0>) -> Result<()> {
+        select_core_ga102::<E>(bar)
+    }
+
+    fn get_signature_reg_fuse_version(
+        &self,
+        bar: &Devres<Bar0>,
+        engine_id_mask: u16,
+        ucode_id: u8,
+    ) -> Result<u32> {
+        get_signature_reg_fuse_version_ga102(bar, engine_id_mask, ucode_id)
+    }
+
+    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()> {
+        program_brom_ga102::<E>(bar, params)
+    }
+}
diff --git a/drivers/gpu/nova-core/falcon/sec2.rs b/drivers/gpu/nova-core/falcon/sec2.rs
new file mode 100644
index 0000000000000000000000000000000000000000..c1efdaa7c4e1b8c04c4e041aae3b61a8b65f656b
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/sec2.rs
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use crate::falcon::FalconEngine;
+
+pub(crate) struct Sec2;
+impl FalconEngine for Sec2 {
+    const BASE: usize = 0x00840000;
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 08302967375274fd88bb9b4fef4969e79e82f3b4..48e4546da603c15c7426fbfd72677f23301f54d2 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -5,6 +5,7 @@
 use crate::devinit;
 use crate::dma::DmaObject;
 use crate::driver::Bar0;
+use crate::falcon::{gsp::Gsp, sec2::Sec2, Falcon};
 use crate::firmware::Firmware;
 use crate::regs;
 use crate::util;
@@ -228,6 +229,16 @@ pub(crate) fn new(
             page
         };
 
+        let gsp_falcon = Falcon::<Gsp>::new(
+            pdev.as_ref(),
+            spec.chipset,
+            &bar,
+            spec.chipset > Chipset::GA100,
+        )?;
+        gsp_falcon.clear_swgen0_intr(&bar)?;
+
+        let _sec2_falcon = Falcon::<Sec2>::new(pdev.as_ref(), spec.chipset, &bar, true)?;
+
         Ok(pin_init!(Self {
             spec,
             bar,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 37c7eb0ea7a926bee4e3c661028847291bf07fa2..f70b48a0a28f8c73b4f6e5223528138eda39f597 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -23,6 +23,7 @@ macro_rules! with_bar {
 mod devinit;
 mod dma;
 mod driver;
+mod falcon;
 mod firmware;
 mod gpu;
 mod regs;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 218cb6441eb0e5c6e5b52eabba006163eec0c8b4..b5c6eeb6ed873a06b4aefcb375f4944eb0b20597 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -9,6 +9,10 @@
 #[macro_use]
 mod macros;
 
+use crate::falcon::{
+    DmaTrfCmdSize, FalconCoreRev, FalconCoreRevSubversion, FalconFbifMemType, FalconFbifTarget,
+    FalconModSelAlgo, FalconSecurityModel, PeregrineCoreSelect,
+};
 use crate::gpu::{Architecture, Chipset};
 
 /* PMC */
@@ -59,3 +63,124 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
 register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05 @ 0x00118234 {
     31:0    value as u32;
 });
+
+/* PFALCON */
+
+register!(NV_PFALCON_FALCON_IRQSCLR @ +0x00000004 {
+    4:4     halt as bool;
+    6:6     swgen0 as bool;
+});
+
+register!(NV_PFALCON_FALCON_MAILBOX0 @ +0x00000040 {
+    31:0    value as u32;
+});
+
+register!(NV_PFALCON_FALCON_MAILBOX1 @ +0x00000044 {
+    31:0    value as u32;
+});
+
+register!(NV_PFALCON_FALCON_RM @ +0x00000084 {
+    31:0    value as u32;
+});
+
+register!(NV_PFALCON_FALCON_HWCFG2 @ +0x000000f4 {
+    10:10   riscv as bool;
+    12:12   mem_scrubbing as bool;
+    31:31   reset_ready as bool, "Signal indicating that reset is completed (GA102+)";
+});
+
+register!(NV_PFALCON_FALCON_CPUCTL @ +0x00000100 {
+    1:1     startcpu as bool;
+    4:4     halted as bool;
+    6:6     alias_en as bool;
+});
+
+register!(NV_PFALCON_FALCON_BOOTVEC @ +0x00000104 {
+    31:0    value as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMACTL @ +0x0000010c {
+    0:0     require_ctx as bool;
+    1:1     dmem_scrubbing as bool;
+    2:2     imem_scrubbing as bool;
+    6:3     dmaq_num as u8;
+    7:7     secure_stat as bool;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFBASE @ +0x00000110 {
+    31:0    base as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFMOFFS @ +0x00000114 {
+    23:0    offs as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFCMD @ +0x00000118 {
+    0:0     full as bool;
+    1:1     idle as bool;
+    3:2     sec as u8;
+    4:4     imem as bool;
+    5:5     is_write as bool;
+    10:8    size as u8 ?=> DmaTrfCmdSize;
+    14:12   ctxdma as u8;
+    16:16   set_dmtag as u8;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFFBOFFS @ +0x0000011c {
+    31:0    offs as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFBASE1 @ +0x00000128 {
+    8:0     base as u16;
+});
+
+register!(NV_PFALCON_FALCON_HWCFG1 @ +0x0000012c {
+    3:0     core_rev as u8 ?=> FalconCoreRev, "Core revision";
+    5:4     security_model as u8 ?=> FalconSecurityModel, "Security model";
+    7:6     core_rev_subversion as u8 ?=> FalconCoreRevSubversion, "Core revision subversion";
+});
+
+register!(NV_PFALCON_FALCON_CPUCTL_ALIAS @ +0x00000130 {
+    1:1     startcpu as bool;
+});
+
+// Actually known as `NV_PSEC_FALCON_ENGINE` and `NV_PGSP_FALCON_ENGINE` depending on the falcon
+// instance.
+register!(NV_PFALCON_FALCON_ENGINE @ +0x000003c0 {
+    0:0     reset as bool;
+});
+
+// TODO: this is an array of registers.
+register!(NV_PFALCON_FBIF_TRANSCFG @ +0x00000600 {
+    1:0     target as u8 ?=> FalconFbifTarget;
+    2:2     mem_type as bool => FalconFbifMemType;
+});
+
+register!(NV_PFALCON_FBIF_CTL @ +0x00000624 {
+    7:7     allow_phys_no_ctx as bool;
+});
+
+register!(NV_PFALCON2_FALCON_MOD_SEL @ +0x00001180 {
+    7:0     algo as u8 ?=> FalconModSelAlgo;
+});
+
+register!(NV_PFALCON2_FALCON_BROM_CURR_UCODE_ID @ +0x00001198 {
+    7:0    ucode_id as u8;
+});
+
+register!(NV_PFALCON2_FALCON_BROM_ENGIDMASK @ +0x0000119c {
+    31:0    value as u32;
+});
+
+// TODO: this is an array of registers.
+register!(NV_PFALCON2_FALCON_BROM_PARAADDR @ +0x00001210 {
+    31:0    value as u32;
+});
+
+/* PRISCV */
+
+register!(NV_PRISCV_RISCV_BCR_CTRL @ +0x00001668 {
+    0:0     valid as bool;
+    4:4     core_select as bool => PeregrineCoreSelect;
+    8:8     br_fetch as bool;
+});
diff --git a/drivers/gpu/nova-core/util.rs b/drivers/gpu/nova-core/util.rs
index afb525228431a2645afe7bb34988e9537757b1d7..81fcfff1f6f437d2f6a2130ce2249fbf4c1501be 100644
--- a/drivers/gpu/nova-core/util.rs
+++ b/drivers/gpu/nova-core/util.rs
@@ -34,7 +34,6 @@ pub(crate) const fn const_bytes_to_str(bytes: &[u8]) -> &str {
 ///
 /// TODO: replace with `read_poll_timeout` once it is available.
 /// (https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/)
-#[expect(dead_code)]
 pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(timeout: Duration, cond: F) -> Result<R> {
     let start_time = Ktime::ktime_get();
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 16/21] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (14 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize Alexandre Courbot
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

FWSEC-FRTS is the first firmware we need to run on the GSP falcon in
order to initiate the GSP boot process. Introduce the structure that
describes it.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 43 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index cb79d039948858e657c9a23a62ed27ff780ac169..1eb216307cd01d975b3d5beda1dc516f34b4b3f2 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -45,6 +45,49 @@ pub(crate) fn new(
     }
 }
 
+/// Structure used to describe some firmwares, notably FWSEC-FRTS.
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct FalconUCodeDescV3 {
+    /// Header defined by `NV_BIT_FALCON_UCODE_DESC_HEADER_VDESC*` in OpenRM.
+    ///
+    /// Bits `31:16` contain the size of the header, after which the actual ucode data starts.
+    hdr: u32,
+    /// Stored size of the ucode after the header.
+    stored_size: u32,
+    /// Offset in `DMEM` at which the signature is expected to be found.
+    pub(crate) pkc_data_offset: u32,
+    /// Offset after the code segment at which the app headers are located.
+    pub(crate) interface_offset: u32,
+    /// Base address at which to load the code segment into `IMEM`.
+    pub(crate) imem_phys_base: u32,
+    /// Size in bytes of the code to copy into `IMEM`.
+    pub(crate) imem_load_size: u32,
+    /// Virtual `IMEM` address (i.e. `tag`) at which the code should start.
+    pub(crate) imem_virt_base: u32,
+    /// Base address at which to load the data segment into `DMEM`.
+    pub(crate) dmem_phys_base: u32,
+    /// Size in bytes of the data to copy into `DMEM`.
+    pub(crate) dmem_load_size: u32,
+    /// Mask of the falcon engines on which this firmware can run.
+    pub(crate) engine_id_mask: u16,
+    /// ID of the ucode used to infer a fuse register to validate the signature.
+    pub(crate) ucode_id: u8,
+    /// Number of signatures in this firmware.
+    pub(crate) signature_count: u8,
+    /// Versions of the signatures, used to infer a valid signature to use.
+    pub(crate) signature_versions: u16,
+    _reserved: u16,
+}
+
+// To be removed once that code is used.
+#[expect(dead_code)]
+impl FalconUCodeDescV3 {
+    pub(crate) fn size(&self) -> usize {
+        ((self.hdr & 0xffff0000) >> 16) as usize
+    }
+}
+
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
 impl<const N: usize> ModInfoBuilder<N> {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (15 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 16/21] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 15:19   ` Timur Tabi
  2025-05-02  4:57   ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 18/21] nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
                   ` (3 subsequent siblings)
  20 siblings, 2 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

From: Joel Fernandes <joelagnelf@nvidia.com>

This will be used in the nova-core driver where we need to upward-align
the image size to get to the next image in the VBIOS ROM.

[acourbot@nvidia.com: handled conflicts due to removal of patch creating
num.rs]

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/num.rs | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index ab0286857061d2de1be0279cbd2cd3490e5a48c3..be75b196aa7a29cf3eed7c902ed8fb98689bbb50 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -67,6 +67,7 @@
 pub mod miscdevice;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod num;
 pub mod of;
 pub mod page;
 #[cfg(CONFIG_PCI)]
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index 0000000000000000000000000000000000000000..953c6ab012601efb3c9387b4b299e22233670c4b
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+/// A trait providing alignment operations for `usize`.
+pub trait UsizeAlign {
+    /// Aligns `self` upwards to the nearest multiple of `align`.
+    fn align_up(self, align: usize) -> usize;
+}
+
+impl UsizeAlign for usize {
+    fn align_up(mut self, align: usize) -> usize {
+        self = (self + align - 1) & !(align - 1);
+        self
+    }
+}
+
+/// Aligns `val` upwards to the nearest multiple of `align`.
+pub const fn usize_align_up(val: usize, align: usize) -> usize {
+    (val + align - 1) & !(align - 1)
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 18/21] nova-core: Add support for VBIOS ucode extraction for boot
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (16 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 19/21] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot, Shirish Baskaran

From: Joel Fernandes <joelagnelf@nvidia.com>

Add support for navigating and setting up vBIOS ucode data required for
GSP to boot. The main data extracted from the vBIOS is the FWSEC-FRTS
firmware which runs on the GSP processor. This firmware runs in high
secure mode, and sets up the WPR2 (Write protected region) before the
Booter runs on the SEC2 processor.

Also add log messages to show the BIOS images.

[102141.013287] NovaCore: Found BIOS image at offset 0x0, size: 0xfe00, type: PciAt
[102141.080692] NovaCore: Found BIOS image at offset 0xfe00, size: 0x14800, type: Efi
[102141.098443] NovaCore: Found BIOS image at offset 0x24600, size: 0x5600, type: FwSec
[102141.415095] NovaCore: Found BIOS image at offset 0x29c00, size: 0x60800, type: FwSec

Tested on my Ampere GA102 and boot is successful.

[applied changes by Alex Courbot for fwsec signatures]
[applied feedback from Alex Courbot and Timur Tabi]
[applied changes related to code reorg, prints etc from Danilo Krummrich]
[acourbot@nvidia.com: fix clippy warnings]

Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shirish Baskaran <sbaskaran@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs  |    2 -
 drivers/gpu/nova-core/gpu.rs       |    3 +
 drivers/gpu/nova-core/nova_core.rs |    1 +
 drivers/gpu/nova-core/vbios.rs     | 1152 ++++++++++++++++++++++++++++++++++++
 4 files changed, 1156 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 1eb216307cd01d975b3d5beda1dc516f34b4b3f2..960982174d834c7c66a47ecfb3a15bf47116b2c5 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -80,8 +80,6 @@ pub(crate) struct FalconUCodeDescV3 {
     _reserved: u16,
 }
 
-// To be removed once that code is used.
-#[expect(dead_code)]
 impl FalconUCodeDescV3 {
     pub(crate) fn size(&self) -> usize {
         ((self.hdr & 0xffff0000) >> 16) as usize
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 48e4546da603c15c7426fbfd72677f23301f54d2..2adef119618b05ef7e397e10dfeda1228f4521c2 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -9,6 +9,7 @@
 use crate::firmware::Firmware;
 use crate::regs;
 use crate::util;
+use crate::vbios::Vbios;
 use core::fmt;
 
 macro_rules! define_chipset {
@@ -239,6 +240,8 @@ pub(crate) fn new(
 
         let _sec2_falcon = Falcon::<Sec2>::new(pdev.as_ref(), spec.chipset, &bar, true)?;
 
+        let _bios = Vbios::new(pdev, &bar)?;
+
         Ok(pin_init!(Self {
             spec,
             bar,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index f70b48a0a28f8c73b4f6e5223528138eda39f597..dc5da9409477c62b1d466ed00c17baf3677a6a53 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -28,6 +28,7 @@ macro_rules! with_bar {
 mod gpu;
 mod regs;
 mod util;
+mod vbios;
 
 kernel::module_pci_driver! {
     type: driver::NovaCore,
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
new file mode 100644
index 0000000000000000000000000000000000000000..719752fc54f1ec4b8dcc9a3c044a2925a187dd2a
--- /dev/null
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -0,0 +1,1152 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! VBIOS extraction and parsing.
+
+// To be removed when all code is used.
+#![expect(dead_code)]
+
+use crate::driver::Bar0;
+use crate::firmware::FalconUCodeDescV3;
+use core::convert::TryFrom;
+use kernel::device;
+use kernel::devres::Devres;
+use kernel::error::Result;
+use kernel::num::UsizeAlign;
+use kernel::pci;
+use kernel::prelude::*;
+
+/// The offset of the VBIOS ROM in the BAR0 space.
+const ROM_OFFSET: usize = 0x300000;
+/// The maximum length of the VBIOS ROM to scan into.
+const BIOS_MAX_SCAN_LEN: usize = 0x100000;
+/// The size to read ahead when parsing initial BIOS image headers.
+const BIOS_READ_AHEAD_SIZE: usize = 1024;
+/// The bit in the last image indicator byte for the PCI Data Structure that
+/// indicates the last image. Bit 0-6 are reserved, bit 7 is last image bit.
+const LAST_IMAGE_BIT_MASK: u8 = 0x80;
+
+// PMU lookup table entry types. Used to locate PMU table entries
+// in the Fwsec image, corresponding to falcon ucodes.
+#[expect(dead_code)]
+const FALCON_UCODE_ENTRY_APPID_FIRMWARE_SEC_LIC: u8 = 0x05;
+#[expect(dead_code)]
+const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45;
+const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85;
+
+/// Vbios Reader for constructing the VBIOS data
+struct VbiosIterator<'a> {
+    pdev: &'a pci::Device,
+    bar0: &'a Devres<Bar0>,
+    // VBIOS data vector: As BIOS images are scanned, they are added to this vector
+    // for reference or copying into other data structures. It is the entire
+    // scanned contents of the VBIOS which progressively extends. It is used
+    // so that we do not re-read any contents that are already read as we use
+    // the cumulative length read so far, and re-read any gaps as we extend
+    // the length.
+    data: KVec<u8>,
+    current_offset: usize, // Current offset for iterator
+    last_found: bool,      // Whether the last image has been found
+}
+
+impl<'a> VbiosIterator<'a> {
+    fn new(pdev: &'a pci::Device, bar0: &'a Devres<Bar0>) -> Result<Self> {
+        Ok(Self {
+            pdev,
+            bar0,
+            data: KVec::new(),
+            current_offset: 0,
+            last_found: false,
+        })
+    }
+
+    /// Read bytes from the ROM at the current end of the data vector
+    fn read_more(&mut self, len: usize) -> Result {
+        let current_len = self.data.len();
+        let start = ROM_OFFSET + current_len;
+
+        // Ensure length is a multiple of 4 for 32-bit reads
+        if len % core::mem::size_of::<u32>() != 0 {
+            dev_err!(
+                self.pdev.as_ref(),
+                "VBIOS read length {} is not a multiple of 4\n",
+                len
+            );
+            return Err(EINVAL);
+        }
+
+        self.data.reserve(len, GFP_KERNEL)?;
+        with_bar!(? self.bar0, |bar0_ref| {
+            // Read ROM data bytes and push directly to vector
+            for i in 0..len {
+                // Read 32-bit word from the VBIOS ROM
+                let rom_addr = start + i * core::mem::size_of::<u32>();
+                let word = bar0_ref.try_read32(rom_addr)?;
+
+                // Convert the u32 to a 4 byte array and push each byte
+                word.to_ne_bytes()
+                    .iter()
+                    .try_for_each(|&b| self.data.push(b, GFP_KERNEL))?;
+            }
+            Ok(())
+        })?;
+
+        Ok(())
+    }
+
+    /// Read bytes at a specific offset, filling any gap
+    fn read_more_at_offset(&mut self, offset: usize, len: usize) -> Result {
+        if offset > BIOS_MAX_SCAN_LEN {
+            dev_err!(self.pdev.as_ref(), "Error: exceeded BIOS scan limit.\n");
+            return Err(EINVAL);
+        }
+
+        // If offset is beyond current data size, fill the gap first
+        let current_len = self.data.len();
+        let gap_bytes = offset.saturating_sub(current_len);
+
+        // Now read the requested bytes at the offset
+        self.read_more(gap_bytes + len)
+    }
+
+    /// Read a BIOS image at a specific offset and create a BiosImage from it.
+    /// self.data is extended as needed and a new BiosImage is returned.
+    /// @context is a string describing the operation for error reporting
+    fn read_bios_image_at_offset(
+        &mut self,
+        offset: usize,
+        len: usize,
+        context: &str,
+    ) -> Result<BiosImage> {
+        let data_len = self.data.len();
+        if offset + len > data_len {
+            self.read_more_at_offset(offset, len).inspect_err(|e| {
+                dev_err!(
+                    self.pdev.as_ref(),
+                    "Failed to read more at offset {:#x}: {:?}\n",
+                    offset,
+                    e
+                )
+            })?;
+        }
+
+        BiosImage::new(self.pdev, &self.data[offset..offset + len]).inspect_err(|err| {
+            dev_err!(
+                self.pdev.as_ref(),
+                "Failed to {} at offset {:#x}: {:?}\n",
+                context,
+                offset,
+                err
+            )
+        })
+    }
+}
+
+impl<'a> Iterator for VbiosIterator<'a> {
+    type Item = Result<BiosImage>;
+
+    /// Iterate over all VBIOS images until the last image is detected or offset
+    /// exceeds scan limit.
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.last_found {
+            return None;
+        }
+
+        if self.current_offset > BIOS_MAX_SCAN_LEN {
+            dev_err!(
+                self.pdev.as_ref(),
+                "Error: exceeded BIOS scan limit, stopping scan\n"
+            );
+            return None;
+        }
+
+        // Parse image headers first to get image size
+        let image_size = match self
+            .read_bios_image_at_offset(
+                self.current_offset,
+                BIOS_READ_AHEAD_SIZE,
+                "parse initial BIOS image headers",
+            )
+            .and_then(|image| image.image_size_bytes())
+        {
+            Ok(size) => size,
+            Err(e) => return Some(Err(e)),
+        };
+
+        // Now create a new BiosImage with the full image data
+        let full_image = match self.read_bios_image_at_offset(
+            self.current_offset,
+            image_size,
+            "parse full BIOS image",
+        ) {
+            Ok(image) => image,
+            Err(e) => return Some(Err(e)),
+        };
+
+        self.last_found = full_image.is_last();
+
+        // Advance to next image (aligned to 512 bytes)
+        self.current_offset += image_size;
+        self.current_offset.align_up(512);
+
+        Some(Ok(full_image))
+    }
+}
+
+pub(crate) struct Vbios {
+    pub fwsec_image: Option<FwSecBiosImage>,
+}
+
+impl Vbios {
+    /// Probe for VBIOS extraction
+    /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
+    pub(crate) fn new(pdev: &pci::Device, bar0: &Devres<Bar0>) -> Result<Vbios> {
+        // Images to extract from iteration
+        let mut pci_at_image: Option<PciAtBiosImage> = None;
+        let mut first_fwsec_image: Option<FwSecBiosImage> = None;
+        let mut second_fwsec_image: Option<FwSecBiosImage> = None;
+
+        // Parse all VBIOS images in the ROM
+        for image_result in VbiosIterator::new(pdev, bar0)? {
+            let full_image = image_result?;
+
+            dev_info!(
+                pdev.as_ref(),
+                "Found BIOS image: size: {:#x}, type: {}, last: {}\n",
+                full_image.image_size_bytes()?,
+                full_image.image_type_str(),
+                full_image.is_last()
+            );
+
+            // Get references to images we will need after the loop, in order to
+            // setup the falcon data offset.
+            match full_image {
+                BiosImage::PciAt(image) => {
+                    pci_at_image = Some(image);
+                }
+                BiosImage::FwSec(image) => {
+                    if first_fwsec_image.is_none() {
+                        first_fwsec_image = Some(image);
+                    } else {
+                        second_fwsec_image = Some(image);
+                    }
+                }
+                // For now we don't need to handle these
+                BiosImage::Efi(_image) => {}
+                BiosImage::Nbsi(_image) => {}
+            }
+        }
+
+        // Using all the images, setup the falcon data pointer in Fwsec.
+        // We need mutable access here, so we handle the Option manually.
+        let final_fwsec_image = {
+            let mut second = second_fwsec_image; // Take ownership of the option
+
+            if let (Some(second), Some(first), Some(pci_at)) =
+                (second.as_mut(), first_fwsec_image, pci_at_image)
+            {
+                second
+                    .setup_falcon_data(pdev, &pci_at, &first)
+                    .inspect_err(|e| {
+                        dev_err!(pdev.as_ref(), "Falcon data setup failed: {:?}\n", e)
+                    })?;
+            } else {
+                dev_err!(
+                    pdev.as_ref(),
+                    "Missing required images for falcon data setup, skipping\n"
+                );
+                return Err(EINVAL);
+            }
+            second
+        };
+
+        Ok(Vbios {
+            fwsec_image: final_fwsec_image,
+        })
+    }
+
+    pub(crate) fn fwsec_header(&self, pdev: &device::Device) -> Result<&FalconUCodeDescV3> {
+        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
+        image.fwsec_header(pdev)
+    }
+
+    pub(crate) fn fwsec_ucode(&self, pdev: &device::Device) -> Result<&[u8]> {
+        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
+        image.fwsec_ucode(pdev, image.fwsec_header(pdev)?)
+    }
+
+    pub(crate) fn fwsec_sigs(&self, pdev: &device::Device) -> Result<&[u8]> {
+        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
+        image.fwsec_sigs(pdev, image.fwsec_header(pdev)?)
+    }
+}
+
+/// PCI Data Structure as defined in PCI Firmware Specification
+#[derive(Debug, Clone)]
+#[repr(C)]
+struct PcirStruct {
+    /// PCI Data Structure signature ("PCIR" or "NPDS")
+    pub signature: [u8; 4],
+    /// PCI Vendor ID (e.g., 0x10DE for NVIDIA)
+    pub vendor_id: u16,
+    /// PCI Device ID
+    pub device_id: u16,
+    /// Device List Pointer
+    pub device_list_ptr: u16,
+    /// PCI Data Structure Length
+    pub pci_data_struct_len: u16,
+    /// PCI Data Structure Revision
+    pub pci_data_struct_rev: u8,
+    /// Class code (3 bytes, 0x03 for display controller)
+    pub class_code: [u8; 3],
+    /// Size of this image in 512-byte blocks
+    pub image_len: u16,
+    /// Revision Level of the Vendor's ROM
+    pub vendor_rom_rev: u16,
+    /// ROM image type (0x00 = PC-AT compatible, 0x03 = EFI, 0x70 = NBSI)
+    pub code_type: u8,
+    /// Last image indicator (0x00 = Not last image, 0x80 = Last image)
+    pub last_image: u8,
+    /// Maximum Run-time Image Length (units of 512 bytes)
+    pub max_runtime_image_len: u16,
+}
+
+impl PcirStruct {
+    fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+        if data.len() < core::mem::size_of::<PcirStruct>() {
+            dev_err!(pdev.as_ref(), "Not enough data for PcirStruct\n");
+            return Err(EINVAL);
+        }
+
+        let mut signature = [0u8; 4];
+        signature.copy_from_slice(&data[0..4]);
+
+        // Signature should be "PCIR" (0x52494350) or "NPDS" (0x5344504e)
+        if &signature != b"PCIR" && &signature != b"NPDS" {
+            dev_err!(
+                pdev.as_ref(),
+                "Invalid signature for PcirStruct: {:?}\n",
+                signature
+            );
+            return Err(EINVAL);
+        }
+
+        let mut class_code = [0u8; 3];
+        class_code.copy_from_slice(&data[13..16]);
+
+        Ok(PcirStruct {
+            signature,
+            vendor_id: u16::from_le_bytes([data[4], data[5]]),
+            device_id: u16::from_le_bytes([data[6], data[7]]),
+            device_list_ptr: u16::from_le_bytes([data[8], data[9]]),
+            pci_data_struct_len: u16::from_le_bytes([data[10], data[11]]),
+            pci_data_struct_rev: data[12],
+            class_code,
+            image_len: u16::from_le_bytes([data[16], data[17]]),
+            vendor_rom_rev: u16::from_le_bytes([data[18], data[19]]),
+            code_type: data[20],
+            last_image: data[21],
+            max_runtime_image_len: u16::from_le_bytes([data[22], data[23]]),
+        })
+    }
+
+    /// Check if this is the last image in the ROM
+    fn is_last(&self) -> bool {
+        self.last_image & LAST_IMAGE_BIT_MASK != 0
+    }
+
+    /// Calculate image size in bytes
+    fn image_size_bytes(&self) -> Result<usize> {
+        if self.image_len > 0 {
+            // Image size is in 512-byte blocks
+            Ok(self.image_len as usize * 512)
+        } else {
+            Err(EINVAL)
+        }
+    }
+}
+
+/// BIOS Information Table (BIT) Header
+/// This is the head of the BIT table, that is used to locate the Falcon data.
+/// The BIT table (with its header) is in the PciAtBiosImage and the falcon data
+/// it is pointing to is in the FwSecBiosImage.
+#[derive(Debug, Clone, Copy)]
+#[expect(dead_code)]
+struct BitHeader {
+    /// 0h: BIT Header Identifier (BMP=0x7FFF/BIT=0xB8FF)
+    pub id: u16,
+    /// 2h: BIT Header Signature ("BIT\0")
+    pub signature: [u8; 4],
+    /// 6h: Binary Coded Decimal Version, ex: 0x0100 is 1.00.
+    pub bcd_version: u16,
+    /// 8h: Size of BIT Header (in bytes)
+    pub header_size: u8,
+    /// 9h: Size of BIT Tokens (in bytes)
+    pub token_size: u8,
+    /// 10h: Number of token entries that follow
+    pub token_entries: u8,
+    /// 11h: BIT Header Checksum
+    pub checksum: u8,
+}
+
+impl BitHeader {
+    fn new(data: &[u8]) -> Result<Self> {
+        if data.len() < 12 {
+            return Err(EINVAL);
+        }
+
+        let mut signature = [0u8; 4];
+        signature.copy_from_slice(&data[2..6]);
+
+        // Check header ID and signature
+        let id = u16::from_le_bytes([data[0], data[1]]);
+        if id != 0xB8FF || &signature != b"BIT\0" {
+            return Err(EINVAL);
+        }
+
+        Ok(BitHeader {
+            id,
+            signature,
+            bcd_version: u16::from_le_bytes([data[6], data[7]]),
+            header_size: data[8],
+            token_size: data[9],
+            token_entries: data[10],
+            checksum: data[11],
+        })
+    }
+}
+
+/// BIT Token Entry: Records in the BIT table followed by the BIT header
+#[derive(Debug, Clone, Copy)]
+#[expect(dead_code)]
+struct BitToken {
+    /// 00h: Token identifier
+    pub id: u8,
+    /// 01h: Version of the token data
+    pub data_version: u8,
+    /// 02h: Size of token data in bytes
+    pub data_size: u16,
+    /// 04h: Offset to the token data
+    pub data_offset: u16,
+}
+
+// Define the token ID for the Falcon data
+pub(in crate::vbios) const BIT_TOKEN_ID_FALCON_DATA: u8 = 0x70;
+
+impl BitToken {
+    /// Find a BIT token entry by BIT ID in a PciAtBiosImage
+    pub(in crate::vbios) fn from_id(image: &PciAtBiosImage, token_id: u8) -> Result<Self> {
+        let header = image.bit_header.as_ref().ok_or(EINVAL)?;
+
+        // Offset to the first token entry
+        let tokens_start = image.bit_offset.ok_or(EINVAL)? + header.header_size as usize;
+
+        for i in 0..header.token_entries as usize {
+            let entry_offset = tokens_start + (i * header.token_size as usize);
+
+            // Make sure we don't go out of bounds
+            if entry_offset + header.token_size as usize > image.base.data.len() {
+                return Err(EINVAL);
+            }
+
+            // Check if this token has the requested ID
+            if image.base.data[entry_offset] == token_id {
+                return Ok(BitToken {
+                    id: image.base.data[entry_offset],
+                    data_version: image.base.data[entry_offset + 1],
+                    data_size: u16::from_le_bytes([
+                        image.base.data[entry_offset + 2],
+                        image.base.data[entry_offset + 3],
+                    ]),
+                    data_offset: u16::from_le_bytes([
+                        image.base.data[entry_offset + 4],
+                        image.base.data[entry_offset + 5],
+                    ]),
+                });
+            }
+        }
+
+        // Token not found
+        Err(ENOENT)
+    }
+}
+
+/// PCI ROM Expansion Header as defined in PCI Firmware Specification.
+/// This is header is at the beginning of every image in the set of
+/// images in the ROM. It contains a pointer to the PCI Data Structure
+/// which describes the image.
+/// For "NBSI" images (NoteBook System Information), the ROM
+/// header deviates from the standard and contains an offset to the
+/// NBSI image however we do not yet parse that in this module and keep
+/// it for future reference.
+#[derive(Debug, Clone, Copy)]
+#[expect(dead_code)]
+struct PciRomHeader {
+    /// 00h: Signature (0xAA55)
+    pub signature: u16,
+    /// 02h: Reserved bytes for processor architecture unique data (20 bytes)
+    pub reserved: [u8; 20],
+    /// 16h: NBSI Data Offset (NBSI-specific, offset from header to NBSI image)
+    pub nbsi_data_offset: Option<u16>,
+    /// 18h: Pointer to PCI Data Structure (offset from start of ROM image)
+    pub pci_data_struct_offset: u16,
+    /// 1Ah: Size of block (this is NBSI-specific)
+    pub size_of_block: Option<u32>,
+}
+
+impl PciRomHeader {
+    fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+        if data.len() < 26 {
+            // Need at least 26 bytes to read pciDataStrucPtr and sizeOfBlock
+            return Err(EINVAL);
+        }
+
+        let signature = u16::from_le_bytes([data[0], data[1]]);
+
+        // Check for valid ROM signatures
+        match signature {
+            0xAA55 | 0xBB77 | 0x4E56 => {}
+            _ => {
+                dev_err!(pdev.as_ref(), "ROM signature unknown {:#x}\n", signature);
+                return Err(EINVAL);
+            }
+        }
+
+        // Read the pointer to the PCI Data Structure at offset 0x18
+        let pci_data_struct_ptr = u16::from_le_bytes([data[24], data[25]]);
+
+        // Try to read optional fields if enough data
+        let mut size_of_block = None;
+        let mut nbsi_data_offset = None;
+
+        if data.len() >= 30 {
+            // Read size_of_block at offset 0x1A
+            size_of_block = Some(
+                (data[29] as u32) << 24
+                    | (data[28] as u32) << 16
+                    | (data[27] as u32) << 8
+                    | (data[26] as u32),
+            );
+        }
+
+        // For NBSI images, try to read the nbsiDataOffset at offset 0x16
+        if data.len() >= 24 {
+            nbsi_data_offset = Some(u16::from_le_bytes([data[22], data[23]]));
+        }
+
+        Ok(PciRomHeader {
+            signature,
+            reserved: [0u8; 20],
+            pci_data_struct_offset: pci_data_struct_ptr,
+            size_of_block,
+            nbsi_data_offset,
+        })
+    }
+}
+
+/// NVIDIA PCI Data Extension Structure. This is similar to the
+/// PCI Data Structure, but is Nvidia-specific and is placed right after
+/// the PCI Data Structure. It contains some fields that are redundant
+/// with the PCI Data Structure, but are needed for traversing the
+/// BIOS images. It is expected to be present in all BIOS images except
+/// for NBSI images.
+#[derive(Debug, Clone)]
+#[expect(dead_code)]
+struct NpdeStruct {
+    /// 00h: Signature ("NPDE")
+    pub signature: [u8; 4],
+    /// 04h: NVIDIA PCI Data Extension Revision
+    pub npci_data_ext_rev: u16,
+    /// 06h: NVIDIA PCI Data Extension Length
+    pub npci_data_ext_len: u16,
+    /// 08h: Sub-image Length (in 512-byte units)
+    pub subimage_len: u16,
+    /// 0Ah: Last image indicator flag
+    pub last_image: u8,
+}
+
+impl NpdeStruct {
+    fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+        if data.len() < 11 {
+            dev_err!(pdev.as_ref(), "Not enough data for NpdeStruct\n");
+            return Err(EINVAL);
+        }
+
+        let mut signature = [0u8; 4];
+        signature.copy_from_slice(&data[0..4]);
+
+        // Signature should be "NPDE" (0x4544504E)
+        if &signature != b"NPDE" {
+            dev_err!(
+                pdev.as_ref(),
+                "Invalid signature for NpdeStruct: {:?}\n",
+                signature
+            );
+            return Err(EINVAL);
+        }
+
+        Ok(NpdeStruct {
+            signature,
+            npci_data_ext_rev: u16::from_le_bytes([data[4], data[5]]),
+            npci_data_ext_len: u16::from_le_bytes([data[6], data[7]]),
+            subimage_len: u16::from_le_bytes([data[8], data[9]]),
+            last_image: data[10],
+        })
+    }
+
+    /// Check if this is the last image in the ROM
+    fn is_last(&self) -> bool {
+        self.last_image & LAST_IMAGE_BIT_MASK != 0
+    }
+
+    /// Calculate image size in bytes
+    fn image_size_bytes(&self) -> Result<usize> {
+        if self.subimage_len > 0 {
+            // Image size is in 512-byte blocks
+            Ok(self.subimage_len as usize * 512)
+        } else {
+            Err(EINVAL)
+        }
+    }
+
+    /// Try to find NPDE in the data, the NPDE is right after the PCIR.
+    fn find_in_data(
+        pdev: &pci::Device,
+        data: &[u8],
+        rom_header: &PciRomHeader,
+        pcir: &PcirStruct,
+    ) -> Option<Self> {
+        // Calculate the offset where NPDE might be located
+        // NPDE should be right after the PCIR structure, aligned to 16 bytes
+        let pcir_offset = rom_header.pci_data_struct_offset as usize;
+        let npde_start = (pcir_offset + pcir.pci_data_struct_len as usize + 0x0F) & !0x0F;
+
+        // Check if we have enough data
+        if npde_start + 11 > data.len() {
+            dev_err!(pdev.as_ref(), "Not enough data for NPDE\n");
+            return None;
+        }
+
+        // Try to create NPDE from the data
+        NpdeStruct::new(pdev, &data[npde_start..])
+            .inspect_err(|e| {
+                dev_err!(pdev.as_ref(), "Error creating NpdeStruct: {:?}\n", e);
+            })
+            .ok()
+    }
+}
+
+// Use a macro to implement BiosImage enum and methods. This avoids having to
+// repeat each enum type when implementing functions like base() in BiosImage.
+macro_rules! bios_image {
+    (
+        $($variant:ident $class:ident),* $(,)?
+    ) => {
+        // BiosImage enum with variants for each image type
+        enum BiosImage {
+            $($variant($class)),*
+        }
+
+        impl BiosImage {
+            /// Get a reference to the common BIOS image data regardless of type
+            fn base(&self) -> &BiosImageBase {
+                match self {
+                    $(Self::$variant(img) => &img.base),*
+                }
+            }
+
+            /// Returns a string representing the type of BIOS image
+            fn image_type_str(&self) -> &'static str {
+                match self {
+                    $(Self::$variant(_) => stringify!($variant)),*
+                }
+            }
+        }
+    }
+}
+
+impl BiosImage {
+    /// Check if this is the last image
+    fn is_last(&self) -> bool {
+        let base = self.base();
+
+        // For NBSI images (type == 0x70), return true as they're
+        // considered the last image
+        if matches!(self, Self::Nbsi(_)) {
+            return true;
+        }
+
+        // For other image types, check the NPDE first if available
+        if let Some(ref npde) = base.npde {
+            return npde.is_last();
+        }
+
+        // Otherwise, fall back to checking the PCIR last_image flag
+        base.pcir.is_last()
+    }
+
+    /// Get the image size in bytes
+    fn image_size_bytes(&self) -> Result<usize> {
+        let base = self.base();
+
+        // Prefer NPDE image size if available
+        if let Some(ref npde) = base.npde {
+            return npde.image_size_bytes();
+        }
+
+        // Otherwise, fall back to the PCIR image size
+        base.pcir.image_size_bytes()
+    }
+
+    /// Create a BiosImageBase from a byte slice and convert it to a BiosImage
+    /// which triggers the constructor of the specific BiosImage enum variant.
+    fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+        let base = BiosImageBase::new(pdev, data)?;
+        let image = base.into_image().inspect_err(|e| {
+            dev_err!(pdev.as_ref(), "Failed to create BiosImage: {:?}\n", e);
+        })?;
+
+        image.image_size_bytes().inspect_err(|_| {
+            dev_err!(
+                pdev.as_ref(),
+                "Invalid image size computed during BiosImage creation\n"
+            )
+        })?;
+
+        Ok(image)
+    }
+}
+
+bios_image! {
+    PciAt PciAtBiosImage,   // PCI-AT compatible BIOS image
+    Efi EfiBiosImage,       // EFI (Extensible Firmware Interface)
+    Nbsi NbsiBiosImage,     // NBSI (Nvidia Bios System Interface)
+    FwSec FwSecBiosImage    // FWSEC (Firmware Security)
+}
+
+struct PciAtBiosImage {
+    base: BiosImageBase,
+    bit_header: Option<BitHeader>,
+    bit_offset: Option<usize>,
+}
+
+struct EfiBiosImage {
+    base: BiosImageBase,
+    // EFI-specific fields can be added here in the future.
+}
+
+struct NbsiBiosImage {
+    base: BiosImageBase,
+    // NBSI-specific fields can be added here in the future.
+}
+
+pub(crate) struct FwSecBiosImage {
+    base: BiosImageBase,
+    // FWSEC-specific fields
+    // The offset of the Falcon data from the start of Fwsec image
+    falcon_data_offset: Option<usize>,
+    // The PmuLookupTable starts at the offset of the falcon data pointer
+    pmu_lookup_table: Option<PmuLookupTable>,
+    // The offset of the Falcon ucode
+    falcon_ucode_offset: Option<usize>,
+}
+
+// Convert from BiosImageBase to BiosImage
+impl TryFrom<BiosImageBase> for BiosImage {
+    type Error = Error;
+
+    fn try_from(base: BiosImageBase) -> Result<Self> {
+        match base.pcir.code_type {
+            0x00 => Ok(BiosImage::PciAt(base.try_into()?)),
+            0x03 => Ok(BiosImage::Efi(EfiBiosImage { base })),
+            0x70 => Ok(BiosImage::Nbsi(NbsiBiosImage { base })),
+            0xE0 => Ok(BiosImage::FwSec(FwSecBiosImage {
+                base,
+                falcon_data_offset: None,
+                pmu_lookup_table: None,
+                falcon_ucode_offset: None,
+            })),
+            _ => Err(EINVAL),
+        }
+    }
+}
+
+/// BIOS Image structure containing various headers and references
+/// fields base to all BIOS images. Each BiosImage type has a
+/// BiosImageBase type along with other image-specific fields.
+/// Note that Rust favors composition of types over inheritance.
+#[derive(Debug)]
+#[expect(dead_code)]
+struct BiosImageBase {
+    /// PCI ROM Expansion Header
+    rom_header: PciRomHeader,
+    /// PCI Data Structure
+    pcir: PcirStruct,
+    /// NVIDIA PCI Data Extension (optional)
+    npde: Option<NpdeStruct>,
+    /// Image data (includes ROM header and PCIR)
+    data: KVec<u8>,
+}
+
+impl BiosImageBase {
+    fn into_image(self) -> Result<BiosImage> {
+        BiosImage::try_from(self)
+    }
+
+    /// Creates a new BiosImageBase from raw byte data.
+    fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+        // Ensure we have enough data for the ROM header
+        if data.len() < 26 {
+            dev_err!(pdev.as_ref(), "Not enough data for ROM header\n");
+            return Err(EINVAL);
+        }
+
+        // Parse the ROM header
+        let rom_header = PciRomHeader::new(pdev, &data[0..26])
+            .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to create PciRomHeader: {:?}\n", e))?;
+
+        // Get the PCI Data Structure using the pointer from the ROM header
+        let pcir_offset = rom_header.pci_data_struct_offset as usize;
+        let pcir_data = data
+            .get(pcir_offset..pcir_offset + core::mem::size_of::<PcirStruct>())
+            .ok_or(EINVAL)
+            .inspect_err(|_| {
+                dev_err!(
+                    pdev.as_ref(),
+                    "PCIR offset {:#x} out of bounds (data length: {})\n",
+                    pcir_offset,
+                    data.len()
+                );
+                dev_err!(
+                    pdev.as_ref(),
+                    "Consider reading more data for construction of BiosImage\n"
+                );
+            })?;
+
+        let pcir = PcirStruct::new(pdev, pcir_data)
+            .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to create PcirStruct: {:?}\n", e))?;
+
+        // Look for NPDE structure if this is not an NBSI image (type != 0x70)
+        let npde = NpdeStruct::find_in_data(pdev, data, &rom_header, &pcir);
+
+        // Create a copy of the data
+        let mut data_copy = KVec::new();
+        data_copy.extend_with(data.len(), 0, GFP_KERNEL)?;
+        data_copy.copy_from_slice(data);
+
+        Ok(BiosImageBase {
+            rom_header,
+            pcir,
+            npde,
+            data: data_copy,
+        })
+    }
+}
+
+/// The PciAt BIOS image is typically the first BIOS image type found in the
+/// BIOS image chain. It contains the BIT header and the BIT tokens.
+impl PciAtBiosImage {
+    /// Find a byte pattern in a slice
+    fn find_byte_pattern(haystack: &[u8], needle: &[u8]) -> Option<usize> {
+        haystack
+            .windows(needle.len())
+            .position(|window| window == needle)
+    }
+
+    /// Find the BIT header in the PciAtBiosImage
+    fn find_bit_header(data: &[u8]) -> Result<(BitHeader, usize)> {
+        let bit_pattern = [0xff, 0xb8, b'B', b'I', b'T', 0x00];
+        let bit_offset = Self::find_byte_pattern(data, &bit_pattern);
+        if bit_offset.is_none() {
+            return Err(EINVAL);
+        }
+
+        let bit_header = BitHeader::new(&data[bit_offset.ok_or(EINVAL)?..])?;
+        Ok((bit_header, bit_offset.ok_or(EINVAL)?))
+    }
+
+    /// Get a BIT token entry from the BIT table in the PciAtBiosImage
+    fn get_bit_token(&self, token_id: u8) -> Result<BitToken> {
+        BitToken::from_id(self, token_id)
+    }
+
+    /// Find the Falcon data pointer structure in the PciAtBiosImage
+    /// This is just a 4 byte structure that contains a pointer to the
+    /// Falcon data in the FWSEC image.
+    fn falcon_data_ptr(&self, pdev: &pci::Device) -> Result<u32> {
+        let token = self.get_bit_token(BIT_TOKEN_ID_FALCON_DATA)?;
+
+        // Make sure we don't go out of bounds
+        if token.data_offset as usize + 4 > self.base.data.len() {
+            return Err(EINVAL);
+        }
+
+        // read the 4 bytes at the offset specified in the token
+        let offset = token.data_offset as usize;
+        let bytes: [u8; 4] = self.base.data[offset..offset + 4].try_into().map_err(|_| {
+            dev_err!(pdev.as_ref(), "Failed to convert data slice to array");
+            EINVAL
+        })?;
+
+        let data_ptr = u32::from_le_bytes(bytes);
+
+        if (data_ptr as usize) < self.base.data.len() {
+            dev_err!(pdev.as_ref(), "Falcon data pointer out of bounds\n");
+            return Err(EINVAL);
+        }
+
+        Ok(data_ptr)
+    }
+}
+
+impl TryFrom<BiosImageBase> for PciAtBiosImage {
+    type Error = Error;
+
+    fn try_from(base: BiosImageBase) -> Result<Self> {
+        let data_slice = &base.data;
+        let (bit_header, bit_offset) = PciAtBiosImage::find_bit_header(data_slice)?;
+
+        Ok(PciAtBiosImage {
+            base,
+            bit_header: Some(bit_header),
+            bit_offset: Some(bit_offset),
+        })
+    }
+}
+
+/// The PmuLookupTableEntry structure is a single entry in the PmuLookupTable.
+/// See the PmuLookupTable description for more information.
+#[expect(dead_code)]
+struct PmuLookupTableEntry {
+    application_id: u8,
+    target_id: u8,
+    data: u32,
+}
+
+impl PmuLookupTableEntry {
+    fn new(data: &[u8]) -> Result<Self> {
+        if data.len() < 5 {
+            return Err(EINVAL);
+        }
+
+        Ok(PmuLookupTableEntry {
+            application_id: data[0],
+            target_id: data[1],
+            data: u32::from_le_bytes(data[2..6].try_into().map_err(|_| EINVAL)?),
+        })
+    }
+}
+
+/// The PmuLookupTableEntry structure is used to find the PmuLookupTableEntry
+/// for a given application ID. The table of entries is pointed to by the falcon
+/// data pointer in the BIT table, and is used to locate the Falcon Ucode.
+#[expect(dead_code)]
+struct PmuLookupTable {
+    version: u8,
+    header_len: u8,
+    entry_len: u8,
+    entry_count: u8,
+    table_data: KVec<u8>,
+}
+
+impl PmuLookupTable {
+    fn new(data: &[u8]) -> Result<Self> {
+        if data.len() < 4 {
+            return Err(EINVAL);
+        }
+
+        let header_len = data[1] as usize;
+        let entry_len = data[2] as usize;
+        let entry_count = data[3] as usize;
+
+        let required_bytes = header_len + (entry_count * entry_len);
+
+        if data.len() < required_bytes {
+            return Err(EINVAL);
+        }
+
+        // Create a copy of only the table data
+        let mut table_data = KVec::new();
+
+        // "last_entry_bytes" is a debugging aid.
+        let mut last_entry_bytes: Option<KVec<u8>> = if cfg!(debug_assertions) {
+            Some(KVec::new())
+        } else {
+            None
+        };
+
+        for &byte in &data[header_len..required_bytes] {
+            table_data.push(byte, GFP_KERNEL)?;
+
+            if cfg!(debug_assertions) {
+                // Debugging (dumps the table data to dmesg):
+                if let Some(ref mut last_entry_bytes) = last_entry_bytes {
+                    last_entry_bytes.push(byte, GFP_KERNEL)?;
+
+                    let last_entry_bytes_len = last_entry_bytes.len();
+                    if last_entry_bytes_len == entry_len {
+                        pr_info!("Last entry bytes: {:02x?}\n", &last_entry_bytes[..]);
+                        *last_entry_bytes = KVec::new();
+                    }
+                }
+            }
+        }
+
+        Ok(PmuLookupTable {
+            version: data[0],
+            header_len: header_len as u8,
+            entry_len: entry_len as u8,
+            entry_count: entry_count as u8,
+            table_data,
+        })
+    }
+
+    fn lookup_index(&self, idx: u8) -> Result<PmuLookupTableEntry> {
+        if idx >= self.entry_count {
+            return Err(EINVAL);
+        }
+
+        let index = (idx as usize) * self.entry_len as usize;
+        PmuLookupTableEntry::new(&self.table_data[index..])
+    }
+
+    // find entry by type value
+    fn find_entry_by_type(&self, entry_type: u8) -> Result<PmuLookupTableEntry> {
+        for i in 0..self.entry_count {
+            let entry = self.lookup_index(i)?;
+            if entry.application_id == entry_type {
+                return Ok(entry);
+            }
+        }
+
+        Err(EINVAL)
+    }
+}
+
+/// The FwSecBiosImage structure contains the PMU table and the Falcon Ucode.
+/// The PMU table contains voltage/frequency tables as well as a pointer to the
+/// Falcon Ucode.
+impl FwSecBiosImage {
+    fn setup_falcon_data(
+        &mut self,
+        pdev: &pci::Device,
+        pci_at_image: &PciAtBiosImage,
+        first_fwsec_image: &FwSecBiosImage,
+    ) -> Result<()> {
+        let mut offset = pci_at_image.falcon_data_ptr(pdev)? as usize;
+
+        // The falcon data pointer assumes that the PciAt and FWSEC images
+        // are contiguous in memory. However, testing shows the EFI image sits in
+        // between them. So calculate the offset from the end of the PciAt image
+        // rather than the start of it. Compensate.
+        offset -= pci_at_image.base.data.len();
+
+        // The offset is now from the start of the first Fwsec image, however
+        // the offset points to a location in the second Fwsec image. Since
+        // the fwsec images are contiguous, subtract the length of the first Fwsec
+        // image from the offset to get the offset to the start of the second
+        // Fwsec image.
+        offset -= first_fwsec_image.base.data.len();
+
+        self.falcon_data_offset = Some(offset);
+
+        // The PmuLookupTable starts at the offset of the falcon data pointer
+        self.pmu_lookup_table = Some(PmuLookupTable::new(&self.base.data[offset..])?);
+
+        match self
+            .pmu_lookup_table
+            .as_ref()
+            .ok_or(EINVAL)?
+            .find_entry_by_type(FALCON_UCODE_ENTRY_APPID_FWSEC_PROD)
+        {
+            Ok(entry) => {
+                let mut ucode_offset = entry.data as usize;
+                ucode_offset -= pci_at_image.base.data.len();
+                ucode_offset -= first_fwsec_image.base.data.len();
+                self.falcon_ucode_offset = Some(ucode_offset);
+                if cfg!(debug_assertions) {
+                    // Print the v3_desc header for debugging
+                    let v3_desc = self.fwsec_header(pdev.as_ref())?;
+                    pr_info!("PmuLookupTableEntry v3_desc: {:#?}\n", v3_desc);
+                }
+            }
+            Err(e) => {
+                dev_err!(
+                    pdev.as_ref(),
+                    "PmuLookupTableEntry not found, error: {:?}\n",
+                    e
+                );
+            }
+        }
+        Ok(())
+    }
+
+    /// TODO: These were borrowed from the old code for integrating this module
+    /// with the outside world. They should be cleaned up and integrated properly.
+    ///
+    /// Get the FwSec header (FalconUCodeDescV3)
+    fn fwsec_header(&self, dev: &device::Device) -> Result<&FalconUCodeDescV3> {
+        // Get the falcon ucode offset that was found in setup_falcon_data
+        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)?;
+
+        // Make sure the offset is within the data bounds
+        if falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>() > self.base.data.len() {
+            dev_err!(dev, "fwsec-frts header not contained within BIOS bounds\n");
+            return Err(ERANGE);
+        }
+
+        // Read the first 4 bytes to get the version
+        let hdr_bytes: [u8; 4] = self.base.data[falcon_ucode_offset..falcon_ucode_offset + 4]
+            .try_into()
+            .map_err(|_| EINVAL)?;
+        let hdr = u32::from_le_bytes(hdr_bytes);
+        let ver = (hdr & 0xff00) >> 8;
+
+        if ver != 3 {
+            dev_err!(dev, "invalid fwsec firmware version\n");
+            return Err(EINVAL);
+        }
+
+        // Return a reference to the FalconUCodeDescV3 structure SAFETY: we have checked that
+        // `falcon_ucode_offset + size_of::<FalconUCodeDescV3` is within the bounds of `data.`
+        Ok(unsafe {
+            &*(self.base.data.as_ptr().add(falcon_ucode_offset) as *const FalconUCodeDescV3)
+        })
+    }
+    /// Get the ucode data as a byte slice
+    fn fwsec_ucode(&self, dev: &device::Device, v3_desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)?;
+
+        // The ucode data follows the descriptor
+        let ucode_data_offset = falcon_ucode_offset + v3_desc.size();
+        let size = (v3_desc.imem_load_size + v3_desc.dmem_load_size) as usize;
+
+        // Get the data slice, checking bounds in a single operation
+        self.base
+            .data
+            .get(ucode_data_offset..ucode_data_offset + size)
+            .ok_or(ERANGE)
+            .inspect_err(|_| dev_err!(dev, "fwsec ucode data not contained within BIOS bounds\n"))
+    }
+
+    /// Get the signatures as a byte slice
+    fn fwsec_sigs(&self, dev: &device::Device, v3_desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+        const SIG_SIZE: usize = 96 * 4;
+
+        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)?;
+
+        // The signatures data follows the descriptor
+        let sigs_data_offset = falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
+        let size = v3_desc.signature_count as usize * SIG_SIZE;
+
+        // Make sure the data is within bounds
+        if sigs_data_offset + size > self.base.data.len() {
+            dev_err!(
+                dev,
+                "fwsec signatures data not contained within BIOS bounds\n"
+            );
+            return Err(ERANGE);
+        }
+
+        Ok(&self.base.data[sigs_data_offset..sigs_data_offset + size])
+    }
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 19/21] gpu: nova-core: compute layout of the FRTS region
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (17 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 18/21] nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 20/21] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 21/21] gpu: nova-core: load and " Alexandre Courbot
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

FWSEC-FRTS is run with the desired address of the FRTS region as
parameter, which we need to compute depending on some hardware
parameters.

Do this in a `FbLayout` structure, that will be later extended to
describe more memory regions used to boot the GSP.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |   4 ++
 drivers/gpu/nova-core/gsp.rs       |   3 +
 drivers/gpu/nova-core/gsp/fb.rs    | 109 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  27 +++++++++
 5 files changed, 144 insertions(+)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 2adef119618b05ef7e397e10dfeda1228f4521c2..8d470a810ec7a04cbee1645fc9c32607d2ad8b8c 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -7,6 +7,7 @@
 use crate::driver::Bar0;
 use crate::falcon::{gsp::Gsp, sec2::Sec2, Falcon};
 use crate::firmware::Firmware;
+use crate::gsp::fb::FbLayout;
 use crate::regs;
 use crate::util;
 use crate::vbios::Vbios;
@@ -240,6 +241,9 @@ pub(crate) fn new(
 
         let _sec2_falcon = Falcon::<Sec2>::new(pdev.as_ref(), spec.chipset, &bar, true)?;
 
+        let fb_layout = FbLayout::new(spec.chipset, &bar)?;
+        dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
+
         let _bios = Vbios::new(pdev, &bar)?;
 
         Ok(pin_init!(Self {
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..27616a9d2b7069b18661fc97811fa1cac285b8f8
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+
+pub(crate) mod fb;
diff --git a/drivers/gpu/nova-core/gsp/fb.rs b/drivers/gpu/nova-core/gsp/fb.rs
new file mode 100644
index 0000000000000000000000000000000000000000..c4f5f36e143e698843d15510b95a8abd80550c0c
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/fb.rs
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::ops::Range;
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+use crate::regs;
+
+fn align_down(value: u64, align: u64) -> u64 {
+    value & !(align - 1)
+}
+
+/// Layout of the GPU framebuffer memory.
+///
+/// Contains ranges of GPU memory reserved for a given purpose during the GSP bootup process.
+#[derive(Debug)]
+#[expect(dead_code)]
+pub(crate) struct FbLayout {
+    pub fb: Range<u64>,
+
+    pub vga_workspace: Range<u64>,
+    pub bios: Range<u64>,
+
+    pub frts: Range<u64>,
+}
+
+impl FbLayout {
+    pub(crate) fn new(chipset: Chipset, bar: &Devres<Bar0>) -> Result<Self> {
+        let fb = {
+            let fb_size = with_bar!(bar, |b| vidmem_size(b, chipset))?;
+
+            0..fb_size
+        };
+        let fb_len = fb.end - fb.start;
+
+        let vga_workspace = {
+            let vga_base = with_bar!(bar, |b| vga_workspace_addr(b, fb_len, chipset,))?;
+
+            vga_base..fb.end
+        };
+
+        let bios = vga_workspace.clone();
+
+        let frts = {
+            const FRTS_DOWN_ALIGN: u64 = 0x20000;
+            const FRTS_SIZE: u64 = 0x100000;
+            let frts_base = align_down(vga_workspace.start, FRTS_DOWN_ALIGN) - FRTS_SIZE;
+
+            frts_base..frts_base + FRTS_SIZE
+        };
+
+        Ok(Self {
+            fb,
+            vga_workspace,
+            bios,
+            frts,
+        })
+    }
+}
+
+/// Returns `true` if the display is disabled.
+fn display_disabled(bar: &Bar0, chipset: Chipset) -> bool {
+    if chipset >= Chipset::GA100 {
+        regs::NV_FUSE_STATUS_OPT_DISPLAY_MAXWELL::read(bar).display_disabled()
+    } else {
+        regs::NV_FUSE_STATUS_OPT_DISPLAY_AMPERE::read(bar).display_disabled()
+    }
+}
+
+/// Returns the video memory size in bytes.
+fn vidmem_size(bar: &Bar0, chipset: Chipset) -> u64 {
+    if chipset >= Chipset::GA102 {
+        (regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_42::read(bar).value() as u64) << 20
+    } else {
+        let local_mem_range = regs::NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE::read(bar);
+        let size =
+            (local_mem_range.lower_mag() as u64) << ((local_mem_range.lower_scale() as u64) + 20);
+
+        if local_mem_range.ecc_mode_enabled() {
+            size / 16 * 15
+        } else {
+            size
+        }
+    }
+}
+
+/// Returns the vga workspace address.
+fn vga_workspace_addr(bar: &Bar0, fb_size: u64, chipset: Chipset) -> u64 {
+    let base = fb_size - 0x100000;
+    let vga_workspace_base = if display_disabled(bar, chipset) {
+        regs::NV_PDISP_VGA_WORKSPACE_BASE::read(bar)
+    } else {
+        return base;
+    };
+
+    if !vga_workspace_base.status_valid() {
+        return base;
+    }
+
+    let addr = (vga_workspace_base.addr() as u64) << 16;
+    if addr < base {
+        fb_size - 0x20000
+    } else {
+        addr
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index dc5da9409477c62b1d466ed00c17baf3677a6a53..e48e9757e54b3b05adcfa05fdc9d2c34c789e950 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -26,6 +26,7 @@ macro_rules! with_bar {
 mod falcon;
 mod firmware;
 mod gpu;
+mod gsp;
 mod regs;
 mod util;
 mod vbios;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index b5c6eeb6ed873a06b4aefcb375f4944eb0b20597..15ec9b7e69694ff198b5353d562fc1aff5eefd3f 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -53,6 +53,12 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
     23:0    adr_63_40 as u32;
 });
 
+register!(NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE @ 0x00100ce0 {
+    3:0     lower_scale as u8;
+    9:4     lower_mag as u8;
+    30:30   ecc_mode_enabled as bool;
+});
+
 /* PGC6 */
 
 register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {
@@ -64,6 +70,27 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
     31:0    value as u32;
 });
 
+register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_42 @ 0x001183a4 {
+    31:0    value as u32;
+});
+
+/* PDISP */
+
+register!(NV_PDISP_VGA_WORKSPACE_BASE @ 0x00625f04 {
+    3:3     status_valid as bool;
+    31:8    addr as u32;
+});
+
+/* FUSE */
+
+register!(NV_FUSE_STATUS_OPT_DISPLAY_MAXWELL @ 0x00021c04 {
+    0:0     display_disabled as bool;
+});
+
+register!(NV_FUSE_STATUS_OPT_DISPLAY_AMPERE @ 0x00820c04 {
+    0:0     display_disabled as bool;
+});
+
 /* PFALCON */
 
 register!(NV_PFALCON_FALCON_IRQSCLR @ +0x00000004 {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 20/21] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (18 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 19/21] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  2025-05-01 12:58 ` [PATCH v2 21/21] gpu: nova-core: load and " Alexandre Courbot
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

The FWSEC firmware needs to be extracted from the VBIOS and patched with
the desired command, as well as the right signature. Do this so we are
ready to load and run this firmware into the GSP falcon and create the
FRTS region.

[joelagnelf@nvidia.com: give better names to FalconAppifHdrV1's fields]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/dma.rs            |   3 -
 drivers/gpu/nova-core/firmware.rs       |  18 ++
 drivers/gpu/nova-core/firmware/fwsec.rs | 360 ++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gpu.rs            |  20 +-
 drivers/gpu/nova-core/vbios.rs          |   3 -
 5 files changed, 396 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
index 9d90ae01d0044eaab4ddbc3eba216741d7a623ef..a12d0dff574aa38fb5eb8f4d759611af2f8ba3ec 100644
--- a/drivers/gpu/nova-core/dma.rs
+++ b/drivers/gpu/nova-core/dma.rs
@@ -2,9 +2,6 @@
 
 //! Simple DMA object wrapper.
 
-// To be removed when all code is used.
-#![expect(dead_code)]
-
 use core::ops::{Deref, DerefMut};
 
 use kernel::device;
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 960982174d834c7c66a47ecfb3a15bf47116b2c5..a2037bfebeac70ced592502fc92af64b9f9aed5c 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -8,9 +8,12 @@
 use kernel::prelude::*;
 use kernel::str::CString;
 
+use crate::dma::DmaObject;
 use crate::gpu;
 use crate::gpu::Chipset;
 
+pub(crate) mod fwsec;
+
 pub(crate) const FIRMWARE_VERSION: &str = "535.113.01";
 
 /// Structure encapsulating the firmware blobs required for the GPU to operate.
@@ -86,6 +89,21 @@ pub(crate) fn size(&self) -> usize {
     }
 }
 
+/// Patch the `ucode_dma` firmware at offset `sig_base_img` with `signature`.
+fn patch_signature(ucode_dma: &mut DmaObject, signature: &[u8], sig_base_img: usize) -> Result<()> {
+    if sig_base_img + signature.len() > ucode_dma.size() {
+        return Err(ERANGE);
+    }
+
+    // SAFETY: we are the only user of this object, so there cannot be any race.
+    let dst = unsafe { ucode_dma.start_ptr_mut().add(sig_base_img) };
+
+    // SAFETY: `signature` and `dst` are valid, properly aligned, and do not overlap.
+    unsafe { core::ptr::copy_nonoverlapping(signature.as_ptr(), dst, signature.len()) };
+
+    Ok(())
+}
+
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
 impl<const N: usize> ModInfoBuilder<N> {
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
new file mode 100644
index 0000000000000000000000000000000000000000..ed5801df4fd5730d74ee59cbdeff35115846ce3b
--- /dev/null
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -0,0 +1,360 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! FWSEC is a High Secure firmware that is extracted from the BIOS and performs the first step of
+//! the GSP startup by creating the WPR2 memory region and copying critical areas of the VBIOS into
+//! it after authenticating them, ensuring they haven't been tampered with. It runs on the GSP
+//! falcon.
+//!
+//! Before being run, it needs to be patched in two areas:
+//!
+//! - The command to be run, as this firmware can perform several tasks ;
+//! - The ucode signature, so the GSP falcon can run FWSEC in HS mode.
+
+use core::alloc::Layout;
+
+use kernel::bindings;
+use kernel::device::{self, Device};
+use kernel::devres::Devres;
+use kernel::prelude::*;
+use kernel::transmute::FromBytes;
+
+use crate::dma::DmaObject;
+use crate::driver::Bar0;
+use crate::falcon::gsp::Gsp;
+use crate::falcon::{Falcon, FalconBromParams, FalconFirmware, FalconLoadTarget};
+use crate::firmware::FalconUCodeDescV3;
+use crate::vbios::Vbios;
+
+const NVFW_FALCON_APPIF_ID_DMEMMAPPER: u32 = 0x4;
+
+#[repr(C)]
+#[derive(Debug)]
+struct FalconAppifHdrV1 {
+    version: u8,
+    header_size: u8,
+    entry_size: u8,
+    entry_count: u8,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifHdrV1 {}
+
+#[repr(C, packed)]
+#[derive(Debug)]
+struct FalconAppifV1 {
+    id: u32,
+    dmem_base: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifV1 {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct FalconAppifDmemmapperV3 {
+    signature: u32,
+    version: u16,
+    size: u16,
+    cmd_in_buffer_offset: u32,
+    cmd_in_buffer_size: u32,
+    cmd_out_buffer_offset: u32,
+    cmd_out_buffer_size: u32,
+    nvf_img_data_buffer_offset: u32,
+    nvf_img_data_buffer_size: u32,
+    printf_buffer_hdr: u32,
+    ucode_build_time_stamp: u32,
+    ucode_signature: u32,
+    init_cmd: u32,
+    ucode_feature: u32,
+    ucode_cmd_mask0: u32,
+    ucode_cmd_mask1: u32,
+    multi_tgt_tbl: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifDmemmapperV3 {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct ReadVbios {
+    ver: u32,
+    hdr: u32,
+    addr: u64,
+    size: u32,
+    flags: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for ReadVbios {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct FrtsRegion {
+    ver: u32,
+    hdr: u32,
+    addr: u32,
+    size: u32,
+    ftype: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FrtsRegion {}
+
+const NVFW_FRTS_CMD_REGION_TYPE_FB: u32 = 2;
+
+#[repr(C, packed)]
+struct FrtsCmd {
+    read_vbios: ReadVbios,
+    frts_region: FrtsRegion,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FrtsCmd {}
+
+const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS: u32 = 0x15;
+const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_SB: u32 = 0x19;
+
+/// Command for the [`FwsecFirmware`] to execute.
+pub(crate) enum FwsecCommand {
+    /// Asks [`FwsecFirmware`] to carve out the WPR2 area and place a verified copy of the VBIOS
+    /// image into it.
+    Frts { frts_addr: u64, frts_size: u64 },
+    /// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
+    #[expect(dead_code)]
+    Sb,
+}
+
+/// Reinterpret the area starting from `offset` in `fw` as an instance of `T` (which must implement
+/// [`FromBytes`]) and return a reference to it.
+///
+/// # Safety
+///
+/// Callers must ensure that the region of memory returned is not written for as long as the
+/// returned reference is alive.
+///
+/// TODO: Remove this and `transmute_mut` once we have a way to transmute objects implementing
+/// FromBytes, e.g.:
+/// https://lore.kernel.org/lkml/20250330234039.29814-1-christiansantoslima21@gmail.com/
+unsafe fn transmute<'a, 'b, T: Sized + FromBytes>(
+    fw: &'a DmaObject,
+    offset: usize,
+) -> Result<&'b T> {
+    if offset + core::mem::size_of::<T>() > fw.size() {
+        return Err(ERANGE);
+    }
+    if (fw.start_ptr() as usize + offset) % core::mem::align_of::<T>() != 0 {
+        return Err(EINVAL);
+    }
+
+    // SAFETY: we have checked that the pointer is properly aligned that its pointed memory is
+    // large enough the contains an instance of `T`, which implements `FromBytes`.
+    Ok(unsafe { &*(fw.start_ptr().add(offset) as *const T) })
+}
+
+/// Reinterpret the area starting from `offset` in `fw` as a mutable instance of `T` (which must
+/// implement [`FromBytes`]) and return a reference to it.
+///
+/// # Safety
+///
+/// Callers must ensure that the region of memory returned is not read or written for as long as
+/// the returned reference is alive.
+unsafe fn transmute_mut<'a, 'b, T: Sized + FromBytes>(
+    fw: &'a mut DmaObject,
+    offset: usize,
+) -> Result<&'b mut T> {
+    if offset + core::mem::size_of::<T>() > fw.size() {
+        return Err(ERANGE);
+    }
+    if (fw.start_ptr_mut() as usize + offset) % core::mem::align_of::<T>() != 0 {
+        return Err(EINVAL);
+    }
+
+    // SAFETY: we have checked that the pointer is properly aligned that its pointed memory is
+    // large enough the contains an instance of `T`, which implements `FromBytes`.
+    Ok(unsafe { &mut *(fw.start_ptr_mut().add(offset) as *mut T) })
+}
+
+/// Patch the Fwsec firmware image in `fw` to run the command `cmd`.
+fn patch_command(fw: &mut DmaObject, v3_desc: &FalconUCodeDescV3, cmd: FwsecCommand) -> Result<()> {
+    let hdr_offset = (v3_desc.imem_load_size + v3_desc.interface_offset) as usize;
+    // SAFETY: we have an exclusive reference to `fw`, and no caller should have shared `fw` with
+    // the hardware yet.
+    let hdr: &FalconAppifHdrV1 = unsafe { transmute(fw, hdr_offset) }?;
+
+    if hdr.version != 1 {
+        return Err(EINVAL);
+    }
+
+    // Find the DMEM mapper section in the firmware.
+    for i in 0..hdr.entry_count as usize {
+        let app: &FalconAppifV1 =
+            // SAFETY: we have an exclusive reference to `fw`, and no caller should have shared
+            // `fw` with the hardware yet.
+            unsafe {
+                transmute(
+                    fw,
+                    hdr_offset + hdr.header_size as usize + i * hdr.entry_size as usize
+                )
+            }?;
+
+        if app.id != NVFW_FALCON_APPIF_ID_DMEMMAPPER {
+            continue;
+        }
+
+        let dmem_mapper: &mut FalconAppifDmemmapperV3 =
+            // SAFETY: we have an exclusive reference to `fw`, and no caller should have shared
+            // `fw` with the hardware yet.
+            unsafe { transmute_mut(fw, (v3_desc.imem_load_size + app.dmem_base) as usize) }?;
+
+        // SAFETY: we have an exclusive reference to `fw`, and no caller should have shared `fw`
+        // with the hardware yet.
+        let frts_cmd: &mut FrtsCmd = unsafe {
+            transmute_mut(
+                fw,
+                (v3_desc.imem_load_size + dmem_mapper.cmd_in_buffer_offset) as usize,
+            )
+        }?;
+
+        frts_cmd.read_vbios = ReadVbios {
+            ver: 1,
+            hdr: core::mem::size_of::<ReadVbios>() as u32,
+            addr: 0,
+            size: 0,
+            flags: 2,
+        };
+
+        dmem_mapper.init_cmd = match cmd {
+            FwsecCommand::Frts {
+                frts_addr,
+                frts_size,
+            } => {
+                frts_cmd.frts_region = FrtsRegion {
+                    ver: 1,
+                    hdr: core::mem::size_of::<FrtsRegion>() as u32,
+                    addr: (frts_addr >> 12) as u32,
+                    size: (frts_size >> 12) as u32,
+                    ftype: NVFW_FRTS_CMD_REGION_TYPE_FB,
+                };
+
+                NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS
+            }
+            FwsecCommand::Sb => NVFW_FALCON_APPIF_DMEMMAPPER_CMD_SB,
+        };
+
+        // Return early as we found and patched the DMEMMAPPER region.
+        return Ok(());
+    }
+
+    Err(ENOTSUPP)
+}
+
+/// Firmware extracted from the VBIOS and responsible for e.g. carving out the WPR2 region as the
+/// first step of the GSP bootflow.
+pub(crate) struct FwsecFirmware {
+    desc: FalconUCodeDescV3,
+    ucode: DmaObject,
+}
+
+impl FalconFirmware for FwsecFirmware {
+    type Target = Gsp;
+
+    fn dma_handle(&self) -> bindings::dma_addr_t {
+        self.ucode.dma_handle()
+    }
+
+    fn imem_load(&self) -> FalconLoadTarget {
+        FalconLoadTarget {
+            src_start: 0,
+            dst_start: self.desc.imem_phys_base,
+            len: self.desc.imem_load_size,
+        }
+    }
+
+    fn dmem_load(&self) -> FalconLoadTarget {
+        FalconLoadTarget {
+            src_start: self.desc.imem_load_size,
+            dst_start: self.desc.dmem_phys_base,
+            len: Layout::from_size_align(self.desc.dmem_load_size as usize, 256)
+                // Cannot panic, as 256 is non-zero and a power of 2.
+                .unwrap()
+                .pad_to_align()
+                .size() as u32,
+        }
+    }
+
+    fn brom_params(&self) -> FalconBromParams {
+        FalconBromParams {
+            pkc_data_offset: self.desc.pkc_data_offset,
+            engine_id_mask: self.desc.engine_id_mask,
+            ucode_id: self.desc.ucode_id,
+        }
+    }
+
+    fn boot_addr(&self) -> u32 {
+        0
+    }
+}
+
+impl FwsecFirmware {
+    /// Extract the Fwsec firmware from `bios` and patch it to run with the `cmd` command.
+    pub(crate) fn new(
+        falcon: &Falcon<Gsp>,
+        dev: &Device<device::Bound>,
+        bar: &Devres<Bar0>,
+        bios: &Vbios,
+        cmd: FwsecCommand,
+    ) -> Result<Self> {
+        let v3_desc = bios.fwsec_header(dev)?;
+        let ucode = bios.fwsec_ucode(dev)?;
+
+        let mut ucode_dma = DmaObject::from_data(dev, ucode)?;
+        patch_command(&mut ucode_dma, v3_desc, cmd)?;
+
+        const SIG_SIZE: usize = 96 * 4;
+        let signatures = bios.fwsec_sigs(dev)?;
+        let sig_base_img = (v3_desc.imem_load_size + v3_desc.pkc_data_offset) as usize;
+
+        if v3_desc.signature_count != 0 {
+            // Patch signature.
+            let desc_sig_versions = v3_desc.signature_versions as u32;
+            let reg_fuse_version = falcon.get_signature_reg_fuse_version(
+                bar,
+                v3_desc.engine_id_mask,
+                v3_desc.ucode_id,
+            )?;
+            dev_dbg!(
+                dev,
+                "desc_sig_versions: {:#x}, reg_fuse_version: {}\n",
+                desc_sig_versions,
+                reg_fuse_version
+            );
+            let signature_idx = {
+                let reg_fuse_version_bit = 1 << reg_fuse_version;
+
+                // Check if the fuse version is supported by the firmware.
+                if desc_sig_versions & reg_fuse_version_bit == 0 {
+                    dev_warn!(
+                        dev,
+                        "no matching signature: {:#x} {:#x}\n",
+                        reg_fuse_version_bit,
+                        v3_desc.signature_versions
+                    );
+                    return Err(EINVAL);
+                }
+
+                // `desc_sig_versions` has one bit set per included signature. Thus, the index of
+                // the signature to patch is the number of bits in `desc_sig_versions` set to `1`
+                // before `reg_fuse_version_bit`.
+
+                // Mask of the bits of `desc_sig_versions` to preserve.
+                let reg_fuse_version_mask = reg_fuse_version_bit.wrapping_sub(1);
+
+                (desc_sig_versions & reg_fuse_version_mask).count_ones()
+            };
+
+            dev_dbg!(dev, "patching signature with index {}\n", signature_idx);
+            let signature_start = signature_idx as usize * SIG_SIZE;
+            let signature = &signatures[signature_start..signature_start + SIG_SIZE];
+            super::patch_signature(&mut ucode_dma, signature, sig_base_img)?;
+        }
+
+        Ok(FwsecFirmware {
+            desc: v3_desc.clone(),
+            ucode: ucode_dma,
+        })
+    }
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 8d470a810ec7a04cbee1645fc9c32607d2ad8b8c..f5ff319db39521cd8ea331bedc27536f7562a5f7 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -6,6 +6,7 @@
 use crate::dma::DmaObject;
 use crate::driver::Bar0;
 use crate::falcon::{gsp::Gsp, sec2::Sec2, Falcon};
+use crate::firmware::fwsec::{FwsecCommand, FwsecFirmware};
 use crate::firmware::Firmware;
 use crate::gsp::fb::FbLayout;
 use crate::regs;
@@ -196,7 +197,11 @@ pub(crate) fn new(
         bar: Devres<Bar0>,
     ) -> Result<impl PinInit<Self>> {
         let spec = Spec::new(&bar)?;
-        let fw = Firmware::new(pdev.as_ref(), spec.chipset, "535.113.01")?;
+        let fw = Firmware::new(
+            pdev.as_ref(),
+            spec.chipset,
+            crate::firmware::FIRMWARE_VERSION,
+        )?;
 
         dev_info!(
             pdev.as_ref(),
@@ -244,7 +249,18 @@ pub(crate) fn new(
         let fb_layout = FbLayout::new(spec.chipset, &bar)?;
         dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
 
-        let _bios = Vbios::new(pdev, &bar)?;
+        let bios = Vbios::new(pdev, &bar)?;
+
+        let _fwsec_frts = FwsecFirmware::new(
+            &gsp_falcon,
+            pdev.as_ref(),
+            &bar,
+            &bios,
+            FwsecCommand::Frts {
+                frts_addr: fb_layout.frts.start,
+                frts_size: fb_layout.frts.end - fb_layout.frts.start,
+            },
+        )?;
 
         Ok(pin_init!(Self {
             spec,
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index 719752fc54f1ec4b8dcc9a3c044a2925a187dd2a..880f77182691b71dec449448b60d939adb01c2dd 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -2,9 +2,6 @@
 
 //! VBIOS extraction and parsing.
 
-// To be removed when all code is used.
-#![expect(dead_code)]
-
 use crate::driver::Bar0;
 use crate::firmware::FalconUCodeDescV3;
 use core::convert::TryFrom;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 21/21] gpu: nova-core: load and run FWSEC-FRTS
  2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (19 preceding siblings ...)
  2025-05-01 12:58 ` [PATCH v2 20/21] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
@ 2025-05-01 12:58 ` Alexandre Courbot
  20 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 12:58 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

With all the required pieces in place, load FWSEC-FRTS onto the GSP
falcon, run it, and check that it completed successfully by carving out
the WPR2 region out of framebuffer memory.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs |  3 ---
 drivers/gpu/nova-core/gpu.rs    | 60 ++++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs   | 15 +++++++++++
 3 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 7cae45645e548bab5b85cb53880898cedbae778a..e9ee0c83dfc521db4d48dd92f963daa9fbae0cb2 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -2,9 +2,6 @@
 
 //! Falcon microprocessor base support
 
-// To be removed when all code is used.
-#![expect(dead_code)]
-
 use core::time::Duration;
 use hal::FalconHal;
 use kernel::bindings;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index f5ff319db39521cd8ea331bedc27536f7562a5f7..a46768d18ac35158292c3bc2d25c8aca82833391 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -251,7 +251,7 @@ pub(crate) fn new(
 
         let bios = Vbios::new(pdev, &bar)?;
 
-        let _fwsec_frts = FwsecFirmware::new(
+        let fwsec_frts = FwsecFirmware::new(
             &gsp_falcon,
             pdev.as_ref(),
             &bar,
@@ -262,6 +262,64 @@ pub(crate) fn new(
             },
         )?;
 
+        // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
+        // reset.
+        if with_bar!(bar, |b| regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(b).hi_val())? != 0 {
+            dev_err!(
+                pdev.as_ref(),
+                "WPR2 region already exists - GPU needs to be reset to proceed\n"
+            );
+            return Err(EBUSY);
+        }
+
+        // Reset falcon, load FWSEC-FRTS, and run it.
+        gsp_falcon.reset(&bar)?;
+        gsp_falcon.dma_load(&bar, &fwsec_frts)?;
+        let (mbox0, _) = gsp_falcon.boot(&bar, Some(0), None)?;
+        if mbox0 != 0 {
+            dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
+            return Err(EINVAL);
+        }
+
+        // SCRATCH_E contains FWSEC-FRTS' error code, if any.
+        let frts_status = with_bar!(bar, |b| regs::NV_PBUS_SW_SCRATCH_0E::read(b)
+            .frts_err_code())?;
+        if frts_status != 0 {
+            dev_err!(
+                pdev.as_ref(),
+                "FWSEC-FRTS returned with error code {:#x}",
+                frts_status
+            );
+            return Err(EINVAL);
+        }
+
+        // Check the WPR2 has been created as we requested.
+        let (wpr2_lo, wpr2_hi) = with_bar!(bar, |b| {
+            (
+                (regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO::read(b).lo_val() as u64) << 12,
+                (regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(b).hi_val() as u64) << 12,
+            )
+        })?;
+        if wpr2_hi == 0 {
+            dev_err!(
+                pdev.as_ref(),
+                "WPR2 region not created after running FWSEC-FRTS\n"
+            );
+
+            return Err(ENOTTY);
+        } else if wpr2_lo != fb_layout.frts.start {
+            dev_err!(
+                pdev.as_ref(),
+                "WPR2 region created at unexpected address {:#x} ; expected {:#x}\n",
+                wpr2_lo,
+                fb_layout.frts.start,
+            );
+            return Err(EINVAL);
+        }
+
+        dev_info!(pdev.as_ref(), "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
+        dev_info!(pdev.as_ref(), "GPU instance built\n");
+
         Ok(pin_init!(Self {
             spec,
             bar,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 15ec9b7e69694ff198b5353d562fc1aff5eefd3f..3acec36f2d5701af4a752808e86d71d5f200359b 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -43,6 +43,13 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
     }
 }
 
+/* PBUS */
+
+// TODO: this is an array of registers.
+register!(NV_PBUS_SW_SCRATCH_0E@0x00001438  {
+    31:16   frts_err_code as u16;
+});
+
 /* PFB */
 
 register!(NV_PFB_NISO_FLUSH_SYSMEM_ADDR @ 0x00100c10 {
@@ -59,6 +66,14 @@ pub(crate) fn chipset(self) -> Result<Chipset, Error> {
     30:30   ecc_mode_enabled as bool;
 });
 
+register!(NV_PFB_PRI_MMU_WPR2_ADDR_LO@0x001fa824  {
+    31:4    lo_val as u32;
+});
+
+register!(NV_PFB_PRI_MMU_WPR2_ADDR_HI@0x001fa828  {
+    31:4    hi_val as u32;
+});
+
 /* PGC6 */
 
 register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code
  2025-05-01 12:58 ` [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
@ 2025-05-01 13:52   ` Joel Fernandes
  2025-05-01 14:18     ` Alexandre Courbot
  0 siblings, 1 reply; 41+ messages in thread
From: Joel Fernandes @ 2025-05-01 13:52 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

Hello Alex,

On Thu, May 01, 2025 at 09:58:33PM +0900, Alexandre Courbot wrote:
> Add the common Falcon code and HAL for Ampere GPUs, and instantiate the
> GSP and SEC2 Falcons that will be required to boot the GSP.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon.rs           | 546 ++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/falcon/gsp.rs       |  25 ++
>  drivers/gpu/nova-core/falcon/hal.rs       |  55 +++
>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 115 +++++++
>  drivers/gpu/nova-core/falcon/sec2.rs      |   8 +
>  drivers/gpu/nova-core/gpu.rs              |  11 +
>  drivers/gpu/nova-core/nova_core.rs        |   1 +
>  drivers/gpu/nova-core/regs.rs             | 125 +++++++
>  drivers/gpu/nova-core/util.rs             |   1 -
>  9 files changed, 886 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..7cae45645e548bab5b85cb53880898cedbae778a
> --- /dev/null
> +++ b/drivers/gpu/nova-core/falcon.rs
> @@ -0,0 +1,546 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Falcon microprocessor base support
> +
> +// To be removed when all code is used.
> +#![expect(dead_code)]
> +
> +use core::time::Duration;
> +use hal::FalconHal;
> +use kernel::bindings;
> +use kernel::device;
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +use kernel::sync::Arc;
> +
> +use crate::driver::Bar0;
> +use crate::gpu::Chipset;
> +use crate::regs;
> +use crate::util;
> +
> +pub(crate) mod gsp;
> +mod hal;
> +pub(crate) mod sec2;
> +
> +/// Revision number of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
> +/// register.
> +#[repr(u8)]
> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
> +pub(crate) enum FalconCoreRev {
> +    #[default]
> +    Rev1 = 1,
> +    Rev2 = 2,
> +    Rev3 = 3,
> +    Rev4 = 4,
> +    Rev5 = 5,
> +    Rev6 = 6,
> +    Rev7 = 7,
> +}
> +
> +impl TryFrom<u8> for FalconCoreRev {
> +    type Error = Error;
> +
> +    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
> +        use FalconCoreRev::*;
> +
> +        let rev = match value {
> +            1 => Rev1,
> +            2 => Rev2,
> +            3 => Rev3,
> +            4 => Rev4,
> +            5 => Rev5,
> +            6 => Rev6,
> +            7 => Rev7,
> +            _ => return Err(EINVAL),
> +        };
> +
> +        Ok(rev)
> +    }
> +}
> +
> +/// Revision subversion number of a falcon core, used in the
> +/// [`crate::regs::NV_PFALCON_FALCON_HWCFG1`] register.
> +#[repr(u8)]
> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
> +pub(crate) enum FalconCoreRevSubversion {
> +    #[default]
> +    Subversion0 = 0,
> +    Subversion1 = 1,
> +    Subversion2 = 2,
> +    Subversion3 = 3,
> +}
> +
> +impl TryFrom<u8> for FalconCoreRevSubversion {
> +    type Error = Error;
> +
> +    fn try_from(value: u8) -> Result<Self> {
> +        use FalconCoreRevSubversion::*;
> +
> +        let sub_version = match value & 0b11 {
> +            0 => Subversion0,
> +            1 => Subversion1,
> +            2 => Subversion2,
> +            3 => Subversion3,
> +            _ => return Err(EINVAL),
> +        };
> +
> +        Ok(sub_version)
> +    }
> +}
> +
> +/// Security model of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
> +/// register.
> +#[repr(u8)]
> +#[derive(Debug, Default, Copy, Clone)]
> +pub(crate) enum FalconSecurityModel {
> +    /// Non-Secure: runs unsigned code without privileges.
> +    #[default]
> +    None = 0,
> +    /// Low-secure: runs unsigned code with some privileges. Can only be entered from `Heavy` mode.

This is not true. Low-secure is also (has to be) signed and the signatures
are verified by High-secure code. I can/will go fix that up in my follow-up doc
patches.

> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
> +///
> +/// We use this function and a heap-allocated trait object instead of statically defined trait
> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
> +/// requested HAL.
> +///
> +/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
> +pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
> +    chipset: Chipset,
> +) -> Result<Arc<dyn FalconHal<E>>> {
> +    let hal = match chipset {
> +        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
> +            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>

I am guessing macro-fication of this did not pan out? i.e. I think we
discussed:
1. Seeing if we can reduce/get rid of Arc in favor of static allocation.
2. Simplify the chain of GAxx | GAyy..
But nothing that cannot be done as a follow-up improvement..

(Also it is a bit weird that the namespace for chipsets for > GA10x is
ga102::GA102::). Example, Chipset::GA104 uses the HAL in Ga102).

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code
  2025-05-01 13:52   ` Joel Fernandes
@ 2025-05-01 14:18     ` Alexandre Courbot
  2025-05-01 14:41       ` Joel Fernandes
  0 siblings, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 14:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Thu May 1, 2025 at 10:52 PM JST, Joel Fernandes wrote:
> Hello Alex,
>
> On Thu, May 01, 2025 at 09:58:33PM +0900, Alexandre Courbot wrote:
>> Add the common Falcon code and HAL for Ampere GPUs, and instantiate the
>> GSP and SEC2 Falcons that will be required to boot the GSP.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/falcon.rs           | 546 ++++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/falcon/gsp.rs       |  25 ++
>>  drivers/gpu/nova-core/falcon/hal.rs       |  55 +++
>>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 115 +++++++
>>  drivers/gpu/nova-core/falcon/sec2.rs      |   8 +
>>  drivers/gpu/nova-core/gpu.rs              |  11 +
>>  drivers/gpu/nova-core/nova_core.rs        |   1 +
>>  drivers/gpu/nova-core/regs.rs             | 125 +++++++
>>  drivers/gpu/nova-core/util.rs             |   1 -
>>  9 files changed, 886 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..7cae45645e548bab5b85cb53880898cedbae778a
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/falcon.rs
>> @@ -0,0 +1,546 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Falcon microprocessor base support
>> +
>> +// To be removed when all code is used.
>> +#![expect(dead_code)]
>> +
>> +use core::time::Duration;
>> +use hal::FalconHal;
>> +use kernel::bindings;
>> +use kernel::device;
>> +use kernel::devres::Devres;
>> +use kernel::prelude::*;
>> +use kernel::sync::Arc;
>> +
>> +use crate::driver::Bar0;
>> +use crate::gpu::Chipset;
>> +use crate::regs;
>> +use crate::util;
>> +
>> +pub(crate) mod gsp;
>> +mod hal;
>> +pub(crate) mod sec2;
>> +
>> +/// Revision number of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
>> +/// register.
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>> +pub(crate) enum FalconCoreRev {
>> +    #[default]
>> +    Rev1 = 1,
>> +    Rev2 = 2,
>> +    Rev3 = 3,
>> +    Rev4 = 4,
>> +    Rev5 = 5,
>> +    Rev6 = 6,
>> +    Rev7 = 7,
>> +}
>> +
>> +impl TryFrom<u8> for FalconCoreRev {
>> +    type Error = Error;
>> +
>> +    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
>> +        use FalconCoreRev::*;
>> +
>> +        let rev = match value {
>> +            1 => Rev1,
>> +            2 => Rev2,
>> +            3 => Rev3,
>> +            4 => Rev4,
>> +            5 => Rev5,
>> +            6 => Rev6,
>> +            7 => Rev7,
>> +            _ => return Err(EINVAL),
>> +        };
>> +
>> +        Ok(rev)
>> +    }
>> +}
>> +
>> +/// Revision subversion number of a falcon core, used in the
>> +/// [`crate::regs::NV_PFALCON_FALCON_HWCFG1`] register.
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>> +pub(crate) enum FalconCoreRevSubversion {
>> +    #[default]
>> +    Subversion0 = 0,
>> +    Subversion1 = 1,
>> +    Subversion2 = 2,
>> +    Subversion3 = 3,
>> +}
>> +
>> +impl TryFrom<u8> for FalconCoreRevSubversion {
>> +    type Error = Error;
>> +
>> +    fn try_from(value: u8) -> Result<Self> {
>> +        use FalconCoreRevSubversion::*;
>> +
>> +        let sub_version = match value & 0b11 {
>> +            0 => Subversion0,
>> +            1 => Subversion1,
>> +            2 => Subversion2,
>> +            3 => Subversion3,
>> +            _ => return Err(EINVAL),
>> +        };
>> +
>> +        Ok(sub_version)
>> +    }
>> +}
>> +
>> +/// Security model of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
>> +/// register.
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone)]
>> +pub(crate) enum FalconSecurityModel {
>> +    /// Non-Secure: runs unsigned code without privileges.
>> +    #[default]
>> +    None = 0,
>> +    /// Low-secure: runs unsigned code with some privileges. Can only be entered from `Heavy` mode.
>
> This is not true. Low-secure is also (has to be) signed and the signatures
> are verified by High-secure code. I can/will go fix that up in my follow-up doc
> patches.

True, but contrary to HS code, the signature in the LS code is not a
hardware (or rather boot ROM) requirement - it's just that the HS code
decides to implement this policy and you could very well have a HS
loader that loads some code and switches to LS without further
verification. The point being that you cannot enter LS mode directly and
need to go through a HS loader first.

Nonetheless, you are right that in practice the HS code will not switch
to LS without due verification, and my use of "unsigned" is confusing.
Let me reword this.

>
>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>> +///
>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>> +/// requested HAL.
>> +///
>> +/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
>> +pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
>> +    chipset: Chipset,
>> +) -> Result<Arc<dyn FalconHal<E>>> {
>> +    let hal = match chipset {
>> +        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
>> +            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>
>
> I am guessing macro-fication of this did not pan out? i.e. I think we
> discussed:
> 1. Seeing if we can reduce/get rid of Arc in favor of static allocation.
> 2. Simplify the chain of GAxx | GAyy..
> But nothing that cannot be done as a follow-up improvement..

Yeah, my macro-fu is still lacking it seems. ^_^;

> (Also it is a bit weird that the namespace for chipsets for > GA10x is
> ga102::GA102::). Example, Chipset::GA104 uses the HAL in Ga102).

It is a convention since Nouveau (but I believe OpenRM as well?) to name
a HAL after the chipset with the lowest number when subsequent chipsets
can also use it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code
  2025-05-01 14:18     ` Alexandre Courbot
@ 2025-05-01 14:41       ` Joel Fernandes
  0 siblings, 0 replies; 41+ messages in thread
From: Joel Fernandes @ 2025-05-01 14:41 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On 5/1/2025 10:18 AM, Alexandre Courbot wrote:
[..]
>> On Thu, May 01, 2025 at 09:58:33PM +0900, Alexandre Courbot wrote:
>>> Add the common Falcon code and HAL for Ampere GPUs, and instantiate the
>>> GSP and SEC2 Falcons that will be required to boot the GSP.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>>  drivers/gpu/nova-core/falcon.rs           | 546 ++++++++++++++++++++++++++++++
>>>  drivers/gpu/nova-core/falcon/gsp.rs       |  25 ++
>>>  drivers/gpu/nova-core/falcon/hal.rs       |  55 +++
>>>  drivers/gpu/nova-core/falcon/hal/ga102.rs | 115 +++++++
>>>  drivers/gpu/nova-core/falcon/sec2.rs      |   8 +
>>>  drivers/gpu/nova-core/gpu.rs              |  11 +
>>>  drivers/gpu/nova-core/nova_core.rs        |   1 +
>>>  drivers/gpu/nova-core/regs.rs             | 125 +++++++
>>>  drivers/gpu/nova-core/util.rs             |   1 -
>>>  9 files changed, 886 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..7cae45645e548bab5b85cb53880898cedbae778a
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/falcon.rs
>>> @@ -0,0 +1,546 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Falcon microprocessor base support
>>> +
>>> +// To be removed when all code is used.
>>> +#![expect(dead_code)]
>>> +
>>> +use core::time::Duration;
>>> +use hal::FalconHal;
>>> +use kernel::bindings;
>>> +use kernel::device;
>>> +use kernel::devres::Devres;
>>> +use kernel::prelude::*;
>>> +use kernel::sync::Arc;
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::gpu::Chipset;
>>> +use crate::regs;
>>> +use crate::util;
>>> +
>>> +pub(crate) mod gsp;
>>> +mod hal;
>>> +pub(crate) mod sec2;
>>> +
>>> +/// Revision number of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
>>> +/// register.
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>>> +pub(crate) enum FalconCoreRev {
>>> +    #[default]
>>> +    Rev1 = 1,
>>> +    Rev2 = 2,
>>> +    Rev3 = 3,
>>> +    Rev4 = 4,
>>> +    Rev5 = 5,
>>> +    Rev6 = 6,
>>> +    Rev7 = 7,
>>> +}
>>> +
>>> +impl TryFrom<u8> for FalconCoreRev {
>>> +    type Error = Error;
>>> +
>>> +    fn try_from(value: u8) -> core::result::Result<Self, Self::Error> {
>>> +        use FalconCoreRev::*;
>>> +
>>> +        let rev = match value {
>>> +            1 => Rev1,
>>> +            2 => Rev2,
>>> +            3 => Rev3,
>>> +            4 => Rev4,
>>> +            5 => Rev5,
>>> +            6 => Rev6,
>>> +            7 => Rev7,
>>> +            _ => return Err(EINVAL),
>>> +        };
>>> +
>>> +        Ok(rev)
>>> +    }
>>> +}
>>> +
>>> +/// Revision subversion number of a falcon core, used in the
>>> +/// [`crate::regs::NV_PFALCON_FALCON_HWCFG1`] register.
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>>> +pub(crate) enum FalconCoreRevSubversion {
>>> +    #[default]
>>> +    Subversion0 = 0,
>>> +    Subversion1 = 1,
>>> +    Subversion2 = 2,
>>> +    Subversion3 = 3,
>>> +}
>>> +
>>> +impl TryFrom<u8> for FalconCoreRevSubversion {
>>> +    type Error = Error;
>>> +
>>> +    fn try_from(value: u8) -> Result<Self> {
>>> +        use FalconCoreRevSubversion::*;
>>> +
>>> +        let sub_version = match value & 0b11 {
>>> +            0 => Subversion0,
>>> +            1 => Subversion1,
>>> +            2 => Subversion2,
>>> +            3 => Subversion3,
>>> +            _ => return Err(EINVAL),
>>> +        };
>>> +
>>> +        Ok(sub_version)
>>> +    }
>>> +}
>>> +
>>> +/// Security model of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
>>> +/// register.
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone)]
>>> +pub(crate) enum FalconSecurityModel {
>>> +    /// Non-Secure: runs unsigned code without privileges.
>>> +    #[default]
>>> +    None = 0,
>>> +    /// Low-secure: runs unsigned code with some privileges. Can only be entered from `Heavy` mode.
>>
>> This is not true. Low-secure is also (has to be) signed and the signatures
>> are verified by High-secure code. I can/will go fix that up in my follow-up doc
>> patches.
> 
> True, but contrary to HS code, the signature in the LS code is not a
> hardware (or rather boot ROM) requirement - it's just that the HS code
> decides to implement this policy and you could very well have a HS
> loader that loads some code and switches to LS without further
> verification. The point being that you cannot enter LS mode directly and
> need to go through a HS loader first.
> 
> Nonetheless, you are right that in practice the HS code will not switch
> to LS without due verification, and my use of "unsigned" is confusing.
> Let me reword this.

Thanks, I wonder if there is any current example of such unsigned LS code. IIUC,
all the LS code is either coming from either the VBIOS or the firmware binaries,
both of which can be modified/re-flashed. Since LS still has some privileges, it
means that it is a bit of security issue. I think you're right though, in theory
LS can be run unverified but I'd think it is atypical.

>>
>>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>>> +///
>>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>>> +/// requested HAL.
>>> +///
>>> +/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
>>> +pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
>>> +    chipset: Chipset,
>>> +) -> Result<Arc<dyn FalconHal<E>>> {
>>> +    let hal = match chipset {
>>> +        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
>>> +            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>
>>
>> I am guessing macro-fication of this did not pan out? i.e. I think we
>> discussed:
>> 1. Seeing if we can reduce/get rid of Arc in favor of static allocation.
>> 2. Simplify the chain of GAxx | GAyy..
>> But nothing that cannot be done as a follow-up improvement..
> 
> Yeah, my macro-fu is still lacking it seems. ^_^;

:-D. It may or may not be worth complicating it though, but I was considering in
the future, to minimize the number of places that need to be modified (and thus
more possible room for errors) for new chipset additions.

>> (Also it is a bit weird that the namespace for chipsets for > GA10x is
>> ga102::GA102::). Example, Chipset::GA104 uses the HAL in Ga102).
> 
> It is a convention since Nouveau (but I believe OpenRM as well?) to name
> a HAL after the chipset with the lowest number when subsequent chipsets
> can also use it.

Ah ok! Maybe worth a documentation comment somewhere as well? Or maybe not. ;-)

thanks,

 - Joel




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 12:58 ` [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize Alexandre Courbot
@ 2025-05-01 15:19   ` Timur Tabi
  2025-05-01 15:22     ` Joel Fernandes
  2025-05-01 21:02     ` Alexandre Courbot
  2025-05-02  4:57   ` Alexandre Courbot
  1 sibling, 2 replies; 41+ messages in thread
From: Timur Tabi @ 2025-05-01 15:19 UTC (permalink / raw)
  To: dakr@kernel.org, a.hindborg@kernel.org, ojeda@kernel.org,
	boqun.feng@gmail.com, simona@ffwll.ch, tmgross@umich.edu,
	alex.gaynor@gmail.com, tzimmermann@suse.de, corbet@lwn.net,
	mripard@kernel.org, maarten.lankhorst@linux.intel.com,
	benno.lossin@proton.me, bjorn3_gh@protonmail.com,
	airlied@gmail.com, aliceryhl@google.com, Alexandre Courbot,
	gary@garyguo.net
  Cc: Alistair Popple, John Hubbard, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Joel Fernandes, Ben Skeggs

On Thu, 2025-05-01 at 21:58 +0900, Alexandre Courbot wrote:


> +impl UsizeAlign for usize {
> +    fn align_up(mut self, align: usize) -> usize {
> +        self = (self + align - 1) & !(align - 1);
> +        self
> +    }
> +}
> +
> +/// Aligns `val` upwards to the nearest multiple of `align`.
> +pub const fn usize_align_up(val: usize, align: usize) -> usize {
> +    (val + align - 1) & !(align - 1)
> +}

Why not have usize_align_up() just return "val.align_up(align)"?

But why why two versions at all?  Is there any context where you could use one
and not the other?




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 15:19   ` Timur Tabi
@ 2025-05-01 15:22     ` Joel Fernandes
  2025-05-01 15:31       ` Timur Tabi
  2025-05-01 21:02     ` Alexandre Courbot
  1 sibling, 1 reply; 41+ messages in thread
From: Joel Fernandes @ 2025-05-01 15:22 UTC (permalink / raw)
  To: Timur Tabi, dakr@kernel.org, a.hindborg@kernel.org,
	ojeda@kernel.org, boqun.feng@gmail.com, simona@ffwll.ch,
	tmgross@umich.edu, alex.gaynor@gmail.com, tzimmermann@suse.de,
	corbet@lwn.net, mripard@kernel.org,
	maarten.lankhorst@linux.intel.com, benno.lossin@proton.me,
	bjorn3_gh@protonmail.com, airlied@gmail.com, aliceryhl@google.com,
	Alexandre Courbot, gary@garyguo.net
  Cc: Alistair Popple, John Hubbard, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Ben Skeggs



On 5/1/2025 11:19 AM, Timur Tabi wrote:
> On Thu, 2025-05-01 at 21:58 +0900, Alexandre Courbot wrote:
> 
> 
>> +impl UsizeAlign for usize {
>> +    fn align_up(mut self, align: usize) -> usize {
>> +        self = (self + align - 1) & !(align - 1);
>> +        self
>> +    }
>> +}
>> +
>> +/// Aligns `val` upwards to the nearest multiple of `align`.
>> +pub const fn usize_align_up(val: usize, align: usize) -> usize {
>> +    (val + align - 1) & !(align - 1)
>> +}
> 
> Why not have usize_align_up() just return "val.align_up(align)"?
> 
> But why why two versions at all?  Is there any context where you could use one
> and not the other?
> 
I can't remember now but when I tried that, I got compiler errors (because val
is immutable?).

Also not mutating it like that matches the pattern in the rest of this file so
I'd leave it as-is.

Thanks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 15:22     ` Joel Fernandes
@ 2025-05-01 15:31       ` Timur Tabi
  2025-05-01 15:31         ` Joel Fernandes
  0 siblings, 1 reply; 41+ messages in thread
From: Timur Tabi @ 2025-05-01 15:31 UTC (permalink / raw)
  To: corbet@lwn.net, a.hindborg@kernel.org, ojeda@kernel.org,
	boqun.feng@gmail.com, simona@ffwll.ch, tmgross@umich.edu,
	mripard@kernel.org, tzimmermann@suse.de, alex.gaynor@gmail.com,
	maarten.lankhorst@linux.intel.com, benno.lossin@proton.me,
	gary@garyguo.net, bjorn3_gh@protonmail.com, Joel Fernandes,
	airlied@gmail.com, aliceryhl@google.com, dakr@kernel.org,
	Alexandre Courbot
  Cc: dri-devel@lists.freedesktop.org, Alistair Popple,
	nouveau@lists.freedesktop.org, Ben Skeggs,
	linux-kernel@vger.kernel.org, John Hubbard,
	rust-for-linux@vger.kernel.org

On Thu, 2025-05-01 at 11:22 -0400, Joel Fernandes wrote:
> Also not mutating it like that matches the pattern in the rest of this file
> so
> I'd leave it as-is.

Oh I see now.  One version changes a variable, and the other returns a new
value.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 15:31       ` Timur Tabi
@ 2025-05-01 15:31         ` Joel Fernandes
  0 siblings, 0 replies; 41+ messages in thread
From: Joel Fernandes @ 2025-05-01 15:31 UTC (permalink / raw)
  To: Timur Tabi, corbet@lwn.net, a.hindborg@kernel.org,
	ojeda@kernel.org, boqun.feng@gmail.com, simona@ffwll.ch,
	tmgross@umich.edu, mripard@kernel.org, tzimmermann@suse.de,
	alex.gaynor@gmail.com, maarten.lankhorst@linux.intel.com,
	benno.lossin@proton.me, gary@garyguo.net,
	bjorn3_gh@protonmail.com, airlied@gmail.com, aliceryhl@google.com,
	dakr@kernel.org, Alexandre Courbot
  Cc: dri-devel@lists.freedesktop.org, Alistair Popple,
	nouveau@lists.freedesktop.org, Ben Skeggs,
	linux-kernel@vger.kernel.org, John Hubbard,
	rust-for-linux@vger.kernel.org



On 5/1/2025 11:31 AM, Timur Tabi wrote:
> On Thu, 2025-05-01 at 11:22 -0400, Joel Fernandes wrote:
>> Also not mutating it like that matches the pattern in the rest of this file
>> so
>> I'd leave it as-is.
> 
> Oh I see now.  One version changes a variable, and the other returns a new
> value.

Yes, exactly! Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 15:19   ` Timur Tabi
  2025-05-01 15:22     ` Joel Fernandes
@ 2025-05-01 21:02     ` Alexandre Courbot
  2025-05-01 21:52       ` Joel Fernandes
  1 sibling, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-01 21:02 UTC (permalink / raw)
  To: Timur Tabi, dakr@kernel.org, a.hindborg@kernel.org,
	ojeda@kernel.org, boqun.feng@gmail.com, simona@ffwll.ch,
	tmgross@umich.edu, alex.gaynor@gmail.com, tzimmermann@suse.de,
	corbet@lwn.net, mripard@kernel.org,
	maarten.lankhorst@linux.intel.com, benno.lossin@proton.me,
	bjorn3_gh@protonmail.com, airlied@gmail.com, aliceryhl@google.com,
	gary@garyguo.net
  Cc: Alistair Popple, John Hubbard, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Joel Fernandes, Ben Skeggs

On Fri May 2, 2025 at 12:19 AM JST, Timur Tabi wrote:
> On Thu, 2025-05-01 at 21:58 +0900, Alexandre Courbot wrote:
>
>
>> +impl UsizeAlign for usize {
>> +    fn align_up(mut self, align: usize) -> usize {
>> +        self = (self + align - 1) & !(align - 1);
>> +        self
>> +    }
>> +}
>> +
>> +/// Aligns `val` upwards to the nearest multiple of `align`.
>> +pub const fn usize_align_up(val: usize, align: usize) -> usize {
>> +    (val + align - 1) & !(align - 1)
>> +}
>
> Why not have usize_align_up() just return "val.align_up(align)"?
>
> But why why two versions at all?  Is there any context where you could use one
> and not the other?

The second version can be used in const context to create values at
compile-time, something the first one cannot do. If we want to factorize
things out we can probably make the first version call the second one
though.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 21:02     ` Alexandre Courbot
@ 2025-05-01 21:52       ` Joel Fernandes
  0 siblings, 0 replies; 41+ messages in thread
From: Joel Fernandes @ 2025-05-01 21:52 UTC (permalink / raw)
  To: Alexandre Courbot, Timur Tabi, dakr@kernel.org,
	a.hindborg@kernel.org, ojeda@kernel.org, boqun.feng@gmail.com,
	simona@ffwll.ch, tmgross@umich.edu, alex.gaynor@gmail.com,
	tzimmermann@suse.de, corbet@lwn.net, mripard@kernel.org,
	maarten.lankhorst@linux.intel.com, benno.lossin@proton.me,
	bjorn3_gh@protonmail.com, airlied@gmail.com, aliceryhl@google.com,
	gary@garyguo.net
  Cc: Alistair Popple, John Hubbard, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Ben Skeggs



On 5/1/2025 5:02 PM, Alexandre Courbot wrote:
> On Fri May 2, 2025 at 12:19 AM JST, Timur Tabi wrote:
>> On Thu, 2025-05-01 at 21:58 +0900, Alexandre Courbot wrote:
>>
>>
>>> +impl UsizeAlign for usize {
>>> +    fn align_up(mut self, align: usize) -> usize {
>>> +        self = (self + align - 1) & !(align - 1);
>>> +        self
>>> +    }
>>> +}
>>> +
>>> +/// Aligns `val` upwards to the nearest multiple of `align`.
>>> +pub const fn usize_align_up(val: usize, align: usize) -> usize {
>>> +    (val + align - 1) & !(align - 1)
>>> +}
>>
>> Why not have usize_align_up() just return "val.align_up(align)"?
>>
>> But why why two versions at all?  Is there any context where you could use one
>> and not the other?
> 
> The second version can be used in const context to create values at
> compile-time, something the first one cannot do. If we want to factorize
> things out we can probably make the first version call the second one
> though.

True. By the way, Timur suggested me to factor it out in a chat so I already did
that in my tree. thanks,

 - Joel


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-01 12:58 ` [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize Alexandre Courbot
  2025-05-01 15:19   ` Timur Tabi
@ 2025-05-02  4:57   ` Alexandre Courbot
  2025-05-02 19:59     ` Joel Fernandes
  1 sibling, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-02  4:57 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Thu May 1, 2025 at 9:58 PM JST, Alexandre Courbot wrote:
> From: Joel Fernandes <joelagnelf@nvidia.com>
>
> This will be used in the nova-core driver where we need to upward-align
> the image size to get to the next image in the VBIOS ROM.
>
> [acourbot@nvidia.com: handled conflicts due to removal of patch creating
> num.rs]
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  rust/kernel/lib.rs |  1 +
>  rust/kernel/num.rs | 21 +++++++++++++++++++++
>  2 files changed, 22 insertions(+)
>
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index ab0286857061d2de1be0279cbd2cd3490e5a48c3..be75b196aa7a29cf3eed7c902ed8fb98689bbb50 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -67,6 +67,7 @@
>  pub mod miscdevice;
>  #[cfg(CONFIG_NET)]
>  pub mod net;
> +pub mod num;
>  pub mod of;
>  pub mod page;
>  #[cfg(CONFIG_PCI)]
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..953c6ab012601efb3c9387b4b299e22233670c4b
> --- /dev/null
> +++ b/rust/kernel/num.rs
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Numerical and binary utilities for primitive types.
> +
> +/// A trait providing alignment operations for `usize`.
> +pub trait UsizeAlign {
> +    /// Aligns `self` upwards to the nearest multiple of `align`.
> +    fn align_up(self, align: usize) -> usize;
> +}

As it turns out I will also need the same functionality for u64 in a
future patch. :) Could we turn this into a more generic trait? E.g:

    /// A trait providing alignment operations for `usize`.
    pub trait Align {
        /// Aligns `self` upwards to the nearest multiple of `align`.
        fn align_up(self, align: Self) -> Self;
    }

That way it can be implemented for all basic types. I'd suggest having a local
macro that takes an arbitrary number of arguments and generates the impl blocks
for each of them, which would be invoked as:

    impl_align!(i8, u8, i16, u16, ...);

> +impl UsizeAlign for usize {
> +    fn align_up(mut self, align: usize) -> usize {
> +        self = (self + align - 1) & !(align - 1);
> +        self
> +    }
> +}

Note that there is no need to mutate `self`, the body can just be:

    (self + align - 1) & !(align - 1)

This operation can trigger overflows/underflows, so I guess for safety it
should return a `Result` and use `checked_add` and `checked_sub`, after moving
`align - 1` into a local variable to avoid checking it twice.

There is also the interesting question of, what if `align` is not a power of 2.
We should probably document the fact that it is expected to be.

> +/// Aligns `val` upwards to the nearest multiple of `align`.
> +pub const fn usize_align_up(val: usize, align: usize) -> usize {
> +    (val + align - 1) & !(align - 1)
> +}

Let's add a statement in the documentation saying that this exists for use in
`const` contexts. The `impl` version can maybe call this one directly to avoid
repeating the same operation twice.

Actually I'm not sure whether we want the `impl` block at all since this
provides the same functionality, albeit in a slightly less cosmetic way. Can
core R4L folks provide guidance about that?

Cheers,
Alex.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-02  4:57   ` Alexandre Courbot
@ 2025-05-02 19:59     ` Joel Fernandes
  2025-05-03  1:59       ` Alexandre Courbot
  0 siblings, 1 reply; 41+ messages in thread
From: Joel Fernandes @ 2025-05-02 19:59 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hello, Alex,

On 5/2/2025 12:57 AM, Alexandre Courbot wrote:
> On Thu May 1, 2025 at 9:58 PM JST, Alexandre Courbot wrote:
>> From: Joel Fernandes <joelagnelf@nvidia.com>
>>
>> This will be used in the nova-core driver where we need to upward-align
>> the image size to get to the next image in the VBIOS ROM.
>>
>> [acourbot@nvidia.com: handled conflicts due to removal of patch creating
>> num.rs]
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  rust/kernel/lib.rs |  1 +
>>  rust/kernel/num.rs | 21 +++++++++++++++++++++
>>  2 files changed, 22 insertions(+)
>>
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index ab0286857061d2de1be0279cbd2cd3490e5a48c3..be75b196aa7a29cf3eed7c902ed8fb98689bbb50 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -67,6 +67,7 @@
>>  pub mod miscdevice;
>>  #[cfg(CONFIG_NET)]
>>  pub mod net;
>> +pub mod num;
>>  pub mod of;
>>  pub mod page;
>>  #[cfg(CONFIG_PCI)]
>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..953c6ab012601efb3c9387b4b299e22233670c4b
>> --- /dev/null
>> +++ b/rust/kernel/num.rs
>> @@ -0,0 +1,21 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Numerical and binary utilities for primitive types.
>> +
>> +/// A trait providing alignment operations for `usize`.
>> +pub trait UsizeAlign {
>> +    /// Aligns `self` upwards to the nearest multiple of `align`.
>> +    fn align_up(self, align: usize) -> usize;
>> +}
> 
> As it turns out I will also need the same functionality for u64 in a
> future patch. :) Could we turn this into a more generic trait? E.g:
> 
>     /// A trait providing alignment operations for `usize`.
>     pub trait Align {
>         /// Aligns `self` upwards to the nearest multiple of `align`.
>         fn align_up(self, align: Self) -> Self;
>     }
> 
> That way it can be implemented for all basic types. I'd suggest having a local
> macro that takes an arbitrary number of arguments and generates the impl blocks
> for each of them, which would be invoked as:
> 
>     impl_align!(i8, u8, i16, u16, ...);

I actually tried this initially before settling for the simpler solution.

We can do this in 3 ways I think, generics, macros or both.

Unlike the last attempt, this time I did get generics to work. So how about this?

use core::ops::{Add, Sub, BitAnd, Not};

pub trait AlignUp {
    fn align_up(self, alignment: Self) -> Self;
}

impl<T> AlignUp for T
where
    T: Copy
        + Add<Output = T>
        + Sub<Output = T>
        + BitAnd<Output = T>
        + Not<Output = T>
        + From<u8>,
{
    fn align_up(self, alignment: Self) -> Self {
        let one = T::from(1u8);
        (self + alignment - one) & !(alignment - one)
    }
}

I know you must be screaming the word monomorphization, but it is clean code and
I'm willing to see just how much code the compiler generates and does not
requiring specifying any specific types at all!

This also could be simpler if we had num_traits in r4L like userspace, because
num_trait's PrimInt encapsulates all the needed ops.

use num_traits::PrimInt;

pub trait RoundUp {
    fn round_up(self, alignment: Self) -> Self;
}

impl<T> RoundUp for T
where
    T: PrimInt,
{
    fn round_up(self, alignment: Self) -> Self {
        (self + alignment - T::one()) & !(alignment - T::one())
    }
}

fn main() {
    let x: u32 = 1234;
    let y: usize = 1234;

    // Output 1536
    println!("u32: {}", x.round_up(512));
    println!("usize: {}", y.round_up(512));
}

For the monomorphization issues, I do wish Rust in general supported restricting
types further so if we could say "where T is u32, usize etc." but it does not
afaics, so maybe we can do this then?

/// This bit can go into separate module if we want to call it
/// 'num_traits' something.

ppub trait Alignable:
    Copy
    + Add<Output = Self>
    + Sub<Output = Self>
    + BitAnd<Output = Self>
    + Not<Output = Self>
    + From<u8>
{
}

impl Alignable for usize {}
impl Alignable for u8 {}
impl Alignable for u16 {}
impl Alignable for u32 {}
impl Alignable for u64 {}
impl Alignable for u128 {}

//////////////////////

And then num.rs remains simple but restricted to those types:

pub trait AlignUp {
    fn align_up(self, alignment: Self) -> Self;
}

impl<T> AlignUp for T
where
    T: Alignable ,
{
    fn align_up(self, alignment: Self) -> Self {
        let one = T::from(1u8);
        (self + alignment - one) & !(alignment - one)
    }
}

If we dislike that, we could go with the pure macro implementation as well. But
I do like the 'num_traits' approach better, since it keeps the align_up
implementation quite clean.

pub trait AlignUp {
    fn align_up(self, alignment: Self) -> Self;
}

macro_rules! align_up_impl {
    ($($t:ty),+) => {
        $(
            impl AlignUp for $t {
                fn align_up(self, alignment: Self) -> Self {
                    (self + alignment - 1) & !(alignment - 1)
                }
            }
        )+
    }
}

align_up_impl!(usize, u8, u16, u32, u64, u128);

Or, we can even combine the 2 approaches. Use macros for the "impl Alignable"
and use generics on the Alignable trait.

macro_rules! impl_alignable {
    ($($t:ty),+) => {
        $(
            impl Alignable for $t {}
        )+
    };
}

impl_alignable!(usize, u8, u16, u32, u64, u128);

pub trait AlignUp {
    fn align_up(self, alignment: Self) -> Self;
}

impl<T> AlignUp for T
where
    T: Alignable,
{
    fn align_up(self, alignment: Self) -> Self {
        let one = T::from(1u8);
        (self + alignment - one) & !(alignment - one)
    }
}

Thoughts?

>
>> +impl UsizeAlign for usize {
>> +    fn align_up(mut self, align: usize) -> usize {
>> +        self = (self + align - 1) & !(align - 1);
>> +        self
>> +    }
>> +}
> 
> Note that there is no need to mutate `self`, the body can just be:
> 
>     (self + align - 1) & !(align - 1)

Sure, that's fair enough. I am Ok with either.

> 
> This operation can trigger overflows/underflows, so I guess for safety it
> should return a `Result` and use `checked_add` and `checked_sub`, after moving
> `align - 1` into a local variable to avoid checking it twice.
> 
> There is also the interesting question of, what if `align` is not a power of 2.
> We should probably document the fact that it is expected to be.

Good idea, makes sense to return Result and also check for power-of-2.

> 
>> +/// Aligns `val` upwards to the nearest multiple of `align`.
>> +pub const fn usize_align_up(val: usize, align: usize) -> usize {
>> +    (val + align - 1) & !(align - 1)
>> +}
> 
> Let's add a statement in the documentation saying that this exists for use in
> `const` contexts. The `impl` version can maybe call this one directly to avoid
> repeating the same operation twice.

Sure.

> Actually I'm not sure whether we want the `impl` block at all since this>
provides the same functionality, albeit in a slightly less cosmetic way. Can
> core R4L folks provide guidance about that?

Yeah I am Ok with this as well, however as you noted it is less cosmetic.

To clarify for r4l folks, Alex is asking do we add an impl on the types for
aligning them up as done by this patch, or do we provide a bunch of helpers
(possibly using generics or macros to keep the code clean)?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module
  2025-05-01 12:58 ` [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
@ 2025-05-02 21:14   ` Timur Tabi
  2025-05-07 13:42     ` Alexandre Courbot
  0 siblings, 1 reply; 41+ messages in thread
From: Timur Tabi @ 2025-05-02 21:14 UTC (permalink / raw)
  To: dakr@kernel.org, a.hindborg@kernel.org, ojeda@kernel.org,
	boqun.feng@gmail.com, simona@ffwll.ch, tmgross@umich.edu,
	alex.gaynor@gmail.com, tzimmermann@suse.de, corbet@lwn.net,
	mripard@kernel.org, maarten.lankhorst@linux.intel.com,
	benno.lossin@proton.me, bjorn3_gh@protonmail.com,
	airlied@gmail.com, aliceryhl@google.com, Alexandre Courbot,
	gary@garyguo.net
  Cc: Alistair Popple, John Hubbard, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Joel Fernandes, Ben Skeggs

On Thu, 2025-05-01 at 21:58 +0900, Alexandre Courbot wrote:
> +pub(crate) const FIRMWARE_VERSION: &str = "535.113.01";

This needs to change to 570.144.  You can find images to use here:

https://github.com/NVIDIA/linux-firmware/commits/nvidia-staging/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-02 19:59     ` Joel Fernandes
@ 2025-05-03  1:59       ` Alexandre Courbot
  2025-05-03  3:02         ` Joel Fernandes
  0 siblings, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-03  1:59 UTC (permalink / raw)
  To: Joel Fernandes, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sat May 3, 2025 at 4:59 AM JST, Joel Fernandes wrote:
> Hello, Alex,
>
> On 5/2/2025 12:57 AM, Alexandre Courbot wrote:
>> On Thu May 1, 2025 at 9:58 PM JST, Alexandre Courbot wrote:
>>> From: Joel Fernandes <joelagnelf@nvidia.com>
>>>
>>> This will be used in the nova-core driver where we need to upward-align
>>> the image size to get to the next image in the VBIOS ROM.
>>>
>>> [acourbot@nvidia.com: handled conflicts due to removal of patch creating
>>> num.rs]
>>>
>>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>>  rust/kernel/lib.rs |  1 +
>>>  rust/kernel/num.rs | 21 +++++++++++++++++++++
>>>  2 files changed, 22 insertions(+)
>>>
>>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>>> index ab0286857061d2de1be0279cbd2cd3490e5a48c3..be75b196aa7a29cf3eed7c902ed8fb98689bbb50 100644
>>> --- a/rust/kernel/lib.rs
>>> +++ b/rust/kernel/lib.rs
>>> @@ -67,6 +67,7 @@
>>>  pub mod miscdevice;
>>>  #[cfg(CONFIG_NET)]
>>>  pub mod net;
>>> +pub mod num;
>>>  pub mod of;
>>>  pub mod page;
>>>  #[cfg(CONFIG_PCI)]
>>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..953c6ab012601efb3c9387b4b299e22233670c4b
>>> --- /dev/null
>>> +++ b/rust/kernel/num.rs
>>> @@ -0,0 +1,21 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Numerical and binary utilities for primitive types.
>>> +
>>> +/// A trait providing alignment operations for `usize`.
>>> +pub trait UsizeAlign {
>>> +    /// Aligns `self` upwards to the nearest multiple of `align`.
>>> +    fn align_up(self, align: usize) -> usize;
>>> +}
>> 
>> As it turns out I will also need the same functionality for u64 in a
>> future patch. :) Could we turn this into a more generic trait? E.g:
>> 
>>     /// A trait providing alignment operations for `usize`.
>>     pub trait Align {
>>         /// Aligns `self` upwards to the nearest multiple of `align`.
>>         fn align_up(self, align: Self) -> Self;
>>     }
>> 
>> That way it can be implemented for all basic types. I'd suggest having a local
>> macro that takes an arbitrary number of arguments and generates the impl blocks
>> for each of them, which would be invoked as:
>> 
>>     impl_align!(i8, u8, i16, u16, ...);
>
> I actually tried this initially before settling for the simpler solution.
>
> We can do this in 3 ways I think, generics, macros or both.
>
> Unlike the last attempt, this time I did get generics to work. So how about this?
>
> use core::ops::{Add, Sub, BitAnd, Not};
>
> pub trait AlignUp {
>     fn align_up(self, alignment: Self) -> Self;
> }
>
> impl<T> AlignUp for T
> where
>     T: Copy
>         + Add<Output = T>
>         + Sub<Output = T>
>         + BitAnd<Output = T>
>         + Not<Output = T>
>         + From<u8>,
> {
>     fn align_up(self, alignment: Self) -> Self {
>         let one = T::from(1u8);
>         (self + alignment - one) & !(alignment - one)
>     }
> }
>
> I know you must be screaming the word monomorphization, but it is clean code and
> I'm willing to see just how much code the compiler generates and does not
> requiring specifying any specific types at all!

No, I think this is great - monomorphization only happens as the code is
used, so it won't be a problem. And the compiler should just inline that
anyway (let's add an `#[inline]` to be sure although I'm not convinced
it is entirely necessary).

>
> This also could be simpler if we had num_traits in r4L like userspace, because
> num_trait's PrimInt encapsulates all the needed ops.
>
> use num_traits::PrimInt;
>
> pub trait RoundUp {
>     fn round_up(self, alignment: Self) -> Self;
> }
>
> impl<T> RoundUp for T
> where
>     T: PrimInt,
> {
>     fn round_up(self, alignment: Self) -> Self {
>         (self + alignment - T::one()) & !(alignment - T::one())
>     }
> }
>
> fn main() {
>     let x: u32 = 1234;
>     let y: usize = 1234;
>
>     // Output 1536
>     println!("u32: {}", x.round_up(512));
>     println!("usize: {}", y.round_up(512));
> }
>
> For the monomorphization issues, I do wish Rust in general supported restricting
> types further so if we could say "where T is u32, usize etc." but it does not
> afaics, so maybe we can do this then?
>
> /// This bit can go into separate module if we want to call it
> /// 'num_traits' something.
>
> ppub trait Alignable:
>     Copy
>     + Add<Output = Self>
>     + Sub<Output = Self>
>     + BitAnd<Output = Self>
>     + Not<Output = Self>
>     + From<u8>
> {
> }
>
> impl Alignable for usize {}
> impl Alignable for u8 {}
> impl Alignable for u16 {}
> impl Alignable for u32 {}
> impl Alignable for u64 {}
> impl Alignable for u128 {}
>
> //////////////////////
>
> And then num.rs remains simple but restricted to those types:
>
> pub trait AlignUp {
>     fn align_up(self, alignment: Self) -> Self;
> }
>
> impl<T> AlignUp for T
> where
>     T: Alignable ,
> {
>     fn align_up(self, alignment: Self) -> Self {
>         let one = T::from(1u8);
>         (self + alignment - one) & !(alignment - one)
>     }
> }
>
> If we dislike that, we could go with the pure macro implementation as well. But
> I do like the 'num_traits' approach better, since it keeps the align_up
> implementation quite clean.

This looks very rust-y and I like it. I guess Alignable (maybe with a
more generic name because I suspect these properties can be used for
other things than aligning as well) could be in the `num` module without
any problem.

>
> pub trait AlignUp {
>     fn align_up(self, alignment: Self) -> Self;
> }
>
> macro_rules! align_up_impl {
>     ($($t:ty),+) => {
>         $(
>             impl AlignUp for $t {
>                 fn align_up(self, alignment: Self) -> Self {
>                     (self + alignment - 1) & !(alignment - 1)
>                 }
>             }
>         )+
>     }
> }
>
> align_up_impl!(usize, u8, u16, u32, u64, u128);
>
> Or, we can even combine the 2 approaches. Use macros for the "impl Alignable"
> and use generics on the Alignable trait.
>
> macro_rules! impl_alignable {
>     ($($t:ty),+) => {
>         $(
>             impl Alignable for $t {}
>         )+
>     };
> }
>
> impl_alignable!(usize, u8, u16, u32, u64, u128);
>
> pub trait AlignUp {
>     fn align_up(self, alignment: Self) -> Self;
> }
>
> impl<T> AlignUp for T
> where
>     T: Alignable,
> {
>     fn align_up(self, alignment: Self) -> Self {
>         let one = T::from(1u8);
>         (self + alignment - one) & !(alignment - one)
>     }
> }
>
> Thoughts?

I think that's the correct way to do it and am fully on board with this
approach.

The only thing this doesn't solve is that it doesn't provide `const`
functions. But maybe for that purpose we can use a single macro that
nicely panics at build-time should an overflow occur.

Cheers,
Alex.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-03  1:59       ` Alexandre Courbot
@ 2025-05-03  3:02         ` Joel Fernandes
  2025-05-03 14:37           ` Alexandre Courbot
  2025-05-03 14:47           ` Alexandre Courbot
  0 siblings, 2 replies; 41+ messages in thread
From: Joel Fernandes @ 2025-05-03  3:02 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel



On 5/2/2025 9:59 PM, Alexandre Courbot wrote:
>> pub trait AlignUp {
>>     fn align_up(self, alignment: Self) -> Self;
>> }
>>
>> macro_rules! align_up_impl {
>>     ($($t:ty),+) => {
>>         $(
>>             impl AlignUp for $t {
>>                 fn align_up(self, alignment: Self) -> Self {
>>                     (self + alignment - 1) & !(alignment - 1)
>>                 }
>>             }
>>         )+
>>     }
>> }
>>
>> align_up_impl!(usize, u8, u16, u32, u64, u128);
>>
>> Or, we can even combine the 2 approaches. Use macros for the "impl Alignable"
>> and use generics on the Alignable trait.
>>
>> macro_rules! impl_alignable {
>>     ($($t:ty),+) => {
>>         $(
>>             impl Alignable for $t {}
>>         )+
>>     };
>> }
>>
>> impl_alignable!(usize, u8, u16, u32, u64, u128);
>>
>> pub trait AlignUp {
>>     fn align_up(self, alignment: Self) -> Self;
>> }
>>
>> impl<T> AlignUp for T
>> where
>>     T: Alignable,
>> {
>>     fn align_up(self, alignment: Self) -> Self {
>>         let one = T::from(1u8);
>>         (self + alignment - one) & !(alignment - one)
>>     }
>> }
>>
>> Thoughts?
> I think that's the correct way to do it and am fully on board with this
> approach.
> 
> The only thing this doesn't solve is that it doesn't provide `const`
> functions. But maybe for that purpose we can use a single macro that
> nicely panics at build-time should an overflow occur.

Great, thanks. I split the traits as follows and it is cleaner and works. I will
look into the build-time overflow check and the returning of Result change on
Monday. Let me know if any objections. I added the #[inline] and hopefully that
gives similar benefits to const that you're seeking:

use core::ops::{BitAnd, BitOr, Not, Add, Sub};
pub trait BitOps:
    Copy
    + BitAnd<Output = Self>
    + BitOr<Output = Self>
    + Not<Output = Self>
{
}

pub trait Unsigned:
    Copy
    + Add<Output = Self>
    + Sub<Output = Self>
    + From<u8>
{
}

macro_rules! impl_unsigned_traits {
    ($($t:ty),+) => {
        $(
            impl Unsigned for $t {}
            impl BitOps for $t {}
        )+
    };
}

impl_unsigned_traits!(usize, u8, u16, u32, u64, u128);

pub trait AlignUp {
    fn align_up(self, alignment: Self) -> Self;
}

impl<T> AlignUp for T
where
    T: BitOps + Unsigned,
{
    #[inline]
    fn align_up(self, alignment: Self) -> Self {
        let one = T::from(1u8);
        (self + alignment - one) & !(alignment - one)
    }
}

Thanks.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-03  3:02         ` Joel Fernandes
@ 2025-05-03 14:37           ` Alexandre Courbot
  2025-05-05 15:25             ` Joel Fernandes
  2025-05-03 14:47           ` Alexandre Courbot
  1 sibling, 1 reply; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-03 14:37 UTC (permalink / raw)
  To: Joel Fernandes, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sat May 3, 2025 at 12:02 PM JST, Joel Fernandes wrote:
>
>
> On 5/2/2025 9:59 PM, Alexandre Courbot wrote:
>>> pub trait AlignUp {
>>>     fn align_up(self, alignment: Self) -> Self;
>>> }
>>>
>>> macro_rules! align_up_impl {
>>>     ($($t:ty),+) => {
>>>         $(
>>>             impl AlignUp for $t {
>>>                 fn align_up(self, alignment: Self) -> Self {
>>>                     (self + alignment - 1) & !(alignment - 1)
>>>                 }
>>>             }
>>>         )+
>>>     }
>>> }
>>>
>>> align_up_impl!(usize, u8, u16, u32, u64, u128);
>>>
>>> Or, we can even combine the 2 approaches. Use macros for the "impl Alignable"
>>> and use generics on the Alignable trait.
>>>
>>> macro_rules! impl_alignable {
>>>     ($($t:ty),+) => {
>>>         $(
>>>             impl Alignable for $t {}
>>>         )+
>>>     };
>>> }
>>>
>>> impl_alignable!(usize, u8, u16, u32, u64, u128);
>>>
>>> pub trait AlignUp {
>>>     fn align_up(self, alignment: Self) -> Self;
>>> }
>>>
>>> impl<T> AlignUp for T
>>> where
>>>     T: Alignable,
>>> {
>>>     fn align_up(self, alignment: Self) -> Self {
>>>         let one = T::from(1u8);
>>>         (self + alignment - one) & !(alignment - one)
>>>     }
>>> }
>>>
>>> Thoughts?
>> I think that's the correct way to do it and am fully on board with this
>> approach.
>> 
>> The only thing this doesn't solve is that it doesn't provide `const`
>> functions. But maybe for that purpose we can use a single macro that
>> nicely panics at build-time should an overflow occur.
>
> Great, thanks. I split the traits as follows and it is cleaner and works. I will
> look into the build-time overflow check and the returning of Result change on
> Monday. Let me know if any objections.

Looking good IMHO, apart maybe from the names of the `BitOps` and
`Unsigned` traits that are not super descriptive and don't need to be
split for the moment anyway.

Actually it may be a good idea to move this into its own patch/series so
it gets more attention as this is starting to look like the `num` or
`num_integer` crates and we might be well-advised to take more
inspiration from them in order to avoid reinventing the wheel. It is
basically asking the question "how do we want to extend the integer
types in a useful way for the kernel", so it's actually pretty important
that we get our answer right. :)

To address our immediate needs of an `align_up`, it just occurred to me
that we could simply use the `next_multiple_of` method, at least
temporarily. It is implemented with a modulo and will therefore probably
result in less efficient code than a version optimized for powers of
two, but it will do the trick until we figure out how we want to extend
the primitive types for the kernel, which is really what this patch is
about - we will also need an `align_down` for instance, and I don't know
of a standard library equivalent for it...

> I added the #[inline] and hopefully that
> gives similar benefits to const that you're seeking:

A `const` version is still going to be needed, `#[inline]` encourages the
compiler to try and inline the function, but AFAIK it doesn't allow use
in const context.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-03  3:02         ` Joel Fernandes
  2025-05-03 14:37           ` Alexandre Courbot
@ 2025-05-03 14:47           ` Alexandre Courbot
  1 sibling, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-03 14:47 UTC (permalink / raw)
  To: Joel Fernandes, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sat May 3, 2025 at 12:02 PM JST, Joel Fernandes wrote:
>
>
> On 5/2/2025 9:59 PM, Alexandre Courbot wrote:
>>> pub trait AlignUp {
>>>     fn align_up(self, alignment: Self) -> Self;
>>> }
>>>
>>> macro_rules! align_up_impl {
>>>     ($($t:ty),+) => {
>>>         $(
>>>             impl AlignUp for $t {
>>>                 fn align_up(self, alignment: Self) -> Self {
>>>                     (self + alignment - 1) & !(alignment - 1)
>>>                 }
>>>             }
>>>         )+
>>>     }
>>> }
>>>
>>> align_up_impl!(usize, u8, u16, u32, u64, u128);
>>>
>>> Or, we can even combine the 2 approaches. Use macros for the "impl Alignable"
>>> and use generics on the Alignable trait.
>>>
>>> macro_rules! impl_alignable {
>>>     ($($t:ty),+) => {
>>>         $(
>>>             impl Alignable for $t {}
>>>         )+
>>>     };
>>> }
>>>
>>> impl_alignable!(usize, u8, u16, u32, u64, u128);
>>>
>>> pub trait AlignUp {
>>>     fn align_up(self, alignment: Self) -> Self;
>>> }
>>>
>>> impl<T> AlignUp for T
>>> where
>>>     T: Alignable,
>>> {
>>>     fn align_up(self, alignment: Self) -> Self {
>>>         let one = T::from(1u8);
>>>         (self + alignment - one) & !(alignment - one)
>>>     }
>>> }
>>>
>>> Thoughts?
>> I think that's the correct way to do it and am fully on board with this
>> approach.
>> 
>> The only thing this doesn't solve is that it doesn't provide `const`
>> functions. But maybe for that purpose we can use a single macro that
>> nicely panics at build-time should an overflow occur.
>
> Great, thanks. I split the traits as follows and it is cleaner and works. I will
> look into the build-time overflow check and the returning of Result change on
> Monday. Let me know if any objections.

Looking good IMHO, apart maybe from the names of the `BitOps` and
`Unsigned` traits that are not super descriptive and don't need to be
split for the moment anyway.

Actually it may be a good idea to split this into its own patch/series
so it gets more attention as this is starting to look like the `num` or
`num_integer` crates and we might be advised to take more inspiration
from them in order to avoid reinventing the wheel.

To address our immediate needs of an `align_up`, it just occurred to me
that we could simply use the `next_multiple_of` method, at least
temporarily. It is implemented with a modulo and will therefore probably
result in less efficient code than a version optimized for powers of
two, but it will do the trick until we figure out how we want to extend
the primitive types for the kernel, which is really what this patch is
about - we will also need an `align_down` for instance.

> I added the #[inline] and hopefully that
> gives similar benefits to const that you're seeking:

A `const` version is still going to be needed, `#[inline]` encourages the
compiler to try and inline the function, but AFAIK it doesn't allow use
in const context.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-03 14:37           ` Alexandre Courbot
@ 2025-05-05 15:25             ` Joel Fernandes
  2025-05-07 14:11               ` Alexandre Courbot
  0 siblings, 1 reply; 41+ messages in thread
From: Joel Fernandes @ 2025-05-05 15:25 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hello, Alexandre,

On 5/3/2025 10:37 AM, Alexandre Courbot wrote:
> On Sat May 3, 2025 at 12:02 PM JST, Joel Fernandes wrote:
>>
>>
>> On 5/2/2025 9:59 PM, Alexandre Courbot wrote:
>>>> pub trait AlignUp {
>>>>     fn align_up(self, alignment: Self) -> Self;
>>>> }
>>>>
>>>> macro_rules! align_up_impl {
>>>>     ($($t:ty),+) => {
>>>>         $(
>>>>             impl AlignUp for $t {
>>>>                 fn align_up(self, alignment: Self) -> Self {
>>>>                     (self + alignment - 1) & !(alignment - 1)
>>>>                 }
>>>>             }
>>>>         )+
>>>>     }
>>>> }
>>>>
>>>> align_up_impl!(usize, u8, u16, u32, u64, u128);
>>>>
>>>> Or, we can even combine the 2 approaches. Use macros for the "impl Alignable"
>>>> and use generics on the Alignable trait.
>>>>
>>>> macro_rules! impl_alignable {
>>>>     ($($t:ty),+) => {
>>>>         $(
>>>>             impl Alignable for $t {}
>>>>         )+
>>>>     };
>>>> }
>>>>
>>>> impl_alignable!(usize, u8, u16, u32, u64, u128);
>>>>
>>>> pub trait AlignUp {
>>>>     fn align_up(self, alignment: Self) -> Self;
>>>> }
>>>>
>>>> impl<T> AlignUp for T
>>>> where
>>>>     T: Alignable,
>>>> {
>>>>     fn align_up(self, alignment: Self) -> Self {
>>>>         let one = T::from(1u8);
>>>>         (self + alignment - one) & !(alignment - one)
>>>>     }
>>>> }
>>>>
>>>> Thoughts?
>>> I think that's the correct way to do it and am fully on board with this
>>> approach.
>>>
>>> The only thing this doesn't solve is that it doesn't provide `const`
>>> functions. But maybe for that purpose we can use a single macro that
>>> nicely panics at build-time should an overflow occur.
>>
>> Great, thanks. I split the traits as follows and it is cleaner and works. I will
>> look into the build-time overflow check and the returning of Result change on
>> Monday. Let me know if any objections.
> 
> Looking good IMHO, apart maybe from the names of the `BitOps` and
> `Unsigned` traits that are not super descriptive and don't need to be
> split for the moment anyway.

Sounds good, actually I already switched to keeping them in one trait
"Unsigned". I agree that is cleaner (see below).

> Actually it may be a good idea to move this into its own patch/series so
> it gets more attention as this is starting to look like the `num` or
> `num_integer` crates and we might be well-advised to take more
> inspiration from them in order to avoid reinventing the wheel. It is
> basically asking the question "how do we want to extend the integer
> types in a useful way for the kernel", so it's actually pretty important
> that we get our answer right. :)

I am not sure if we want to split the series for a simple change like this,
because then the whole series gets blocked? It may also be better to pair the
user of the function with the function itself IMHO since the function is also
quite small. I am also Ok with keeping the original patch in the series and
extending on that in the future (with just usize) to not block the series.

Regarding for the full blown num module, I looked over the weekend and its
actually a bunch of modules working together, with dozens of numeric APIs, so I
am not sure if we should pull everything or try to copy parts of it. The R4l
guidelines have something to say here. A good approach IMO is to just do it
incrementally, like I'm doing with this patch.

I think defining a "Unsigned" trait does make sense, and then for future
expansion, it can be expanded on in the new num module?

> 
> To address our immediate needs of an `align_up`, it just occurred to me
> that we could simply use the `next_multiple_of` method, at least
> temporarily. It is implemented with a modulo and will therefore probably
> result in less efficient code than a version optimized for powers of
> two, but it will do the trick until we figure out how we want to extend
> the primitive types for the kernel, which is really what this patch is
> about - we will also need an `align_down` for instance, and I don't know
> of a standard library equivalent for it...

Why do we want to trade off for "less efficient code"? :) I think that's worse
than the original change (before this series) I had which had no function call
at all, but hardcoded the expression at the call site. The suggestion is also
less desirable than having a local helper in the vbios module itself. I am not
much a fan of the idea "lets call this temporarily and have sub optimal code"
when the alternative is to just do it in-place, in-module, or via a num module
extension :)

> 
>> I added the #[inline] and hopefully that
>> gives similar benefits to const that you're seeking:
> 
> A `const` version is still going to be needed, `#[inline]` encourages the
> compiler to try and inline the function, but AFAIK it doesn't allow use
> in const context.

Right, so for the vbios use case there is no use of a const function. The only
reason I added it is because there were other functions at the time which were
used (by the now dropped timer module). I suggest let us add the const function
once there is a user of it, I also don't know right how to do it. Like if I use
generics for the const fn, I get this:

const fn align_up_unsigned<T: Unsigned>(value: T, alignment: T) -> T {
    let one = T::from(1u8);
    (value + alignment - one) & !(alignment - one)
}

error[E0658]: cannot call conditionally-const method `<T as Add>::add` in
constant functions

I tried to do this with macros as well, but no luck. If you can share a macro, I
can incorporate it into the patch.

I can respin this patch again on conclusion of the discussion, but any guidance
from rust-for-linux folks is also much appreciated. Below is currently the patch
that I am considering so far (without the const function and Result changes).

// num.rs
//! Numerical and binary utilities for primitive types.

/// A trait providing alignment operations for `usize`.
use core::ops::{Add, BitAnd, BitOr, Not, Sub};

/// Traits for unsigned integers
pub trait Unsigned:
    Copy
    + BitAnd<Output = Self>
    + BitOr<Output = Self>
    + Not<Output = Self>
    + Add<Output = Self>
    + Sub<Output = Self>
    + From<u8>
{
}

macro_rules! unsigned_trait_impl {
    ($($t:ty),+) => {
        $(
            impl Unsigned for $t {}
        )+
    };
}
unsigned_trait_impl!(usize, u8, u16, u32, u64, u128);

/// Trait for unsigned integer alignment
pub trait UnsignedAlign {
    /// Implement upward power-of-2 alignment for unsigned ints
    fn align_up(self, alignment: Self) -> Self;
}

impl<T> UnsignedAlign for T
where
    T: Unsigned,
{
    #[inline]
    fn align_up(self, alignment: Self) -> Self {
        let one = T::from(1u8);
        (self + alignment - one) & !(alignment - one)
    }
}

Thanks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module
  2025-05-02 21:14   ` Timur Tabi
@ 2025-05-07 13:42     ` Alexandre Courbot
  0 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-07 13:42 UTC (permalink / raw)
  To: Timur Tabi, dakr@kernel.org, a.hindborg@kernel.org,
	ojeda@kernel.org, boqun.feng@gmail.com, simona@ffwll.ch,
	tmgross@umich.edu, alex.gaynor@gmail.com, tzimmermann@suse.de,
	corbet@lwn.net, mripard@kernel.org,
	maarten.lankhorst@linux.intel.com, benno.lossin@proton.me,
	bjorn3_gh@protonmail.com, airlied@gmail.com, aliceryhl@google.com,
	gary@garyguo.net
  Cc: Alistair Popple, John Hubbard, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Joel Fernandes, Ben Skeggs

On Sat May 3, 2025 at 6:14 AM JST, Timur Tabi wrote:
> On Thu, 2025-05-01 at 21:58 +0900, Alexandre Courbot wrote:
>> +pub(crate) const FIRMWARE_VERSION: &str = "535.113.01";
>
> This needs to change to 570.144.  You can find images to use here:
>
> https://github.com/NVIDIA/linux-firmware/commits/nvidia-staging/

I have the patch ready for that, just holding on to it for now because
nothing it done with the loaded firmware yet, and getting r570 adds one
extra step to testing this patchset.

Once we start loading and running Booter I will switch the version to
r570.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize
  2025-05-05 15:25             ` Joel Fernandes
@ 2025-05-07 14:11               ` Alexandre Courbot
  0 siblings, 0 replies; 41+ messages in thread
From: Alexandre Courbot @ 2025-05-07 14:11 UTC (permalink / raw)
  To: Joel Fernandes, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue May 6, 2025 at 12:25 AM JST, Joel Fernandes wrote:
>> Actually it may be a good idea to move this into its own patch/series so
>> it gets more attention as this is starting to look like the `num` or
>> `num_integer` crates and we might be well-advised to take more
>> inspiration from them in order to avoid reinventing the wheel. It is
>> basically asking the question "how do we want to extend the integer
>> types in a useful way for the kernel", so it's actually pretty important
>> that we get our answer right. :)
>
> I am not sure if we want to split the series for a simple change like this,
> because then the whole series gets blocked? It may also be better to pair the
> user of the function with the function itself IMHO since the function is also
> quite small. I am also Ok with keeping the original patch in the series and
> extending on that in the future (with just usize) to not block the series.
>
> Regarding for the full blown num module, I looked over the weekend and its
> actually a bunch of modules working together, with dozens of numeric APIs, so I
> am not sure if we should pull everything or try to copy parts of it. The R4l
> guidelines have something to say here. A good approach IMO is to just do it
> incrementally, like I'm doing with this patch.
>
> I think defining a "Unsigned" trait does make sense, and then for future
> expansion, it can be expanded on in the new num module?

Yeah maybe I was looking too far ahead. This can definitely grow
gradually.

>> To address our immediate needs of an `align_up`, it just occurred to me
>> that we could simply use the `next_multiple_of` method, at least
>> temporarily. It is implemented with a modulo and will therefore probably
>> result in less efficient code than a version optimized for powers of
>> two, but it will do the trick until we figure out how we want to extend
>> the primitive types for the kernel, which is really what this patch is
>> about - we will also need an `align_down` for instance, and I don't know
>> of a standard library equivalent for it...
>
> Why do we want to trade off for "less efficient code"? :) I think that's worse
> than the original change (before this series) I had which had no function call
> at all, but hardcoded the expression at the call site. The suggestion is also
> less desirable than having a local helper in the vbios module itself. I am not
> much a fan of the idea "lets call this temporarily and have sub optimal code"
> when the alternative is to just do it in-place, in-module, or via a num module
> extension :)

`next_multiple_of` has the benefit of returning the correct result even
for non-powers of 2, but at the same time trying to align to something
that is not a power of 2 is probably a defect in the code itself. ^_^;

Another reason for not using it is to have things properly named, so
agreed that an extension trait with the functionality we need, with a
name that clearly carries our intent and implemented as efficiently as
the C equivalent is better than reusing standard library methods that
happen to provide the correct result.

>>> I added the #[inline] and hopefully that
>>> gives similar benefits to const that you're seeking:
>> 
>> A `const` version is still going to be needed, `#[inline]` encourages the
>> compiler to try and inline the function, but AFAIK it doesn't allow use
>> in const context.
>
> Right, so for the vbios use case there is no use of a const function. The only
> reason I added it is because there were other functions at the time which were
> used (by the now dropped timer module). I suggest let us add the const function
> once there is a user of it, I also don't know right how to do it. Like if I use
> generics for the const fn, I get this:
>
> const fn align_up_unsigned<T: Unsigned>(value: T, alignment: T) -> T {
>     let one = T::from(1u8);
>     (value + alignment - one) & !(alignment - one)
> }
>
> error[E0658]: cannot call conditionally-const method `<T as Add>::add` in
> constant functions

Interesting, I would expect that to fail but "conditionally-const"?
After looking that up is appears we can constraint a generic type
against a const trait, but that feature is still experimental and not
enabled in the kernel. So agreed, let's consider that later.


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-05-07 14:12 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-01 12:58 [PATCH v2 00/21] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 01/21] rust: devres: allow to borrow a reference to the resource's Device Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 02/21] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 03/21] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 04/21] gpu: nova-core: add missing GA100 definition Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 05/21] gpu: nova-core: take bound device in Gpu::new Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 06/21] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 07/21] gpu: nova-core: fix layout of NV_PMC_BOOT_0 Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 08/21] gpu: nova-core: introduce helper macro for register access Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 09/21] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
2025-05-02 21:14   ` Timur Tabi
2025-05-07 13:42     ` Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 10/21] rust: make ETIMEDOUT error available Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 11/21] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 12/21] gpu: nova-core: add DMA object struct Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 13/21] gpu: nova-core: register sysmem flush page Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 14/21] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 15/21] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
2025-05-01 13:52   ` Joel Fernandes
2025-05-01 14:18     ` Alexandre Courbot
2025-05-01 14:41       ` Joel Fernandes
2025-05-01 12:58 ` [PATCH v2 16/21] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 17/21] rust: num: Add an upward alignment helper for usize Alexandre Courbot
2025-05-01 15:19   ` Timur Tabi
2025-05-01 15:22     ` Joel Fernandes
2025-05-01 15:31       ` Timur Tabi
2025-05-01 15:31         ` Joel Fernandes
2025-05-01 21:02     ` Alexandre Courbot
2025-05-01 21:52       ` Joel Fernandes
2025-05-02  4:57   ` Alexandre Courbot
2025-05-02 19:59     ` Joel Fernandes
2025-05-03  1:59       ` Alexandre Courbot
2025-05-03  3:02         ` Joel Fernandes
2025-05-03 14:37           ` Alexandre Courbot
2025-05-05 15:25             ` Joel Fernandes
2025-05-07 14:11               ` Alexandre Courbot
2025-05-03 14:47           ` Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 18/21] nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 19/21] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 20/21] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
2025-05-01 12:58 ` [PATCH v2 21/21] gpu: nova-core: load and " Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).