dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
@ 2025-04-20 12:19 Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 01/16] rust: add useful ops for u64 Alexandre Courbot
                   ` (16 more replies)
  0 siblings, 17 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot, Sergio González Collado

Hi everyone,

This series is a continuation of my previous RFCs [1] to complete the
first step of GSP booting (running the FWSEC-FRTS firmware extracted
from the BIOS) on Ampere devices. While it is still far from bringing
the GPU into a state where it can do anything useful, it sets up the
basic layout of the driver upon which we can build in order to continue
with the next steps of GSP booting, as well as supporting more chipsets.

Upon successful probe, the driver will display the range of the WPR2
region constructed by FWSEC-FRTS:

  [   95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
  [   95.436002] NovaCore 0000:01:00.0: GPU instance built

This code is based on nova-next with the try_access_with patch [2].

There is still a bit of unsafe code where it is not desired, notably to
transmute byte slices into types that implement FromBytes - this is
because support for doing such transmute operations safely are not in
the kernel crate yet.

[1] https://lore.kernel.org/rust-for-linux/20250320-nova_timer-v3-0-79aa2ad25a79@nvidia.com/
[2] https://lore.kernel.org/rust-for-linux/20250411-try_with-v4-0-f470ac79e2e2@nvidia.com/

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Alexandre Courbot (15):
      rust: add useful ops for u64
      rust: make ETIMEDOUT error available
      gpu: nova-core: derive useful traits for Chipset
      gpu: nova-core: add missing GA100 definition
      gpu: nova-core: take bound device in Gpu::new
      gpu: nova-core: define registers layout using helper macro
      gpu: nova-core: move Firmware to firmware module
      gpu: nova-core: wait for GFW_BOOT completion
      gpu: nova-core: register sysmem flush page
      gpu: nova-core: add basic timer device
      gpu: nova-core: add falcon register definitions and base code
      gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
      gpu: nova-core: compute layout of the FRTS region
      gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
      gpu: nova-core: load and run FWSEC-FRTS

Joel Fernandes (1):
      gpu: nova-core: Add support for VBIOS ucode extraction for boot

 Documentation/gpu/nova/core/todo.rst      |    6 +
 drivers/gpu/nova-core/devinit.rs          |   40 ++
 drivers/gpu/nova-core/dma.rs              |   54 ++
 drivers/gpu/nova-core/driver.rs           |    2 +-
 drivers/gpu/nova-core/falcon.rs           |  466 ++++++++++++
 drivers/gpu/nova-core/falcon/gsp.rs       |   27 +
 drivers/gpu/nova-core/falcon/hal.rs       |   54 ++
 drivers/gpu/nova-core/falcon/hal/ga102.rs |  111 +++
 drivers/gpu/nova-core/falcon/sec2.rs      |    9 +
 drivers/gpu/nova-core/firmware.rs         |   90 ++-
 drivers/gpu/nova-core/firmware/fwsec.rs   |  340 +++++++++
 drivers/gpu/nova-core/gpu.rs              |  211 ++++--
 drivers/gpu/nova-core/gsp.rs              |    3 +
 drivers/gpu/nova-core/gsp/fb.rs           |  109 +++
 drivers/gpu/nova-core/nova_core.rs        |   24 +
 drivers/gpu/nova-core/regs.rs             |  304 ++++++--
 drivers/gpu/nova-core/regs/macros.rs      |  297 ++++++++
 drivers/gpu/nova-core/timer.rs            |  130 ++++
 drivers/gpu/nova-core/vbios.rs            | 1100 +++++++++++++++++++++++++++++
 rust/kernel/error.rs                      |    1 +
 rust/kernel/lib.rs                        |    1 +
 rust/kernel/num.rs                        |   52 ++
 22 files changed, 3347 insertions(+), 84 deletions(-)
---
base-commit: 96609a1969f4ade45351ec368c65580c77592e8b
change-id: 20250417-nova-frts-96ef299abe2c
prerequisite-change-id: 20250313-try_with-cc9f91dd3b60:v4
prerequisite-patch-id: b0c2d08bdea8193307c43c04aa9ff96baf6b00e1
prerequisite-patch-id: b6d1232c2dfef24e4d3f8753a198eb6c427c3486

Best regards,
-- 
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/16] rust: add useful ops for u64
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 02/16] rust: make ETIMEDOUT error available Alexandre Courbot
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot, Sergio González Collado

It is common to build a u64 from its high and low parts obtained from
two 32-bit registers. Conversely, it is also common to split a u64 into
two u32s to write them into registers. Add an extension trait for u64
that implement these methods in a new `num` module.

It is expected that this trait will be extended with other useful
operations, and similar extension traits implemented for other types.

Reviewed-by: Sergio González Collado <sergio.collado@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/num.rs | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 55a8dfeece0b27f188456a9eaebd1045c4cafbcb..e30d2c075a3607f6ea40c901b3281e8798e81260 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -65,6 +65,7 @@
 pub mod miscdevice;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod num;
 pub mod of;
 pub mod page;
 #[cfg(CONFIG_PCI)]
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index 0000000000000000000000000000000000000000..9b93db6528eef131fb74c1289f1e152cc2a13168
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+/// Useful operations for `u64`.
+pub trait U64Ext {
+    /// Build a `u64` by combining its `high` and `low` parts.
+    ///
+    /// ```
+    /// use kernel::num::U64Ext;
+    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
+    /// ```
+    fn from_u32s(high: u32, low: u32) -> Self;
+
+    /// Returns the upper 32 bits of `self`.
+    fn upper_32_bits(self) -> u32;
+
+    /// Returns the lower 32 bits of `self`.
+    fn lower_32_bits(self) -> u32;
+}
+
+impl U64Ext for u64 {
+    fn from_u32s(high: u32, low: u32) -> Self {
+        ((high as u64) << u32::BITS) | low as u64
+    }
+
+    fn upper_32_bits(self) -> u32 {
+        (self >> u32::BITS) as u32
+    }
+
+    fn lower_32_bits(self) -> u32 {
+        self as u32
+    }
+}
+
+/// Same as [`U64Ext::upper_32_bits`], but defined outside of the trait so it can be used in a
+/// `const` context.
+pub const fn upper_32_bits(v: u64) -> u32 {
+    (v >> u32::BITS) as u32
+}
+
+/// Same as [`U64Ext::lower_32_bits`], but defined outside of the trait so it can be used in a
+/// `const` context.
+pub const fn lower_32_bits(v: u64) -> u32 {
+    v as u32
+}
+
+/// Same as [`U64Ext::from_u32s`], but defined outside of the trait so it can be used in a `const`
+/// context.
+pub const fn u64_from_u32s(high: u32, low: u32) -> u64 {
+    ((high as u64) << u32::BITS) | low as u64
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/16] rust: make ETIMEDOUT error available
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 01/16] rust: add useful ops for u64 Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will use this error in the nova-core driver.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/error.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 3dee3139fcd4379b94748c0ba1965f4e1865b633..083c7b068cf4e185100de96e520c54437898ee72 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -65,6 +65,7 @@ macro_rules! declare_err {
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
     declare_err!(EOVERFLOW, "Value too large for defined data type.");
+    declare_err!(ETIMEDOUT, "Connection timed out.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 01/16] rust: add useful ops for u64 Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 02/16] rust: make ETIMEDOUT error available Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22 16:23   ` Joel Fernandes
  2025-04-20 12:19 ` [PATCH 04/16] gpu: nova-core: add missing GA100 definition Alexandre Courbot
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will commonly need to compare chipset versions, so derive the
ordering traits to make that possible. Also derive Copy and Clone since
passing Chipset by value will be more efficient than by reference.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 17c9660da45034762edaa78e372d8821144cdeb7..4de67a2dc16302c00530026156d7264cbc7e5b32 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -13,7 +13,7 @@ macro_rules! define_chipset {
     ({ $($variant:ident = $value:expr),* $(,)* }) =>
     {
         /// Enum representation of the GPU chipset.
-        #[derive(fmt::Debug)]
+        #[derive(fmt::Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
         pub(crate) enum Chipset {
             $($variant = $value),*,
         }

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/16] gpu: nova-core: add missing GA100 definition
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (2 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 05/16] gpu: nova-core: take bound device in Gpu::new Alexandre Courbot
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

linux-firmware contains a directory for GA100, and it is a defined
chipset in Nouveau.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 4de67a2dc16302c00530026156d7264cbc7e5b32..9fe6aedaa9563799c2624d461d4e37ee9b094909 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -54,6 +54,7 @@ fn try_from(value: u32) -> Result<Self, Self::Error> {
     TU117 = 0x167,
     TU116 = 0x168,
     // Ampere
+    GA100 = 0x170,
     GA102 = 0x172,
     GA103 = 0x173,
     GA104 = 0x174,
@@ -73,7 +74,7 @@ pub(crate) fn arch(&self) -> Architecture {
             Self::TU102 | Self::TU104 | Self::TU106 | Self::TU117 | Self::TU116 => {
                 Architecture::Turing
             }
-            Self::GA102 | Self::GA103 | Self::GA104 | Self::GA106 | Self::GA107 => {
+            Self::GA100 | Self::GA102 | Self::GA103 | Self::GA104 | Self::GA106 | Self::GA107 => {
                 Architecture::Ampere
             }
             Self::AD102 | Self::AD103 | Self::AD104 | Self::AD106 | Self::AD107 => {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/16] gpu: nova-core: take bound device in Gpu::new
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (3 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 04/16] gpu: nova-core: add missing GA100 definition Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 06/16] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will need to perform things like allocating DMA memory during device
creation, so make sure to take the device context that will allow us to
perform these actions.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 9fe6aedaa9563799c2624d461d4e37ee9b094909..19a17cdc204b013482c0d307c5838cf3044c8cc8 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -183,7 +183,10 @@ pub(crate) struct Gpu {
 }
 
 impl Gpu {
-    pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<Self>> {
+    pub(crate) fn new(
+        pdev: &pci::Device<device::Bound>,
+        bar: Devres<Bar0>,
+    ) -> Result<impl PinInit<Self>> {
         let spec = Spec::new(&bar)?;
         let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/16] gpu: nova-core: define registers layout using helper macro
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (4 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 05/16] gpu: nova-core: take bound device in Gpu::new Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22 10:29   ` Danilo Krummrich
  2025-04-20 12:19 ` [PATCH 07/16] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add the register!() macro, which defines a given register's layout and
provide bit-field accessors with a way to convert them to a given type.
This macro will allow us to make clear definitions of the registers and
manipulate their fields safely.

The long-term goal is to eventually move it to the kernel crate so it
can be used my other drivers as well, but it was agreed to first land it
into nova-core and make it mature there.

To illustrate its usage, use it to define the layout for the Boot0
register and use its accessors through the use of the convenience
with_bar!() macro, which uses Revocable::try_access() and converts its
returned Option into the proper error as needed.

Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 Documentation/gpu/nova/core/todo.rst |   6 +
 drivers/gpu/nova-core/gpu.rs         |   5 +-
 drivers/gpu/nova-core/nova_core.rs   |  18 +++
 drivers/gpu/nova-core/regs.rs        |  60 ++-----
 drivers/gpu/nova-core/regs/macros.rs | 297 +++++++++++++++++++++++++++++++++++
 5 files changed, 333 insertions(+), 53 deletions(-)

diff --git a/Documentation/gpu/nova/core/todo.rst b/Documentation/gpu/nova/core/todo.rst
index 234d753d3eacc709b928b1ccbfc9750ef36ec4ed..8a459fc088121f770bfcda5dfb4ef51c712793ce 100644
--- a/Documentation/gpu/nova/core/todo.rst
+++ b/Documentation/gpu/nova/core/todo.rst
@@ -102,7 +102,13 @@ Usage:
 	let boot0 = Boot0::read(&bar);
 	pr_info!("Revision: {}\n", boot0.revision());
 
+Note: a work-in-progress implementation currently resides in
+`drivers/gpu/nova-core/regs/macros.rs` and is used in nova-core. It would be
+nice to improve it (possibly using proc macros) and move it to the `kernel`
+crate so it can be used by other components as well.
+
 | Complexity: Advanced
+| Contact: Alexandre Courbot
 
 Delay / Sleep abstractions
 --------------------------
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 19a17cdc204b013482c0d307c5838cf3044c8cc8..891b59fe7255b3951962e30819145e686253706a 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -135,11 +135,10 @@ pub(crate) struct Spec {
 
 impl Spec {
     fn new(bar: &Devres<Bar0>) -> Result<Spec> {
-        let bar = bar.try_access().ok_or(ENXIO)?;
-        let boot0 = regs::Boot0::read(&bar);
+        let boot0 = with_bar!(bar, |b| regs::Boot0::read(b))?;
 
         Ok(Self {
-            chipset: boot0.chipset().try_into()?,
+            chipset: boot0.chipset()?,
             revision: Revision::from_boot0(boot0),
         })
     }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index a91cd924054b49966937a8db6aab9cd0614f10de..0eecd612e34efc046dad852e6239de6ffa5fdd62 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -2,6 +2,24 @@
 
 //! Nova Core GPU Driver
 
+#[macro_use]
+mod macros {
+    /// Convenience macro to run a closure while holding [`crate::driver::Bar0`].
+    ///
+    /// If the bar cannot be acquired, then `ENXIO` is returned.
+    ///
+    /// If a `?` is present before the `bar` argument, then the `Result` returned by the closure is
+    /// merged into the `Result` of the macro itself to avoid having a `Result<Result<>>`.
+    macro_rules! with_bar {
+        ($bar:expr, $closure:expr) => {
+            $bar.try_access_with($closure).ok_or(ENXIO)
+        };
+        (? $bar:expr, $closure:expr) => {
+            with_bar!($bar, $closure).and_then(|r| r)
+        };
+    }
+}
+
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index b1a25b86ef17a6710e6236d5e7f1f26cd4407ce3..e315a3011660df7f18c0a3e0582b5845545b36e2 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -1,55 +1,15 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use crate::driver::Bar0;
+use core::ops::Deref;
+use kernel::io::Io;
 
-// TODO
-//
-// Create register definitions via generic macros. See task "Generic register
-// abstraction" in Documentation/gpu/nova/core/todo.rst.
+#[macro_use]
+mod macros;
 
-const BOOT0_OFFSET: usize = 0x00000000;
+use crate::gpu::Chipset;
 
-// 3:0 - chipset minor revision
-const BOOT0_MINOR_REV_SHIFT: u8 = 0;
-const BOOT0_MINOR_REV_MASK: u32 = 0x0000000f;
-
-// 7:4 - chipset major revision
-const BOOT0_MAJOR_REV_SHIFT: u8 = 4;
-const BOOT0_MAJOR_REV_MASK: u32 = 0x000000f0;
-
-// 23:20 - chipset implementation Identifier (depends on architecture)
-const BOOT0_IMPL_SHIFT: u8 = 20;
-const BOOT0_IMPL_MASK: u32 = 0x00f00000;
-
-// 28:24 - chipset architecture identifier
-const BOOT0_ARCH_MASK: u32 = 0x1f000000;
-
-// 28:20 - chipset identifier (virtual register field combining BOOT0_IMPL and
-//         BOOT0_ARCH)
-const BOOT0_CHIPSET_SHIFT: u8 = BOOT0_IMPL_SHIFT;
-const BOOT0_CHIPSET_MASK: u32 = BOOT0_IMPL_MASK | BOOT0_ARCH_MASK;
-
-#[derive(Copy, Clone)]
-pub(crate) struct Boot0(u32);
-
-impl Boot0 {
-    #[inline]
-    pub(crate) fn read(bar: &Bar0) -> Self {
-        Self(bar.read32(BOOT0_OFFSET))
-    }
-
-    #[inline]
-    pub(crate) fn chipset(&self) -> u32 {
-        (self.0 & BOOT0_CHIPSET_MASK) >> BOOT0_CHIPSET_SHIFT
-    }
-
-    #[inline]
-    pub(crate) fn minor_rev(&self) -> u8 {
-        ((self.0 & BOOT0_MINOR_REV_MASK) >> BOOT0_MINOR_REV_SHIFT) as u8
-    }
-
-    #[inline]
-    pub(crate) fn major_rev(&self) -> u8 {
-        ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as u8
-    }
-}
+register!(Boot0@0x00000000, "Basic revision information about the GPU";
+    3:0     minor_rev => as u8, "minor revision of the chip";
+    7:4     major_rev => as u8, "major revision of the chip";
+    28:20   chipset => try_into Chipset, "chipset model"
+);
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
new file mode 100644
index 0000000000000000000000000000000000000000..fa9bd6b932048113de997658b112885666e694c9
--- /dev/null
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Types and macros to define register layout and accessors.
+//!
+//! A single register typically includes several fields, which are accessed through a combination
+//! of bit-shift and mask operations that introduce a class of potential mistakes, notably because
+//! not all possible field values are necessarily valid.
+//!
+//! The macros in this module allow to define, using an intruitive and readable syntax, a dedicated
+//! type for each register with its own field accessors that can return an error is a field's value
+//! is invalid. They also provide a builder type allowing to construct a register value to be
+//! written by combining valid values for its fields.
+
+/// Helper macro for the `register` macro.
+///
+/// Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
+/// and conversion to regular `u32`).
+macro_rules! __reg_def_common {
+    ($name:ident $(, $type_comment:expr)?) => {
+        $(
+        #[doc=$type_comment]
+        )?
+        #[repr(transparent)]
+        #[derive(Clone, Copy, Default)]
+        pub(crate) struct $name(u32);
+
+        // TODO: should we display the raw hex value, then the value of all its fields?
+        impl ::core::fmt::Debug for $name {
+            fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+                f.debug_tuple(stringify!($name))
+                    .field(&format_args!("0x{0:x}", &self.0))
+                    .finish()
+            }
+        }
+
+        impl core::ops::BitOr for $name {
+            type Output = Self;
+
+            fn bitor(self, rhs: Self) -> Self::Output {
+                Self(self.0 | rhs.0)
+            }
+        }
+
+        impl From<$name> for u32 {
+            fn from(reg: $name) -> u32 {
+                reg.0
+            }
+        }
+    };
+}
+
+/// Helper macro for the `register` macro.
+///
+/// Defines the getter method for $field.
+macro_rules! __reg_def_field_getter {
+    (
+        $hi:tt:$lo:tt $field:ident
+            $(=> as $as_type:ty)?
+            $(=> as_bit $bit_type:ty)?
+            $(=> into $type:ty)?
+            $(=> try_into $try_type:ty)?
+        $(, $comment:expr)?
+    ) => {
+        $(
+        #[doc=concat!("Returns the ", $comment)]
+        )?
+        #[inline]
+        pub(crate) fn $field(self) -> $( $as_type )? $( $bit_type )? $( $type )? $( core::result::Result<$try_type, <$try_type as TryFrom<u32>>::Error> )? {
+            const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
+            const SHIFT: u32 = MASK.trailing_zeros();
+            let field = (self.0 & MASK) >> SHIFT;
+
+            $( field as $as_type )?
+            $(
+            // TODO: it would be nice to throw a compile-time error if $hi != $lo as this means we
+            // are considering more than one bit but returning a bool...
+            <$bit_type>::from(if field != 0 { true } else { false }) as $bit_type
+            )?
+            $( <$type>::from(field) )?
+            $( <$try_type>::try_from(field) )?
+        }
+    }
+}
+
+/// Helper macro for the `register` macro.
+///
+/// Defines all the field getter methods for `$name`.
+macro_rules! __reg_def_getters {
+    (
+        $name:ident
+        $(; $hi:tt:$lo:tt $field:ident
+            $(=> as $as_type:ty)?
+            $(=> as_bit $bit_type:ty)?
+            $(=> into $type:ty)?
+            $(=> try_into $try_type:ty)?
+        $(, $field_comment:expr)?)* $(;)?
+    ) => {
+        #[allow(dead_code)]
+        impl $name {
+            $(
+            __reg_def_field_getter!($hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)?);
+            )*
+        }
+    };
+}
+
+/// Helper macro for the `register` macro.
+///
+/// Defines the setter method for $field.
+macro_rules! __reg_def_field_setter {
+    (
+        $hi:tt:$lo:tt $field:ident
+            $(=> as $as_type:ty)?
+            $(=> as_bit $bit_type:ty)?
+            $(=> into $type:ty)?
+            $(=> try_into $try_type:ty)?
+        $(, $comment:expr)?
+    ) => {
+        kernel::macros::paste! {
+        $(
+        #[doc=concat!("Sets the ", $comment)]
+        )?
+        #[inline]
+        pub(crate) fn [<set_ $field>](mut self, value: $( $as_type)? $( $bit_type )? $( $type )? $( $try_type)? ) -> Self {
+            const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
+            const SHIFT: u32 = MASK.trailing_zeros();
+
+            let value = ((value as u32) << SHIFT) & MASK;
+            self.0 = (self.0 & !MASK) | value;
+            self
+        }
+        }
+    };
+}
+
+/// Helper macro for the `register` macro.
+///
+/// Defines all the field setter methods for `$name`.
+macro_rules! __reg_def_setters {
+    (
+        $name:ident
+        $(; $hi:tt:$lo:tt $field:ident
+            $(=> as $as_type:ty)?
+            $(=> as_bit $bit_type:ty)?
+            $(=> into $type:ty)?
+            $(=> try_into $try_type:ty)?
+        $(, $field_comment:expr)?)* $(;)?
+    ) => {
+        #[allow(dead_code)]
+        impl $name {
+            $(
+            __reg_def_field_setter!($hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)?);
+            )*
+        }
+    };
+}
+
+/// Defines a dedicated type for a register with an absolute offset, alongside with getter and
+/// setter methods for its fields and methods to read and write it from an `Io` region.
+///
+/// Example:
+///
+/// ```no_run
+/// register!(Boot0@0x00000100, "Basic revision information about the chip";
+///     3:0     minor_rev => as u8, "minor revision of the chip";
+///     7:4     major_rev => as u8, "major revision of the chip";
+///     28:20   chipset => try_into Chipset, "chipset model"
+/// );
+/// ```
+///
+/// This defines a `Boot0` type which can be read or written from offset `0x100` of an `Io` region.
+/// It is composed of 3 fields, for instance `minor_rev` is made of the 4 less significant bits of
+/// the register. Each field can be accessed and modified using helper methods:
+///
+/// ```no_run
+/// // Read from offset 0x100.
+/// let boot0 = Boot0::read(&bar);
+/// pr_info!("chip revision: {}.{}", boot0.major_rev(), boot0.minor_rev());
+///
+/// // `Chipset::try_from` will be called with the value of the field and returns an error if the
+/// // value is invalid.
+/// let chipset = boot0.chipset()?;
+///
+/// // Update some fields and write the value back.
+/// boot0.set_major_rev(3).set_minor_rev(10).write(&bar);
+///
+/// // Or just update the register value in a single step:
+/// Boot0::alter(&bar, |r| r.set_major_rev(3).set_minor_rev(10));
+/// ```
+///
+/// Fields are made accessible using one of the following strategies:
+///
+/// - `as <type>` simply casts the field value to the requested type.
+/// - `as_bit <type>` turns the field into a boolean and calls `<type>::from()` with the obtained
+///   value. To be used with single-bit fields.
+/// - `into <type>` calls `<type>::from()` on the value of the field. It is expected to handle all
+///   the possible values for the bit range selected.
+/// - `try_into <type>` calls `<type>::try_from()` on the value of the field and returns its
+///   result.
+///
+/// The documentation strings are optional. If present, they will be added to the type or the field
+/// getter and setter methods they are attached to.
+///
+/// Putting a `+` before the address of the register makes it relative to a base: the `read` and
+/// `write` methods take a `base` argument that is added to the specified address before access,
+/// and adds `try_read` and `try_write` methods to allow access with offsets unknown at
+/// compile-time.
+///
+macro_rules! register {
+    // Create a register at a fixed offset of the MMIO space.
+    (
+        $name:ident@$offset:expr $(, $type_comment:expr)?
+        $(; $hi:tt:$lo:tt $field:ident
+            $(=> as $as_type:ty)?
+            $(=> as_bit $bit_type:ty)?
+            $(=> into $type:ty)?
+            $(=> try_into $try_type:ty)?
+        $(, $field_comment:expr)?)* $(;)?
+    ) => {
+        __reg_def_common!($name);
+
+        #[allow(dead_code)]
+        impl $name {
+            #[inline]
+            pub(crate) fn read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T) -> Self {
+                Self(bar.read32($offset))
+            }
+
+            #[inline]
+            pub(crate) fn write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T) {
+                bar.write32(self.0, $offset)
+            }
+
+            #[inline]
+            pub(crate) fn alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, f: F) {
+                let reg = f(Self::read(bar));
+                reg.write(bar);
+            }
+        }
+
+        __reg_def_getters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
+
+        __reg_def_setters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
+    };
+
+    // Create a register at a relative offset from a base address.
+    (
+        $name:ident@+$offset:expr $(, $type_comment:expr)?
+        $(; $hi:tt:$lo:tt $field:ident
+            $(=> as $as_type:ty)?
+            $(=> as_bit $bit_type:ty)?
+            $(=> into $type:ty)?
+            $(=> try_into $try_type:ty)?
+        $(, $field_comment:expr)?)* $(;)?
+    ) => {
+        __reg_def_common!($name);
+
+        #[allow(dead_code)]
+        impl $name {
+            #[inline]
+            pub(crate) fn read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T, base: usize) -> Self {
+                Self(bar.read32(base + $offset))
+            }
+
+            #[inline]
+            pub(crate) fn write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T, base: usize) {
+                bar.write32(self.0, base + $offset)
+            }
+
+            #[inline]
+            pub(crate) fn alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, base: usize, f: F) {
+                let reg = f(Self::read(bar, base));
+                reg.write(bar, base);
+            }
+
+            #[inline]
+            pub(crate) fn try_read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T, base: usize) -> ::kernel::error::Result<Self> {
+                bar.try_read32(base + $offset).map(Self)
+            }
+
+            #[inline]
+            pub(crate) fn try_write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T, base: usize) -> ::kernel::error::Result<()> {
+                bar.try_write32(self.0, base + $offset)
+            }
+
+            #[inline]
+            pub(crate) fn try_alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, base: usize, f: F) -> ::kernel::error::Result<()> {
+                let reg = f(Self::try_read(bar, base)?);
+                reg.try_write(bar, base)
+            }
+        }
+
+        __reg_def_getters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
+
+        __reg_def_setters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
+    };
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/16] gpu: nova-core: move Firmware to firmware module
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (5 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 06/16] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will extend the firmware methods, so move it to its own module
instead to keep gpu.rs focused.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 42 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/gpu.rs      | 35 +++-----------------------------
 2 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 6e6361c59ca1ae9a52185e66e850ba1db93eb8ce..9bad7a86382af7917b3dce7bf3087d0002bd5971 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -1,7 +1,47 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use crate::gpu;
+//! Contains structures and functions dedicated to the parsing, building and patching of firmwares
+//! to be loaded into a given execution unit.
+
+use kernel::device;
 use kernel::firmware;
+use kernel::prelude::*;
+use kernel::str::CString;
+
+use crate::gpu;
+use crate::gpu::Chipset;
+
+/// Structure encapsulating the firmware blobs required for the GPU to operate.
+#[expect(dead_code)]
+pub(crate) struct Firmware {
+    pub booter_load: firmware::Firmware,
+    pub booter_unload: firmware::Firmware,
+    pub bootloader: firmware::Firmware,
+    pub gsp: firmware::Firmware,
+}
+
+impl Firmware {
+    pub(crate) fn new(
+        dev: &device::Device<device::Bound>,
+        chipset: Chipset,
+        ver: &str,
+    ) -> Result<Firmware> {
+        let mut chip_name = CString::try_from_fmt(fmt!("{}", chipset))?;
+        chip_name.make_ascii_lowercase();
+
+        let request = |name_| {
+            CString::try_from_fmt(fmt!("nvidia/{}/gsp/{}-{}.bin", &*chip_name, name_, ver))
+                .and_then(|path| firmware::Firmware::request(&path, dev))
+        };
+
+        Ok(Firmware {
+            booter_load: request("booter_load")?,
+            booter_unload: request("booter_unload")?,
+            bootloader: request("bootloader")?,
+            gsp: request("gsp")?,
+        })
+    }
+}
 
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 891b59fe7255b3951962e30819145e686253706a..866c5992b9eb27735975bb4948e522bc01fadaa2 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,10 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use kernel::{
-    device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*, str::CString,
-};
+use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
 use crate::driver::Bar0;
+use crate::firmware::Firmware;
 use crate::regs;
 use crate::util;
 use core::fmt;
@@ -144,34 +143,6 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
     }
 }
 
-/// Structure encapsulating the firmware blobs required for the GPU to operate.
-#[expect(dead_code)]
-pub(crate) struct Firmware {
-    booter_load: firmware::Firmware,
-    booter_unload: firmware::Firmware,
-    bootloader: firmware::Firmware,
-    gsp: firmware::Firmware,
-}
-
-impl Firmware {
-    fn new(dev: &device::Device, spec: &Spec, ver: &str) -> Result<Firmware> {
-        let mut chip_name = CString::try_from_fmt(fmt!("{}", spec.chipset))?;
-        chip_name.make_ascii_lowercase();
-
-        let request = |name_| {
-            CString::try_from_fmt(fmt!("nvidia/{}/gsp/{}-{}.bin", &*chip_name, name_, ver))
-                .and_then(|path| firmware::Firmware::request(&path, dev))
-        };
-
-        Ok(Firmware {
-            booter_load: request("booter_load")?,
-            booter_unload: request("booter_unload")?,
-            bootloader: request("bootloader")?,
-            gsp: request("gsp")?,
-        })
-    }
-}
-
 /// Structure holding the resources required to operate the GPU.
 #[pin_data]
 pub(crate) struct Gpu {
@@ -187,7 +158,7 @@ pub(crate) fn new(
         bar: Devres<Bar0>,
     ) -> Result<impl PinInit<Self>> {
         let spec = Spec::new(&bar)?;
-        let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;
+        let fw = Firmware::new(pdev.as_ref(), spec.chipset, "535.113.01")?;
 
         dev_info!(
             pdev.as_ref(),

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (6 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 07/16] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-21 21:45   ` Joel Fernandes
  2025-04-22 11:36   ` Danilo Krummrich
  2025-04-20 12:19 ` [PATCH 09/16] gpu: nova-core: register sysmem flush page Alexandre Courbot
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Upon reset, the GPU executes the GFW_BOOT firmware in order to
initialize its base parameters such as clocks. The driver must ensure
that this step is completed before using the hardware.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/driver.rs    |  2 +-
 drivers/gpu/nova-core/gpu.rs       |  5 +++++
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
new file mode 100644
index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
--- /dev/null
+++ b/drivers/gpu/nova-core/devinit.rs
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Methods for device initialization.
+
+use kernel::bindings;
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+/// Wait for devinit FW completion.
+///
+/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
+/// considered unusable until this step is completed, so it must be waited on very early during
+/// driver initialization.
+pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
+    let mut timeout = 2000;
+
+    loop {
+        let gfw_booted = with_bar!(
+            bar,
+            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
+                .read_protection_level0_enabled()
+                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
+        )?;
+
+        if gfw_booted {
+            return Ok(());
+        }
+
+        if timeout == 0 {
+            return Err(ETIMEDOUT);
+        }
+        timeout -= 1;
+
+        // SAFETY: msleep should be safe to call with any parameter.
+        unsafe { bindings::msleep(2) };
+    }
+}
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x1000000;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -2,6 +2,7 @@
 
 use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
+use crate::devinit;
 use crate::driver::Bar0;
 use crate::firmware::Firmware;
 use crate::regs;
@@ -168,6 +169,10 @@ pub(crate) fn new(
             spec.revision
         );
 
+        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
+        devinit::wait_gfw_boot_completion(&bar)
+            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
+
         Ok(pin_init!(Self { spec, bar, fw }))
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -20,6 +20,7 @@ macro_rules! with_bar {
     }
 }
 
+mod devinit;
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -13,3 +13,14 @@
     7:4     major_rev => as u8, "major revision of the chip";
     28:20   chipset => try_into Chipset, "chipset model"
 );
+
+/* GC6 */
+
+register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
+    0:0     read_protection_level0_enabled => as_bit bool
+);
+
+/* TODO: This is an array of registers. */
+register!(Pgc6AonSecureScratchGroup05@0x00118234;
+    31:0    value => as u32
+);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/16] gpu: nova-core: register sysmem flush page
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (7 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22 11:45   ` Danilo Krummrich
  2025-04-22 18:50   ` Joel Fernandes
  2025-04-20 12:19 ` [PATCH 10/16] gpu: nova-core: add basic timer device Alexandre Courbot
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

A page of system memory is reserved so sysmembar can perform a read on
it if a system write occurred since the last flush. Do this early as it
can be required to e.g. reset the GPU falcons.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/dma.rs       | 54 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gpu.rs       | 53 +++++++++++++++++++++++++++++++++++--
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 10 +++++++
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
new file mode 100644
index 0000000000000000000000000000000000000000..a4162bff597132a04e002b2b910a4537bbabc287
--- /dev/null
+++ b/drivers/gpu/nova-core/dma.rs
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Simple DMA object wrapper.
+
+// To be removed when all code is used.
+#![allow(dead_code)]
+
+use kernel::device;
+use kernel::dma::CoherentAllocation;
+use kernel::page::PAGE_SIZE;
+use kernel::prelude::*;
+
+pub(crate) struct DmaObject {
+    pub dma: CoherentAllocation<u8>,
+    pub len: usize,
+    #[allow(dead_code)]
+    pub name: &'static str,
+}
+
+impl DmaObject {
+    pub(crate) fn new(
+        dev: &device::Device<device::Bound>,
+        len: usize,
+        name: &'static str,
+    ) -> Result<Self> {
+        let len = core::alloc::Layout::from_size_align(len, PAGE_SIZE)
+            .map_err(|_| EINVAL)?
+            .pad_to_align()
+            .size();
+        let dma = CoherentAllocation::alloc_coherent(dev, len, GFP_KERNEL | __GFP_ZERO)?;
+
+        Ok(Self { dma, len, name })
+    }
+
+    pub(crate) fn from_data(
+        dev: &device::Device<device::Bound>,
+        data: &[u8],
+        name: &'static str,
+    ) -> Result<Self> {
+        Self::new(dev, data.len(), name).and_then(|mut dma_obj| {
+            // SAFETY:
+            // - The copied data fits within the size of the allocated object.
+            // - We have just created this object and there is no other user at this stage.
+            unsafe {
+                core::ptr::copy_nonoverlapping(
+                    data.as_ptr(),
+                    dma_obj.dma.start_ptr_mut(),
+                    data.len(),
+                );
+            }
+            Ok(dma_obj)
+        })
+    }
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 1f7799692a0ab042f2540e01414f5ca347ae9ecc..d43e710cc983d51f053dacbd77cbbfb79fa882c3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -3,6 +3,7 @@
 use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
 use crate::devinit;
+use crate::dma::DmaObject;
 use crate::driver::Bar0;
 use crate::firmware::Firmware;
 use crate::regs;
@@ -145,12 +146,30 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
 }
 
 /// Structure holding the resources required to operate the GPU.
-#[pin_data]
+#[pin_data(PinnedDrop)]
 pub(crate) struct Gpu {
     spec: Spec,
     /// MMIO mapping of PCI BAR 0
     bar: Devres<Bar0>,
     fw: Firmware,
+    sysmem_flush: DmaObject,
+}
+
+#[pinned_drop]
+impl PinnedDrop for Gpu {
+    fn drop(self: Pin<&mut Self>) {
+        // Unregister the sysmem flush page before we release it.
+        let _ = with_bar!(&self.bar, |b| {
+            regs::PfbNisoFlushSysmemAddr::default()
+                .set_adr_39_08(0)
+                .write(b);
+            if self.spec.chipset >= Chipset::GA102 {
+                regs::PfbNisoFlushSysmemAddrHi::default()
+                    .set_adr_63_40(0)
+                    .write(b);
+            }
+        });
+    }
 }
 
 impl Gpu {
@@ -173,6 +192,36 @@ pub(crate) fn new(
         devinit::wait_gfw_boot_completion(&bar)
             .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
 
-        Ok(pin_init!(Self { spec, bar, fw }))
+        // System memory page required for sysmembar to properly flush into system memory.
+        let sysmem_flush = {
+            let page = DmaObject::new(
+                pdev.as_ref(),
+                kernel::bindings::PAGE_SIZE,
+                "sysmem flush page",
+            )?;
+
+            // Register the sysmem flush page.
+            with_bar!(bar, |b| {
+                let handle = page.dma.dma_handle();
+
+                regs::PfbNisoFlushSysmemAddr::default()
+                    .set_adr_39_08((handle >> 8) as u32)
+                    .write(b);
+                if spec.chipset >= Chipset::GA102 {
+                    regs::PfbNisoFlushSysmemAddrHi::default()
+                        .set_adr_63_40((handle >> 40) as u32)
+                        .write(b);
+                }
+            })?;
+
+            page
+        };
+
+        Ok(pin_init!(Self {
+            spec,
+            bar,
+            fw,
+            sysmem_flush,
+        }))
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 878161e060f54da7738c656f6098936a62dcaa93..37c7eb0ea7a926bee4e3c661028847291bf07fa2 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -21,6 +21,7 @@ macro_rules! with_bar {
 }
 
 mod devinit;
+mod dma;
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac..1e24787c4b5f432ac25fe399c8cb38b7350e44ae 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -14,6 +14,16 @@
     28:20   chipset => try_into Chipset, "chipset model"
 );
 
+/* PFB */
+
+register!(PfbNisoFlushSysmemAddr@0x00100c10;
+    31:0    adr_39_08 => as u32
+);
+
+register!(PfbNisoFlushSysmemAddrHi@0x00100c40;
+    23:0    adr_63_40 => as u32
+);
+
 /* GC6 */
 
 register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/16] gpu: nova-core: add basic timer device
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (8 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 09/16] gpu: nova-core: register sysmem flush page Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22 12:07   ` Danilo Krummrich
  2025-04-20 12:19 ` [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add a timer that works with GPU time and provides the ability to wait on
a condition with a specific timeout.

The `Duration` Rust type is used to keep track is differences between
timestamps ; this will be replaced by the equivalent kernel type once it
lands.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |   5 ++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  10 +++
 drivers/gpu/nova-core/timer.rs     | 133 +++++++++++++++++++++++++++++++++++++
 4 files changed, 149 insertions(+)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index d43e710cc983d51f053dacbd77cbbfb79fa882c3..1b3e43e0412e2a2ea178c7404ea647c9e38d4e04 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -7,6 +7,7 @@
 use crate::driver::Bar0;
 use crate::firmware::Firmware;
 use crate::regs;
+use crate::timer::Timer;
 use crate::util;
 use core::fmt;
 
@@ -153,6 +154,7 @@ pub(crate) struct Gpu {
     bar: Devres<Bar0>,
     fw: Firmware,
     sysmem_flush: DmaObject,
+    timer: Timer,
 }
 
 #[pinned_drop]
@@ -217,11 +219,14 @@ pub(crate) fn new(
             page
         };
 
+        let timer = Timer::new();
+
         Ok(pin_init!(Self {
             spec,
             bar,
             fw,
             sysmem_flush,
+            timer,
         }))
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 37c7eb0ea7a926bee4e3c661028847291bf07fa2..df3468c92c6081b3e2db218d92fbe1c40a0a75c3 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -26,6 +26,7 @@ macro_rules! with_bar {
 mod firmware;
 mod gpu;
 mod regs;
+mod timer;
 mod util;
 
 kernel::module_pci_driver! {
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 1e24787c4b5f432ac25fe399c8cb38b7350e44ae..f191cf4eb44c2b950e5cfcc6d04f95c122ce29d3 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -14,6 +14,16 @@
     28:20   chipset => try_into Chipset, "chipset model"
 );
 
+/* PTIMER */
+
+register!(PtimerTime0@0x00009400;
+    31:0    lo => as u32, "low 32-bits of the timer"
+);
+
+register!(PtimerTime1@0x00009410;
+    31:0    hi => as u32, "high 32 bits of the timer"
+);
+
 /* PFB */
 
 register!(PfbNisoFlushSysmemAddr@0x00100c10;
diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
new file mode 100644
index 0000000000000000000000000000000000000000..8987352f4192bc9b4b2fc0fb5f2e8e62ff27be68
--- /dev/null
+++ b/drivers/gpu/nova-core/timer.rs
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Nova Core Timer subdevice
+
+// To be removed when all code is used.
+#![allow(dead_code)]
+
+use core::fmt::Display;
+use core::ops::{Add, Sub};
+use core::time::Duration;
+
+use kernel::devres::Devres;
+use kernel::num::U64Ext;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+/// A timestamp with nanosecond granularity obtained from the GPU timer.
+///
+/// A timestamp can also be substracted to another in order to obtain a [`Duration`].
+#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) struct Timestamp(u64);
+
+impl Display for Timestamp {
+    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+        write!(f, "{}", self.0)
+    }
+}
+
+impl Add<Duration> for Timestamp {
+    type Output = Self;
+
+    fn add(mut self, rhs: Duration) -> Self::Output {
+        let mut nanos = rhs.as_nanos();
+        while nanos > u64::MAX as u128 {
+            self.0 = self.0.wrapping_add(nanos as u64);
+            nanos -= u64::MAX as u128;
+        }
+
+        Timestamp(self.0.wrapping_add(nanos as u64))
+    }
+}
+
+impl Sub for Timestamp {
+    type Output = Duration;
+
+    fn sub(self, rhs: Self) -> Self::Output {
+        Duration::from_nanos(self.0.wrapping_sub(rhs.0))
+    }
+}
+
+pub(crate) struct Timer {}
+
+impl Timer {
+    pub(crate) fn new() -> Self {
+        Self {}
+    }
+
+    /// Read the current timer timestamp.
+    pub(crate) fn read(&self, bar: &Bar0) -> Timestamp {
+        loop {
+            let hi = regs::PtimerTime1::read(bar);
+            let lo = regs::PtimerTime0::read(bar);
+
+            if hi.hi() == regs::PtimerTime1::read(bar).hi() {
+                return Timestamp(u64::from_u32s(hi.hi(), lo.lo()));
+            }
+        }
+    }
+
+    #[allow(dead_code)]
+    pub(crate) fn time(bar: &Bar0, time: u64) {
+        regs::PtimerTime1::default()
+            .set_hi(time.upper_32_bits())
+            .write(bar);
+        regs::PtimerTime0::default()
+            .set_lo(time.lower_32_bits())
+            .write(bar);
+    }
+
+    /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
+    ///
+    /// When `cond` evaluates to `Some`, its return value is returned.
+    ///
+    /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
+    /// `Some`, or if the timer device is stuck for some reason.
+    pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
+        &self,
+        bar: &Devres<Bar0>,
+        timeout: Duration,
+        cond: F,
+    ) -> Result<R> {
+        // Number of consecutive time reads after which we consider the timer frozen if it hasn't
+        // moved forward.
+        const MAX_STALLED_READS: usize = 16;
+
+        let (mut cur_time, mut prev_time, deadline) = {
+            let cur_time = with_bar!(bar, |b| self.read(b))?;
+            let deadline = cur_time + timeout;
+
+            (cur_time, cur_time, deadline)
+        };
+        let mut num_reads = 0;
+
+        loop {
+            if let Some(ret) = cond() {
+                return Ok(ret);
+            }
+
+            (|| {
+                cur_time = with_bar!(bar, |b| self.read(b))?;
+
+                /* Check if the timer is frozen for some reason. */
+                if cur_time == prev_time {
+                    if num_reads >= MAX_STALLED_READS {
+                        return Err(ETIMEDOUT);
+                    }
+                    num_reads += 1;
+                } else {
+                    if cur_time >= deadline {
+                        return Err(ETIMEDOUT);
+                    }
+
+                    num_reads = 0;
+                    prev_time = cur_time;
+                }
+
+                Ok(())
+            })()?;
+        }
+    }
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (9 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 10/16] gpu: nova-core: add basic timer device Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22 14:44   ` Danilo Krummrich
  2025-04-20 12:19 ` [PATCH 12/16] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add the common Falcon code and HAL for Ampere GPUs, and instantiate the
GSP and SEC2 Falcons that will be required to boot the GSP.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs           | 469 ++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/falcon/gsp.rs       |  27 ++
 drivers/gpu/nova-core/falcon/hal.rs       |  54 ++++
 drivers/gpu/nova-core/falcon/hal/ga102.rs | 111 +++++++
 drivers/gpu/nova-core/falcon/sec2.rs      |   9 +
 drivers/gpu/nova-core/gpu.rs              |  16 +
 drivers/gpu/nova-core/nova_core.rs        |   1 +
 drivers/gpu/nova-core/regs.rs             | 189 ++++++++++++
 drivers/gpu/nova-core/timer.rs            |   3 -
 9 files changed, 876 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
new file mode 100644
index 0000000000000000000000000000000000000000..71f374445ff3277eac628e183942c79f557366d5
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -0,0 +1,469 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Falcon microprocessor base support
+
+// To be removed when all code is used.
+#![allow(dead_code)]
+
+use core::hint::unreachable_unchecked;
+use core::time::Duration;
+use hal::FalconHal;
+use kernel::bindings;
+use kernel::devres::Devres;
+use kernel::sync::Arc;
+use kernel::{pci, prelude::*};
+
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+use crate::regs;
+use crate::timer::Timer;
+
+pub(crate) mod gsp;
+mod hal;
+pub(crate) mod sec2;
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRev {
+    #[default]
+    Rev1 = 1,
+    Rev2 = 2,
+    Rev3 = 3,
+    Rev4 = 4,
+    Rev5 = 5,
+    Rev6 = 6,
+    Rev7 = 7,
+}
+
+impl TryFrom<u32> for FalconCoreRev {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        use FalconCoreRev::*;
+
+        let rev = match value {
+            1 => Rev1,
+            2 => Rev2,
+            3 => Rev3,
+            4 => Rev4,
+            5 => Rev5,
+            6 => Rev6,
+            7 => Rev7,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(rev)
+    }
+}
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone)]
+pub(crate) enum FalconSecurityModel {
+    #[default]
+    None = 0,
+    Light = 2,
+    Heavy = 3,
+}
+
+impl TryFrom<u32> for FalconSecurityModel {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        use FalconSecurityModel::*;
+
+        let sec_model = match value {
+            0 => None,
+            2 => Light,
+            3 => Heavy,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(sec_model)
+    }
+}
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRevSubversion {
+    #[default]
+    Subversion0 = 0,
+    Subversion1 = 1,
+    Subversion2 = 2,
+    Subversion3 = 3,
+}
+
+impl From<u32> for FalconCoreRevSubversion {
+    fn from(value: u32) -> Self {
+        use FalconCoreRevSubversion::*;
+
+        match value & 0b11 {
+            0 => Subversion0,
+            1 => Subversion1,
+            2 => Subversion2,
+            3 => Subversion3,
+            // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
+            4..=u32::MAX => unsafe { unreachable_unchecked() },
+        }
+    }
+}
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq)]
+pub(crate) enum FalconModSelAlgo {
+    #[default]
+    Rsa3k = 1,
+}
+
+impl TryFrom<u32> for FalconModSelAlgo {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        match value {
+            1 => Ok(FalconModSelAlgo::Rsa3k),
+            _ => Err(EINVAL),
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum RiscvCoreSelect {
+    Falcon = 0,
+    Riscv = 1,
+}
+
+impl From<bool> for RiscvCoreSelect {
+    fn from(value: bool) -> Self {
+        match value {
+            false => RiscvCoreSelect::Falcon,
+            true => RiscvCoreSelect::Riscv,
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum FalconMem {
+    Imem,
+    Dmem,
+}
+
+#[derive(Debug, Clone, Default)]
+pub(crate) enum FalconFbifTarget {
+    #[default]
+    LocalFb = 0,
+    CoherentSysmem = 1,
+    NoncoherentSysmem = 2,
+}
+
+impl TryFrom<u32> for FalconFbifTarget {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        let res = match value {
+            0 => Self::LocalFb,
+            1 => Self::CoherentSysmem,
+            2 => Self::NoncoherentSysmem,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(res)
+    }
+}
+
+#[derive(Debug, Clone, Default)]
+pub(crate) enum FalconFbifMemType {
+    #[default]
+    Virtual = 0,
+    Physical = 1,
+}
+
+impl From<bool> for FalconFbifMemType {
+    fn from(value: bool) -> Self {
+        match value {
+            false => Self::Virtual,
+            true => Self::Physical,
+        }
+    }
+}
+
+/// Trait defining the parameters of a given Falcon instance.
+pub(crate) trait FalconEngine: Sync {
+    /// Base I/O address for the falcon, relative from which its registers are accessed.
+    const BASE: usize;
+}
+
+/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
+#[derive(Debug)]
+pub(crate) struct FalconLoadTarget {
+    /// Offset from the start of the source object to copy from.
+    pub(crate) src_start: u32,
+    /// Offset from the start of the destination memory to copy into.
+    pub(crate) dst_start: u32,
+    /// Number of bytes to copy.
+    pub(crate) len: u32,
+}
+
+#[derive(Debug)]
+pub(crate) struct FalconBromParams {
+    pub(crate) pkc_data_offset: u32,
+    pub(crate) engine_id_mask: u16,
+    pub(crate) ucode_id: u8,
+}
+
+pub(crate) trait FalconFirmware {
+    type Target: FalconEngine;
+
+    /// Returns the DMA handle of the object containing the firmware.
+    fn dma_handle(&self) -> bindings::dma_addr_t;
+
+    /// Returns the load parameters for `IMEM`.
+    fn imem_load(&self) -> FalconLoadTarget;
+
+    /// Returns the load parameters for `DMEM`.
+    fn dmem_load(&self) -> FalconLoadTarget;
+
+    /// Returns the parameters to write into the BROM registers.
+    fn brom_params(&self) -> FalconBromParams;
+
+    /// Returns the start address of the firmware.
+    fn boot_addr(&self) -> u32;
+}
+
+/// Contains the base parameters common to all Falcon instances.
+pub(crate) struct Falcon<E: FalconEngine> {
+    pub hal: Arc<dyn FalconHal<E>>,
+}
+
+impl<E: FalconEngine + 'static> Falcon<E> {
+    pub(crate) fn new(
+        pdev: &pci::Device,
+        chipset: Chipset,
+        bar: &Devres<Bar0>,
+        need_riscv: bool,
+    ) -> Result<Self> {
+        let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, E::BASE))?;
+        // Ensure that the revision and security model contain valid values.
+        let _rev = hwcfg1.core_rev()?;
+        let _sec_model = hwcfg1.security_model()?;
+
+        if need_riscv {
+            let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
+            if !hwcfg2.riscv() {
+                dev_err!(
+                    pdev.as_ref(),
+                    "riscv support requested on falcon that does not support it\n"
+                );
+                return Err(EINVAL);
+            }
+        }
+
+        Ok(Self {
+            hal: hal::create_falcon_hal(chipset)?,
+        })
+    }
+
+    fn reset_wait_mem_scrubbing(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        timer.wait_on(bar, Duration::from_millis(20), || {
+            bar.try_access_with(|b| regs::FalconHwcfg2::read(b, E::BASE))
+                .and_then(|r| if r.mem_scrubbing() { Some(()) } else { None })
+        })
+    }
+
+    fn reset_eng(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        let _ = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
+
+        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
+        // RESET_READY so a non-failing timeout is used.
+        let _ = timer.wait_on(bar, Duration::from_micros(150), || {
+            bar.try_access_with(|b| regs::FalconHwcfg2::read(b, E::BASE))
+                .and_then(|r| if r.reset_ready() { Some(()) } else { None })
+        });
+
+        with_bar!(bar, |b| regs::FalconEngine::alter(b, E::BASE, |v| v
+            .set_reset(true)))?;
+
+        let _: Result<()> = timer.wait_on(bar, Duration::from_micros(10), || None);
+
+        with_bar!(bar, |b| regs::FalconEngine::alter(b, E::BASE, |v| v
+            .set_reset(false)))?;
+
+        self.reset_wait_mem_scrubbing(bar, timer)?;
+
+        Ok(())
+    }
+
+    pub(crate) fn reset(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        self.reset_eng(bar, timer)?;
+        self.hal.select_core(bar, timer)?;
+        self.reset_wait_mem_scrubbing(bar, timer)?;
+
+        with_bar!(bar, |b| {
+            regs::FalconRm::default()
+                .set_val(regs::Boot0::read(b).into())
+                .write(b, E::BASE)
+        })
+    }
+
+    fn dma_wr(
+        &self,
+        bar: &Devres<Bar0>,
+        timer: &Timer,
+        dma_handle: bindings::dma_addr_t,
+        target_mem: FalconMem,
+        load_offsets: FalconLoadTarget,
+        sec: bool,
+    ) -> Result<()> {
+        const DMA_LEN: u32 = 256;
+        const DMA_LEN_ILOG2_MINUS2: u8 = (DMA_LEN.ilog2() - 2) as u8;
+
+        // For IMEM, we want to use the start offset as a virtual address tag for each page, since
+        // code addresses in the firmware (and the boot vector) are virtual.
+        //
+        // For DMEM we can fold the start offset into the DMA handle.
+        let (src_start, dma_start) = match target_mem {
+            FalconMem::Imem => (load_offsets.src_start, dma_handle),
+            FalconMem::Dmem => (
+                0,
+                dma_handle + load_offsets.src_start as bindings::dma_addr_t,
+            ),
+        };
+        if dma_start % DMA_LEN as bindings::dma_addr_t > 0 {
+            pr_err!(
+                "DMA transfer start addresses must be a multiple of {}",
+                DMA_LEN
+            );
+            return Err(EINVAL);
+        }
+        if load_offsets.len % DMA_LEN > 0 {
+            pr_err!("DMA transfer length must be a multiple of {}", DMA_LEN);
+            return Err(EINVAL);
+        }
+
+        // Set up the base source DMA address.
+        with_bar!(bar, |b| {
+            regs::FalconDmaTrfBase::default()
+                .set_base((dma_start >> 8) as u32)
+                .write(b, E::BASE);
+            regs::FalconDmaTrfBase1::default()
+                .set_base((dma_start >> 40) as u16)
+                .write(b, E::BASE)
+        })?;
+
+        let cmd = regs::FalconDmaTrfCmd::default()
+            .set_size(DMA_LEN_ILOG2_MINUS2)
+            .set_imem(target_mem == FalconMem::Imem)
+            .set_sec(if sec { 1 } else { 0 });
+
+        for pos in (0..load_offsets.len).step_by(DMA_LEN as usize) {
+            // Perform a transfer of size `DMA_LEN`.
+            with_bar!(bar, |b| {
+                regs::FalconDmaTrfMOffs::default()
+                    .set_offs(load_offsets.dst_start + pos)
+                    .write(b, E::BASE);
+                regs::FalconDmaTrfBOffs::default()
+                    .set_offs(src_start + pos)
+                    .write(b, E::BASE);
+                cmd.write(b, E::BASE)
+            })?;
+
+            // Wait for the transfer to complete.
+            timer.wait_on(bar, Duration::from_millis(2000), || {
+                bar.try_access_with(|b| regs::FalconDmaTrfCmd::read(b, E::BASE))
+                    .and_then(|v| if v.idle() { Some(()) } else { None })
+            })?;
+        }
+
+        Ok(())
+    }
+
+    pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(
+        &self,
+        bar: &Devres<Bar0>,
+        timer: &Timer,
+        fw: &F,
+    ) -> Result<()> {
+        let dma_handle = fw.dma_handle();
+
+        with_bar!(bar, |b| {
+            regs::FalconFbifCtl::alter(b, E::BASE, |v| v.set_allow_phys_no_ctx(true));
+            regs::FalconDmaCtl::default().write(b, E::BASE);
+            regs::FalconFbifTranscfg::alter(b, E::BASE, |v| {
+                v.set_target(FalconFbifTarget::CoherentSysmem)
+                    .set_mem_type(FalconFbifMemType::Physical)
+            });
+        })?;
+
+        self.dma_wr(
+            bar,
+            timer,
+            dma_handle,
+            FalconMem::Imem,
+            fw.imem_load(),
+            true,
+        )?;
+        self.dma_wr(
+            bar,
+            timer,
+            dma_handle,
+            FalconMem::Dmem,
+            fw.dmem_load(),
+            true,
+        )?;
+
+        self.hal.program_brom(bar, &fw.brom_params())?;
+
+        with_bar!(bar, |b| {
+            // Set `BootVec` to start of non-secure code.
+            regs::FalconBootVec::default()
+                .set_boot_vec(fw.boot_addr())
+                .write(b, E::BASE);
+        })?;
+
+        Ok(())
+    }
+
+    pub(crate) fn boot(
+        &self,
+        bar: &Devres<Bar0>,
+        timer: &Timer,
+        mbox0: Option<u32>,
+        mbox1: Option<u32>,
+    ) -> Result<(u32, u32)> {
+        with_bar!(bar, |b| {
+            if let Some(mbox0) = mbox0 {
+                regs::FalconMailbox0::default()
+                    .set_mailbox0(mbox0)
+                    .write(b, E::BASE);
+            }
+
+            if let Some(mbox1) = mbox1 {
+                regs::FalconMailbox1::default()
+                    .set_mailbox1(mbox1)
+                    .write(b, E::BASE);
+            }
+
+            match regs::FalconCpuCtl::read(b, E::BASE).alias_en() {
+                true => regs::FalconCpuCtlAlias::default()
+                    .set_start_cpu(true)
+                    .write(b, E::BASE),
+                false => regs::FalconCpuCtl::default()
+                    .set_start_cpu(true)
+                    .write(b, E::BASE),
+            }
+        })?;
+
+        timer.wait_on(bar, Duration::from_secs(2), || {
+            bar.try_access()
+                .map(|b| regs::FalconCpuCtl::read(&*b, E::BASE))
+                .and_then(|v| if v.halted() { Some(()) } else { None })
+        })?;
+
+        let (mbox0, mbox1) = with_bar!(bar, |b| {
+            let mbox0 = regs::FalconMailbox0::read(b, E::BASE).mailbox0();
+            let mbox1 = regs::FalconMailbox1::read(b, E::BASE).mailbox1();
+
+            (mbox0, mbox1)
+        })?;
+
+        Ok((mbox0, mbox1))
+    }
+}
diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..44b8dc118eda1263eaede466efd55408c6e7cded
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/gsp.rs
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::{
+    driver::Bar0,
+    falcon::{Falcon, FalconEngine},
+    regs,
+};
+
+pub(crate) struct Gsp;
+impl FalconEngine for Gsp {
+    const BASE: usize = 0x00110000;
+}
+
+pub(crate) type GspFalcon = Falcon<Gsp>;
+
+impl Falcon<Gsp> {
+    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
+    /// allow GSP to signal CPU for processing new messages in message queue.
+    pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
+        with_bar!(bar, |b| regs::FalconIrqsclr::default()
+            .set_swgen0(true)
+            .write(b, Gsp::BASE))
+    }
+}
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
new file mode 100644
index 0000000000000000000000000000000000000000..5ebf4e88f1f25a13cf47859a53507be53e795d34
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+use kernel::sync::Arc;
+
+use crate::driver::Bar0;
+use crate::falcon::{FalconBromParams, FalconEngine};
+use crate::gpu::Chipset;
+use crate::timer::Timer;
+
+mod ga102;
+
+/// Hardware Abstraction Layer for Falcon cores.
+///
+/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
+/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
+/// registers.
+pub(crate) trait FalconHal<E: FalconEngine>: Sync {
+    // Activates the Falcon core if the engine is a risvc/falcon dual engine.
+    fn select_core(&self, _bar: &Devres<Bar0>, _timer: &Timer) -> Result<()> {
+        Ok(())
+    }
+
+    fn get_signature_reg_fuse_version(
+        &self,
+        bar: &Devres<Bar0>,
+        engine_id_mask: u16,
+        ucode_id: u8,
+    ) -> Result<u32>;
+
+    // Program the BROM registers prior to starting a secure firmware.
+    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
+}
+
+/// Returns a boxed falcon HAL adequate for the passed `chipset`.
+///
+/// We use this function and a heap-allocated trait object instead of statically defined trait
+/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
+/// requested HAL.
+///
+/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
+pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
+    chipset: Chipset,
+) -> Result<Arc<dyn FalconHal<E>>> {
+    let hal = match chipset {
+        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
+            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>
+        }
+        _ => return Err(ENOTSUPP),
+    };
+
+    Ok(hal)
+}
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
new file mode 100644
index 0000000000000000000000000000000000000000..747b02ca671f7d4a97142665a9ba64807c87391e
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::marker::PhantomData;
+use core::time::Duration;
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::falcon::{FalconBromParams, FalconEngine, FalconModSelAlgo, RiscvCoreSelect};
+use crate::regs;
+use crate::timer::Timer;
+
+use super::FalconHal;
+
+fn select_core_ga102<E: FalconEngine>(bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+    let bcr_ctrl = with_bar!(bar, |b| regs::RiscvBcrCtrl::read(b, E::BASE))?;
+    if bcr_ctrl.core_select() != RiscvCoreSelect::Falcon {
+        with_bar!(bar, |b| regs::RiscvBcrCtrl::default()
+            .set_core_select(RiscvCoreSelect::Falcon)
+            .write(b, E::BASE))?;
+
+        timer.wait_on(bar, Duration::from_millis(10), || {
+            bar.try_access_with(|b| regs::RiscvBcrCtrl::read(b, E::BASE))
+                .and_then(|v| if v.valid() { Some(()) } else { None })
+        })?;
+    }
+
+    Ok(())
+}
+
+fn get_signature_reg_fuse_version_ga102(
+    bar: &Devres<Bar0>,
+    engine_id_mask: u16,
+    ucode_id: u8,
+) -> Result<u32> {
+    // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
+    // registers, which are an array. Our register definition macros do not allow us to manage them
+    // properly, so we need to hardcode their addresses for now.
+
+    // Each engine has 16 ucode version registers numbered from 1 to 16.
+    if ucode_id == 0 || ucode_id > 16 {
+        pr_warn!("invalid ucode id {:#x}", ucode_id);
+        return Err(EINVAL);
+    }
+    let reg_fuse = if engine_id_mask & 0x0001 != 0 {
+        // NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION
+        0x824140
+    } else if engine_id_mask & 0x0004 != 0 {
+        // NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION
+        0x824100
+    } else if engine_id_mask & 0x0400 != 0 {
+        // NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION
+        0x8241c0
+    } else {
+        pr_warn!("unexpected engine_id_mask {:#x}", engine_id_mask);
+        return Err(EINVAL);
+    } + ((ucode_id - 1) as usize * core::mem::size_of::<u32>());
+
+    let reg_fuse_version = with_bar!(bar, |b| { b.read32(reg_fuse) })?;
+
+    // Equivalent of Find Last Set bit.
+    Ok(u32::BITS - reg_fuse_version.leading_zeros())
+}
+
+fn program_brom_ga102<E: FalconEngine>(
+    bar: &Devres<Bar0>,
+    params: &FalconBromParams,
+) -> Result<()> {
+    with_bar!(bar, |b| {
+        regs::FalconBromParaaddr0::default()
+            .set_addr(params.pkc_data_offset)
+            .write(b, E::BASE);
+        regs::FalconBromEngidmask::default()
+            .set_mask(params.engine_id_mask as u32)
+            .write(b, E::BASE);
+        regs::FalconBromCurrUcodeId::default()
+            .set_ucode_id(params.ucode_id as u32)
+            .write(b, E::BASE);
+        regs::FalconModSel::default()
+            .set_algo(FalconModSelAlgo::Rsa3k)
+            .write(b, E::BASE)
+    })
+}
+
+pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
+
+impl<E: FalconEngine> Ga102<E> {
+    pub(super) fn new() -> Self {
+        Self(PhantomData)
+    }
+}
+
+impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
+    fn select_core(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        select_core_ga102::<E>(bar, timer)
+    }
+
+    fn get_signature_reg_fuse_version(
+        &self,
+        bar: &Devres<Bar0>,
+        engine_id_mask: u16,
+        ucode_id: u8,
+    ) -> Result<u32> {
+        get_signature_reg_fuse_version_ga102(bar, engine_id_mask, ucode_id)
+    }
+
+    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()> {
+        program_brom_ga102::<E>(bar, params)
+    }
+}
diff --git a/drivers/gpu/nova-core/falcon/sec2.rs b/drivers/gpu/nova-core/falcon/sec2.rs
new file mode 100644
index 0000000000000000000000000000000000000000..85dda3e8380a3d31d34c92c4236c6f81c63ce772
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/sec2.rs
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use crate::falcon::{Falcon, FalconEngine};
+
+pub(crate) struct Sec2;
+impl FalconEngine for Sec2 {
+    const BASE: usize = 0x00840000;
+}
+pub(crate) type Sec2Falcon = Falcon<Sec2>;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 1b3e43e0412e2a2ea178c7404ea647c9e38d4e04..ec4c648c6e8b4aa7d06c627ed59c0e66a08c679e 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -5,6 +5,8 @@
 use crate::devinit;
 use crate::dma::DmaObject;
 use crate::driver::Bar0;
+use crate::falcon::gsp::GspFalcon;
+use crate::falcon::sec2::Sec2Falcon;
 use crate::firmware::Firmware;
 use crate::regs;
 use crate::timer::Timer;
@@ -221,6 +223,20 @@ pub(crate) fn new(
 
         let timer = Timer::new();
 
+        let gsp_falcon = GspFalcon::new(
+            pdev,
+            spec.chipset,
+            &bar,
+            if spec.chipset > Chipset::GA100 {
+                true
+            } else {
+                false
+            },
+        )?;
+        gsp_falcon.clear_swgen0_intr(&bar)?;
+
+        let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, true)?;
+
         Ok(pin_init!(Self {
             spec,
             bar,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index df3468c92c6081b3e2db218d92fbe1c40a0a75c3..4dde8004d24882c60669b5acd6af9d6988c66a9c 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -23,6 +23,7 @@ macro_rules! with_bar {
 mod devinit;
 mod dma;
 mod driver;
+mod falcon;
 mod firmware;
 mod gpu;
 mod regs;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index f191cf4eb44c2b950e5cfcc6d04f95c122ce29d3..c76a16dc8e7267a4eb54cb71e1cca6fb9e00188f 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -6,6 +6,10 @@
 #[macro_use]
 mod macros;
 
+use crate::falcon::{
+    FalconCoreRev, FalconCoreRevSubversion, FalconFbifMemType, FalconFbifTarget, FalconModSelAlgo,
+    FalconSecurityModel, RiscvCoreSelect,
+};
 use crate::gpu::Chipset;
 
 register!(Boot0@0x00000000, "Basic revision information about the GPU";
@@ -44,3 +48,188 @@
 register!(Pgc6AonSecureScratchGroup05@0x00118234;
     31:0    value => as u32
 );
+
+/* PFALCON */
+
+register!(FalconIrqsclr@+0x00000004;
+    4:4     halt => as_bit bool;
+    6:6     swgen0 => as_bit bool;
+);
+
+register!(FalconIrqstat@+0x00000008;
+    4:4     halt => as_bit bool;
+    6:6     swgen0 => as_bit bool;
+);
+
+register!(FalconIrqmclr@+0x00000014;
+    31:0    val => as u32
+);
+
+register!(FalconIrqmask@+0x00000018;
+    31:0    val => as u32
+);
+
+register!(FalconRm@+0x00000084;
+    31:0    val => as u32
+);
+
+register!(FalconIrqdest@+0x0000001c;
+    31:0    val => as u32
+);
+
+register!(FalconMailbox0@+0x00000040;
+    31:0    mailbox0 => as u32
+);
+register!(FalconMailbox1@+0x00000044;
+    31:0    mailbox1 => as u32
+);
+
+register!(FalconHwcfg2@+0x000000f4;
+    10:10   riscv => as_bit bool;
+    12:12   mem_scrubbing => as_bit bool;
+    31:31   reset_ready => as_bit bool;
+);
+
+register!(FalconCpuCtl@+0x00000100;
+    1:1     start_cpu => as_bit bool;
+    4:4     halted => as_bit bool;
+    6:6     alias_en => as_bit bool;
+);
+
+register!(FalconBootVec@+0x00000104;
+    31:0    boot_vec => as u32
+);
+
+register!(FalconHwCfg@+0x00000108;
+    8:0     imem_size => as u32;
+    17:9    dmem_size => as u32;
+);
+
+register!(FalconDmaCtl@+0x0000010c;
+    0:0     require_ctx => as_bit bool;
+    1:1     dmem_scrubbing  => as_bit bool;
+    2:2     imem_scrubbing => as_bit bool;
+    6:3     dmaq_num => as_bit u8;
+    7:7     secure_stat => as_bit bool;
+);
+
+register!(FalconDmaTrfBase@+0x00000110;
+    31:0    base => as u32;
+);
+
+register!(FalconDmaTrfMOffs@+0x00000114;
+    23:0    offs => as u32;
+);
+
+register!(FalconDmaTrfCmd@+0x00000118;
+    0:0     full => as_bit bool;
+    1:1     idle => as_bit bool;
+    3:2     sec => as_bit u8;
+    4:4     imem => as_bit bool;
+    5:5     is_write => as_bit bool;
+    10:8    size => as u8;
+    14:12   ctxdma => as u8;
+    16:16   set_dmtag => as u8;
+);
+
+register!(FalconDmaTrfBOffs@+0x0000011c;
+    31:0    offs => as u32;
+);
+
+register!(FalconDmaTrfBase1@+0x00000128;
+    8:0     base => as u16;
+);
+
+register!(FalconHwcfg1@+0x0000012c;
+    3:0     core_rev => try_into FalconCoreRev, "core revision of the falcon";
+    5:4     security_model => try_into FalconSecurityModel, "security model of the falcon";
+    7:6     core_rev_subversion => into FalconCoreRevSubversion;
+    11:8    imem_ports => as u8;
+    15:12   dmem_ports => as u8;
+);
+
+register!(FalconCpuCtlAlias@+0x00000130;
+    1:1     start_cpu => as_bit bool;
+);
+
+/* TODO: this is an array of registers */
+register!(FalconImemC@+0x00000180;
+    7:2     offs => as u8;
+    23:8    blk => as u8;
+    24:24   aincw => as_bit bool;
+    25:25   aincr => as_bit bool;
+    28:28   secure => as_bit bool;
+    29:29   sec_atomic => as_bit bool;
+);
+
+register!(FalconImemD@+0x00000184;
+    31:0    data => as u32;
+);
+
+register!(FalconImemT@+0x00000188;
+    15:0    data => as u16;
+);
+
+register!(FalconDmemC@+0x000001c0;
+    7:2     offs => as u8;
+    23:0    addr => as u32;
+    23:8    blk => as u8;
+    24:24   aincw => as_bit bool;
+    25:25   aincr => as_bit bool;
+    26:26   settag => as_bit bool;
+    27:27   setlvl => as_bit bool;
+    28:28   va => as_bit bool;
+    29:29   miss => as_bit bool;
+);
+
+register!(FalconDmemD@+0x000001c4;
+    31:0    data => as u32;
+);
+
+register!(FalconModSel@+0x00001180;
+    7:0     algo => try_into FalconModSelAlgo;
+);
+register!(FalconBromCurrUcodeId@+0x00001198;
+    31:0    ucode_id => as u32;
+);
+register!(FalconBromEngidmask@+0x0000119c;
+    31:0    mask => as u32;
+);
+register!(FalconBromParaaddr0@+0x00001210;
+    31:0    addr => as u32;
+);
+
+register!(RiscvCpuctl@+0x00000388;
+    0:0     startcpu => as_bit bool;
+    4:4     halted => as_bit bool;
+    5:5     stopped => as_bit bool;
+    7:7     active_stat => as_bit bool;
+);
+
+register!(FalconEngine@+0x000003c0;
+    0:0     reset => as_bit bool;
+);
+
+register!(RiscvIrqmask@+0x00000528;
+    31:0    mask => as u32;
+);
+
+register!(RiscvIrqdest@+0x0000052c;
+    31:0    dest => as u32;
+);
+
+/* TODO: this is an array of registers */
+register!(FalconFbifTranscfg@+0x00000600;
+    1:0     target => try_into FalconFbifTarget;
+    2:2     mem_type => as_bit FalconFbifMemType;
+);
+
+register!(FalconFbifCtl@+0x00000624;
+    7:7     allow_phys_no_ctx => as_bit bool;
+);
+
+register!(RiscvBcrCtrl@+0x00001668;
+    0:0     valid => as_bit bool;
+    4:4     core_select => as_bit RiscvCoreSelect;
+    8:8     br_fetch => as_bit bool;
+);
diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
index 8987352f4192bc9b4b2fc0fb5f2e8e62ff27be68..c03a5c36d1230dfbf2bd6e02a793264280c6d509 100644
--- a/drivers/gpu/nova-core/timer.rs
+++ b/drivers/gpu/nova-core/timer.rs
@@ -2,9 +2,6 @@
 
 //! Nova Core Timer subdevice
 
-// To be removed when all code is used.
-#![allow(dead_code)]
-
 use core::fmt::Display;
 use core::ops::{Add, Sub};
 use core::time::Duration;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/16] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (10 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22 14:46   ` Danilo Krummrich
  2025-04-20 12:19 ` [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

FWSEC-FRTS is the first firmware we need to run on the GSP falcon in
order to initiate the GSP boot process. Introduce the structure that
describes it.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 9bad7a86382af7917b3dce7bf3087d0002bd5971..4ef5ba934b9d255635aa9a902e1d3a732d6e5568 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -43,6 +43,34 @@ pub(crate) fn new(
     }
 }
 
+/// Structure used to describe some firmwares, notable fwsec-frts.
+#[allow(dead_code)]
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct FalconUCodeDescV3 {
+    pub(crate) hdr: u32,
+    pub(crate) stored_size: u32,
+    pub(crate) pkc_data_offset: u32,
+    pub(crate) interface_offset: u32,
+    pub(crate) imem_phys_base: u32,
+    pub(crate) imem_load_size: u32,
+    pub(crate) imem_virt_base: u32,
+    pub(crate) dmem_phys_base: u32,
+    pub(crate) dmem_load_size: u32,
+    pub(crate) engine_id_mask: u16,
+    pub(crate) ucode_id: u8,
+    pub(crate) signature_count: u8,
+    pub(crate) signature_versions: u16,
+    _reserved: u16,
+}
+
+#[allow(dead_code)]
+impl FalconUCodeDescV3 {
+    pub(crate) fn size(&self) -> usize {
+        ((self.hdr & 0xffff0000) >> 16) as usize
+    }
+}
+
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
 impl<const N: usize> ModInfoBuilder<N> {

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (11 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 12/16] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-23 14:06   ` Danilo Krummrich
  2025-04-20 12:19 ` [PATCH 14/16] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

From: Joel Fernandes <joelagnelf@nvidia.com>

Add support for navigating and setting up vBIOS ucode data required for
GSP to boot. The main data extracted from the vBIOS is the FWSEC-FRTS
firmware which runs on the GSP processor. This firmware runs in high
secure mode, and sets up the WPR2 (Write protected region) before the
Booter runs on the SEC2 processor.

Also add log messages to show the BIOS images.

[102141.013287] NovaCore: Found BIOS image at offset 0x0, size: 0xfe00, type: PciAt
[102141.080692] NovaCore: Found BIOS image at offset 0xfe00, size: 0x14800, type: Efi
[102141.098443] NovaCore: Found BIOS image at offset 0x24600, size: 0x5600, type: FwSec
[102141.415095] NovaCore: Found BIOS image at offset 0x29c00, size: 0x60800, type: FwSec

Tested on my Ampere GA102 and boot is successful.

[applied changes by Alex Courbot for fwsec signatures]
[applied feedback from Alex Courbot and Timur Tabi]

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs  |    2 -
 drivers/gpu/nova-core/gpu.rs       |    5 +
 drivers/gpu/nova-core/nova_core.rs |    1 +
 drivers/gpu/nova-core/vbios.rs     | 1103 ++++++++++++++++++++++++++++++++++++
 4 files changed, 1109 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 4ef5ba934b9d255635aa9a902e1d3a732d6e5568..58c0513d49e9a0cef36917c8e2b25c414f6fc596 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -44,7 +44,6 @@ pub(crate) fn new(
 }
 
 /// Structure used to describe some firmwares, notable fwsec-frts.
-#[allow(dead_code)]
 #[repr(C)]
 #[derive(Debug, Clone)]
 pub(crate) struct FalconUCodeDescV3 {
@@ -64,7 +63,6 @@ pub(crate) struct FalconUCodeDescV3 {
     _reserved: u16,
 }
 
-#[allow(dead_code)]
 impl FalconUCodeDescV3 {
     pub(crate) fn size(&self) -> usize {
         ((self.hdr & 0xffff0000) >> 16) as usize
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index ec4c648c6e8b4aa7d06c627ed59c0e66a08c679e..2344dfc69fe4246644437d70572680a4450b5bd7 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -11,6 +11,7 @@
 use crate::regs;
 use crate::timer::Timer;
 use crate::util;
+use crate::vbios::Vbios;
 use core::fmt;
 
 macro_rules! define_chipset {
@@ -157,6 +158,7 @@ pub(crate) struct Gpu {
     fw: Firmware,
     sysmem_flush: DmaObject,
     timer: Timer,
+    bios: Vbios,
 }
 
 #[pinned_drop]
@@ -237,12 +239,15 @@ pub(crate) fn new(
 
         let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, true)?;
 
+        let bios = Vbios::probe(&bar)?;
+
         Ok(pin_init!(Self {
             spec,
             bar,
             fw,
             sysmem_flush,
             timer,
+            bios,
         }))
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 4dde8004d24882c60669b5acd6af9d6988c66a9c..2858f4a0dc35eb9d6547d5cbd81de44c8fc47bae 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -29,6 +29,7 @@ macro_rules! with_bar {
 mod regs;
 mod timer;
 mod util;
+mod vbios;
 
 kernel::module_pci_driver! {
     type: driver::NovaCore,
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
new file mode 100644
index 0000000000000000000000000000000000000000..534107b708cab0eb8d0accf7daa5718edf030358
--- /dev/null
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -0,0 +1,1103 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// To be removed when all code is used.
+#![allow(dead_code)]
+
+//! VBIOS extraction and parsing.
+
+use crate::driver::Bar0;
+use crate::firmware::FalconUCodeDescV3;
+use core::convert::TryFrom;
+use kernel::devres::Devres;
+use kernel::error::Result;
+use kernel::prelude::*;
+
+/// The offset of the VBIOS ROM in the BAR0 space.
+const ROM_OFFSET: usize = 0x300000;
+/// The maximum length of the VBIOS ROM to scan into.
+const BIOS_MAX_SCAN_LEN: usize = 0x100000;
+/// The size to read ahead when parsing initial BIOS image headers.
+const BIOS_READ_AHEAD_SIZE: usize = 1024;
+
+// PMU lookup table entry types. Used to locate PMU table entries
+// in the Fwsec image, corresponding to falcon ucodes.
+#[allow(dead_code)]
+const FALCON_UCODE_ENTRY_APPID_FIRMWARE_SEC_LIC: u8 = 0x05;
+#[allow(dead_code)]
+const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45;
+const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85;
+
+pub(crate) struct Vbios {
+    pub fwsec_image: Option<FwSecBiosImage>,
+}
+
+impl Vbios {
+    /// Read bytes from the ROM at the current end of the data vector
+    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
+        let current_len = data.len();
+        let start = ROM_OFFSET + current_len;
+
+        // Ensure length is a multiple of 4 for 32-bit reads
+        if len % core::mem::size_of::<u32>() != 0 {
+            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
+            return Err(EINVAL);
+        }
+
+        // Allocate and zero-initialize the required memory
+        data.extend_with(len, 0, GFP_KERNEL)?;
+        with_bar!(?bar0, |bar0_ref| {
+            let dst = &mut data[current_len..current_len + len];
+            for (idx, chunk) in dst
+                .chunks_exact_mut(core::mem::size_of::<u32>())
+                .enumerate()
+            {
+                let addr = start + (idx * core::mem::size_of::<u32>());
+                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
+                // method out of convenience to convert the 32-bit integer as it
+                // is in memory into a byte array without any endianness
+                // conversion or byte-swapping.
+                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
+            }
+            Ok(())
+        })?;
+
+        Ok(())
+    }
+
+    /// Read bytes at a specific offset, filling any gap
+    fn read_more_at_offset(
+        bar0: &Devres<Bar0>,
+        data: &mut KVec<u8>,
+        offset: usize,
+        len: usize,
+    ) -> Result {
+        if offset > BIOS_MAX_SCAN_LEN {
+            pr_err!("Error: exceeded BIOS scan limit.\n");
+            return Err(EINVAL);
+        }
+
+        // If offset is beyond current data size, fill the gap first
+        let current_len = data.len();
+        let gap_bytes = if offset > current_len {
+            offset - current_len
+        } else {
+            0
+        };
+
+        // Now read the requested bytes at the offset
+        Self::read_more(bar0, data, gap_bytes + len)
+    }
+
+    /// Read a BIOS image at a specific offset and create a BiosImage from it.
+    /// @data is extended as needed and a new BiosImage is returned.
+    fn read_bios_image_at_offset(
+        bar0: &Devres<Bar0>,
+        data: &mut KVec<u8>,
+        offset: usize,
+        len: usize,
+    ) -> Result<BiosImage> {
+        if offset + len > data.len() {
+            Self::read_more_at_offset(bar0, data, offset, len).inspect_err(|e| {
+                pr_err!("Failed to read more at offset {:#x}: {:?}\n", offset, e)
+            })?;
+        }
+
+        BiosImage::try_from(&data[offset..offset + len]).inspect_err(|e| {
+            pr_err!(
+                "Failed to create BiosImage at offset {:#x}: {:?}\n",
+                offset,
+                e
+            )
+        })
+    }
+
+    /// Probe for VBIOS extraction
+    /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
+    pub(crate) fn probe(bar0: &Devres<Bar0>) -> Result<Self> {
+        // VBIOS data vector: As BIOS images are scanned, they are added to this vector
+        // for reference or copying into other data structures. It is the entire
+        // scanned contents of the VBIOS which progressively extends. It is used
+        // so that we do not re-read any contents that are already read as we use
+        // the cumulative length read so far, and re-read any gaps as we extend
+        // the length
+        let mut data = KVec::new();
+
+        // Loop through all the BiosImage and extract relevant ones and relevant data from them
+        let mut cur_offset = 0;
+        let mut pci_at_image: Option<PciAtBiosImage> = None;
+        let mut first_fwsec_image: Option<FwSecBiosImage> = None;
+        let mut second_fwsec_image: Option<FwSecBiosImage> = None;
+
+        // loop till break
+        loop {
+            // Try to parse a BIOS image at the current offset
+            // This will now check for all valid ROM signatures (0xAA55, 0xBB77, 0x4E56)
+            let image_size =
+                Self::read_bios_image_at_offset(bar0, &mut data, cur_offset, BIOS_READ_AHEAD_SIZE)
+                    .and_then(|image| image.image_size_bytes())
+                    .inspect_err(|e| {
+                        pr_err!(
+                            "Failed to parse initial BIOS image headers at offset {:#x}: {:?}\n",
+                            cur_offset,
+                            e
+                        );
+                    })?;
+
+            // Create a new BiosImage with the full image data
+            let full_image =
+                Self::read_bios_image_at_offset(bar0, &mut data, cur_offset, image_size)
+                    .inspect_err(|e| {
+                        pr_err!(
+                            "Failed to parse full BIOS image at offset {:#x}: {:?}\n",
+                            cur_offset,
+                            e
+                        );
+                    })?;
+
+            // Determine the image type
+            let image_type = full_image.image_type_str();
+
+            pr_info!(
+                "Found BIOS image at offset {:#x}, size: {:#x}, type: {}\n",
+                cur_offset,
+                image_size,
+                image_type
+            );
+
+            let is_last = full_image.is_last();
+            // Get references to images we will need after the loop, in order to
+            // setup the falcon data offset.
+            match full_image {
+                BiosImage::PciAt(image) => {
+                    pci_at_image = Some(image);
+                }
+                BiosImage::FwSec(image) => {
+                    if first_fwsec_image.is_none() {
+                        first_fwsec_image = Some(image);
+                    } else {
+                        second_fwsec_image = Some(image);
+                    }
+                }
+                // For now we don't need to handle these
+                BiosImage::Efi(_image) => {}
+                BiosImage::Nbsi(_image) => {}
+            }
+
+            // Break if this is the last image
+            if is_last {
+                break;
+            }
+
+            // Move to the next image (aligned to 512 bytes)
+            cur_offset += image_size;
+            cur_offset = (cur_offset + 511) & !511;
+
+            // Safety check - don't go beyond BIOS_MAX_SCAN_LEN (1MB)
+            if cur_offset > BIOS_MAX_SCAN_LEN {
+                pr_err!("Error: exceeded BIOS scan limit, stopping scan\n");
+                break;
+            }
+        } // end of loop
+
+        // Using all the images, setup the falcon data pointer in Fwsec.
+        // We need mutable access here, so we handle the Option manually.
+        let final_fwsec_image = {
+            let mut second = second_fwsec_image; // Take ownership of the option
+            let first_ref = first_fwsec_image.as_ref();
+            let pci_at_ref = pci_at_image.as_ref();
+
+            if let (Some(second), Some(first), Some(pci_at)) =
+                (second.as_mut(), first_ref, pci_at_ref)
+            {
+                second
+                    .setup_falcon_data(pci_at, first)
+                    .inspect_err(|e| pr_err!("Falcon data setup failed: {:?}\n", e))?;
+            } else {
+                pr_err!("Missing required images for falcon data setup, skipping\n");
+            }
+            second // Return the potentially modified second image
+        };
+
+        Ok(Self {
+            fwsec_image: final_fwsec_image,
+        })
+    }
+
+    pub(crate) fn fwsec_header(&self) -> Result<&FalconUCodeDescV3> {
+        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
+        image.fwsec_header()
+    }
+
+    pub(crate) fn fwsec_ucode(&self) -> Result<&[u8]> {
+        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
+        image.fwsec_ucode(image.fwsec_header()?)
+    }
+
+    pub(crate) fn fwsec_sigs(&self) -> Result<&[u8]> {
+        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
+        image.fwsec_sigs(image.fwsec_header()?)
+    }
+}
+
+/// PCI Data Structure as defined in PCI Firmware Specification
+#[derive(Debug, Clone)]
+#[repr(C)]
+#[allow(dead_code)]
+struct PcirStruct {
+    /// PCI Data Structure signature ("PCIR" or "NPDS")
+    pub signature: [u8; 4],
+    /// PCI Vendor ID (e.g., 0x10DE for NVIDIA)
+    pub vendor_id: u16,
+    /// PCI Device ID
+    pub device_id: u16,
+    /// Device List Pointer
+    pub device_list_ptr: u16,
+    /// PCI Data Structure Length
+    pub pci_data_struct_len: u16,
+    /// PCI Data Structure Revision
+    pub pci_data_struct_rev: u8,
+    /// Class code (3 bytes, 0x03 for display controller)
+    pub class_code: [u8; 3],
+    /// Size of this image in 512-byte blocks
+    pub image_len: u16,
+    /// Revision Level of the Vendor's ROM
+    pub vendor_rom_rev: u16,
+    /// ROM image type (0x00 = PC-AT compatible, 0x03 = EFI, 0x70 = NBSI)
+    pub code_type: u8,
+    /// Last image indicator (0x00 = Not last image, 0x80 = Last image)
+    pub last_image: u8,
+    /// Maximum Run-time Image Length (units of 512 bytes)
+    pub max_runtime_image_len: u16,
+}
+
+impl TryFrom<&[u8]> for PcirStruct {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        if data.len() < core::mem::size_of::<PcirStruct>() {
+            pr_err!("Not enough data for PcirStruct\n");
+            return Err(EINVAL);
+        }
+
+        let mut signature = [0u8; 4];
+        signature.copy_from_slice(&data[0..4]);
+
+        // Signature should be "PCIR" (0x52494350) or "NPDS" (0x5344504e)
+        if &signature != b"PCIR" && &signature != b"NPDS" {
+            pr_err!("Invalid signature for PcirStruct: {:?}\n", signature);
+            return Err(EINVAL);
+        }
+
+        let mut class_code = [0u8; 3];
+        class_code.copy_from_slice(&data[13..16]);
+
+        Ok(PcirStruct {
+            signature,
+            vendor_id: u16::from_le_bytes([data[4], data[5]]),
+            device_id: u16::from_le_bytes([data[6], data[7]]),
+            device_list_ptr: u16::from_le_bytes([data[8], data[9]]),
+            pci_data_struct_len: u16::from_le_bytes([data[10], data[11]]),
+            pci_data_struct_rev: data[12],
+            class_code,
+            image_len: u16::from_le_bytes([data[16], data[17]]),
+            vendor_rom_rev: u16::from_le_bytes([data[18], data[19]]),
+            code_type: data[20],
+            last_image: data[21],
+            max_runtime_image_len: u16::from_le_bytes([data[22], data[23]]),
+        })
+    }
+}
+
+impl PcirStruct {
+    /// Check if this is the last image in the ROM
+    fn is_last(&self) -> bool {
+        self.last_image & 0x80 != 0
+    }
+
+    /// Calculate image size in bytes
+    fn image_size_bytes(&self) -> Result<usize> {
+        if self.image_len > 0 {
+            // Image size is in 512-byte blocks
+            Ok(self.image_len as usize * 512)
+        } else {
+            Err(EINVAL)
+        }
+    }
+}
+
+/// BIOS Information Table (BIT) Header
+/// This is the head of the BIT table, that is used to locate the Falcon data.
+/// The BIT table (with its header) is in the PciAtBiosImage and the falcon data
+/// it is pointing to is in the FwSecBiosImage.
+#[derive(Debug, Clone, Copy)]
+#[allow(dead_code)]
+struct BitHeader {
+    /// 0h: BIT Header Identifier (BMP=0x7FFF/BIT=0xB8FF)
+    pub id: u16,
+    /// 2h: BIT Header Signature ("BIT\0")
+    pub signature: [u8; 4],
+    /// 6h: Binary Coded Decimal Version, ex: 0x0100 is 1.00.
+    pub bcd_version: u16,
+    /// 8h: Size of BIT Header (in bytes)
+    pub header_size: u8,
+    /// 9h: Size of BIT Tokens (in bytes)
+    pub token_size: u8,
+    /// 10h: Number of token entries that follow
+    pub token_entries: u8,
+    /// 11h: BIT Header Checksum
+    pub checksum: u8,
+}
+
+impl TryFrom<&[u8]> for BitHeader {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        if data.len() < 12 {
+            return Err(EINVAL);
+        }
+
+        let mut signature = [0u8; 4];
+        signature.copy_from_slice(&data[2..6]);
+
+        // Check header ID and signature
+        let id = u16::from_le_bytes([data[0], data[1]]);
+        if id != 0xB8FF || &signature != b"BIT\0" {
+            return Err(EINVAL);
+        }
+
+        Ok(BitHeader {
+            id,
+            signature,
+            bcd_version: u16::from_le_bytes([data[6], data[7]]),
+            header_size: data[8],
+            token_size: data[9],
+            token_entries: data[10],
+            checksum: data[11],
+        })
+    }
+}
+
+/// BIT Token Entry: Records in the BIT table followed by the BIT header
+#[derive(Debug, Clone, Copy)]
+#[allow(dead_code)]
+struct BitToken {
+    /// 00h: Token identifier
+    pub id: u8,
+    /// 01h: Version of the token data
+    pub data_version: u8,
+    /// 02h: Size of token data in bytes
+    pub data_size: u16,
+    /// 04h: Offset to the token data
+    pub data_offset: u16,
+}
+
+// Define the token ID for the Falcon data
+pub(in crate::vbios) const BIT_TOKEN_ID_FALCON_DATA: u8 = 0x70;
+
+impl BitToken {
+    /// Find a BIT token entry by BIT ID in a PciAtBiosImage
+    pub(in crate::vbios) fn from_id(image: &PciAtBiosImage, token_id: u8) -> Result<Self> {
+        let header = image.bit_header.as_ref().ok_or(EINVAL)?;
+
+        // Offset to the first token entry
+        let tokens_start = image.bit_offset.unwrap() + header.header_size as usize;
+
+        for i in 0..header.token_entries as usize {
+            let entry_offset = tokens_start + (i * header.token_size as usize);
+
+            // Make sure we don't go out of bounds
+            if entry_offset + header.token_size as usize > image.base.data.len() {
+                return Err(EINVAL);
+            }
+
+            // Check if this token has the requested ID
+            if image.base.data[entry_offset] == token_id {
+                return Ok(BitToken {
+                    id: image.base.data[entry_offset],
+                    data_version: image.base.data[entry_offset + 1],
+                    data_size: u16::from_le_bytes([
+                        image.base.data[entry_offset + 2],
+                        image.base.data[entry_offset + 3],
+                    ]),
+                    data_offset: u16::from_le_bytes([
+                        image.base.data[entry_offset + 4],
+                        image.base.data[entry_offset + 5],
+                    ]),
+                });
+            }
+        }
+
+        // Token not found
+        Err(ENOENT)
+    }
+}
+
+/// PCI ROM Expansion Header as defined in PCI Firmware Specification.
+/// This is header is at the beginning of every image in the set of
+/// images in the ROM. It contains a pointer to the PCI Data Structure
+/// which describes the image.
+/// For "NBSI" images (NoteBook System Information), the ROM
+/// header deviates from the standard and contains an offset to the
+/// NBSI image however we do not yet parse that in this module and keep
+/// it for future reference.
+#[derive(Debug, Clone, Copy)]
+#[allow(dead_code)]
+struct PciRomHeader {
+    /// 00h: Signature (0xAA55)
+    pub signature: u16,
+    /// 02h: Reserved bytes for processor architecture unique data (20 bytes)
+    pub reserved: [u8; 20],
+    /// 16h: NBSI Data Offset (NBSI-specific, offset from header to NBSI image)
+    pub nbsi_data_offset: Option<u16>,
+    /// 18h: Pointer to PCI Data Structure (offset from start of ROM image)
+    pub pci_data_struct_offset: u16,
+    /// 1Ah: Size of block (this is NBSI-specific)
+    pub size_of_block: Option<u32>,
+}
+
+impl TryFrom<&[u8]> for PciRomHeader {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        if data.len() < 26 {
+            // Need at least 26 bytes to read pciDataStrucPtr and sizeOfBlock
+            return Err(EINVAL);
+        }
+
+        let signature = u16::from_le_bytes([data[0], data[1]]);
+
+        // Check for valid ROM signatures
+        match signature {
+            0xAA55 | 0xBB77 | 0x4E56 => {}
+            _ => {
+                pr_err!("ROM signature unknown {:#x}\n", signature);
+                return Err(EINVAL);
+            }
+        }
+
+        // Read the pointer to the PCI Data Structure at offset 0x18
+        let pci_data_struct_ptr = u16::from_le_bytes([data[24], data[25]]);
+
+        // Try to read optional fields if enough data
+        let mut size_of_block = None;
+        let mut nbsi_data_offset = None;
+
+        if data.len() >= 30 {
+            // Read size_of_block at offset 0x1A
+            size_of_block = Some(
+                (data[29] as u32) << 24
+                    | (data[28] as u32) << 16
+                    | (data[27] as u32) << 8
+                    | (data[26] as u32),
+            );
+        }
+
+        // For NBSI images, try to read the nbsiDataOffset at offset 0x16
+        if data.len() >= 24 {
+            nbsi_data_offset = Some(u16::from_le_bytes([data[22], data[23]]));
+        }
+
+        Ok(PciRomHeader {
+            signature,
+            reserved: [0u8; 20],
+            pci_data_struct_offset: pci_data_struct_ptr,
+            size_of_block,
+            nbsi_data_offset,
+        })
+    }
+}
+
+/// NVIDIA PCI Data Extension Structure. This is similar to the
+/// PCI Data Structure, but is Nvidia-specific and is placed right after
+/// the PCI Data Structure. It contains some fields that are redundant
+/// with the PCI Data Structure, but are needed for traversing the
+/// BIOS images. It is expected to be present in all BIOS images except
+/// for NBSI images.
+#[derive(Debug, Clone)]
+#[allow(dead_code)]
+struct NpdeStruct {
+    /// 00h: Signature ("NPDE")
+    pub signature: [u8; 4],
+    /// 04h: NVIDIA PCI Data Extension Revision
+    pub npci_data_ext_rev: u16,
+    /// 06h: NVIDIA PCI Data Extension Length
+    pub npci_data_ext_len: u16,
+    /// 08h: Sub-image Length (in 512-byte units)
+    pub subimage_len: u16,
+    /// 0Ah: Last image indicator flag
+    pub last_image: u8,
+}
+
+impl TryFrom<&[u8]> for NpdeStruct {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        if data.len() < 11 {
+            pr_err!("Not enough data for NpdeStruct\n");
+            return Err(EINVAL);
+        }
+
+        let mut signature = [0u8; 4];
+        signature.copy_from_slice(&data[0..4]);
+
+        // Signature should be "NPDE" (0x4544504E)
+        if &signature != b"NPDE" {
+            pr_err!("Invalid signature for NpdeStruct: {:?}\n", signature);
+            return Err(EINVAL);
+        }
+
+        Ok(NpdeStruct {
+            signature,
+            npci_data_ext_rev: u16::from_le_bytes([data[4], data[5]]),
+            npci_data_ext_len: u16::from_le_bytes([data[6], data[7]]),
+            subimage_len: u16::from_le_bytes([data[8], data[9]]),
+            last_image: data[10],
+        })
+    }
+}
+
+impl NpdeStruct {
+    /// Check if this is the last image in the ROM
+    fn is_last(&self) -> bool {
+        self.last_image & 0x80 != 0
+    }
+
+    /// Calculate image size in bytes
+    fn image_size_bytes(&self) -> Result<usize> {
+        if self.subimage_len > 0 {
+            // Image size is in 512-byte blocks
+            Ok(self.subimage_len as usize * 512)
+        } else {
+            Err(EINVAL)
+        }
+    }
+
+    /// Try to find NPDE in the data, the NPDE is right after the PCIR.
+    fn find_in_data(data: &[u8], rom_header: &PciRomHeader, pcir: &PcirStruct) -> Option<Self> {
+        // Calculate the offset where NPDE might be located
+        // NPDE should be right after the PCIR structure, aligned to 16 bytes
+        let pcir_offset = rom_header.pci_data_struct_offset as usize;
+        let npde_start = (pcir_offset + pcir.pci_data_struct_len as usize + 0x0F) & !0x0F;
+
+        // Check if we have enough data
+        if npde_start + 11 > data.len() {
+            pr_err!("Not enough data for NPDE\n");
+            return None;
+        }
+
+        // Try to create NPDE from the data
+        NpdeStruct::try_from(&data[npde_start..])
+            .inspect_err(|e| {
+                pr_err!("Error creating NpdeStruct: {:?}\n", e);
+            })
+            .ok()
+    }
+}
+// Use a macro to implement BiosImage enum and methods. This avoids having to
+// repeat each enum type when implementing functions like base() in BiosImage.
+macro_rules! bios_image {
+    (
+        $($variant:ident $class:ident),* $(,)?
+    ) => {
+        // BiosImage enum with variants for each image type
+        enum BiosImage {
+            $($variant($class)),*
+        }
+
+        impl BiosImage {
+            /// Get a reference to the common BIOS image data regardless of type
+            fn base(&self) -> &BiosImageBase {
+                match self {
+                    $(Self::$variant(img) => &img.base),*
+                }
+            }
+
+            /// Returns a string representing the type of BIOS image
+            fn image_type_str(&self) -> &'static str {
+                match self {
+                    $(Self::$variant(_) => stringify!($variant)),*
+                }
+            }
+        }
+    }
+}
+
+impl BiosImage {
+    /// Check if this is the last image
+    fn is_last(&self) -> bool {
+        let base = self.base();
+
+        // For NBSI images (type == 0x70), return true as they're
+        // considered the last image
+        if matches!(self, Self::Nbsi(_)) {
+            return true;
+        }
+
+        // For other image types, check NPDE first if available
+        if let Some(ref npde) = base.npde {
+            return npde.is_last();
+        }
+
+        // Otherwise, fall back to checking the PCIR last_image flag
+        base.pcir.is_last()
+    }
+
+    /// Get the image size in bytes
+    fn image_size_bytes(&self) -> Result<usize> {
+        let base = self.base();
+
+        // Prefer NPDE image size if available
+        if let Some(ref npde) = base.npde {
+            return npde.image_size_bytes();
+        }
+
+        // Otherwise, fall back to the PCIR image size
+        base.pcir.image_size_bytes()
+    }
+}
+
+bios_image! {
+    PciAt PciAtBiosImage,   // PCI-AT compatible BIOS image
+    Efi EfiBiosImage,       // EFI (Extensible Firmware Interface)
+    Nbsi NbsiBiosImage,     // NBSI (Nvidia Bios System Interface)
+    FwSec FwSecBiosImage    // FWSEC (Firmware Security)
+}
+
+struct PciAtBiosImage {
+    base: BiosImageBase,
+    bit_header: Option<BitHeader>,
+    bit_offset: Option<usize>,
+}
+
+struct EfiBiosImage {
+    base: BiosImageBase,
+    // EFI-specific fields can be added here in the future.
+}
+
+struct NbsiBiosImage {
+    base: BiosImageBase,
+    // NBSI-specific fields can be added here in the future.
+}
+
+pub(crate) struct FwSecBiosImage {
+    base: BiosImageBase,
+    // FWSEC-specific fields
+    // The offset of the Falcon data from the start of Fwsec image
+    falcon_data_offset: Option<usize>,
+    // The PmuLookupTable starts at the offset of the falcon data pointer
+    pmu_lookup_table: Option<PmuLookupTable>,
+    // The offset of the Falcon ucode
+    falcon_ucode_offset: Option<usize>,
+}
+
+// Convert from BiosImageBase to BiosImage
+impl TryFrom<BiosImageBase> for BiosImage {
+    type Error = Error;
+
+    fn try_from(base: BiosImageBase) -> Result<Self> {
+        match base.pcir.code_type {
+            0x00 => Ok(BiosImage::PciAt(base.try_into()?)),
+            0x03 => Ok(BiosImage::Efi(EfiBiosImage { base })),
+            0x70 => Ok(BiosImage::Nbsi(NbsiBiosImage { base })),
+            0xE0 => Ok(BiosImage::FwSec(FwSecBiosImage {
+                base,
+                falcon_data_offset: None,
+                pmu_lookup_table: None,
+                falcon_ucode_offset: None,
+            })),
+            _ => {
+                pr_err!("Unknown BIOS image type {:#x}\n", base.pcir.code_type);
+                Err(EINVAL)
+            }
+        }
+    }
+}
+
+/// BiosImage creation from a byte slice. This creates a BiosImageBase
+/// and then converts it to a BiosImage which triggers the constructor of
+/// the specific BiosImage enum variant.
+impl TryFrom<&[u8]> for BiosImage {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        let base = BiosImageBase::try_from(data)?;
+        let image = base.to_image()?;
+
+        image
+            .image_size_bytes()
+            .inspect_err(|_| pr_err!("Invalid image size computed during BiosImage creation\n"))?;
+
+        Ok(image)
+    }
+}
+
+/// BIOS Image structure containing various headers and references
+/// fields base to all BIOS images. Each BiosImage type has a
+/// BiosImageBase type along with other image-specific fields.
+/// Note that Rust favors composition of types over inheritance.
+#[derive(Debug)]
+#[allow(dead_code)]
+struct BiosImageBase {
+    /// PCI ROM Expansion Header
+    pub rom_header: PciRomHeader,
+    /// PCI Data Structure
+    pub pcir: PcirStruct,
+    /// NVIDIA PCI Data Extension (optional)
+    pub npde: Option<NpdeStruct>,
+    /// Image data (includes ROM header and PCIR)
+    pub data: KVec<u8>,
+}
+
+impl BiosImageBase {
+    fn to_image(self) -> Result<BiosImage> {
+        BiosImage::try_from(self)
+    }
+}
+
+impl TryFrom<&[u8]> for BiosImageBase {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        // Ensure we have enough data for the ROM header
+        if data.len() < 26 {
+            pr_err!("Not enough data for ROM header\n");
+            return Err(EINVAL);
+        }
+
+        // Parse the ROM header
+        let rom_header = PciRomHeader::try_from(&data[0..26])
+            .inspect_err(|e| pr_err!("Failed to create PciRomHeader: {:?}\n", e))?;
+
+        // Get the PCI Data Structure using the pointer from the ROM header
+        let pcir_offset = rom_header.pci_data_struct_offset as usize;
+        let pcir_data = data
+            .get(pcir_offset..pcir_offset + core::mem::size_of::<PcirStruct>())
+            .ok_or(EINVAL)
+            .inspect_err(|_| {
+                pr_err!(
+                    "PCIR offset {:#x} out of bounds (data length: {})\n",
+                    pcir_offset,
+                    data.len()
+                );
+                pr_err!("Consider reading more data for construction of BiosImage\n");
+            })?;
+
+        let pcir = PcirStruct::try_from(pcir_data)
+            .inspect_err(|e| pr_err!("Failed to create PcirStruct: {:?}\n", e))?;
+
+        // Look for NPDE structure if this is not an NBSI image (type != 0x70)
+        let npde = NpdeStruct::find_in_data(data, &rom_header, &pcir);
+
+        // Create a copy of the data
+        let mut data_copy = KVec::new();
+        data_copy.extend_with(data.len(), 0, GFP_KERNEL)?;
+        data_copy.copy_from_slice(data);
+
+        Ok(BiosImageBase {
+            rom_header,
+            pcir,
+            npde,
+            data: data_copy,
+        })
+    }
+}
+
+/// The PciAt BIOS image is typically the first BIOS image type found in the
+/// BIOS image chain. It contains the BIT header and the BIT tokens.
+impl PciAtBiosImage {
+    /// Find a byte pattern in a slice
+    fn find_byte_pattern(haystack: &[u8], needle: &[u8]) -> Option<usize> {
+        haystack
+            .windows(needle.len())
+            .position(|window| window == needle)
+    }
+
+    /// Find the BIT header in the PciAtBiosImage
+    fn find_bit_header(data: &[u8]) -> Result<(BitHeader, usize)> {
+        let bit_pattern = [0xff, 0xb8, b'B', b'I', b'T', 0x00];
+        let bit_offset = Self::find_byte_pattern(data, &bit_pattern);
+        if bit_offset.is_none() {
+            return Err(EINVAL);
+        }
+
+        let bit_header = BitHeader::try_from(&data[bit_offset.unwrap()..])?;
+        Ok((bit_header, bit_offset.unwrap()))
+    }
+
+    /// Get a BIT token entry from the BIT table in the PciAtBiosImage
+    fn get_bit_token(&self, token_id: u8) -> Result<BitToken> {
+        BitToken::from_id(self, token_id)
+    }
+
+    /// Find the Falcon data pointer structure in the PciAtBiosImage
+    /// This is just a 4 byte structure that contains a pointer to the
+    /// Falcon data in the FWSEC image.
+    fn falcon_data_ptr(&self) -> Result<u32> {
+        let token = self.get_bit_token(BIT_TOKEN_ID_FALCON_DATA)?;
+
+        // Make sure we don't go out of bounds
+        if token.data_offset as usize + 4 > self.base.data.len() {
+            return Err(EINVAL);
+        }
+
+        // read the 4 bytes at the offset specified in the token
+        let offset = token.data_offset as usize;
+        let bytes: [u8; 4] = self.base.data[offset..offset + 4].try_into().map_err(|_| {
+            pr_err!("Failed to convert data slice to array");
+            EINVAL
+        })?;
+
+        let data_ptr = u32::from_le_bytes(bytes);
+
+        if (data_ptr as usize) < self.base.data.len() {
+            pr_err!("Falcon data pointer out of bounds\n");
+            return Err(EINVAL);
+        }
+
+        Ok(data_ptr)
+    }
+}
+
+impl TryFrom<BiosImageBase> for PciAtBiosImage {
+    type Error = Error;
+
+    fn try_from(base: BiosImageBase) -> Result<Self> {
+        let data_slice = &base.data;
+        let (bit_header, bit_offset) = PciAtBiosImage::find_bit_header(data_slice)?;
+
+        Ok(PciAtBiosImage {
+            base,
+            bit_header: Some(bit_header),
+            bit_offset: Some(bit_offset),
+        })
+    }
+}
+
+/// The PmuLookupTableEntry structure is a single entry in the PmuLookupTable.
+/// See the PmuLookupTable description for more information.
+#[allow(dead_code)]
+struct PmuLookupTableEntry {
+    application_id: u8,
+    target_id: u8,
+    data: u32,
+}
+
+impl TryFrom<&[u8]> for PmuLookupTableEntry {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        if data.len() < 5 {
+            return Err(EINVAL);
+        }
+
+        Ok(PmuLookupTableEntry {
+            application_id: data[0],
+            target_id: data[1],
+            data: u32::from_le_bytes(data[2..6].try_into().map_err(|_| EINVAL)?),
+        })
+    }
+}
+
+/// The PmuLookupTableEntry structure is used to find the PmuLookupTableEntry
+/// for a given application ID. The table of entries is pointed to by the falcon
+/// data pointer in the BIT table, and is used to locate the Falcon Ucode.
+#[allow(dead_code)]
+struct PmuLookupTable {
+    version: u8,
+    header_len: u8,
+    entry_len: u8,
+    entry_count: u8,
+    table_data: KVec<u8>,
+}
+
+impl TryFrom<&[u8]> for PmuLookupTable {
+    type Error = Error;
+
+    fn try_from(data: &[u8]) -> Result<Self> {
+        if data.len() < 4 {
+            return Err(EINVAL);
+        }
+
+        let header_len = data[1] as usize;
+        let entry_len = data[2] as usize;
+        let entry_count = data[3] as usize;
+
+        let required_bytes = header_len + (entry_count * entry_len);
+
+        if data.len() < required_bytes {
+            return Err(EINVAL);
+        }
+
+        // Create a copy of only the table data
+        let mut table_data = KVec::new();
+
+        // "last_entry_bytes" is a debugging aid.
+        // let mut last_entry_bytes: Option<KVec<u8>> = Some(KVec::new());
+
+        for &byte in &data[header_len..required_bytes] {
+            table_data.push(byte, GFP_KERNEL)?;
+            /*
+             * Uncomment for debugging (dumps the table data to dmesg):
+             * last_entry_bytes.as_mut().ok_or(EINVAL)?.push(byte, GFP_KERNEL)?;
+             *
+             * let last_entry_bytes_len = last_entry_bytes.as_ref().ok_or(EINVAL)?.len();
+             * if last_entry_bytes_len == entry_len {
+             *     pr_info!("Last entry bytes: {:02x?}\n", &last_entry_bytes.as_ref().ok_or(EINVAL)?[..]);
+             *     last_entry_bytes = Some(KVec::new());
+             * }
+             */
+        }
+
+        Ok(PmuLookupTable {
+            version: data[0],
+            header_len: header_len as u8,
+            entry_len: entry_len as u8,
+            entry_count: entry_count as u8,
+            table_data,
+        })
+    }
+}
+
+impl PmuLookupTable {
+    fn lookup_index(&self, idx: u8) -> Result<PmuLookupTableEntry> {
+        if idx >= self.entry_count {
+            return Err(EINVAL);
+        }
+
+        let index = (idx as usize) * self.entry_len as usize;
+        Ok(PmuLookupTableEntry::try_from(&self.table_data[index..])?)
+    }
+
+    // find entry by type value
+    fn find_entry_by_type(&self, entry_type: u8) -> Result<PmuLookupTableEntry> {
+        for i in 0..self.entry_count {
+            let entry = self.lookup_index(i)?;
+            if entry.application_id == entry_type {
+                return Ok(entry);
+            }
+        }
+
+        Err(EINVAL)
+    }
+}
+
+/// The FwSecBiosImage structure contains the PMU table and the Falcon Ucode.
+/// The PMU table contains voltage/frequency tables as well as a pointer to the
+/// Falcon Ucode.
+impl FwSecBiosImage {
+    fn setup_falcon_data(
+        &mut self,
+        pci_at_image: &PciAtBiosImage,
+        first_fwsec_image: &FwSecBiosImage,
+    ) -> Result<()> {
+        let mut offset = pci_at_image.falcon_data_ptr()? as usize;
+
+        // The falcon data pointer assumes that the PciAt and FWSEC images
+        // are contiguous in memory. However, testing shows the EFI image sits in
+        // between them. So calculate the offset from the end of the PciAt image
+        // rather than the start of it. Compensate.
+        offset -= pci_at_image.base.data.len();
+
+        // The offset is now from the start of the first Fwsec image, however
+        // the offset points to a location in the second Fwsec image. Since
+        // the fwsec images are contiguous, subtract the length of the first Fwsec
+        // image from the offset to get the offset to the start of the second
+        // Fwsec image.
+        offset -= first_fwsec_image.base.data.len();
+
+        self.falcon_data_offset = Some(offset);
+
+        // The PmuLookupTable starts at the offset of the falcon data pointer
+        self.pmu_lookup_table = Some(PmuLookupTable::try_from(&self.base.data[offset..])?);
+
+        match self
+            .pmu_lookup_table
+            .as_ref()
+            .ok_or(EINVAL)?
+            .find_entry_by_type(FALCON_UCODE_ENTRY_APPID_FWSEC_PROD)
+        {
+            Ok(entry) => {
+                let mut ucode_offset = entry.data as usize;
+                ucode_offset -= pci_at_image.base.data.len();
+                ucode_offset -= first_fwsec_image.base.data.len();
+                self.falcon_ucode_offset = Some(ucode_offset);
+
+                /*
+                 * Uncomment for debug: print the v3_desc header
+                 * let v3_desc = self.fwsec_header()?;
+                 * pr_info!("PmuLookupTableEntry v3_desc: {:#?}\n", v3_desc);
+                 */
+            }
+            Err(e) => {
+                pr_err!("PmuLookupTableEntry not found, error: {:?}\n", e);
+            }
+        }
+        Ok(())
+    }
+
+    /// TODO: These were borrowed from the old code for integrating this module
+    /// with the outside world. They should be cleaned up and integrated properly.
+    ///
+    /// Get the FwSec header (FalconUCodeDescV3)
+    fn fwsec_header(&self) -> Result<&FalconUCodeDescV3> {
+        // Get the falcon ucode offset that was found in setup_falcon_data
+        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)? as usize;
+
+        // Make sure the offset is within the data bounds
+        if falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>() > self.base.data.len() {
+            pr_err!("fwsec-frts header not contained within BIOS bounds\n");
+            return Err(ERANGE);
+        }
+
+        // Read the first 4 bytes to get the version
+        let hdr_bytes: [u8; 4] = self.base.data[falcon_ucode_offset..falcon_ucode_offset + 4]
+            .try_into()
+            .map_err(|_| EINVAL)?;
+        let hdr = u32::from_le_bytes(hdr_bytes);
+        let ver = (hdr & 0xff00) >> 8;
+
+        if ver != 3 {
+            pr_err!("invalid fwsec firmware version\n");
+            return Err(EINVAL);
+        }
+
+        // Return a reference to the FalconUCodeDescV3 structure
+        Ok(unsafe {
+            &*(self.base.data.as_ptr().add(falcon_ucode_offset) as *const FalconUCodeDescV3)
+        })
+    }
+    /// Get the ucode data as a byte slice
+    fn fwsec_ucode(&self, v3_desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)? as usize;
+
+        // The ucode data follows the descriptor
+        let ucode_data_offset = falcon_ucode_offset + v3_desc.size();
+        let size = (v3_desc.imem_load_size + v3_desc.dmem_load_size) as usize;
+
+        // Get the data slice, checking bounds in a single operation
+        self.base
+            .data
+            .get(ucode_data_offset..ucode_data_offset + size)
+            .ok_or(ERANGE)
+            .inspect_err(|_| pr_err!("fwsec ucode data not contained within BIOS bounds\n"))
+    }
+
+    /// Get the signatures as a byte slice
+    fn fwsec_sigs(&self, v3_desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+        const SIG_SIZE: usize = 96 * 4;
+
+        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)? as usize;
+
+        // The signatures data follows the descriptor
+        let sigs_data_offset = falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
+        let size = v3_desc.signature_count as usize * SIG_SIZE;
+
+        // Make sure the data is within bounds
+        if sigs_data_offset + size > self.base.data.len() {
+            pr_err!("fwsec signatures data not contained within BIOS bounds\n");
+            return Err(ERANGE);
+        }
+
+        Ok(&self.base.data[sigs_data_offset..sigs_data_offset + size])
+    }
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/16] gpu: nova-core: compute layout of the FRTS region
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (12 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 15/16] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

FWSEC-FRTS is run with the desired address of the FRTS region as
parameter, which we need to compute depending on some hardware
parameters.

Do this in a `FbLayout` structure, that will be later extended to
describe more memory regions used to boot the GSP.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |   4 ++
 drivers/gpu/nova-core/gsp.rs       |   3 +
 drivers/gpu/nova-core/gsp/fb.rs    | 109 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  27 +++++++++
 5 files changed, 144 insertions(+)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 2344dfc69fe4246644437d70572680a4450b5bd7..b43d1fc6bba15ffd76d564eccdb9e2afe239a3a4 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -8,6 +8,7 @@
 use crate::falcon::gsp::GspFalcon;
 use crate::falcon::sec2::Sec2Falcon;
 use crate::firmware::Firmware;
+use crate::gsp::fb::FbLayout;
 use crate::regs;
 use crate::timer::Timer;
 use crate::util;
@@ -241,6 +242,9 @@ pub(crate) fn new(
 
         let bios = Vbios::probe(&bar)?;
 
+        let fb_layout = FbLayout::new(spec.chipset, &bar)?;
+        dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
+
         Ok(pin_init!(Self {
             spec,
             bar,
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..27616a9d2b7069b18661fc97811fa1cac285b8f8
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+
+pub(crate) mod fb;
diff --git a/drivers/gpu/nova-core/gsp/fb.rs b/drivers/gpu/nova-core/gsp/fb.rs
new file mode 100644
index 0000000000000000000000000000000000000000..63f41dfa184c434aa4eb7d4cb1f5f1e6f0552563
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/fb.rs
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::ops::Range;
+
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+use crate::regs;
+
+fn align_down(value: u64, align: u64) -> u64 {
+    value & !(align - 1)
+}
+
+/// Layout of the GPU framebuffer memory.
+///
+/// Contains ranges of GPU memory reserved for a given purpose during the GSP bootup process.
+#[derive(Debug)]
+#[allow(dead_code)]
+pub(crate) struct FbLayout {
+    pub fb: Range<u64>,
+
+    pub vga_workspace: Range<u64>,
+    pub bios: Range<u64>,
+
+    pub frts: Range<u64>,
+}
+
+impl FbLayout {
+    pub(crate) fn new(chipset: Chipset, bar: &Devres<Bar0>) -> Result<Self> {
+        let fb = {
+            let fb_size = with_bar!(bar, |b| vidmem_size(b, chipset))?;
+
+            0..fb_size
+        };
+        let fb_len = fb.end - fb.start;
+
+        let vga_workspace = {
+            let vga_base = with_bar!(bar, |b| vga_workspace_addr(&b, fb_len, chipset,))?;
+
+            vga_base..fb.end
+        };
+
+        let bios = vga_workspace.clone();
+
+        let frts = {
+            const FRTS_DOWN_ALIGN: u64 = 0x20000;
+            const FRTS_SIZE: u64 = 0x100000;
+            let frts_base = align_down(vga_workspace.start, FRTS_DOWN_ALIGN) - FRTS_SIZE;
+
+            frts_base..frts_base + FRTS_SIZE
+        };
+
+        Ok(Self {
+            fb,
+            vga_workspace,
+            bios,
+            frts,
+        })
+    }
+}
+
+/// Returns `true` if the display is disabled.
+fn display_disabled(bar: &Bar0, chipset: Chipset) -> bool {
+    if chipset >= Chipset::GA100 {
+        regs::FuseStatusOptDisplayAmpere::read(bar).display_disabled()
+    } else {
+        regs::FuseStatusOptDisplayMaxwell::read(bar).display_disabled()
+    }
+}
+
+/// Returns the video memory size in bytes.
+fn vidmem_size(bar: &Bar0, chipset: Chipset) -> u64 {
+    if chipset >= Chipset::GA102 {
+        (regs::Pgc6AonSecureScratchGroup42::read(bar).value() as u64) << 20
+    } else {
+        let local_mem_range = regs::PfbPriMmuLocalMemoryRange::read(bar);
+        let size =
+            (local_mem_range.lower_mag() as u64) << ((local_mem_range.lower_scale() as u64) + 20);
+
+        if local_mem_range.ecc_mode_enabled() {
+            size / 16 * 15
+        } else {
+            size
+        }
+    }
+}
+
+/// Returns the vga workspace address.
+fn vga_workspace_addr(bar: &Bar0, fb_size: u64, chipset: Chipset) -> u64 {
+    let base = fb_size - 0x100000;
+    let vga_workspace_base = if display_disabled(bar, chipset) {
+        regs::PdispVgaWorkspaceBase::read(bar)
+    } else {
+        return base;
+    };
+
+    if !vga_workspace_base.status_valid() {
+        return base;
+    }
+
+    let addr = (vga_workspace_base.addr() as u64) << 16;
+    if addr < base {
+        fb_size - 0x20000
+    } else {
+        addr
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 2858f4a0dc35eb9d6547d5cbd81de44c8fc47bae..b78a71dea6e10707dc594fdc070b71dbb663e505 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -26,6 +26,7 @@ macro_rules! with_bar {
 mod falcon;
 mod firmware;
 mod gpu;
+mod gsp;
 mod regs;
 mod timer;
 mod util;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index c76a16dc8e7267a4eb54cb71e1cca6fb9e00188f..3954542fdd77debd8f96d111ddd231d72dbf5b5a 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -38,6 +38,12 @@
     23:0    adr_63_40 => as u32
 );
 
+register!(PfbPriMmuLocalMemoryRange@0x00100ce0;
+    3:0     lower_scale => as u8;
+    9:4     lower_mag => as u8;
+    30:30   ecc_mode_enabled => as_bit bool;
+);
+
 /* GC6 */
 
 register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
@@ -49,6 +55,27 @@
     31:0    value => as u32
 );
 
+register!(Pgc6AonSecureScratchGroup42@0x001183a4;
+    31:0    value => as u32
+);
+
+/* PDISP */
+
+register!(PdispVgaWorkspaceBase@0x00625f04;
+    3:3     status_valid => as_bit bool;
+    31:8    addr => as u32;
+);
+
+/* FUSE */
+
+register!(FuseStatusOptDisplayMaxwell@0x00021c04;
+    0:0     display_disabled => as_bit bool;
+);
+
+register!(FuseStatusOptDisplayAmpere@0x00820c04;
+    0:0     display_disabled => as_bit bool;
+);
+
 /* PFALCON */
 
 register!(FalconIrqsclr@+0x00000004;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/16] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (13 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 14/16] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-20 12:19 ` [PATCH 16/16] gpu: nova-core: load and " Alexandre Courbot
  2025-04-22  8:40 ` [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

The FWSEC firmware needs to be extracted from the VBIOS and patched with
the desired command, as well as the right signature. Do this so we are
ready to load and run this firmware into the GSP falcon and create the
FRTS region.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs       |  22 ++-
 drivers/gpu/nova-core/firmware/fwsec.rs | 340 ++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gpu.rs            |  18 +-
 3 files changed, 378 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 58c0513d49e9a0cef36917c8e2b25c414f6fc596..010763afdd74e92a4380d739a17319e05781007f 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -8,9 +8,14 @@
 use kernel::prelude::*;
 use kernel::str::CString;
 
+use crate::dma::DmaObject;
 use crate::gpu;
 use crate::gpu::Chipset;
 
+pub(crate) mod fwsec;
+
+pub(crate) const FIRMWARE_VERSION: &'static str = "535.113.01";
+
 /// Structure encapsulating the firmware blobs required for the GPU to operate.
 #[expect(dead_code)]
 pub(crate) struct Firmware {
@@ -69,10 +74,25 @@ pub(crate) fn size(&self) -> usize {
     }
 }
 
+/// Patch the `ucode_dma` firmware at offset `sig_base_img` with `signature`.
+fn patch_signature(ucode_dma: &mut DmaObject, signature: &[u8], sig_base_img: usize) -> Result<()> {
+    if sig_base_img + signature.len() > ucode_dma.len {
+        return Err(ERANGE);
+    }
+
+    // SAFETY: we are the only user of this object, so there cannot be any race.
+    let dst = unsafe { ucode_dma.dma.start_ptr_mut().offset(sig_base_img as isize) };
+
+    // SAFETY: `signature` and `dst` are valid, properly aligned, and do not overlap.
+    unsafe { core::ptr::copy_nonoverlapping(signature.as_ptr(), dst, signature.len()) };
+
+    Ok(())
+}
+
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
 impl<const N: usize> ModInfoBuilder<N> {
-    const VERSION: &'static str = "535.113.01";
+    const VERSION: &'static str = FIRMWARE_VERSION;
 
     const fn make_entry_file(self, chipset: &str, fw: &str) -> Self {
         ModInfoBuilder(
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
new file mode 100644
index 0000000000000000000000000000000000000000..664319d1d31c9727bb830100641c53b5d914be5a
--- /dev/null
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! FWSEC is a High Secure firmware that is extracted from the BIOS and performs the first step of
+//! the GSP startup by creating the WPR2 memory region and copying critical areas of the VBIOS into
+//! it after authenticating them, ensuring they haven't been tampered with. It runs on the GSP
+//! falcon.
+//!
+//! Before being run, it needs to be patched in two areas:
+//!
+//! - The command to be run, as this firmware can perform several tasks ;
+//! - The ucode signature, so the GSP falcon can run FWSEC in HS mode.
+
+use core::alloc::Layout;
+
+use kernel::bindings;
+use kernel::device::{self, Device};
+use kernel::devres::Devres;
+use kernel::prelude::*;
+use kernel::transmute::FromBytes;
+
+use crate::dma::DmaObject;
+use crate::driver::Bar0;
+use crate::falcon::gsp::Gsp;
+use crate::falcon::{Falcon, FalconBromParams, FalconFirmware, FalconLoadTarget};
+use crate::firmware::FalconUCodeDescV3;
+use crate::vbios::Vbios;
+
+const NVFW_FALCON_APPIF_ID_DMEMMAPPER: u32 = 0x4;
+
+#[repr(C)]
+#[derive(Debug)]
+struct FalconAppifHdrV1 {
+    ver: u8,
+    hdr: u8,
+    len: u8,
+    cnt: u8,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifHdrV1 {}
+
+#[repr(C, packed)]
+#[derive(Debug)]
+struct FalconAppifV1 {
+    id: u32,
+    dmem_base: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifV1 {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct FalconAppifDmemmapperV3 {
+    signature: u32,
+    version: u16,
+    size: u16,
+    cmd_in_buffer_offset: u32,
+    cmd_in_buffer_size: u32,
+    cmd_out_buffer_offset: u32,
+    cmd_out_buffer_size: u32,
+    nvf_img_data_buffer_offset: u32,
+    nvf_img_data_buffer_size: u32,
+    printf_buffer_hdr: u32,
+    ucode_build_time_stamp: u32,
+    ucode_signature: u32,
+    init_cmd: u32,
+    ucode_feature: u32,
+    ucode_cmd_mask0: u32,
+    ucode_cmd_mask1: u32,
+    multi_tgt_tbl: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifDmemmapperV3 {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct ReadVbios {
+    ver: u32,
+    hdr: u32,
+    addr: u64,
+    size: u32,
+    flags: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for ReadVbios {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct FrtsRegion {
+    ver: u32,
+    hdr: u32,
+    addr: u32,
+    size: u32,
+    ftype: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FrtsRegion {}
+
+const NVFW_FRTS_CMD_REGION_TYPE_FB: u32 = 2;
+
+#[repr(C, packed)]
+struct FrtsCmd {
+    read_vbios: ReadVbios,
+    frts_region: FrtsRegion,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FrtsCmd {}
+
+const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS: u32 = 0x15;
+const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_SB: u32 = 0x19;
+
+/// Command for the [`FwsecFirmware`] to execute.
+pub(crate) enum FwsecCommand {
+    /// Asks [`FwsecFirmware`] to carve out the WPR2 area and place a verified copy of the VBIOS
+    /// image into it.
+    Frts { frts_addr: u64, frts_size: u64 },
+    /// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
+    #[allow(dead_code)]
+    Sb,
+}
+
+/// Reinterpret the area starting from `offset` in `fw` as an instance of `T` (which must implement
+/// [`FromBytes`]) and return a reference to it.
+///
+/// # Safety
+///
+/// Callers must ensure that the region of memory returned is not written for as long as the
+/// returned reference is alive.
+unsafe fn transmute<'a, 'b, T: Sized + FromBytes>(
+    fw: &'a DmaObject,
+    offset: usize,
+) -> Result<&'b T> {
+    if offset + core::mem::size_of::<T>() > fw.len {
+        return Err(ERANGE);
+    }
+    if (fw.dma.start_ptr() as usize + offset) % core::mem::align_of::<T>() != 0 {
+        return Err(EINVAL);
+    }
+
+    // SAFETY: we have checked that the pointer is properly aligned that its pointed memory is
+    // large enough the contains an instance of `T`, which implements `FromBytes`.
+    Ok(unsafe { &*(fw.dma.start_ptr().offset(offset as isize) as *const T) })
+}
+
+/// Reinterpret the area starting from `offset` in `fw` as a mutable instance of `T` (which must
+/// implement [`FromBytes`]) and return a reference to it.
+///
+/// # Safety
+///
+/// Callers must ensure that the region of memory returned is not read or written for as long as
+/// the returned reference is alive.
+unsafe fn transmute_mut<'a, 'b, T: Sized + FromBytes>(
+    fw: &'a mut DmaObject,
+    offset: usize,
+) -> Result<&'b mut T> {
+    if offset + core::mem::size_of::<T>() > fw.len {
+        return Err(ERANGE);
+    }
+    if (fw.dma.start_ptr_mut() as usize + offset) % core::mem::align_of::<T>() != 0 {
+        return Err(EINVAL);
+    }
+
+    // SAFETY: we have checked that the pointer is properly aligned that its pointed memory is
+    // large enough the contains an instance of `T`, which implements `FromBytes`.
+    Ok(unsafe { &mut *(fw.dma.start_ptr_mut().offset(offset as isize) as *mut T) })
+}
+
+/// Patch the Fwsec firmware image in `fw` to run the command `cmd`.
+fn patch_command(fw: &mut DmaObject, v3_desc: &FalconUCodeDescV3, cmd: FwsecCommand) -> Result<()> {
+    let hdr_offset = (v3_desc.imem_load_size + v3_desc.interface_offset) as usize;
+    let hdr: &FalconAppifHdrV1 = unsafe { transmute(fw, hdr_offset) }?;
+
+    if hdr.ver != 1 {
+        return Err(EINVAL);
+    }
+
+    // Find the DMEM mapper section in the firmware.
+    for i in 0..hdr.cnt as usize {
+        let app: &FalconAppifV1 =
+            unsafe { transmute(fw, hdr_offset + hdr.hdr as usize + i * hdr.len as usize) }?;
+
+        if app.id != NVFW_FALCON_APPIF_ID_DMEMMAPPER {
+            continue;
+        }
+
+        let dmem_mapper: &mut FalconAppifDmemmapperV3 =
+            unsafe { transmute_mut(fw, (v3_desc.imem_load_size + app.dmem_base) as usize) }?;
+
+        let frts_cmd: &mut FrtsCmd = unsafe {
+            transmute_mut(
+                fw,
+                (v3_desc.imem_load_size + dmem_mapper.cmd_in_buffer_offset) as usize,
+            )
+        }?;
+
+        frts_cmd.read_vbios = ReadVbios {
+            ver: 1,
+            hdr: core::mem::size_of::<ReadVbios>() as u32,
+            addr: 0,
+            size: 0,
+            flags: 2,
+        };
+
+        dmem_mapper.init_cmd = match cmd {
+            FwsecCommand::Frts {
+                frts_addr,
+                frts_size,
+            } => {
+                frts_cmd.frts_region = FrtsRegion {
+                    ver: 1,
+                    hdr: core::mem::size_of::<FrtsRegion>() as u32,
+                    addr: (frts_addr >> 12) as u32,
+                    size: (frts_size >> 12) as u32,
+                    ftype: NVFW_FRTS_CMD_REGION_TYPE_FB,
+                };
+
+                NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS
+            }
+            FwsecCommand::Sb => NVFW_FALCON_APPIF_DMEMMAPPER_CMD_SB,
+        };
+
+        // Return early as we found and patched the DMEMMAPPER region.
+        return Ok(());
+    }
+
+    Err(ENOTSUPP)
+}
+
+/// Firmware extracted from the VBIOS and responsible for e.g. carving out the WPR2 region as the
+/// first step of the GSP bootflow.
+pub(crate) struct FwsecFirmware {
+    desc: FalconUCodeDescV3,
+    ucode: DmaObject,
+}
+
+impl FalconFirmware for FwsecFirmware {
+    type Target = Gsp;
+
+    fn dma_handle(&self) -> bindings::dma_addr_t {
+        self.ucode.dma.dma_handle()
+    }
+
+    fn imem_load(&self) -> FalconLoadTarget {
+        FalconLoadTarget {
+            src_start: 0,
+            dst_start: self.desc.imem_phys_base,
+            len: self.desc.imem_load_size,
+        }
+    }
+
+    fn dmem_load(&self) -> FalconLoadTarget {
+        FalconLoadTarget {
+            src_start: self.desc.imem_load_size,
+            dst_start: self.desc.dmem_phys_base,
+            len: Layout::from_size_align(self.desc.dmem_load_size as usize, 256)
+                // Cannot panic, as 256 is non-zero and a power of 2.
+                .unwrap()
+                .pad_to_align()
+                .size() as u32,
+        }
+    }
+
+    fn brom_params(&self) -> FalconBromParams {
+        FalconBromParams {
+            pkc_data_offset: self.desc.pkc_data_offset,
+            engine_id_mask: self.desc.engine_id_mask,
+            ucode_id: self.desc.ucode_id,
+        }
+    }
+
+    fn boot_addr(&self) -> u32 {
+        0
+    }
+}
+
+impl FwsecFirmware {
+    /// Extract the Fwsec firmware from `bios` and patch it to run with the `cmd` command.
+    pub(crate) fn new(
+        falcon: &Falcon<Gsp>,
+        pdev: &Device<device::Bound>,
+        bar: &Devres<Bar0>,
+        bios: &Vbios,
+        cmd: FwsecCommand,
+    ) -> Result<Self> {
+        let v3_desc = bios.fwsec_header()?;
+        let ucode = bios.fwsec_ucode()?;
+
+        let mut ucode_dma = DmaObject::from_data(pdev, ucode, "fwsec-frts")?;
+        patch_command(&mut ucode_dma, v3_desc, cmd)?;
+
+        const SIG_SIZE: usize = 96 * 4;
+        let signatures = bios.fwsec_sigs()?;
+        let sig_base_img = (v3_desc.imem_load_size + v3_desc.pkc_data_offset) as usize;
+
+        if v3_desc.signature_count != 0 {
+            // Patch signature.
+            let mut sig_fuse_version = v3_desc.signature_versions as u32;
+            pr_debug!("sig_fuse_version: {}\n", sig_fuse_version);
+            let reg_fuse_version = falcon.hal.get_signature_reg_fuse_version(
+                bar,
+                v3_desc.engine_id_mask,
+                v3_desc.ucode_id,
+            )?;
+            let idx = {
+                let mut reg_fuse_version = 1 << reg_fuse_version;
+                pr_debug!("reg_fuse_version: {:#x}\n", reg_fuse_version);
+                if (reg_fuse_version & sig_fuse_version) == 0 {
+                    pr_warn!(
+                        "no matching signature: {:#x} {:#x}\n",
+                        reg_fuse_version,
+                        v3_desc.signature_versions
+                    );
+                    return Err(EINVAL);
+                }
+
+                let mut idx = 0;
+                while (reg_fuse_version & sig_fuse_version & 1) == 0 {
+                    idx += sig_fuse_version & 1;
+                    reg_fuse_version >>= 1;
+                    sig_fuse_version >>= 1;
+
+                    if reg_fuse_version == 0 || sig_fuse_version == 0 {
+                        return Err(EINVAL);
+                    }
+                }
+
+                idx
+            };
+
+            pr_debug!("patching signature with idx {}\n", idx);
+            let signature_start = idx as usize * SIG_SIZE;
+            let signature = &signatures[signature_start..signature_start + SIG_SIZE];
+            super::patch_signature(&mut ucode_dma, signature, sig_base_img)?;
+        }
+
+        Ok(FwsecFirmware {
+            desc: v3_desc.clone(),
+            ucode: ucode_dma,
+        })
+    }
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index b43d1fc6bba15ffd76d564eccdb9e2afe239a3a4..5d15a99f8d1eec3c2e1f6d119eb521361733c709 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -7,6 +7,7 @@
 use crate::driver::Bar0;
 use crate::falcon::gsp::GspFalcon;
 use crate::falcon::sec2::Sec2Falcon;
+use crate::firmware::fwsec::{FwsecCommand, FwsecFirmware};
 use crate::firmware::Firmware;
 use crate::gsp::fb::FbLayout;
 use crate::regs;
@@ -185,7 +186,11 @@ pub(crate) fn new(
         bar: Devres<Bar0>,
     ) -> Result<impl PinInit<Self>> {
         let spec = Spec::new(&bar)?;
-        let fw = Firmware::new(pdev.as_ref(), spec.chipset, "535.113.01")?;
+        let fw = Firmware::new(
+            pdev.as_ref(),
+            spec.chipset,
+            crate::firmware::FIRMWARE_VERSION,
+        )?;
 
         dev_info!(
             pdev.as_ref(),
@@ -245,6 +250,17 @@ pub(crate) fn new(
         let fb_layout = FbLayout::new(spec.chipset, &bar)?;
         dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
 
+        let _fwsec_frts = FwsecFirmware::new(
+            &gsp_falcon,
+            pdev.as_ref(),
+            &bar,
+            &bios,
+            FwsecCommand::Frts {
+                frts_addr: fb_layout.frts.start,
+                frts_size: fb_layout.frts.end - fb_layout.frts.start,
+            },
+        )?;
+
         Ok(pin_init!(Self {
             spec,
             bar,

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/16] gpu: nova-core: load and run FWSEC-FRTS
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (14 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 15/16] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
@ 2025-04-20 12:19 ` Alexandre Courbot
  2025-04-22  8:40 ` [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
  16 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-20 12:19 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

With all the required pieces in place, load FWSEC-FRTS onto the GSP
falcon, run it, and check that it completed successfully by carving out
the WPR2 region out of framebuffer memory.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/falcon.rs |  3 ---
 drivers/gpu/nova-core/gpu.rs    | 59 ++++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs   | 15 +++++++++++
 drivers/gpu/nova-core/vbios.rs  |  3 ---
 4 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 71f374445ff3277eac628e183942c79f557366d5..f90bb739cb9864d88e3427c7ec76953c69ec2c67 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -2,9 +2,6 @@
 
 //! Falcon microprocessor base support
 
-// To be removed when all code is used.
-#![allow(dead_code)]
-
 use core::hint::unreachable_unchecked;
 use core::time::Duration;
 use hal::FalconHal;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 5d15a99f8d1eec3c2e1f6d119eb521361733c709..4d03a0b11b6411e22a652183e975f6889446ed46 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -250,7 +250,7 @@ pub(crate) fn new(
         let fb_layout = FbLayout::new(spec.chipset, &bar)?;
         dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
 
-        let _fwsec_frts = FwsecFirmware::new(
+        let fwsec_frts = FwsecFirmware::new(
             &gsp_falcon,
             pdev.as_ref(),
             &bar,
@@ -261,6 +261,63 @@ pub(crate) fn new(
             },
         )?;
 
+        // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
+        // reset.
+        if with_bar!(bar, |b| regs::PfbPriMmuWpr2AddrHi::read(b).hi_val())? != 0 {
+            dev_err!(
+                pdev.as_ref(),
+                "WPR2 region already exists - GPU needs to be reset to proceed\n"
+            );
+            return Err(EBUSY);
+        }
+
+        // Reset falcon, load FWSEC-FRTS, and run it.
+        gsp_falcon.reset(&bar, &timer)?;
+        gsp_falcon.dma_load(&bar, &timer, &fwsec_frts)?;
+        let (mbox0, _) = gsp_falcon.boot(&bar, &timer, Some(0), None)?;
+        if mbox0 != 0 {
+            dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
+            return Err(EINVAL);
+        }
+
+        // SCRATCH_E contains FWSEC-FRTS' error code, if any.
+        let frts_status = with_bar!(bar, |b| regs::PbusSwScratche::read(b).frts_err_code())?;
+        if frts_status != 0 {
+            dev_err!(
+                pdev.as_ref(),
+                "FWSEC-FRTS returned with error code {:#x}",
+                frts_status
+            );
+            return Err(EINVAL);
+        }
+
+        // Check the WPR2 has been created as we requested.
+        let (wpr2_lo, wpr2_hi) = with_bar!(bar, |b| {
+            (
+                (regs::PfbPriMmuWpr2AddrLo::read(b).lo_val() as u64) << 12,
+                (regs::PfbPriMmuWpr2AddrHi::read(b).hi_val() as u64) << 12,
+            )
+        })?;
+        if wpr2_hi == 0 {
+            dev_err!(
+                pdev.as_ref(),
+                "WPR2 region not created after running FWSEC-FRTS\n"
+            );
+
+            return Err(ENOTTY);
+        } else if wpr2_lo != fb_layout.frts.start {
+            dev_err!(
+                pdev.as_ref(),
+                "WPR2 region created at unexpected address {:#x} ; expected {:#x}\n",
+                wpr2_lo,
+                fb_layout.frts.start,
+            );
+            return Err(EINVAL);
+        }
+
+        dev_info!(pdev.as_ref(), "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
+        dev_info!(pdev.as_ref(), "GPU instance built\n");
+
         Ok(pin_init!(Self {
             spec,
             bar,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 3954542fdd77debd8f96d111ddd231d72dbf5b5a..eae5b7c13155d2da39f47661024ae52390e04366 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -18,6 +18,13 @@
     28:20   chipset => try_into Chipset, "chipset model"
 );
 
+/* PBUS */
+
+register!(PbusSwScratche@0x00001438;
+    15:0    sb_err_code => as u16;
+    31:16   frts_err_code => as u16;
+);
+
 /* PTIMER */
 
 register!(PtimerTime0@0x00009400;
@@ -44,6 +51,14 @@
     30:30   ecc_mode_enabled => as_bit bool;
 );
 
+register!(PfbPriMmuWpr2AddrLo@0x001fa824;
+    31:4    lo_val => as u32
+);
+
+register!(PfbPriMmuWpr2AddrHi@0x001fa828;
+    31:4    hi_val => as u32
+);
+
 /* GC6 */
 
 register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index 534107b708cab0eb8d0accf7daa5718edf030358..74735c083d472ce955d6d3afaabd46a8d354c792 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -1,8 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
 
-// To be removed when all code is used.
-#![allow(dead_code)]
-
 //! VBIOS extraction and parsing.
 
 use crate::driver::Bar0;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-20 12:19 ` [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
@ 2025-04-21 21:45   ` Joel Fernandes
  2025-04-22 11:28     ` Danilo Krummrich
  2025-04-22 11:36   ` Danilo Krummrich
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-21 21:45 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hi, Alex,
Just some documentation-type comments and one Rust-naming-convention comment:

On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
> Upon reset, the GPU executes the GFW_BOOT firmware in order to
> initialize its base parameters such as clocks. The driver must ensure
> that this step is completed before using the hardware.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/driver.rs    |  2 +-
>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>  5 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
> --- /dev/null
> +++ b/drivers/gpu/nova-core/devinit.rs
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Methods for device initialization.

Let us clarify what devinit means.

devinit is a sequence of register read/writes after reset that performs tasks
such as:
1. Programming VRAM memory controller timings.
2. Power sequencing.
3. Clock and PLL configuration.
4. Thermal management.
5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
only part of memory and then kicks of 'async scrubbing'.

devinit itself is a 'script' which is interpreted by the PMU microcontroller of
of the GPU by an interpreter program.

Note that devinit also needs to run during suspend/resume at runtime.

I talked with Alex and I could add a new patch on top of this patch to add these
clarifying 'doc' comments as well. I will commit them to my git branch and send
on top of this as needed, but Alex can feel free to decide to squash them as well.

> +
> +use kernel::bindings;
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +
> +/// Wait for devinit FW completion.
> +///
> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
> +/// considered unusable until this step is completed, so it must be waited on very early during
> +/// driver initialization.
> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {

To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
term used for VBIOS ROM components several of which execute before the driver
loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.

> +    let mut timeout = 2000;
> +
> +    loop {
> +        let gfw_booted = with_bar!(
> +            bar,
> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)

Per my research, FWSEC is run before hand on the GSP in 'high secure' mode,
before the driver even loads. This happens roughly around the time devinit is
also happening (not sure if it before or after). This FWSEC is supposed to lower
the privilege level of the access to 'Pgc6AonSecureScratchGroup05' so that the
register is accessible by the CPU. I think we should mention that here as
rationale for why we need to read Pgc6AonSecureScratchGroup05PrivLevelMask first
before accessing Pgc6AonSecureScratchGroup05.

Here we should say we need to read the GFW_BOOT only once we know that the
privilege level has been reduced by the FWSEC

> +                .read_protection_level0_enabled()
> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff

I find this Rust convention for camel casing long constants very unreadable and
troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
for sake of readability. Could the Rust community / maintainers provide some input?

Apart from readability, it also makes searching for the same register name a
nightmare with other code bases written in C.

Couple of ideas discussed:

1. May be have a macro that converts
REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
regs::Pgc6AonSecureScratchGroup05 ?
But not sure what it takes on the rust side to implement a macro like that.

2. Adding doc comments both in regs.rs during defining the register, and
possibly at the caller site. This still does address the issue fully.


> +        )?;
> +
> +        if gfw_booted {
> +            return Ok(());
> +        }
> +
> +        if timeout == 0 {
> +            return Err(ETIMEDOUT);
> +        }
> +        timeout -= 1;
> +
> +        // SAFETY: msleep should be safe to call with any parameter.
> +        unsafe { bindings::msleep(2) };
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>      pub(crate) gpu: Gpu,
>  }
>  
> -const BAR0_SIZE: usize = 8;
> +const BAR0_SIZE: usize = 0x1000000;
>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>  
>  kernel::pci_device_table!(
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -2,6 +2,7 @@
>  
>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>  
> +use crate::devinit;
>  use crate::driver::Bar0;
>  use crate::firmware::Firmware;
>  use crate::regs;
> @@ -168,6 +169,10 @@ pub(crate) fn new(
>              spec.revision
>          );
>  
> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
> +        devinit::wait_gfw_boot_completion(&bar)
> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
> +
>          Ok(pin_init!(Self { spec, bar, fw }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>      }
>  }
>  
> +mod devinit;
>  mod driver;
>  mod firmware;
>  mod gpu;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -13,3 +13,14 @@
>      7:4     major_rev => as u8, "major revision of the chip";
>      28:20   chipset => try_into Chipset, "chipset model"
>  );
> +
> +/* GC6 */

GC6 is a GPU low-power state. The VRAM is in self-refresh and GPU itself is
powered down (all power rails not required for self-refresh).

The following registers are exposed by the hardware unit in the GPU which
manages the GC6 state transitions:

> +
> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
> +    0:0     read_protection_level0_enabled => as_bit bool
> +);

This is a privilege-level-mask register, which dictates whether the host CPU can
access the register.

> +
> +/* TODO: This is an array of registers. */
> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
> +    31:0    value => as u32
> +);
> 

These are always-on registers always available including in the GC6 state (which
makes sense since we need to access it to know if we are far enough in the boot
process).

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
  2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
                   ` (15 preceding siblings ...)
  2025-04-20 12:19 ` [PATCH 16/16] gpu: nova-core: load and " Alexandre Courbot
@ 2025-04-22  8:40 ` Danilo Krummrich
  2025-04-22 14:12   ` Alexandre Courbot
  16 siblings, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22  8:40 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel,
	Sergio González Collado

On Sun, Apr 20, 2025 at 09:19:32PM +0900, Alexandre Courbot wrote:
> Hi everyone,
> 
> This series is a continuation of my previous RFCs [1] to complete the
> first step of GSP booting (running the FWSEC-FRTS firmware extracted
> from the BIOS) on Ampere devices. While it is still far from bringing
> the GPU into a state where it can do anything useful, it sets up the
> basic layout of the driver upon which we can build in order to continue
> with the next steps of GSP booting, as well as supporting more chipsets.
> 
> Upon successful probe, the driver will display the range of the WPR2
> region constructed by FWSEC-FRTS:
> 
>   [   95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
>   [   95.436002] NovaCore 0000:01:00.0: GPU instance built
> 
> This code is based on nova-next with the try_access_with patch [2].

Please make sure to compile with CLIPPY=1, the series has quite some clippy
warnings.

I also noticed that there are a lot of compiler warnings about unreachable pub
fields with rustc 1.78, whereas with the latest stable compiler there are none.

I'm not exactly sure why that is (and I haven't looked further), but the
corresponding fields indeed seem to have unnecessary pub visibility.

> There is still a bit of unsafe code where it is not desired, notably to
> transmute byte slices into types that implement FromBytes - this is
> because support for doing such transmute operations safely are not in
> the kernel crate yet.

I assume you refer to [3]? As long as we put a TODO and follow up once the
series lands, that's fine for me.

> 
> [1] https://lore.kernel.org/rust-for-linux/20250320-nova_timer-v3-0-79aa2ad25a79@nvidia.com/
> [2] https://lore.kernel.org/rust-for-linux/20250411-try_with-v4-0-f470ac79e2e2@nvidia.com/
[3] https://lore.kernel.org/lkml/20250330234039.29814-1-christiansantoslima21@gmail.com/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/16] gpu: nova-core: define registers layout using helper macro
  2025-04-20 12:19 ` [PATCH 06/16] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
@ 2025-04-22 10:29   ` Danilo Krummrich
  2025-04-28 14:27     ` Alexandre Courbot
  0 siblings, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 10:29 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sun, Apr 20, 2025 at 09:19:38PM +0900, Alexandre Courbot wrote:
> Add the register!() macro, which defines a given register's layout and
> provide bit-field accessors with a way to convert them to a given type.
> This macro will allow us to make clear definitions of the registers and
> manipulate their fields safely.
> 
> The long-term goal is to eventually move it to the kernel crate so it
> can be used my other drivers as well, but it was agreed to first land it
> into nova-core and make it mature there.
> 
> To illustrate its usage, use it to define the layout for the Boot0
> register and use its accessors through the use of the convenience
> with_bar!() macro, which uses Revocable::try_access() and converts its

s/try_access/try_access_with/

> returned Option into the proper error as needed.

Using try_access_with() / with_bar! should be a separate patch.

> diff --git a/Documentation/gpu/nova/core/todo.rst b/Documentation/gpu/nova/core/todo.rst
> index 234d753d3eacc709b928b1ccbfc9750ef36ec4ed..8a459fc088121f770bfcda5dfb4ef51c712793ce 100644
> --- a/Documentation/gpu/nova/core/todo.rst
> +++ b/Documentation/gpu/nova/core/todo.rst
> @@ -102,7 +102,13 @@ Usage:
>  	let boot0 = Boot0::read(&bar);
>  	pr_info!("Revision: {}\n", boot0.revision());
>  
> +Note: a work-in-progress implementation currently resides in
> +`drivers/gpu/nova-core/regs/macros.rs` and is used in nova-core. It would be
> +nice to improve it (possibly using proc macros) and move it to the `kernel`
> +crate so it can be used by other components as well.
> +
>  | Complexity: Advanced
> +| Contact: Alexandre Courbot

This is good -- thanks for adding it.

> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index b1a25b86ef17a6710e6236d5e7f1f26cd4407ce3..e315a3011660df7f18c0a3e0582b5845545b36e2 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -1,55 +1,15 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -use crate::driver::Bar0;
> +use core::ops::Deref;
> +use kernel::io::Io;
>  
> -// TODO
> -//
> -// Create register definitions via generic macros. See task "Generic register
> -// abstraction" in Documentation/gpu/nova/core/todo.rst.
> +#[macro_use]
> +mod macros;
>  
> -const BOOT0_OFFSET: usize = 0x00000000;
> +use crate::gpu::Chipset;
>  
> -// 3:0 - chipset minor revision
> -const BOOT0_MINOR_REV_SHIFT: u8 = 0;
> -const BOOT0_MINOR_REV_MASK: u32 = 0x0000000f;
> -
> -// 7:4 - chipset major revision
> -const BOOT0_MAJOR_REV_SHIFT: u8 = 4;
> -const BOOT0_MAJOR_REV_MASK: u32 = 0x000000f0;
> -
> -// 23:20 - chipset implementation Identifier (depends on architecture)
> -const BOOT0_IMPL_SHIFT: u8 = 20;
> -const BOOT0_IMPL_MASK: u32 = 0x00f00000;
> -
> -// 28:24 - chipset architecture identifier
> -const BOOT0_ARCH_MASK: u32 = 0x1f000000;
> -
> -// 28:20 - chipset identifier (virtual register field combining BOOT0_IMPL and
> -//         BOOT0_ARCH)
> -const BOOT0_CHIPSET_SHIFT: u8 = BOOT0_IMPL_SHIFT;
> -const BOOT0_CHIPSET_MASK: u32 = BOOT0_IMPL_MASK | BOOT0_ARCH_MASK;
> -
> -#[derive(Copy, Clone)]
> -pub(crate) struct Boot0(u32);
> -
> -impl Boot0 {
> -    #[inline]
> -    pub(crate) fn read(bar: &Bar0) -> Self {
> -        Self(bar.read32(BOOT0_OFFSET))
> -    }
> -
> -    #[inline]
> -    pub(crate) fn chipset(&self) -> u32 {
> -        (self.0 & BOOT0_CHIPSET_MASK) >> BOOT0_CHIPSET_SHIFT
> -    }
> -
> -    #[inline]
> -    pub(crate) fn minor_rev(&self) -> u8 {
> -        ((self.0 & BOOT0_MINOR_REV_MASK) >> BOOT0_MINOR_REV_SHIFT) as u8
> -    }
> -
> -    #[inline]
> -    pub(crate) fn major_rev(&self) -> u8 {
> -        ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as u8
> -    }
> -}
> +register!(Boot0@0x00000000, "Basic revision information about the GPU";
> +    3:0     minor_rev => as u8, "minor revision of the chip";
> +    7:4     major_rev => as u8, "major revision of the chip";
> +    28:20   chipset => try_into Chipset, "chipset model"

Should we preserve the information that this is the combination of two register
fields?

> +);
> diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..fa9bd6b932048113de997658b112885666e694c9
> --- /dev/null
> +++ b/drivers/gpu/nova-core/regs/macros.rs
> @@ -0,0 +1,297 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Types and macros to define register layout and accessors.
> +//!
> +//! A single register typically includes several fields, which are accessed through a combination
> +//! of bit-shift and mask operations that introduce a class of potential mistakes, notably because
> +//! not all possible field values are necessarily valid.
> +//!
> +//! The macros in this module allow to define, using an intruitive and readable syntax, a dedicated
> +//! type for each register with its own field accessors that can return an error is a field's value
> +//! is invalid. They also provide a builder type allowing to construct a register value to be
> +//! written by combining valid values for its fields.
> +
> +/// Helper macro for the `register` macro.
> +///
> +/// Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
> +/// and conversion to regular `u32`).
> +macro_rules! __reg_def_common {
> +    ($name:ident $(, $type_comment:expr)?) => {
> +        $(
> +        #[doc=$type_comment]
> +        )?
> +        #[repr(transparent)]
> +        #[derive(Clone, Copy, Default)]
> +        pub(crate) struct $name(u32);
> +
> +        // TODO: should we display the raw hex value, then the value of all its fields?

To me it seems useful to have both.

> +        impl ::core::fmt::Debug for $name {
> +            fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
> +                f.debug_tuple(stringify!($name))
> +                    .field(&format_args!("0x{0:x}", &self.0))
> +                    .finish()
> +            }
> +        }
> +
> +        impl core::ops::BitOr for $name {
> +            type Output = Self;
> +
> +            fn bitor(self, rhs: Self) -> Self::Output {
> +                Self(self.0 | rhs.0)
> +            }
> +        }
> +
> +        impl From<$name> for u32 {

Here and in a few more cases below: This needs the full path; also remember to
use absolute paths everwhere.

> +            fn from(reg: $name) -> u32 {
> +                reg.0
> +            }
> +        }
> +    };
> +}
> +
> +/// Helper macro for the `register` macro.
> +///
> +/// Defines the getter method for $field.
> +macro_rules! __reg_def_field_getter {
> +    (
> +        $hi:tt:$lo:tt $field:ident
> +            $(=> as $as_type:ty)?
> +            $(=> as_bit $bit_type:ty)?
> +            $(=> into $type:ty)?
> +            $(=> try_into $try_type:ty)?
> +        $(, $comment:expr)?
> +    ) => {
> +        $(
> +        #[doc=concat!("Returns the ", $comment)]
> +        )?
> +        #[inline]
> +        pub(crate) fn $field(self) -> $( $as_type )? $( $bit_type )? $( $type )? $( core::result::Result<$try_type, <$try_type as TryFrom<u32>>::Error> )? {

Please make sure to wrap lines with a length > 100.

> +            const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
> +            const SHIFT: u32 = MASK.trailing_zeros();
> +            let field = (self.0 & MASK) >> SHIFT;
> +
> +            $( field as $as_type )?
> +            $(
> +            // TODO: it would be nice to throw a compile-time error if $hi != $lo as this means we
> +            // are considering more than one bit but returning a bool...

Would the following work?

	const _: () = {
	   build_assert!($hi != $lo);
	   ()
	};

Though I guess, the above definition of MASK already guarantees that $hi and $lo
are known on compile time.

> +            <$bit_type>::from(if field != 0 { true } else { false }) as $bit_type
> +            )?
> +            $( <$type>::from(field) )?
> +            $( <$try_type>::try_from(field) )?
> +        }
> +    }
> +}
> +
> +/// Helper macro for the `register` macro.
> +///
> +/// Defines all the field getter methods for `$name`.
> +macro_rules! __reg_def_getters {
> +    (
> +        $name:ident
> +        $(; $hi:tt:$lo:tt $field:ident
> +            $(=> as $as_type:ty)?
> +            $(=> as_bit $bit_type:ty)?
> +            $(=> into $type:ty)?
> +            $(=> try_into $try_type:ty)?
> +        $(, $field_comment:expr)?)* $(;)?
> +    ) => {
> +        #[allow(dead_code)]
> +        impl $name {
> +            $(
> +            __reg_def_field_getter!($hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)?);
> +            )*
> +        }
> +    };
> +}
> +
> +/// Helper macro for the `register` macro.
> +///
> +/// Defines the setter method for $field.
> +macro_rules! __reg_def_field_setter {
> +    (
> +        $hi:tt:$lo:tt $field:ident
> +            $(=> as $as_type:ty)?
> +            $(=> as_bit $bit_type:ty)?
> +            $(=> into $type:ty)?
> +            $(=> try_into $try_type:ty)?
> +        $(, $comment:expr)?
> +    ) => {
> +        kernel::macros::paste! {
> +        $(
> +        #[doc=concat!("Sets the ", $comment)]
> +        )?
> +        #[inline]
> +        pub(crate) fn [<set_ $field>](mut self, value: $( $as_type)? $( $bit_type )? $( $type )? $( $try_type)? ) -> Self {
> +            const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
> +            const SHIFT: u32 = MASK.trailing_zeros();
> +
> +            let value = ((value as u32) << SHIFT) & MASK;
> +            self.0 = (self.0 & !MASK) | value;
> +            self
> +        }
> +        }
> +    };
> +}
> +
> +/// Helper macro for the `register` macro.
> +///
> +/// Defines all the field setter methods for `$name`.
> +macro_rules! __reg_def_setters {
> +    (
> +        $name:ident
> +        $(; $hi:tt:$lo:tt $field:ident
> +            $(=> as $as_type:ty)?
> +            $(=> as_bit $bit_type:ty)?
> +            $(=> into $type:ty)?
> +            $(=> try_into $try_type:ty)?
> +        $(, $field_comment:expr)?)* $(;)?
> +    ) => {
> +        #[allow(dead_code)]
> +        impl $name {
> +            $(
> +            __reg_def_field_setter!($hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)?);
> +            )*
> +        }
> +    };
> +}
> +
> +/// Defines a dedicated type for a register with an absolute offset, alongside with getter and
> +/// setter methods for its fields and methods to read and write it from an `Io` region.
> +///
> +/// Example:
> +///
> +/// ```no_run
> +/// register!(Boot0@0x00000100, "Basic revision information about the chip";
> +///     3:0     minor_rev => as u8, "minor revision of the chip";
> +///     7:4     major_rev => as u8, "major revision of the chip";
> +///     28:20   chipset => try_into Chipset, "chipset model"
> +/// );
> +/// ```
> +///
> +/// This defines a `Boot0` type which can be read or written from offset `0x100` of an `Io` region.
> +/// It is composed of 3 fields, for instance `minor_rev` is made of the 4 less significant bits of
> +/// the register. Each field can be accessed and modified using helper methods:
> +///
> +/// ```no_run
> +/// // Read from offset 0x100.
> +/// let boot0 = Boot0::read(&bar);
> +/// pr_info!("chip revision: {}.{}", boot0.major_rev(), boot0.minor_rev());
> +///
> +/// // `Chipset::try_from` will be called with the value of the field and returns an error if the
> +/// // value is invalid.
> +/// let chipset = boot0.chipset()?;
> +///
> +/// // Update some fields and write the value back.
> +/// boot0.set_major_rev(3).set_minor_rev(10).write(&bar);
> +///
> +/// // Or just update the register value in a single step:
> +/// Boot0::alter(&bar, |r| r.set_major_rev(3).set_minor_rev(10));
> +/// ```
> +///
> +/// Fields are made accessible using one of the following strategies:
> +///
> +/// - `as <type>` simply casts the field value to the requested type.
> +/// - `as_bit <type>` turns the field into a boolean and calls `<type>::from()` with the obtained
> +///   value. To be used with single-bit fields.
> +/// - `into <type>` calls `<type>::from()` on the value of the field. It is expected to handle all
> +///   the possible values for the bit range selected.
> +/// - `try_into <type>` calls `<type>::try_from()` on the value of the field and returns its
> +///   result.
> +///
> +/// The documentation strings are optional. If present, they will be added to the type or the field
> +/// getter and setter methods they are attached to.
> +///
> +/// Putting a `+` before the address of the register makes it relative to a base: the `read` and
> +/// `write` methods take a `base` argument that is added to the specified address before access,
> +/// and adds `try_read` and `try_write` methods to allow access with offsets unknown at
> +/// compile-time.
> +///
> +macro_rules! register {
> +    // Create a register at a fixed offset of the MMIO space.
> +    (
> +        $name:ident@$offset:expr $(, $type_comment:expr)?

Can we use this as doc-comment?

> +        $(; $hi:tt:$lo:tt $field:ident
> +            $(=> as $as_type:ty)?
> +            $(=> as_bit $bit_type:ty)?
> +            $(=> into $type:ty)?
> +            $(=> try_into $try_type:ty)?
> +        $(, $field_comment:expr)?)* $(;)?
> +    ) => {
> +        __reg_def_common!($name);
> +
> +        #[allow(dead_code)]
> +        impl $name {
> +            #[inline]
> +            pub(crate) fn read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T) -> Self {

Not necessarily a PCI bar, could be any I/O type.

> +                Self(bar.read32($offset))
> +            }
> +
> +            #[inline]
> +            pub(crate) fn write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T) {
> +                bar.write32(self.0, $offset)
> +            }
> +
> +            #[inline]
> +            pub(crate) fn alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, f: F) {
> +                let reg = f(Self::read(bar));
> +                reg.write(bar);
> +            }
> +        }
> +
> +        __reg_def_getters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
> +
> +        __reg_def_setters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
> +    };
> +
> +    // Create a register at a relative offset from a base address.
> +    (
> +        $name:ident@+$offset:expr $(, $type_comment:expr)?
> +        $(; $hi:tt:$lo:tt $field:ident
> +            $(=> as $as_type:ty)?
> +            $(=> as_bit $bit_type:ty)?
> +            $(=> into $type:ty)?
> +            $(=> try_into $try_type:ty)?
> +        $(, $field_comment:expr)?)* $(;)?
> +    ) => {

I assume this is for cases where we have multiple instances of the same
controller, engine, etc. I think it would be good to add a small example for
this one too.

> +        __reg_def_common!($name);
> +
> +        #[allow(dead_code)]
> +        impl $name {
> +            #[inline]
> +            pub(crate) fn read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T, base: usize) -> Self {
> +                Self(bar.read32(base + $offset))
> +            }
> +
> +            #[inline]
> +            pub(crate) fn write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T, base: usize) {
> +                bar.write32(self.0, base + $offset)
> +            }
> +
> +            #[inline]
> +            pub(crate) fn alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, base: usize, f: F) {
> +                let reg = f(Self::read(bar, base));
> +                reg.write(bar, base);
> +            }
> +
> +            #[inline]
> +            pub(crate) fn try_read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T, base: usize) -> ::kernel::error::Result<Self> {
> +                bar.try_read32(base + $offset).map(Self)
> +            }
> +
> +            #[inline]
> +            pub(crate) fn try_write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T, base: usize) -> ::kernel::error::Result<()> {
> +                bar.try_write32(self.0, base + $offset)
> +            }
> +
> +            #[inline]
> +            pub(crate) fn try_alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, base: usize, f: F) -> ::kernel::error::Result<()> {
> +                let reg = f(Self::try_read(bar, base)?);
> +                reg.try_write(bar, base)
> +            }
> +        }
> +
> +        __reg_def_getters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
> +
> +        __reg_def_setters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
> +    };
> +}
> 
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-21 21:45   ` Joel Fernandes
@ 2025-04-22 11:28     ` Danilo Krummrich
  2025-04-22 13:06       ` Alexandre Courbot
  0 siblings, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 11:28 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Mon, Apr 21, 2025 at 05:45:33PM -0400, Joel Fernandes wrote:
> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
> > diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
> > --- /dev/null
> > +++ b/drivers/gpu/nova-core/devinit.rs
> > @@ -0,0 +1,40 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +//! Methods for device initialization.
> 
> Let us clarify what devinit means.
> 
> devinit is a sequence of register read/writes after reset that performs tasks
> such as:
> 1. Programming VRAM memory controller timings.
> 2. Power sequencing.
> 3. Clock and PLL configuration.
> 4. Thermal management.
> 5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
> only part of memory and then kicks of 'async scrubbing'.
> 
> devinit itself is a 'script' which is interpreted by the PMU microcontroller of
> of the GPU by an interpreter program.
> 
> Note that devinit also needs to run during suspend/resume at runtime.

Thanks for writing this up. I fully agree that those things have to be
documented.

> 
> I talked with Alex and I could add a new patch on top of this patch to add these
> clarifying 'doc' comments as well. I will commit them to my git branch and send
> on top of this as needed, but Alex can feel free to decide to squash them as well.

Fine with both, whatever you guys prefer.

> 
> > +
> > +use kernel::bindings;
> > +use kernel::devres::Devres;
> > +use kernel::prelude::*;
> > +
> > +use crate::driver::Bar0;
> > +use crate::regs;
> > +
> > +/// Wait for devinit FW completion.
> > +///
> > +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
> > +/// considered unusable until this step is completed, so it must be waited on very early during
> > +/// driver initialization.
> > +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
> 
> To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
> term used for VBIOS ROM components several of which execute before the driver
> loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.

Yes, we should absolutely explain acronyms as well as use consistent and defined
terminology when referring to things.

I think we should put both into Documentation/gpu/nova/ and add the
corresponding pointers in the code.

> I find this Rust convention for camel casing long constants very unreadable and
> troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
> for sake of readability. Could the Rust community / maintainers provide some input?
> 
> Apart from readability, it also makes searching for the same register name a
> nightmare with other code bases written in C.
> 
> Couple of ideas discussed:
> 
> 1. May be have a macro that converts
> REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
> regs::Pgc6AonSecureScratchGroup05 ?
> But not sure what it takes on the rust side to implement a macro like that.
> 
> 2. Adding doc comments both in regs.rs during defining the register, and
> possibly at the caller site. This still does address the issue fully.

If that addresses your concern, it sounds totally reasonable to me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-20 12:19 ` [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
  2025-04-21 21:45   ` Joel Fernandes
@ 2025-04-22 11:36   ` Danilo Krummrich
  2025-04-29 12:48     ` Alexandre Courbot
  1 sibling, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 11:36 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sun, Apr 20, 2025 at 09:19:40PM +0900, Alexandre Courbot wrote:
> Upon reset, the GPU executes the GFW_BOOT firmware in order to
> initialize its base parameters such as clocks. The driver must ensure
> that this step is completed before using the hardware.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/driver.rs    |  2 +-
>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>  5 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
> --- /dev/null
> +++ b/drivers/gpu/nova-core/devinit.rs
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Methods for device initialization.
> +
> +use kernel::bindings;
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +
> +/// Wait for devinit FW completion.
> +///
> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
> +/// considered unusable until this step is completed, so it must be waited on very early during
> +/// driver initialization.
> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
> +    let mut timeout = 2000;
> +
> +    loop {
> +        let gfw_booted = with_bar!(
> +            bar,
> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
> +                .read_protection_level0_enabled()
> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
> +        )?;
> +
> +        if gfw_booted {
> +            return Ok(());
> +        }
> +
> +        if timeout == 0 {
> +            return Err(ETIMEDOUT);
> +        }
> +        timeout -= 1;
> +
> +        // SAFETY: msleep should be safe to call with any parameter.
> +        unsafe { bindings::msleep(2) };

I assume this goes away with [1]? Can we please add a corresponding TODO? Also,
do you mind preparing the follow-up patches for cases like this (there's also
the transmute one), such that we can apply them, once the dependencies did land
and such that we can verify that they suit our needs?

[1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/

> +    }
> +}
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>      pub(crate) gpu: Gpu,
>  }
>  
> -const BAR0_SIZE: usize = 8;
> +const BAR0_SIZE: usize = 0x1000000;
>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>  
>  kernel::pci_device_table!(
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -2,6 +2,7 @@
>  
>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>  
> +use crate::devinit;
>  use crate::driver::Bar0;
>  use crate::firmware::Firmware;
>  use crate::regs;
> @@ -168,6 +169,10 @@ pub(crate) fn new(
>              spec.revision
>          );
>  
> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
> +        devinit::wait_gfw_boot_completion(&bar)
> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
> +
>          Ok(pin_init!(Self { spec, bar, fw }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>      }
>  }
>  
> +mod devinit;
>  mod driver;
>  mod firmware;
>  mod gpu;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -13,3 +13,14 @@
>      7:4     major_rev => as u8, "major revision of the chip";
>      28:20   chipset => try_into Chipset, "chipset model"
>  );
> +
> +/* GC6 */
> +
> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
> +    0:0     read_protection_level0_enabled => as_bit bool
> +);
> +
> +/* TODO: This is an array of registers. */
> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
> +    31:0    value => as u32
> +);

Please also document new register definitions.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/16] gpu: nova-core: register sysmem flush page
  2025-04-20 12:19 ` [PATCH 09/16] gpu: nova-core: register sysmem flush page Alexandre Courbot
@ 2025-04-22 11:45   ` Danilo Krummrich
  2025-04-23 13:03     ` Alexandre Courbot
  2025-04-22 18:50   ` Joel Fernandes
  1 sibling, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 11:45 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sun, Apr 20, 2025 at 09:19:41PM +0900, Alexandre Courbot wrote:
> A page of system memory is reserved so sysmembar can perform a read on
> it if a system write occurred since the last flush. Do this early as it
> can be required to e.g. reset the GPU falcons.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/dma.rs       | 54 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/gpu.rs       | 53 +++++++++++++++++++++++++++++++++++--
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 10 +++++++
>  4 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..a4162bff597132a04e002b2b910a4537bbabc287
> --- /dev/null
> +++ b/drivers/gpu/nova-core/dma.rs
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Simple DMA object wrapper.
> +
> +// To be removed when all code is used.
> +#![allow(dead_code)]
> +
> +use kernel::device;
> +use kernel::dma::CoherentAllocation;
> +use kernel::page::PAGE_SIZE;
> +use kernel::prelude::*;
> +
> +pub(crate) struct DmaObject {
> +    pub dma: CoherentAllocation<u8>,
> +    pub len: usize,

This should be covered by CoherentAllocation already, no? If it does not have a
public accessor for its size, please add it for CoherentAllocation instead. I
can take the corresponding patch through the nova tree.

> +    #[allow(dead_code)]

Please prefer #[expect(dead_code)], such that we are forced to remove it once
it's subsequently used.

> +    pub name: &'static str,
> +}
> +
> +impl DmaObject {
> +    pub(crate) fn new(
> +        dev: &device::Device<device::Bound>,
> +        len: usize,
> +        name: &'static str,
> +    ) -> Result<Self> {
> +        let len = core::alloc::Layout::from_size_align(len, PAGE_SIZE)
> +            .map_err(|_| EINVAL)?
> +            .pad_to_align()
> +            .size();
> +        let dma = CoherentAllocation::alloc_coherent(dev, len, GFP_KERNEL | __GFP_ZERO)?;
> +
> +        Ok(Self { dma, len, name })
> +    }
> +
> +    pub(crate) fn from_data(
> +        dev: &device::Device<device::Bound>,
> +        data: &[u8],
> +        name: &'static str,
> +    ) -> Result<Self> {
> +        Self::new(dev, data.len(), name).and_then(|mut dma_obj| {
> +            // SAFETY:
> +            // - The copied data fits within the size of the allocated object.
> +            // - We have just created this object and there is no other user at this stage.
> +            unsafe {
> +                core::ptr::copy_nonoverlapping(
> +                    data.as_ptr(),
> +                    dma_obj.dma.start_ptr_mut(),
> +                    data.len(),
> +                );
> +            }
> +            Ok(dma_obj)
> +        })
> +    }
> +}

The DMA wrapper should probably be added in a separate patch.

> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 1f7799692a0ab042f2540e01414f5ca347ae9ecc..d43e710cc983d51f053dacbd77cbbfb79fa882c3 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -3,6 +3,7 @@
>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>  
>  use crate::devinit;
> +use crate::dma::DmaObject;
>  use crate::driver::Bar0;
>  use crate::firmware::Firmware;
>  use crate::regs;
> @@ -145,12 +146,30 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
>  }
>  
>  /// Structure holding the resources required to operate the GPU.
> -#[pin_data]
> +#[pin_data(PinnedDrop)]
>  pub(crate) struct Gpu {
>      spec: Spec,
>      /// MMIO mapping of PCI BAR 0
>      bar: Devres<Bar0>,
>      fw: Firmware,
> +    sysmem_flush: DmaObject,

Please add a doc-comment for this DmaObject explaining what it is used for by
the driver and why it is needed.

> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for Gpu {
> +    fn drop(self: Pin<&mut Self>) {
> +        // Unregister the sysmem flush page before we release it.
> +        let _ = with_bar!(&self.bar, |b| {
> +            regs::PfbNisoFlushSysmemAddr::default()
> +                .set_adr_39_08(0)
> +                .write(b);
> +            if self.spec.chipset >= Chipset::GA102 {
> +                regs::PfbNisoFlushSysmemAddrHi::default()
> +                    .set_adr_63_40(0)
> +                    .write(b);
> +            }
> +        });
> +    }
>  }
>  
>  impl Gpu {
> @@ -173,6 +192,36 @@ pub(crate) fn new(
>          devinit::wait_gfw_boot_completion(&bar)
>              .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>  
> -        Ok(pin_init!(Self { spec, bar, fw }))
> +        // System memory page required for sysmembar to properly flush into system memory.
> +        let sysmem_flush = {
> +            let page = DmaObject::new(
> +                pdev.as_ref(),
> +                kernel::bindings::PAGE_SIZE,
> +                "sysmem flush page",
> +            )?;
> +
> +            // Register the sysmem flush page.
> +            with_bar!(bar, |b| {
> +                let handle = page.dma.dma_handle();
> +
> +                regs::PfbNisoFlushSysmemAddr::default()
> +                    .set_adr_39_08((handle >> 8) as u32)
> +                    .write(b);
> +                if spec.chipset >= Chipset::GA102 {
> +                    regs::PfbNisoFlushSysmemAddrHi::default()
> +                        .set_adr_63_40((handle >> 40) as u32)
> +                        .write(b);
> +                }
> +            })?;
> +
> +            page
> +        };
> +
> +        Ok(pin_init!(Self {
> +            spec,
> +            bar,
> +            fw,
> +            sysmem_flush,
> +        }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 878161e060f54da7738c656f6098936a62dcaa93..37c7eb0ea7a926bee4e3c661028847291bf07fa2 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -21,6 +21,7 @@ macro_rules! with_bar {
>  }
>  
>  mod devinit;
> +mod dma;
>  mod driver;
>  mod firmware;
>  mod gpu;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac..1e24787c4b5f432ac25fe399c8cb38b7350e44ae 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -14,6 +14,16 @@
>      28:20   chipset => try_into Chipset, "chipset model"
>  );
>  
> +/* PFB */
> +
> +register!(PfbNisoFlushSysmemAddr@0x00100c10;
> +    31:0    adr_39_08 => as u32
> +);
> +
> +register!(PfbNisoFlushSysmemAddrHi@0x00100c40;
> +    23:0    adr_63_40 => as u32
> +);

Please add some documentation for the register and its fields.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/16] gpu: nova-core: add basic timer device
  2025-04-20 12:19 ` [PATCH 10/16] gpu: nova-core: add basic timer device Alexandre Courbot
@ 2025-04-22 12:07   ` Danilo Krummrich
  2025-04-29 13:13     ` Alexandre Courbot
  0 siblings, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 12:07 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sun, Apr 20, 2025 at 09:19:42PM +0900, Alexandre Courbot wrote:
> Add a timer that works with GPU time and provides the ability to wait on
> a condition with a specific timeout.

What can this timer do for us, what and HrTimer can't do for us?

> 
> The `Duration` Rust type is used to keep track is differences between
> timestamps ; this will be replaced by the equivalent kernel type once it
> lands.

Fine for me -- can you please add a corresponding TODO and add it to your list
of follow-up patches?

> diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..8987352f4192bc9b4b2fc0fb5f2e8e62ff27be68
> --- /dev/null
> +++ b/drivers/gpu/nova-core/timer.rs
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Nova Core Timer subdevice
> +
> +// To be removed when all code is used.
> +#![allow(dead_code)]

Please prefer 'expect'.

> +
> +use core::fmt::Display;
> +use core::ops::{Add, Sub};
> +use core::time::Duration;
> +
> +use kernel::devres::Devres;
> +use kernel::num::U64Ext;
> +use kernel::prelude::*;
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +
> +/// A timestamp with nanosecond granularity obtained from the GPU timer.
> +///
> +/// A timestamp can also be substracted to another in order to obtain a [`Duration`].
> +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
> +pub(crate) struct Timestamp(u64);
> +
> +impl Display for Timestamp {
> +    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
> +        write!(f, "{}", self.0)
> +    }
> +}
> +
> +impl Add<Duration> for Timestamp {
> +    type Output = Self;
> +
> +    fn add(mut self, rhs: Duration) -> Self::Output {
> +        let mut nanos = rhs.as_nanos();
> +        while nanos > u64::MAX as u128 {
> +            self.0 = self.0.wrapping_add(nanos as u64);
> +            nanos -= u64::MAX as u128;
> +        }
> +
> +        Timestamp(self.0.wrapping_add(nanos as u64))
> +    }
> +}
> +
> +impl Sub for Timestamp {
> +    type Output = Duration;
> +
> +    fn sub(self, rhs: Self) -> Self::Output {
> +        Duration::from_nanos(self.0.wrapping_sub(rhs.0))
> +    }
> +}
> +
> +pub(crate) struct Timer {}
> +
> +impl Timer {
> +    pub(crate) fn new() -> Self {
> +        Self {}
> +    }
> +
> +    /// Read the current timer timestamp.
> +    pub(crate) fn read(&self, bar: &Bar0) -> Timestamp {
> +        loop {
> +            let hi = regs::PtimerTime1::read(bar);
> +            let lo = regs::PtimerTime0::read(bar);
> +
> +            if hi.hi() == regs::PtimerTime1::read(bar).hi() {
> +                return Timestamp(u64::from_u32s(hi.hi(), lo.lo()));
> +            }

So, if hi did not change since we've read both hi and lo, we can trust both
values. Probably worth to add a brief comment.

Additionally, we may want to add that if we get unlucky, it takes around 4s to
get unlucky again, even though that's rather obvious.

> +        }
> +    }
> +
> +    #[allow(dead_code)]
> +    pub(crate) fn time(bar: &Bar0, time: u64) {
> +        regs::PtimerTime1::default()
> +            .set_hi(time.upper_32_bits())
> +            .write(bar);
> +        regs::PtimerTime0::default()
> +            .set_lo(time.lower_32_bits())
> +            .write(bar);
> +    }
> +
> +    /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
> +    ///
> +    /// When `cond` evaluates to `Some`, its return value is returned.
> +    ///
> +    /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
> +    /// `Some`, or if the timer device is stuck for some reason.
> +    pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
> +        &self,
> +        bar: &Devres<Bar0>,
> +        timeout: Duration,
> +        cond: F,
> +    ) -> Result<R> {
> +        // Number of consecutive time reads after which we consider the timer frozen if it hasn't
> +        // moved forward.
> +        const MAX_STALLED_READS: usize = 16;

Huh! Can't we trust the timer hardware? Probably one reason more to use HrTimer?

> +
> +        let (mut cur_time, mut prev_time, deadline) = {
> +            let cur_time = with_bar!(bar, |b| self.read(b))?;
> +            let deadline = cur_time + timeout;
> +
> +            (cur_time, cur_time, deadline)
> +        };
> +        let mut num_reads = 0;
> +
> +        loop {
> +            if let Some(ret) = cond() {
> +                return Ok(ret);
> +            }
> +
> +            (|| {
> +                cur_time = with_bar!(bar, |b| self.read(b))?;
> +
> +                /* Check if the timer is frozen for some reason. */
> +                if cur_time == prev_time {
> +                    if num_reads >= MAX_STALLED_READS {
> +                        return Err(ETIMEDOUT);
> +                    }
> +                    num_reads += 1;
> +                } else {
> +                    if cur_time >= deadline {
> +                        return Err(ETIMEDOUT);
> +                    }
> +
> +                    num_reads = 0;
> +                    prev_time = cur_time;
> +                }
> +
> +                Ok(())
> +            })()?;
> +        }
> +    }
> +}
> 
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-22 11:28     ` Danilo Krummrich
@ 2025-04-22 13:06       ` Alexandre Courbot
  2025-04-22 13:46         ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-22 13:06 UTC (permalink / raw)
  To: Danilo Krummrich, Joel Fernandes
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel

Hi Joel, Danilo,

On Tue Apr 22, 2025 at 8:28 PM JST, Danilo Krummrich wrote:
> On Mon, Apr 21, 2025 at 05:45:33PM -0400, Joel Fernandes wrote:
>> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
>> > diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>> > new file mode 100644
>> > index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>> > --- /dev/null
>> > +++ b/drivers/gpu/nova-core/devinit.rs
>> > @@ -0,0 +1,40 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +
>> > +//! Methods for device initialization.
>> 
>> Let us clarify what devinit means.
>> 
>> devinit is a sequence of register read/writes after reset that performs tasks
>> such as:
>> 1. Programming VRAM memory controller timings.
>> 2. Power sequencing.
>> 3. Clock and PLL configuration.
>> 4. Thermal management.
>> 5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
>> only part of memory and then kicks of 'async scrubbing'.
>> 
>> devinit itself is a 'script' which is interpreted by the PMU microcontroller of
>> of the GPU by an interpreter program.
>> 
>> Note that devinit also needs to run during suspend/resume at runtime.
>
> Thanks for writing this up. I fully agree that those things have to be
> documented.
>
>> 
>> I talked with Alex and I could add a new patch on top of this patch to add these
>> clarifying 'doc' comments as well. I will commit them to my git branch and send
>> on top of this as needed, but Alex can feel free to decide to squash them as well.
>
> Fine with both, whatever you guys prefer.

If that works with you, I will put Joel's patches improving the
documentation right after mines adding the code in the next revision. I
know this ideally should be a single patch, but researching this stuff
(and producing a proper writeup) is quite involved and a separate kind
of task from the quickly-translate-code-while-peeking-at-OpenRM work
that I did. 

>
>> 
>> > +
>> > +use kernel::bindings;
>> > +use kernel::devres::Devres;
>> > +use kernel::prelude::*;
>> > +
>> > +use crate::driver::Bar0;
>> > +use crate::regs;
>> > +
>> > +/// Wait for devinit FW completion.
>> > +///
>> > +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>> > +/// considered unusable until this step is completed, so it must be waited on very early during
>> > +/// driver initialization.
>> > +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>> 
>> To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
>> term used for VBIOS ROM components several of which execute before the driver
>> loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.
>
> Yes, we should absolutely explain acronyms as well as use consistent and defined
> terminology when referring to things.
>
> I think we should put both into Documentation/gpu/nova/ and add the
> corresponding pointers in the code.

SGTM.

>
>> I find this Rust convention for camel casing long constants very unreadable and
>> troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
>> for sake of readability. Could the Rust community / maintainers provide some input?
>> 
>> Apart from readability, it also makes searching for the same register name a
>> nightmare with other code bases written in C.
>> 
>> Couple of ideas discussed:
>> 
>> 1. May be have a macro that converts
>> REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
>> regs::Pgc6AonSecureScratchGroup05 ?
>> But not sure what it takes on the rust side to implement a macro like that.
>> 
>> 2. Adding doc comments both in regs.rs during defining the register, and
>> possibly at the caller site. This still does address the issue fully.
>
> If that addresses your concern, it sounds totally reasonable to me.

Sorry, I'm having trouble understanding what you guys are agreeing on.
:)

The most radical option would be to define the registers directly as
capital snake-case (NV_PGC6_...), basically a 1:1 match with OpenRM.
This would be the easiest way to cross-reference, but goes against the
Rust naming conventions. If we go all the way, this also means the field
accessors would be capital snake-case, unless we figure out a smart
macro to work this around...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-22 13:06       ` Alexandre Courbot
@ 2025-04-22 13:46         ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-22 13:46 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel

On 4/22/2025 9:06 AM, Alexandre Courbot wrote:
> On Tue Apr 22, 2025 at 8:28 PM JST, Danilo Krummrich wrote:
>> On Mon, Apr 21, 2025 at 05:45:33PM -0400, Joel Fernandes wrote:
>>> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
>>>> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>>>> new file mode 100644
>>>> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/nova-core/devinit.rs
>>>> @@ -0,0 +1,40 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +
>>>> +//! Methods for device initialization.
>>>
>>> Let us clarify what devinit means.
>>>
>>> devinit is a sequence of register read/writes after reset that performs tasks
>>> such as:
>>> 1. Programming VRAM memory controller timings.
>>> 2. Power sequencing.
>>> 3. Clock and PLL configuration.
>>> 4. Thermal management.
>>> 5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
>>> only part of memory and then kicks of 'async scrubbing'.
>>>
>>> devinit itself is a 'script' which is interpreted by the PMU microcontroller of
>>> of the GPU by an interpreter program.
>>>
>>> Note that devinit also needs to run during suspend/resume at runtime.
>>
>> Thanks for writing this up. I fully agree that those things have to be
>> documented.

Thanks.

>>> I talked with Alex and I could add a new patch on top of this patch to add these
>>> clarifying 'doc' comments as well. I will commit them to my git branch and send
>>> on top of this as needed, but Alex can feel free to decide to squash them as well.
>>
>> Fine with both, whatever you guys prefer.
> 
> If that works with you, I will put Joel's patches improving the
> documentation right after mines adding the code in the next revision. I
> know this ideally should be a single patch, but researching this stuff
> (and producing a proper writeup) is quite involved and a separate kind
> of task from the quickly-translate-code-while-peeking-at-OpenRM work
> that I did. 

From my side, this makes sense to me.

>>>> +
>>>> +use kernel::bindings;
>>>> +use kernel::devres::Devres;
>>>> +use kernel::prelude::*;
>>>> +
>>>> +use crate::driver::Bar0;
>>>> +use crate::regs;
>>>> +
>>>> +/// Wait for devinit FW completion.
>>>> +///
>>>> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>>>> +/// considered unusable until this step is completed, so it must be waited on very early during
>>>> +/// driver initialization.
>>>> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>>>
>>> To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
>>> term used for VBIOS ROM components several of which execute before the driver
>>> loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.
>>
>> Yes, we should absolutely explain acronyms as well as use consistent and defined
>> terminology when referring to things.
>>
>> I think we should put both into Documentation/gpu/nova/ and add the
>> corresponding pointers in the code.
> 
> SGTM.

Ack.

>>> I find this Rust convention for camel casing long constants very unreadable and
>>> troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
>>> for sake of readability. Could the Rust community / maintainers provide some input?
>>>
>>> Apart from readability, it also makes searching for the same register name a
>>> nightmare with other code bases written in C.
>>>
>>> Couple of ideas discussed:
>>>
>>> 1. May be have a macro that converts
>>> REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
>>> regs::Pgc6AonSecureScratchGroup05 ?
>>> But not sure what it takes on the rust side to implement a macro like that.
>>>
>>> 2. Adding doc comments both in regs.rs during defining the register, and
>>> possibly at the caller site. This still does address the issue fully.
>>
>> If that addresses your concern, it sounds totally reasonable to me.
> 
> Sorry, I'm having trouble understanding what you guys are agreeing on.
> :)
> 
> The most radical option would be to define the registers directly as
> capital snake-case (NV_PGC6_...), basically a 1:1 match with OpenRM.
> This would be the easiest way to cross-reference, but goes against the
> Rust naming conventions. If we go all the way, this also means the field
> accessors would be capital snake-case, unless we figure out a smart
> macro to work this around...

I think the accessors can still be lower case, because we can do something like:

pgc6_reg = NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK;

pgc6_reg.field_accessor();

?

Since rust convention already allows capital snake-case for statics and
constants, I think we should aim for this exception for Nova register defs
before discussing other options (i.e. directly replacing the definition of the
register from camel case to capital snake case) as is consistent with Nouveau,
Open RM etc. I think it will make things so much easier (and probably less
error-prone and maintainable) such as translation of GSP headers to Rust, string
search across Nouveau, Open RM repositories etc.

Thoughts?

thanks,

 - Joel





^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
  2025-04-22  8:40 ` [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
@ 2025-04-22 14:12   ` Alexandre Courbot
  0 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-22 14:12 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel,
	Sergio González Collado

Hi Danilo,

On Tue Apr 22, 2025 at 5:40 PM JST, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:32PM +0900, Alexandre Courbot wrote:
>> Hi everyone,
>> 
>> This series is a continuation of my previous RFCs [1] to complete the
>> first step of GSP booting (running the FWSEC-FRTS firmware extracted
>> from the BIOS) on Ampere devices. While it is still far from bringing
>> the GPU into a state where it can do anything useful, it sets up the
>> basic layout of the driver upon which we can build in order to continue
>> with the next steps of GSP booting, as well as supporting more chipsets.
>> 
>> Upon successful probe, the driver will display the range of the WPR2
>> region constructed by FWSEC-FRTS:
>> 
>>   [   95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
>>   [   95.436002] NovaCore 0000:01:00.0: GPU instance built
>> 
>> This code is based on nova-next with the try_access_with patch [2].
>
> Please make sure to compile with CLIPPY=1, the series has quite some clippy
> warnings.

Indeed, I just tried and it wasn't pretty - sorry for the omission.

>
> I also noticed that there are a lot of compiler warnings about unreachable pub
> fields with rustc 1.78, whereas with the latest stable compiler there are none.
>
> I'm not exactly sure why that is (and I haven't looked further), but the
> corresponding fields indeed seem to have unnecessary pub visibility.

I'll try building with 1.78 and fix that.

>
>> There is still a bit of unsafe code where it is not desired, notably to
>> transmute byte slices into types that implement FromBytes - this is
>> because support for doing such transmute operations safely are not in
>> the kernel crate yet.
>
> I assume you refer to [3]? As long as we put a TODO and follow up once the
> series lands, that's fine for me.

Yes, that's the idea. Will do.

>
>> 
>> [1] https://lore.kernel.org/rust-for-linux/20250320-nova_timer-v3-0-79aa2ad25a79@nvidia.com/
>> [2] https://lore.kernel.org/rust-for-linux/20250411-try_with-v4-0-f470ac79e2e2@nvidia.com/
> [3] https://lore.kernel.org/lkml/20250330234039.29814-1-christiansantoslima21@gmail.com/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-20 12:19 ` [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
@ 2025-04-22 14:44   ` Danilo Krummrich
  2025-04-30  6:58     ` Joel Fernandes
  2025-04-30 13:25     ` Alexandre Courbot
  0 siblings, 2 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 14:44 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

This patch could probably split up a bit, to make it more pleasant to review. :)

On Sun, Apr 20, 2025 at 09:19:43PM +0900, Alexandre Courbot wrote:
> 
> +#[repr(u8)]
> +#[derive(Debug, Default, Copy, Clone)]
> +pub(crate) enum FalconSecurityModel {
> +    #[default]
> +    None = 0,
> +    Light = 2,
> +    Heavy = 3,
> +}

Please add an explanation for the different security modules. Where are the
differences?

I think most of the structures, registers, abbreviations, etc. introduced in
this patch need some documentation.

Please see https://docs.kernel.org/gpu/nova/guidelines.html#documentation.

> +
> +impl TryFrom<u32> for FalconSecurityModel {
> +    type Error = Error;
> +
> +    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
> +        use FalconSecurityModel::*;
> +
> +        let sec_model = match value {
> +            0 => None,
> +            2 => Light,
> +            3 => Heavy,
> +            _ => return Err(EINVAL),
> +        };
> +
> +        Ok(sec_model)
> +    }
> +}
> +
> +#[repr(u8)]
> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
> +pub(crate) enum FalconCoreRevSubversion {
> +    #[default]
> +    Subversion0 = 0,
> +    Subversion1 = 1,
> +    Subversion2 = 2,
> +    Subversion3 = 3,
> +}
> +
> +impl From<u32> for FalconCoreRevSubversion {
> +    fn from(value: u32) -> Self {
> +        use FalconCoreRevSubversion::*;
> +
> +        match value & 0b11 {
> +            0 => Subversion0,
> +            1 => Subversion1,
> +            2 => Subversion2,
> +            3 => Subversion3,
> +            // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
> +            4..=u32::MAX => unsafe { unreachable_unchecked() },
> +        }

FalconCoreRev uses TryFrom to avoid unsafe code, I think FalconCoreRevSubversion
should do the same thing.

> +/// Trait defining the parameters of a given Falcon instance.
> +pub(crate) trait FalconEngine: Sync {
> +    /// Base I/O address for the falcon, relative from which its registers are accessed.
> +    const BASE: usize;
> +}
> +
> +/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
> +#[derive(Debug)]
> +pub(crate) struct FalconLoadTarget {
> +    /// Offset from the start of the source object to copy from.
> +    pub(crate) src_start: u32,
> +    /// Offset from the start of the destination memory to copy into.
> +    pub(crate) dst_start: u32,
> +    /// Number of bytes to copy.
> +    pub(crate) len: u32,
> +}
> +
> +#[derive(Debug)]
> +pub(crate) struct FalconBromParams {
> +    pub(crate) pkc_data_offset: u32,
> +    pub(crate) engine_id_mask: u16,
> +    pub(crate) ucode_id: u8,
> +}
> +
> +pub(crate) trait FalconFirmware {
> +    type Target: FalconEngine;
> +
> +    /// Returns the DMA handle of the object containing the firmware.
> +    fn dma_handle(&self) -> bindings::dma_addr_t;
> +
> +    /// Returns the load parameters for `IMEM`.
> +    fn imem_load(&self) -> FalconLoadTarget;
> +
> +    /// Returns the load parameters for `DMEM`.
> +    fn dmem_load(&self) -> FalconLoadTarget;
> +
> +    /// Returns the parameters to write into the BROM registers.
> +    fn brom_params(&self) -> FalconBromParams;
> +
> +    /// Returns the start address of the firmware.
> +    fn boot_addr(&self) -> u32;
> +}
> +
> +/// Contains the base parameters common to all Falcon instances.
> +pub(crate) struct Falcon<E: FalconEngine> {
> +    pub hal: Arc<dyn FalconHal<E>>,

This should probably be private and instead should be exposed via Deref.

Also, please see my comment at create_falcon_hal() regarding the dynamic
dispatch.

> +}
> +
> +impl<E: FalconEngine + 'static> Falcon<E> {
> +    pub(crate) fn new(
> +        pdev: &pci::Device,
> +        chipset: Chipset,
> +        bar: &Devres<Bar0>,
> +        need_riscv: bool,
> +    ) -> Result<Self> {
> +        let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, E::BASE))?;
> +        // Ensure that the revision and security model contain valid values.
> +        let _rev = hwcfg1.core_rev()?;
> +        let _sec_model = hwcfg1.security_model()?;
> +
> +        if need_riscv {
> +            let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
> +            if !hwcfg2.riscv() {
> +                dev_err!(
> +                    pdev.as_ref(),
> +                    "riscv support requested on falcon that does not support it\n"
> +                );
> +                return Err(EINVAL);
> +            }
> +        }
> +
> +        Ok(Self {
> +            hal: hal::create_falcon_hal(chipset)?,

I'd prefer to move the contents of create_falcon_hal() into this constructor.

> +        })
> +    }
> +
> +    fn reset_wait_mem_scrubbing(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
> +        timer.wait_on(bar, Duration::from_millis(20), || {
> +            bar.try_access_with(|b| regs::FalconHwcfg2::read(b, E::BASE))
> +                .and_then(|r| if r.mem_scrubbing() { Some(()) } else { None })
> +        })
> +    }
> +
> +    fn reset_eng(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
> +        let _ = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
> +
> +        // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
> +        // RESET_READY so a non-failing timeout is used.
> +        let _ = timer.wait_on(bar, Duration::from_micros(150), || {
> +            bar.try_access_with(|b| regs::FalconHwcfg2::read(b, E::BASE))
> +                .and_then(|r| if r.reset_ready() { Some(()) } else { None })
> +        });
> +
> +        with_bar!(bar, |b| regs::FalconEngine::alter(b, E::BASE, |v| v
> +            .set_reset(true)))?;
> +
> +        let _: Result<()> = timer.wait_on(bar, Duration::from_micros(10), || None);
> +
> +        with_bar!(bar, |b| regs::FalconEngine::alter(b, E::BASE, |v| v
> +            .set_reset(false)))?;
> +
> +        self.reset_wait_mem_scrubbing(bar, timer)?;
> +
> +        Ok(())
> +    }
> +
> +    pub(crate) fn reset(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
> +        self.reset_eng(bar, timer)?;
> +        self.hal.select_core(bar, timer)?;
> +        self.reset_wait_mem_scrubbing(bar, timer)?;
> +
> +        with_bar!(bar, |b| {
> +            regs::FalconRm::default()
> +                .set_val(regs::Boot0::read(b).into())
> +                .write(b, E::BASE)
> +        })
> +    }
> +
> +    fn dma_wr(
> +        &self,
> +        bar: &Devres<Bar0>,
> +        timer: &Timer,
> +        dma_handle: bindings::dma_addr_t,
> +        target_mem: FalconMem,
> +        load_offsets: FalconLoadTarget,
> +        sec: bool,
> +    ) -> Result<()> {
> +        const DMA_LEN: u32 = 256;
> +        const DMA_LEN_ILOG2_MINUS2: u8 = (DMA_LEN.ilog2() - 2) as u8;
> +
> +        // For IMEM, we want to use the start offset as a virtual address tag for each page, since
> +        // code addresses in the firmware (and the boot vector) are virtual.
> +        //
> +        // For DMEM we can fold the start offset into the DMA handle.
> +        let (src_start, dma_start) = match target_mem {
> +            FalconMem::Imem => (load_offsets.src_start, dma_handle),
> +            FalconMem::Dmem => (
> +                0,
> +                dma_handle + load_offsets.src_start as bindings::dma_addr_t,
> +            ),
> +        };
> +        if dma_start % DMA_LEN as bindings::dma_addr_t > 0 {
> +            pr_err!(
> +                "DMA transfer start addresses must be a multiple of {}",
> +                DMA_LEN
> +            );
> +            return Err(EINVAL);
> +        }
> +        if load_offsets.len % DMA_LEN > 0 {
> +            pr_err!("DMA transfer length must be a multiple of {}", DMA_LEN);
> +            return Err(EINVAL);
> +        }
> +
> +        // Set up the base source DMA address.
> +        with_bar!(bar, |b| {
> +            regs::FalconDmaTrfBase::default()
> +                .set_base((dma_start >> 8) as u32)
> +                .write(b, E::BASE);
> +            regs::FalconDmaTrfBase1::default()
> +                .set_base((dma_start >> 40) as u16)
> +                .write(b, E::BASE)
> +        })?;
> +
> +        let cmd = regs::FalconDmaTrfCmd::default()
> +            .set_size(DMA_LEN_ILOG2_MINUS2)
> +            .set_imem(target_mem == FalconMem::Imem)
> +            .set_sec(if sec { 1 } else { 0 });
> +
> +        for pos in (0..load_offsets.len).step_by(DMA_LEN as usize) {
> +            // Perform a transfer of size `DMA_LEN`.
> +            with_bar!(bar, |b| {
> +                regs::FalconDmaTrfMOffs::default()
> +                    .set_offs(load_offsets.dst_start + pos)
> +                    .write(b, E::BASE);
> +                regs::FalconDmaTrfBOffs::default()
> +                    .set_offs(src_start + pos)
> +                    .write(b, E::BASE);
> +                cmd.write(b, E::BASE)
> +            })?;
> +
> +            // Wait for the transfer to complete.
> +            timer.wait_on(bar, Duration::from_millis(2000), || {
> +                bar.try_access_with(|b| regs::FalconDmaTrfCmd::read(b, E::BASE))
> +                    .and_then(|v| if v.idle() { Some(()) } else { None })
> +            })?;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(
> +        &self,
> +        bar: &Devres<Bar0>,
> +        timer: &Timer,
> +        fw: &F,
> +    ) -> Result<()> {
> +        let dma_handle = fw.dma_handle();
> +
> +        with_bar!(bar, |b| {
> +            regs::FalconFbifCtl::alter(b, E::BASE, |v| v.set_allow_phys_no_ctx(true));
> +            regs::FalconDmaCtl::default().write(b, E::BASE);
> +            regs::FalconFbifTranscfg::alter(b, E::BASE, |v| {
> +                v.set_target(FalconFbifTarget::CoherentSysmem)
> +                    .set_mem_type(FalconFbifMemType::Physical)
> +            });
> +        })?;
> +
> +        self.dma_wr(
> +            bar,
> +            timer,
> +            dma_handle,
> +            FalconMem::Imem,
> +            fw.imem_load(),
> +            true,
> +        )?;
> +        self.dma_wr(
> +            bar,
> +            timer,
> +            dma_handle,
> +            FalconMem::Dmem,
> +            fw.dmem_load(),
> +            true,
> +        )?;
> +
> +        self.hal.program_brom(bar, &fw.brom_params())?;
> +
> +        with_bar!(bar, |b| {
> +            // Set `BootVec` to start of non-secure code.
> +            regs::FalconBootVec::default()
> +                .set_boot_vec(fw.boot_addr())
> +                .write(b, E::BASE);
> +        })?;
> +
> +        Ok(())
> +    }
> +
> +    pub(crate) fn boot(
> +        &self,
> +        bar: &Devres<Bar0>,
> +        timer: &Timer,
> +        mbox0: Option<u32>,
> +        mbox1: Option<u32>,
> +    ) -> Result<(u32, u32)> {
> +        with_bar!(bar, |b| {
> +            if let Some(mbox0) = mbox0 {
> +                regs::FalconMailbox0::default()
> +                    .set_mailbox0(mbox0)
> +                    .write(b, E::BASE);
> +            }
> +
> +            if let Some(mbox1) = mbox1 {
> +                regs::FalconMailbox1::default()
> +                    .set_mailbox1(mbox1)
> +                    .write(b, E::BASE);
> +            }
> +
> +            match regs::FalconCpuCtl::read(b, E::BASE).alias_en() {
> +                true => regs::FalconCpuCtlAlias::default()
> +                    .set_start_cpu(true)
> +                    .write(b, E::BASE),
> +                false => regs::FalconCpuCtl::default()
> +                    .set_start_cpu(true)
> +                    .write(b, E::BASE),
> +            }
> +        })?;
> +
> +        timer.wait_on(bar, Duration::from_secs(2), || {
> +            bar.try_access()
> +                .map(|b| regs::FalconCpuCtl::read(&*b, E::BASE))
> +                .and_then(|v| if v.halted() { Some(()) } else { None })
> +        })?;
> +
> +        let (mbox0, mbox1) = with_bar!(bar, |b| {
> +            let mbox0 = regs::FalconMailbox0::read(b, E::BASE).mailbox0();
> +            let mbox1 = regs::FalconMailbox1::read(b, E::BASE).mailbox1();
> +
> +            (mbox0, mbox1)
> +        })?;
> +
> +        Ok((mbox0, mbox1))
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..44b8dc118eda1263eaede466efd55408c6e7cded
> --- /dev/null
> +++ b/drivers/gpu/nova-core/falcon/gsp.rs
> @@ -0,0 +1,27 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +
> +use crate::{
> +    driver::Bar0,
> +    falcon::{Falcon, FalconEngine},
> +    regs,
> +};
> +
> +pub(crate) struct Gsp;
> +impl FalconEngine for Gsp {
> +    const BASE: usize = 0x00110000;
> +}
> +
> +pub(crate) type GspFalcon = Falcon<Gsp>;

Please drop this type alias, Falcon<Gsp> seems simple enough and is much more
obvious IMHO.

> +
> +impl Falcon<Gsp> {
> +    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
> +    /// allow GSP to signal CPU for processing new messages in message queue.
> +    pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
> +        with_bar!(bar, |b| regs::FalconIrqsclr::default()
> +            .set_swgen0(true)
> +            .write(b, Gsp::BASE))
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..5ebf4e88f1f25a13cf47859a53507be53e795d34
> --- /dev/null
> +++ b/drivers/gpu/nova-core/falcon/hal.rs
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +use kernel::sync::Arc;
> +
> +use crate::driver::Bar0;
> +use crate::falcon::{FalconBromParams, FalconEngine};
> +use crate::gpu::Chipset;
> +use crate::timer::Timer;
> +
> +mod ga102;
> +
> +/// Hardware Abstraction Layer for Falcon cores.
> +///
> +/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
> +/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
> +/// registers.
> +pub(crate) trait FalconHal<E: FalconEngine>: Sync {
> +    // Activates the Falcon core if the engine is a risvc/falcon dual engine.
> +    fn select_core(&self, _bar: &Devres<Bar0>, _timer: &Timer) -> Result<()> {
> +        Ok(())
> +    }
> +
> +    fn get_signature_reg_fuse_version(
> +        &self,
> +        bar: &Devres<Bar0>,
> +        engine_id_mask: u16,
> +        ucode_id: u8,
> +    ) -> Result<u32>;
> +
> +    // Program the BROM registers prior to starting a secure firmware.
> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
> +}
> +
> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
> +///
> +/// We use this function and a heap-allocated trait object instead of statically defined trait
> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
> +/// requested HAL.

Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
relevant to FalconHal impls?

Can't we do something like I do in the following example [1]?

```
use std::marker::PhantomData;
use std::ops::Deref;

trait Engine {
    const BASE: u32;
}

trait Hal<E: Engine> {
    fn access(&self);
}

struct Gsp;

impl Engine for Gsp {
    const BASE: u32 = 0x1;
}

struct Sec2;

impl Engine for Sec2 {
    const BASE: u32 = 0x2;
}

struct GA100<E: Engine>(PhantomData<E>);

impl<E: Engine> Hal<E> for GA100<E> {
    fn access(&self) {
        println!("Base: {}", E::BASE);
    }
}

impl<E: Engine> GA100<E> {
    fn new() -> Self {
        Self(PhantomData)
    }
}

//struct Falcon<E: Engine>(GA100<E>);

struct Falcon<H: Hal<E>, E: Engine>(H, PhantomData<E>);

impl<H: Hal<E>, E: Engine> Falcon<H, E> {
    fn new(hal: H) -> Self {
        Self(hal, PhantomData)
    }
}

impl<H: Hal<E>, E: Engine> Deref for Falcon<H, E> {
    type Target = H;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

fn main() {
    let gsp = Falcon::new(GA100::<Gsp>::new());
    let sec2 = Falcon::new(GA100::<Sec2>::new());

    gsp.access();
    sec2.access();
}
```

[1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2

> +///
> +/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
> +pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
> +    chipset: Chipset,
> +) -> Result<Arc<dyn FalconHal<E>>> {
> +    let hal = match chipset {
> +        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
> +            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>
> +        }
> +        _ => return Err(ENOTSUPP),
> +    };
> +
> +    Ok(hal)
> +}
> diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..747b02ca671f7d4a97142665a9ba64807c87391e
> --- /dev/null
> +++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use core::marker::PhantomData;
> +use core::time::Duration;
> +
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +
> +use crate::driver::Bar0;
> +use crate::falcon::{FalconBromParams, FalconEngine, FalconModSelAlgo, RiscvCoreSelect};
> +use crate::regs;
> +use crate::timer::Timer;
> +
> +use super::FalconHal;
> +
> +fn select_core_ga102<E: FalconEngine>(bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
> +    let bcr_ctrl = with_bar!(bar, |b| regs::RiscvBcrCtrl::read(b, E::BASE))?;
> +    if bcr_ctrl.core_select() != RiscvCoreSelect::Falcon {
> +        with_bar!(bar, |b| regs::RiscvBcrCtrl::default()
> +            .set_core_select(RiscvCoreSelect::Falcon)
> +            .write(b, E::BASE))?;
> +
> +        timer.wait_on(bar, Duration::from_millis(10), || {
> +            bar.try_access_with(|b| regs::RiscvBcrCtrl::read(b, E::BASE))
> +                .and_then(|v| if v.valid() { Some(()) } else { None })
> +        })?;
> +    }
> +
> +    Ok(())
> +}
> +
> +fn get_signature_reg_fuse_version_ga102(
> +    bar: &Devres<Bar0>,
> +    engine_id_mask: u16,
> +    ucode_id: u8,
> +) -> Result<u32> {
> +    // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
> +    // registers, which are an array. Our register definition macros do not allow us to manage them
> +    // properly, so we need to hardcode their addresses for now.
> +
> +    // Each engine has 16 ucode version registers numbered from 1 to 16.
> +    if ucode_id == 0 || ucode_id > 16 {
> +        pr_warn!("invalid ucode id {:#x}", ucode_id);
> +        return Err(EINVAL);
> +    }
> +    let reg_fuse = if engine_id_mask & 0x0001 != 0 {
> +        // NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION
> +        0x824140
> +    } else if engine_id_mask & 0x0004 != 0 {
> +        // NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION
> +        0x824100
> +    } else if engine_id_mask & 0x0400 != 0 {
> +        // NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION
> +        0x8241c0
> +    } else {
> +        pr_warn!("unexpected engine_id_mask {:#x}", engine_id_mask);
> +        return Err(EINVAL);
> +    } + ((ucode_id - 1) as usize * core::mem::size_of::<u32>());
> +
> +    let reg_fuse_version = with_bar!(bar, |b| { b.read32(reg_fuse) })?;
> +
> +    // Equivalent of Find Last Set bit.
> +    Ok(u32::BITS - reg_fuse_version.leading_zeros())
> +}
> +
> +fn program_brom_ga102<E: FalconEngine>(
> +    bar: &Devres<Bar0>,
> +    params: &FalconBromParams,
> +) -> Result<()> {
> +    with_bar!(bar, |b| {
> +        regs::FalconBromParaaddr0::default()
> +            .set_addr(params.pkc_data_offset)
> +            .write(b, E::BASE);
> +        regs::FalconBromEngidmask::default()
> +            .set_mask(params.engine_id_mask as u32)
> +            .write(b, E::BASE);
> +        regs::FalconBromCurrUcodeId::default()
> +            .set_ucode_id(params.ucode_id as u32)
> +            .write(b, E::BASE);
> +        regs::FalconModSel::default()
> +            .set_algo(FalconModSelAlgo::Rsa3k)
> +            .write(b, E::BASE)
> +    })
> +}
> +
> +pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
> +
> +impl<E: FalconEngine> Ga102<E> {
> +    pub(super) fn new() -> Self {
> +        Self(PhantomData)
> +    }
> +}
> +
> +impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
> +    fn select_core(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
> +        select_core_ga102::<E>(bar, timer)
> +    }
> +
> +    fn get_signature_reg_fuse_version(
> +        &self,
> +        bar: &Devres<Bar0>,
> +        engine_id_mask: u16,
> +        ucode_id: u8,
> +    ) -> Result<u32> {
> +        get_signature_reg_fuse_version_ga102(bar, engine_id_mask, ucode_id)
> +    }
> +
> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()> {
> +        program_brom_ga102::<E>(bar, params)
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/falcon/sec2.rs b/drivers/gpu/nova-core/falcon/sec2.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..85dda3e8380a3d31d34c92c4236c6f81c63ce772
> --- /dev/null
> +++ b/drivers/gpu/nova-core/falcon/sec2.rs
> @@ -0,0 +1,9 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use crate::falcon::{Falcon, FalconEngine};
> +
> +pub(crate) struct Sec2;
> +impl FalconEngine for Sec2 {
> +    const BASE: usize = 0x00840000;
> +}
> +pub(crate) type Sec2Falcon = Falcon<Sec2>;
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 1b3e43e0412e2a2ea178c7404ea647c9e38d4e04..ec4c648c6e8b4aa7d06c627ed59c0e66a08c679e 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -5,6 +5,8 @@
>  use crate::devinit;
>  use crate::dma::DmaObject;
>  use crate::driver::Bar0;
> +use crate::falcon::gsp::GspFalcon;
> +use crate::falcon::sec2::Sec2Falcon;
>  use crate::firmware::Firmware;
>  use crate::regs;
>  use crate::timer::Timer;
> @@ -221,6 +223,20 @@ pub(crate) fn new(
>  
>          let timer = Timer::new();
>  
> +        let gsp_falcon = GspFalcon::new(
> +            pdev,
> +            spec.chipset,
> +            &bar,
> +            if spec.chipset > Chipset::GA100 {
> +                true
> +            } else {
> +                false
> +            },
> +        )?;
> +        gsp_falcon.clear_swgen0_intr(&bar)?;
> +
> +        let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, true)?;
> +
>          Ok(pin_init!(Self {
>              spec,
>              bar,
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index df3468c92c6081b3e2db218d92fbe1c40a0a75c3..4dde8004d24882c60669b5acd6af9d6988c66a9c 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -23,6 +23,7 @@ macro_rules! with_bar {
>  mod devinit;
>  mod dma;
>  mod driver;
> +mod falcon;
>  mod firmware;
>  mod gpu;
>  mod regs;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index f191cf4eb44c2b950e5cfcc6d04f95c122ce29d3..c76a16dc8e7267a4eb54cb71e1cca6fb9e00188f 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -6,6 +6,10 @@
>  #[macro_use]
>  mod macros;
>  
> +use crate::falcon::{
> +    FalconCoreRev, FalconCoreRevSubversion, FalconFbifMemType, FalconFbifTarget, FalconModSelAlgo,
> +    FalconSecurityModel, RiscvCoreSelect,
> +};
>  use crate::gpu::Chipset;
>  
>  register!(Boot0@0x00000000, "Basic revision information about the GPU";
> @@ -44,3 +48,188 @@
>  register!(Pgc6AonSecureScratchGroup05@0x00118234;
>      31:0    value => as u32
>  );
> +
> +/* PFALCON */
> +
> +register!(FalconIrqsclr@+0x00000004;
> +    4:4     halt => as_bit bool;
> +    6:6     swgen0 => as_bit bool;
> +);
> +
> +register!(FalconIrqstat@+0x00000008;
> +    4:4     halt => as_bit bool;
> +    6:6     swgen0 => as_bit bool;
> +);
> +
> +register!(FalconIrqmclr@+0x00000014;
> +    31:0    val => as u32
> +);
> +
> +register!(FalconIrqmask@+0x00000018;
> +    31:0    val => as u32
> +);
> +
> +register!(FalconRm@+0x00000084;
> +    31:0    val => as u32
> +);
> +
> +register!(FalconIrqdest@+0x0000001c;
> +    31:0    val => as u32
> +);
> +
> +register!(FalconMailbox0@+0x00000040;
> +    31:0    mailbox0 => as u32
> +);
> +register!(FalconMailbox1@+0x00000044;
> +    31:0    mailbox1 => as u32
> +);
> +
> +register!(FalconHwcfg2@+0x000000f4;
> +    10:10   riscv => as_bit bool;
> +    12:12   mem_scrubbing => as_bit bool;
> +    31:31   reset_ready => as_bit bool;
> +);
> +
> +register!(FalconCpuCtl@+0x00000100;
> +    1:1     start_cpu => as_bit bool;
> +    4:4     halted => as_bit bool;
> +    6:6     alias_en => as_bit bool;
> +);
> +
> +register!(FalconBootVec@+0x00000104;
> +    31:0    boot_vec => as u32
> +);
> +
> +register!(FalconHwCfg@+0x00000108;
> +    8:0     imem_size => as u32;
> +    17:9    dmem_size => as u32;
> +);
> +
> +register!(FalconDmaCtl@+0x0000010c;
> +    0:0     require_ctx => as_bit bool;
> +    1:1     dmem_scrubbing  => as_bit bool;
> +    2:2     imem_scrubbing => as_bit bool;
> +    6:3     dmaq_num => as_bit u8;
> +    7:7     secure_stat => as_bit bool;
> +);
> +
> +register!(FalconDmaTrfBase@+0x00000110;
> +    31:0    base => as u32;
> +);
> +
> +register!(FalconDmaTrfMOffs@+0x00000114;
> +    23:0    offs => as u32;
> +);
> +
> +register!(FalconDmaTrfCmd@+0x00000118;
> +    0:0     full => as_bit bool;
> +    1:1     idle => as_bit bool;
> +    3:2     sec => as_bit u8;
> +    4:4     imem => as_bit bool;
> +    5:5     is_write => as_bit bool;
> +    10:8    size => as u8;
> +    14:12   ctxdma => as u8;
> +    16:16   set_dmtag => as u8;
> +);
> +
> +register!(FalconDmaTrfBOffs@+0x0000011c;
> +    31:0    offs => as u32;
> +);
> +
> +register!(FalconDmaTrfBase1@+0x00000128;
> +    8:0     base => as u16;
> +);
> +
> +register!(FalconHwcfg1@+0x0000012c;
> +    3:0     core_rev => try_into FalconCoreRev, "core revision of the falcon";
> +    5:4     security_model => try_into FalconSecurityModel, "security model of the falcon";
> +    7:6     core_rev_subversion => into FalconCoreRevSubversion;
> +    11:8    imem_ports => as u8;
> +    15:12   dmem_ports => as u8;
> +);
> +
> +register!(FalconCpuCtlAlias@+0x00000130;
> +    1:1     start_cpu => as_bit bool;
> +);
> +
> +/* TODO: this is an array of registers */
> +register!(FalconImemC@+0x00000180;
> +    7:2     offs => as u8;
> +    23:8    blk => as u8;
> +    24:24   aincw => as_bit bool;
> +    25:25   aincr => as_bit bool;
> +    28:28   secure => as_bit bool;
> +    29:29   sec_atomic => as_bit bool;
> +);
> +
> +register!(FalconImemD@+0x00000184;
> +    31:0    data => as u32;
> +);
> +
> +register!(FalconImemT@+0x00000188;
> +    15:0    data => as u16;
> +);
> +
> +register!(FalconDmemC@+0x000001c0;
> +    7:2     offs => as u8;
> +    23:0    addr => as u32;
> +    23:8    blk => as u8;
> +    24:24   aincw => as_bit bool;
> +    25:25   aincr => as_bit bool;
> +    26:26   settag => as_bit bool;
> +    27:27   setlvl => as_bit bool;
> +    28:28   va => as_bit bool;
> +    29:29   miss => as_bit bool;
> +);
> +
> +register!(FalconDmemD@+0x000001c4;
> +    31:0    data => as u32;
> +);
> +
> +register!(FalconModSel@+0x00001180;
> +    7:0     algo => try_into FalconModSelAlgo;
> +);
> +register!(FalconBromCurrUcodeId@+0x00001198;
> +    31:0    ucode_id => as u32;
> +);
> +register!(FalconBromEngidmask@+0x0000119c;
> +    31:0    mask => as u32;
> +);
> +register!(FalconBromParaaddr0@+0x00001210;
> +    31:0    addr => as u32;
> +);
> +
> +register!(RiscvCpuctl@+0x00000388;
> +    0:0     startcpu => as_bit bool;
> +    4:4     halted => as_bit bool;
> +    5:5     stopped => as_bit bool;
> +    7:7     active_stat => as_bit bool;
> +);
> +
> +register!(FalconEngine@+0x000003c0;
> +    0:0     reset => as_bit bool;
> +);
> +
> +register!(RiscvIrqmask@+0x00000528;
> +    31:0    mask => as u32;
> +);
> +
> +register!(RiscvIrqdest@+0x0000052c;
> +    31:0    dest => as u32;
> +);
> +
> +/* TODO: this is an array of registers */
> +register!(FalconFbifTranscfg@+0x00000600;
> +    1:0     target => try_into FalconFbifTarget;
> +    2:2     mem_type => as_bit FalconFbifMemType;
> +);
> +
> +register!(FalconFbifCtl@+0x00000624;
> +    7:7     allow_phys_no_ctx => as_bit bool;
> +);
> +
> +register!(RiscvBcrCtrl@+0x00001668;
> +    0:0     valid => as_bit bool;
> +    4:4     core_select => as_bit RiscvCoreSelect;
> +    8:8     br_fetch => as_bit bool;
> +);
> diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
> index 8987352f4192bc9b4b2fc0fb5f2e8e62ff27be68..c03a5c36d1230dfbf2bd6e02a793264280c6d509 100644
> --- a/drivers/gpu/nova-core/timer.rs
> +++ b/drivers/gpu/nova-core/timer.rs
> @@ -2,9 +2,6 @@
>  
>  //! Nova Core Timer subdevice
>  
> -// To be removed when all code is used.
> -#![allow(dead_code)]
> -
>  use core::fmt::Display;
>  use core::ops::{Add, Sub};
>  use core::time::Duration;
> 
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/16] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
  2025-04-20 12:19 ` [PATCH 12/16] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
@ 2025-04-22 14:46   ` Danilo Krummrich
  0 siblings, 0 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-22 14:46 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sun, Apr 20, 2025 at 09:19:44PM +0900, Alexandre Courbot wrote:
> FWSEC-FRTS is the first firmware we need to run on the GSP falcon in
> order to initiate the GSP boot process. Introduce the structure that
> describes it.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware.rs | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
> index 9bad7a86382af7917b3dce7bf3087d0002bd5971..4ef5ba934b9d255635aa9a902e1d3a732d6e5568 100644
> --- a/drivers/gpu/nova-core/firmware.rs
> +++ b/drivers/gpu/nova-core/firmware.rs
> @@ -43,6 +43,34 @@ pub(crate) fn new(
>      }
>  }
>  
> +/// Structure used to describe some firmwares, notable fwsec-frts.
> +#[allow(dead_code)]

Please use 'expect'.

> +#[repr(C)]
> +#[derive(Debug, Clone)]
> +pub(crate) struct FalconUCodeDescV3 {

Can we get some more documentation on the fields please? :)

> +    pub(crate) hdr: u32,
> +    pub(crate) stored_size: u32,
> +    pub(crate) pkc_data_offset: u32,
> +    pub(crate) interface_offset: u32,
> +    pub(crate) imem_phys_base: u32,
> +    pub(crate) imem_load_size: u32,
> +    pub(crate) imem_virt_base: u32,
> +    pub(crate) dmem_phys_base: u32,
> +    pub(crate) dmem_load_size: u32,
> +    pub(crate) engine_id_mask: u16,
> +    pub(crate) ucode_id: u8,
> +    pub(crate) signature_count: u8,
> +    pub(crate) signature_versions: u16,
> +    _reserved: u16,
> +}
> +
> +#[allow(dead_code)]
> +impl FalconUCodeDescV3 {
> +    pub(crate) fn size(&self) -> usize {
> +        ((self.hdr & 0xffff0000) >> 16) as usize

What's this magic number?

> +    }
> +}
> +
>  pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
>  
>  impl<const N: usize> ModInfoBuilder<N> {
> 
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset
  2025-04-20 12:19 ` [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
@ 2025-04-22 16:23   ` Joel Fernandes
  2025-04-24  7:50     ` Alexandre Courbot
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-22 16:23 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel



On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
> We will commonly need to compare chipset versions, so derive the
> ordering traits to make that possible. Also derive Copy and Clone since
> passing Chipset by value will be more efficient than by reference.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gpu.rs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 17c9660da45034762edaa78e372d8821144cdeb7..4de67a2dc16302c00530026156d7264cbc7e5b32 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -13,7 +13,7 @@ macro_rules! define_chipset {
>      ({ $($variant:ident = $value:expr),* $(,)* }) =>
>      {
>          /// Enum representation of the GPU chipset.
> -        #[derive(fmt::Debug)]
> +        #[derive(fmt::Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]

Since Ord implies PartialOrd, do you need both? Same for Eq.

Also under which scenario does Chipset require PartialOrd?

thanks,

 - Joel


>          pub(crate) enum Chipset {
>              $($variant = $value),*,
>          }
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/16] gpu: nova-core: register sysmem flush page
  2025-04-20 12:19 ` [PATCH 09/16] gpu: nova-core: register sysmem flush page Alexandre Courbot
  2025-04-22 11:45   ` Danilo Krummrich
@ 2025-04-22 18:50   ` Joel Fernandes
  1 sibling, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-22 18:50 UTC (permalink / raw)
  To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel



On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
> A page of system memory is reserved so sysmembar can perform a read on
> it if a system write occurred since the last flush. Do this early as it
> can be required to e.g. reset the GPU falcons.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/dma.rs       | 54 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/gpu.rs       | 53 +++++++++++++++++++++++++++++++++++--
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 10 +++++++
>  4 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..a4162bff597132a04e002b2b910a4537bbabc287
> --- /dev/null
> +++ b/drivers/gpu/nova-core/dma.rs
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Simple DMA object wrapper.
> +
> +// To be removed when all code is used.
> +#![allow(dead_code)]
> +
> +use kernel::device;
> +use kernel::dma::CoherentAllocation;
> +use kernel::page::PAGE_SIZE;
> +use kernel::prelude::*;
> +
> +pub(crate) struct DmaObject {
> +    pub dma: CoherentAllocation<u8>,
> +    pub len: usize,
> +    #[allow(dead_code)]
> +    pub name: &'static str,
> +}
> +
> +impl DmaObject {
> +    pub(crate) fn new(
> +        dev: &device::Device<device::Bound>,
> +        len: usize,
> +        name: &'static str,
> +    ) -> Result<Self> {
> +        let len = core::alloc::Layout::from_size_align(len, PAGE_SIZE)
> +            .map_err(|_| EINVAL)?
> +            .pad_to_align()
> +            .size();
> +        let dma = CoherentAllocation::alloc_coherent(dev, len, GFP_KERNEL | __GFP_ZERO)?;
> +
> +        Ok(Self { dma, len, name })
> +    }
> +
> +    pub(crate) fn from_data(
> +        dev: &device::Device<device::Bound>,
> +        data: &[u8],
> +        name: &'static str,
> +    ) -> Result<Self> {
> +        Self::new(dev, data.len(), name).and_then(|mut dma_obj| {
> +            // SAFETY:
> +            // - The copied data fits within the size of the allocated object.
> +            // - We have just created this object and there is no other user at this stage.
> +            unsafe {
> +                core::ptr::copy_nonoverlapping(
> +                    data.as_ptr(),
> +                    dma_obj.dma.start_ptr_mut(),
> +                    data.len(),
> +                );
> +            }
> +            Ok(dma_obj)
> +        })
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 1f7799692a0ab042f2540e01414f5ca347ae9ecc..d43e710cc983d51f053dacbd77cbbfb79fa882c3 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -3,6 +3,7 @@
>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>  
>  use crate::devinit;
> +use crate::dma::DmaObject;
>  use crate::driver::Bar0;
>  use crate::firmware::Firmware;
>  use crate::regs;
> @@ -145,12 +146,30 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
>  }
>  
>  /// Structure holding the resources required to operate the GPU.
> -#[pin_data]
> +#[pin_data(PinnedDrop)]
>  pub(crate) struct Gpu {
>      spec: Spec,
>      /// MMIO mapping of PCI BAR 0
>      bar: Devres<Bar0>,
>      fw: Firmware,

Can add here:
  // System memory page required for sysmembar which is a GPU-initiated hardware
  // memory-barrier operation that flushes all pending GPU-side memory writes
  // that were done through PCIE, to system memory.

Will add to my git tree as well (but feel free to squash as needed).

> +    sysmem_flush: DmaObject,
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for Gpu {
> +    fn drop(self: Pin<&mut Self>) {
> +        // Unregister the sysmem flush page before we release it.
> +        let _ = with_bar!(&self.bar, |b| {
> +            regs::PfbNisoFlushSysmemAddr::default()
> +                .set_adr_39_08(0)
> +                .write(b);
> +            if self.spec.chipset >= Chipset::GA102 {
> +                regs::PfbNisoFlushSysmemAddrHi::default()
> +                    .set_adr_63_40(0)
> +                    .write(b);
> +            }
> +        });
> +    }
>  }
>  
>  impl Gpu {
> @@ -173,6 +192,36 @@ pub(crate) fn new(
>          devinit::wait_gfw_boot_completion(&bar)
>              .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>  
> -        Ok(pin_init!(Self { spec, bar, fw }))
> +        // System memory page required for sysmembar to properly flush into system memory.

Can elaborate more here:

  // System memory page required for sysmembar which is a GPU-initiated hardware
  // memory-barrier operation that flushes all GPU-side memory writes that were
  // done through PCIE, to system memory. It is required for Falcon to be reset
  // as the reset operation involves a reset handshake. When the falcon acks the
  // reset, it writes its acknowledgement into system memory, but for this write
  // to be visible to the host, it needs to do sysmembar to flush the write and
  // prevent the driver from timing out.

> +        let sysmem_flush = {
> +            let page = DmaObject::new(
> +                pdev.as_ref(),
> +                kernel::bindings::PAGE_SIZE,
> +                "sysmem flush page",
> +            )?;
> +
> +            // Register the sysmem flush page.
> +            with_bar!(bar, |b| {
> +                let handle = page.dma.dma_handle();
> +
> +                regs::PfbNisoFlushSysmemAddr::default()
> +                    .set_adr_39_08((handle >> 8) as u32)
> +                    .write(b);
> +                if spec.chipset >= Chipset::GA102 {
> +                    regs::PfbNisoFlushSysmemAddrHi::default()
> +                        .set_adr_63_40((handle >> 40) as u32)
> +                        .write(b);
> +                }
> +            })?;
> +
> +            page
> +        };
> +
> +        Ok(pin_init!(Self {
> +            spec,
> +            bar,
> +            fw,
> +            sysmem_flush,
> +        }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 878161e060f54da7738c656f6098936a62dcaa93..37c7eb0ea7a926bee4e3c661028847291bf07fa2 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -21,6 +21,7 @@ macro_rules! with_bar {
>  }
>  
>  mod devinit;
> +mod dma;
>  mod driver;
>  mod firmware;
>  mod gpu;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac..1e24787c4b5f432ac25fe399c8cb38b7350e44ae 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -14,6 +14,16 @@
>      28:20   chipset => try_into Chipset, "chipset model"
>  );
>  
> +/* PFB */

Also can add:

/// These two registers together hold the physical system memory address
/// that is used by the GPU for perform sysmembar operation (see gpu.rs).

> +
> +register!(PfbNisoFlushSysmemAddr@0x00100c10;
> +    31:0    adr_39_08 => as u32
> +);
> +
> +register!(PfbNisoFlushSysmemAddrHi@0x00100c40;
> +    23:0    adr_63_40 => as u32
> +);
> +

Thanks.




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/16] gpu: nova-core: register sysmem flush page
  2025-04-22 11:45   ` Danilo Krummrich
@ 2025-04-23 13:03     ` Alexandre Courbot
  0 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-23 13:03 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue Apr 22, 2025 at 8:45 PM JST, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:41PM +0900, Alexandre Courbot wrote:
>> A page of system memory is reserved so sysmembar can perform a read on
>> it if a system write occurred since the last flush. Do this early as it
>> can be required to e.g. reset the GPU falcons.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/dma.rs       | 54 ++++++++++++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/gpu.rs       | 53 +++++++++++++++++++++++++++++++++++--
>>  drivers/gpu/nova-core/nova_core.rs |  1 +
>>  drivers/gpu/nova-core/regs.rs      | 10 +++++++
>>  4 files changed, 116 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..a4162bff597132a04e002b2b910a4537bbabc287
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/dma.rs
>> @@ -0,0 +1,54 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Simple DMA object wrapper.
>> +
>> +// To be removed when all code is used.
>> +#![allow(dead_code)]
>> +
>> +use kernel::device;
>> +use kernel::dma::CoherentAllocation;
>> +use kernel::page::PAGE_SIZE;
>> +use kernel::prelude::*;
>> +
>> +pub(crate) struct DmaObject {
>> +    pub dma: CoherentAllocation<u8>,
>> +    pub len: usize,
>
> This should be covered by CoherentAllocation already, no? If it does not have a
> public accessor for its size, please add it for CoherentAllocation instead. I
> can take the corresponding patch through the nova tree.

`CoherentAllocation::count` is currently not accessible publicly. I
agree that exposing it would make sense, let me add a patch doing that.

>
>> +    #[allow(dead_code)]
>
> Please prefer #[expect(dead_code)], such that we are forced to remove it once
> it's subsequently used.

Ah, that's indeed more suitable, thanks!

>
>> +    pub name: &'static str,
>> +}
>> +
>> +impl DmaObject {
>> +    pub(crate) fn new(
>> +        dev: &device::Device<device::Bound>,
>> +        len: usize,
>> +        name: &'static str,
>> +    ) -> Result<Self> {
>> +        let len = core::alloc::Layout::from_size_align(len, PAGE_SIZE)
>> +            .map_err(|_| EINVAL)?
>> +            .pad_to_align()
>> +            .size();
>> +        let dma = CoherentAllocation::alloc_coherent(dev, len, GFP_KERNEL | __GFP_ZERO)?;
>> +
>> +        Ok(Self { dma, len, name })
>> +    }
>> +
>> +    pub(crate) fn from_data(
>> +        dev: &device::Device<device::Bound>,
>> +        data: &[u8],
>> +        name: &'static str,
>> +    ) -> Result<Self> {
>> +        Self::new(dev, data.len(), name).and_then(|mut dma_obj| {
>> +            // SAFETY:
>> +            // - The copied data fits within the size of the allocated object.
>> +            // - We have just created this object and there is no other user at this stage.
>> +            unsafe {
>> +                core::ptr::copy_nonoverlapping(
>> +                    data.as_ptr(),
>> +                    dma_obj.dma.start_ptr_mut(),
>> +                    data.len(),
>> +                );
>> +            }
>> +            Ok(dma_obj)
>> +        })
>> +    }
>> +}
>
> The DMA wrapper should probably be added in a separate patch.

Sure.

>
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 1f7799692a0ab042f2540e01414f5ca347ae9ecc..d43e710cc983d51f053dacbd77cbbfb79fa882c3 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -3,6 +3,7 @@
>>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>>  
>>  use crate::devinit;
>> +use crate::dma::DmaObject;
>>  use crate::driver::Bar0;
>>  use crate::firmware::Firmware;
>>  use crate::regs;
>> @@ -145,12 +146,30 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> {
>>  }
>>  
>>  /// Structure holding the resources required to operate the GPU.
>> -#[pin_data]
>> +#[pin_data(PinnedDrop)]
>>  pub(crate) struct Gpu {
>>      spec: Spec,
>>      /// MMIO mapping of PCI BAR 0
>>      bar: Devres<Bar0>,
>>      fw: Firmware,
>> +    sysmem_flush: DmaObject,
>
> Please add a doc-comment for this DmaObject explaining what it is used for by
> the driver and why it is needed.

Will do.

>
>> +}
>> +
>> +#[pinned_drop]
>> +impl PinnedDrop for Gpu {
>> +    fn drop(self: Pin<&mut Self>) {
>> +        // Unregister the sysmem flush page before we release it.
>> +        let _ = with_bar!(&self.bar, |b| {
>> +            regs::PfbNisoFlushSysmemAddr::default()
>> +                .set_adr_39_08(0)
>> +                .write(b);
>> +            if self.spec.chipset >= Chipset::GA102 {
>> +                regs::PfbNisoFlushSysmemAddrHi::default()
>> +                    .set_adr_63_40(0)
>> +                    .write(b);
>> +            }
>> +        });
>> +    }
>>  }
>>  
>>  impl Gpu {
>> @@ -173,6 +192,36 @@ pub(crate) fn new(
>>          devinit::wait_gfw_boot_completion(&bar)
>>              .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>>  
>> -        Ok(pin_init!(Self { spec, bar, fw }))
>> +        // System memory page required for sysmembar to properly flush into system memory.
>> +        let sysmem_flush = {
>> +            let page = DmaObject::new(
>> +                pdev.as_ref(),
>> +                kernel::bindings::PAGE_SIZE,
>> +                "sysmem flush page",
>> +            )?;
>> +
>> +            // Register the sysmem flush page.
>> +            with_bar!(bar, |b| {
>> +                let handle = page.dma.dma_handle();
>> +
>> +                regs::PfbNisoFlushSysmemAddr::default()
>> +                    .set_adr_39_08((handle >> 8) as u32)
>> +                    .write(b);
>> +                if spec.chipset >= Chipset::GA102 {
>> +                    regs::PfbNisoFlushSysmemAddrHi::default()
>> +                        .set_adr_63_40((handle >> 40) as u32)
>> +                        .write(b);
>> +                }
>> +            })?;
>> +
>> +            page
>> +        };
>> +
>> +        Ok(pin_init!(Self {
>> +            spec,
>> +            bar,
>> +            fw,
>> +            sysmem_flush,
>> +        }))
>>      }
>>  }
>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>> index 878161e060f54da7738c656f6098936a62dcaa93..37c7eb0ea7a926bee4e3c661028847291bf07fa2 100644
>> --- a/drivers/gpu/nova-core/nova_core.rs
>> +++ b/drivers/gpu/nova-core/nova_core.rs
>> @@ -21,6 +21,7 @@ macro_rules! with_bar {
>>  }
>>  
>>  mod devinit;
>> +mod dma;
>>  mod driver;
>>  mod firmware;
>>  mod gpu;
>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>> index fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac..1e24787c4b5f432ac25fe399c8cb38b7350e44ae 100644
>> --- a/drivers/gpu/nova-core/regs.rs
>> +++ b/drivers/gpu/nova-core/regs.rs
>> @@ -14,6 +14,16 @@
>>      28:20   chipset => try_into Chipset, "chipset model"
>>  );
>>  
>> +/* PFB */
>> +
>> +register!(PfbNisoFlushSysmemAddr@0x00100c10;
>> +    31:0    adr_39_08 => as u32
>> +);
>> +
>> +register!(PfbNisoFlushSysmemAddrHi@0x00100c40;
>> +    23:0    adr_63_40 => as u32
>> +);
>
> Please add some documentation for the register and its fields.

Ack.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-20 12:19 ` [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
@ 2025-04-23 14:06   ` Danilo Krummrich
  2025-04-23 14:52     ` Joel Fernandes
                       ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-23 14:06 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Sun, Apr 20, 2025 at 09:19:45PM +0900, Alexandre Courbot wrote:
> From: Joel Fernandes <joelagnelf@nvidia.com>
> 
> Add support for navigating and setting up vBIOS ucode data required for
> GSP to boot. The main data extracted from the vBIOS is the FWSEC-FRTS
> firmware which runs on the GSP processor. This firmware runs in high
> secure mode, and sets up the WPR2 (Write protected region) before the
> Booter runs on the SEC2 processor.
> 
> Also add log messages to show the BIOS images.
> 
> [102141.013287] NovaCore: Found BIOS image at offset 0x0, size: 0xfe00, type: PciAt
> [102141.080692] NovaCore: Found BIOS image at offset 0xfe00, size: 0x14800, type: Efi
> [102141.098443] NovaCore: Found BIOS image at offset 0x24600, size: 0x5600, type: FwSec
> [102141.415095] NovaCore: Found BIOS image at offset 0x29c00, size: 0x60800, type: FwSec
> 
> Tested on my Ampere GA102 and boot is successful.
> 
> [applied changes by Alex Courbot for fwsec signatures]
> [applied feedback from Alex Courbot and Timur Tabi]
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware.rs  |    2 -
>  drivers/gpu/nova-core/gpu.rs       |    5 +
>  drivers/gpu/nova-core/nova_core.rs |    1 +
>  drivers/gpu/nova-core/vbios.rs     | 1103 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 1109 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
> index 4ef5ba934b9d255635aa9a902e1d3a732d6e5568..58c0513d49e9a0cef36917c8e2b25c414f6fc596 100644
> --- a/drivers/gpu/nova-core/firmware.rs
> +++ b/drivers/gpu/nova-core/firmware.rs
> @@ -44,7 +44,6 @@ pub(crate) fn new(
>  }
>  
>  /// Structure used to describe some firmwares, notable fwsec-frts.
> -#[allow(dead_code)]
>  #[repr(C)]
>  #[derive(Debug, Clone)]
>  pub(crate) struct FalconUCodeDescV3 {
> @@ -64,7 +63,6 @@ pub(crate) struct FalconUCodeDescV3 {
>      _reserved: u16,
>  }
>  
> -#[allow(dead_code)]
>  impl FalconUCodeDescV3 {
>      pub(crate) fn size(&self) -> usize {
>          ((self.hdr & 0xffff0000) >> 16) as usize
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index ec4c648c6e8b4aa7d06c627ed59c0e66a08c679e..2344dfc69fe4246644437d70572680a4450b5bd7 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -11,6 +11,7 @@
>  use crate::regs;
>  use crate::timer::Timer;
>  use crate::util;
> +use crate::vbios::Vbios;
>  use core::fmt;
>  
>  macro_rules! define_chipset {
> @@ -157,6 +158,7 @@ pub(crate) struct Gpu {
>      fw: Firmware,
>      sysmem_flush: DmaObject,
>      timer: Timer,
> +    bios: Vbios,
>  }
>  
>  #[pinned_drop]
> @@ -237,12 +239,15 @@ pub(crate) fn new(
>  
>          let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, true)?;
>  
> +        let bios = Vbios::probe(&bar)?;
> +
>          Ok(pin_init!(Self {
>              spec,
>              bar,
>              fw,
>              sysmem_flush,
>              timer,
> +            bios,
>          }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 4dde8004d24882c60669b5acd6af9d6988c66a9c..2858f4a0dc35eb9d6547d5cbd81de44c8fc47bae 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -29,6 +29,7 @@ macro_rules! with_bar {
>  mod regs;
>  mod timer;
>  mod util;
> +mod vbios;
>  
>  kernel::module_pci_driver! {
>      type: driver::NovaCore,
> diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..534107b708cab0eb8d0accf7daa5718edf030358
> --- /dev/null
> +++ b/drivers/gpu/nova-core/vbios.rs
> @@ -0,0 +1,1103 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// To be removed when all code is used.
> +#![allow(dead_code)]

Please not, use 'expect' and and only where needed. If it would be too much,
it's probably a good indicator that we want to reduce the size of the patch for
now.

> +
> +//! VBIOS extraction and parsing.
> +
> +use crate::driver::Bar0;
> +use crate::firmware::FalconUCodeDescV3;
> +use core::convert::TryFrom;
> +use kernel::devres::Devres;
> +use kernel::error::Result;
> +use kernel::prelude::*;
> +
> +/// The offset of the VBIOS ROM in the BAR0 space.
> +const ROM_OFFSET: usize = 0x300000;
> +/// The maximum length of the VBIOS ROM to scan into.
> +const BIOS_MAX_SCAN_LEN: usize = 0x100000;
> +/// The size to read ahead when parsing initial BIOS image headers.
> +const BIOS_READ_AHEAD_SIZE: usize = 1024;
> +
> +// PMU lookup table entry types. Used to locate PMU table entries
> +// in the Fwsec image, corresponding to falcon ucodes.
> +#[allow(dead_code)]
> +const FALCON_UCODE_ENTRY_APPID_FIRMWARE_SEC_LIC: u8 = 0x05;
> +#[allow(dead_code)]
> +const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45;
> +const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85;
> +
> +pub(crate) struct Vbios {
> +    pub fwsec_image: Option<FwSecBiosImage>,
> +}
> +
> +impl Vbios {
> +    /// Read bytes from the ROM at the current end of the data vector
> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
> +        let current_len = data.len();
> +        let start = ROM_OFFSET + current_len;
> +
> +        // Ensure length is a multiple of 4 for 32-bit reads
> +        if len % core::mem::size_of::<u32>() != 0 {
> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);

Please don't use any of the pr_*() print macros within a driver, use the dev_*()
ones instead.

> +            return Err(EINVAL);
> +        }
> +
> +        // Allocate and zero-initialize the required memory

That's obvious from the code, if you feel this needs a comment, better explain
what we need it for, why zero-initialize, etc.

> +        data.extend_with(len, 0, GFP_KERNEL)?;
> +        with_bar!(?bar0, |bar0_ref| {
> +            let dst = &mut data[current_len..current_len + len];
> +            for (idx, chunk) in dst
> +                .chunks_exact_mut(core::mem::size_of::<u32>())
> +                .enumerate()
> +            {
> +                let addr = start + (idx * core::mem::size_of::<u32>());
> +                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
> +                // method out of convenience to convert the 32-bit integer as it
> +                // is in memory into a byte array without any endianness
> +                // conversion or byte-swapping.
> +                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
> +            }
> +            Ok(())
> +        })?;
> +
> +        Ok(())
> +    }
> +
> +    /// Read bytes at a specific offset, filling any gap
> +    fn read_more_at_offset(
> +        bar0: &Devres<Bar0>,
> +        data: &mut KVec<u8>,
> +        offset: usize,
> +        len: usize,
> +    ) -> Result {
> +        if offset > BIOS_MAX_SCAN_LEN {
> +            pr_err!("Error: exceeded BIOS scan limit.\n");
> +            return Err(EINVAL);
> +        }
> +
> +        // If offset is beyond current data size, fill the gap first
> +        let current_len = data.len();
> +        let gap_bytes = if offset > current_len {
> +            offset - current_len
> +        } else {
> +            0
> +        };
> +
> +        // Now read the requested bytes at the offset
> +        Self::read_more(bar0, data, gap_bytes + len)
> +    }
> +
> +    /// Read a BIOS image at a specific offset and create a BiosImage from it.
> +    /// @data is extended as needed and a new BiosImage is returned.
> +    fn read_bios_image_at_offset(
> +        bar0: &Devres<Bar0>,
> +        data: &mut KVec<u8>,
> +        offset: usize,
> +        len: usize,
> +    ) -> Result<BiosImage> {
> +        if offset + len > data.len() {
> +            Self::read_more_at_offset(bar0, data, offset, len).inspect_err(|e| {
> +                pr_err!("Failed to read more at offset {:#x}: {:?}\n", offset, e)
> +            })?;
> +        }
> +
> +        BiosImage::try_from(&data[offset..offset + len]).inspect_err(|e| {
> +            pr_err!(
> +                "Failed to create BiosImage at offset {:#x}: {:?}\n",
> +                offset,
> +                e
> +            )
> +        })
> +    }
> +
> +    /// Probe for VBIOS extraction
> +    /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
> +    pub(crate) fn probe(bar0: &Devres<Bar0>) -> Result<Self> {

Let's not call it probe(), what about VBios::parse(), or simply VBios::new()?

> +        // VBIOS data vector: As BIOS images are scanned, they are added to this vector
> +        // for reference or copying into other data structures. It is the entire
> +        // scanned contents of the VBIOS which progressively extends. It is used
> +        // so that we do not re-read any contents that are already read as we use
> +        // the cumulative length read so far, and re-read any gaps as we extend
> +        // the length
> +        let mut data = KVec::new();
> +
> +        // Loop through all the BiosImage and extract relevant ones and relevant data from them
> +        let mut cur_offset = 0;

I suggest to create a new type that contains data and offset and implement
read_bios_image_at_offset() and friends as methods of this type. I think this
would turn out much cleaner.

> +        let mut pci_at_image: Option<PciAtBiosImage> = None;
> +        let mut first_fwsec_image: Option<FwSecBiosImage> = None;
> +        let mut second_fwsec_image: Option<FwSecBiosImage> = None;

I don't really like that we need those mutable Option types because of the below
match, but I can't really see a better option, so I won't object.

> +
> +        // loop till break

This comment seems unnecessary, better explain what we loop over and why.

> +        loop {
> +            // Try to parse a BIOS image at the current offset
> +            // This will now check for all valid ROM signatures (0xAA55, 0xBB77, 0x4E56)
> +            let image_size =
> +                Self::read_bios_image_at_offset(bar0, &mut data, cur_offset, BIOS_READ_AHEAD_SIZE)
> +                    .and_then(|image| image.image_size_bytes())
> +                    .inspect_err(|e| {
> +                        pr_err!(
> +                            "Failed to parse initial BIOS image headers at offset {:#x}: {:?}\n",
> +                            cur_offset,
> +                            e
> +                        );
> +                    })?;
> +
> +            // Create a new BiosImage with the full image data
> +            let full_image =
> +                Self::read_bios_image_at_offset(bar0, &mut data, cur_offset, image_size)
> +                    .inspect_err(|e| {
> +                        pr_err!(
> +                            "Failed to parse full BIOS image at offset {:#x}: {:?}\n",
> +                            cur_offset,
> +                            e
> +                        );
> +                    })?;
> +
> +            // Determine the image type
> +            let image_type = full_image.image_type_str();
> +
> +            pr_info!(

I think this should be a debug print.

> +                "Found BIOS image at offset {:#x}, size: {:#x}, type: {}\n",
> +                cur_offset,
> +                image_size,
> +                image_type
> +            );
> +
> +            let is_last = full_image.is_last();
> +            // Get references to images we will need after the loop, in order to
> +            // setup the falcon data offset.
> +            match full_image {
> +                BiosImage::PciAt(image) => {
> +                    pci_at_image = Some(image);
> +                }
> +                BiosImage::FwSec(image) => {
> +                    if first_fwsec_image.is_none() {
> +                        first_fwsec_image = Some(image);
> +                    } else {
> +                        second_fwsec_image = Some(image);
> +                    }
> +                }
> +                // For now we don't need to handle these
> +                BiosImage::Efi(_image) => {}
> +                BiosImage::Nbsi(_image) => {}
> +            }
> +
> +            // Break if this is the last image
> +            if is_last {
> +                break;
> +            }
> +
> +            // Move to the next image (aligned to 512 bytes)
> +            cur_offset += image_size;
> +            cur_offset = (cur_offset + 511) & !511;

This looks like we want some align_up() helper that should go into the kernel
crate.

Alternatively you can use Layout, but that doesn't really seem to match well.

> +
> +            // Safety check - don't go beyond BIOS_MAX_SCAN_LEN (1MB)
> +            if cur_offset > BIOS_MAX_SCAN_LEN {
> +                pr_err!("Error: exceeded BIOS scan limit, stopping scan\n");
> +                break;
> +            }
> +        } // end of loop

That's a good indicator that the loop is too long, can we please break it down a
bit? There seems to be some potential for moving things into subroutines.

> +
> +        // Using all the images, setup the falcon data pointer in Fwsec.
> +        // We need mutable access here, so we handle the Option manually.
> +        let final_fwsec_image = {
> +            let mut second = second_fwsec_image; // Take ownership of the option
> +            let first_ref = first_fwsec_image.as_ref();
> +            let pci_at_ref = pci_at_image.as_ref();

You could change this as follows, since first_fwsec_image and pci_at_image
aren't used afterwards anyways.

diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index 74735c083d47..62e2da576161 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -200,14 +200,12 @@ pub(crate) fn probe(bar0: &Devres<Bar0>) -> Result<Self> {
         // We need mutable access here, so we handle the Option manually.
         let final_fwsec_image = {
             let mut second = second_fwsec_image; // Take ownership of the option
-            let first_ref = first_fwsec_image.as_ref();
-            let pci_at_ref = pci_at_image.as_ref();

             if let (Some(second), Some(first), Some(pci_at)) =
-                (second.as_mut(), first_ref, pci_at_ref)
+                (second.as_mut(), first_fwsec_image, pci_at_image)
             {
                 second
-                    .setup_falcon_data(pci_at, first)
+                    .setup_falcon_data(&pci_at, &first)
                     .inspect_err(|e| pr_err!("Falcon data setup failed: {:?}\n", e))?;
             } else {
                 pr_err!("Missing required images for falcon data setup, skipping\n");

> +
> +            if let (Some(second), Some(first), Some(pci_at)) =
> +                (second.as_mut(), first_ref, pci_at_ref)
> +            {
> +                second
> +                    .setup_falcon_data(pci_at, first)
> +                    .inspect_err(|e| pr_err!("Falcon data setup failed: {:?}\n", e))?;
> +            } else {
> +                pr_err!("Missing required images for falcon data setup, skipping\n");
> +            }
> +            second // Return the potentially modified second image

What happens if we hit the else case above? Should this method be fallible
instead?

> +        };
> +
> +        Ok(Self {
> +            fwsec_image: final_fwsec_image,
> +        })
> +    }
> +
> +    pub(crate) fn fwsec_header(&self) -> Result<&FalconUCodeDescV3> {
> +        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
> +        image.fwsec_header()
> +    }
> +
> +    pub(crate) fn fwsec_ucode(&self) -> Result<&[u8]> {
> +        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
> +        image.fwsec_ucode(image.fwsec_header()?)
> +    }
> +
> +    pub(crate) fn fwsec_sigs(&self) -> Result<&[u8]> {
> +        let image = self.fwsec_image.as_ref().ok_or(EINVAL)?;
> +        image.fwsec_sigs(image.fwsec_header()?)
> +    }
> +}
> +
> +/// PCI Data Structure as defined in PCI Firmware Specification
> +#[derive(Debug, Clone)]
> +#[repr(C)]
> +#[allow(dead_code)]
> +struct PcirStruct {
> +    /// PCI Data Structure signature ("PCIR" or "NPDS")
> +    pub signature: [u8; 4],
> +    /// PCI Vendor ID (e.g., 0x10DE for NVIDIA)
> +    pub vendor_id: u16,
> +    /// PCI Device ID
> +    pub device_id: u16,
> +    /// Device List Pointer
> +    pub device_list_ptr: u16,
> +    /// PCI Data Structure Length
> +    pub pci_data_struct_len: u16,
> +    /// PCI Data Structure Revision
> +    pub pci_data_struct_rev: u8,
> +    /// Class code (3 bytes, 0x03 for display controller)
> +    pub class_code: [u8; 3],
> +    /// Size of this image in 512-byte blocks
> +    pub image_len: u16,
> +    /// Revision Level of the Vendor's ROM
> +    pub vendor_rom_rev: u16,
> +    /// ROM image type (0x00 = PC-AT compatible, 0x03 = EFI, 0x70 = NBSI)
> +    pub code_type: u8,
> +    /// Last image indicator (0x00 = Not last image, 0x80 = Last image)
> +    pub last_image: u8,
> +    /// Maximum Run-time Image Length (units of 512 bytes)
> +    pub max_runtime_image_len: u16,
> +}
> +
> +impl TryFrom<&[u8]> for PcirStruct {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        if data.len() < core::mem::size_of::<PcirStruct>() {
> +            pr_err!("Not enough data for PcirStruct\n");
> +            return Err(EINVAL);
> +        }
> +
> +        let mut signature = [0u8; 4];
> +        signature.copy_from_slice(&data[0..4]);
> +
> +        // Signature should be "PCIR" (0x52494350) or "NPDS" (0x5344504e)
> +        if &signature != b"PCIR" && &signature != b"NPDS" {
> +            pr_err!("Invalid signature for PcirStruct: {:?}\n", signature);
> +            return Err(EINVAL);
> +        }
> +
> +        let mut class_code = [0u8; 3];
> +        class_code.copy_from_slice(&data[13..16]);
> +
> +        Ok(PcirStruct {
> +            signature,
> +            vendor_id: u16::from_le_bytes([data[4], data[5]]),
> +            device_id: u16::from_le_bytes([data[6], data[7]]),
> +            device_list_ptr: u16::from_le_bytes([data[8], data[9]]),
> +            pci_data_struct_len: u16::from_le_bytes([data[10], data[11]]),
> +            pci_data_struct_rev: data[12],
> +            class_code,
> +            image_len: u16::from_le_bytes([data[16], data[17]]),
> +            vendor_rom_rev: u16::from_le_bytes([data[18], data[19]]),
> +            code_type: data[20],
> +            last_image: data[21],
> +            max_runtime_image_len: u16::from_le_bytes([data[22], data[23]]),
> +        })
> +    }
> +}
> +
> +impl PcirStruct {
> +    /// Check if this is the last image in the ROM
> +    fn is_last(&self) -> bool {
> +        self.last_image & 0x80 != 0
> +    }
> +
> +    /// Calculate image size in bytes
> +    fn image_size_bytes(&self) -> Result<usize> {
> +        if self.image_len > 0 {
> +            // Image size is in 512-byte blocks
> +            Ok(self.image_len as usize * 512)
> +        } else {
> +            Err(EINVAL)
> +        }
> +    }
> +}
> +
> +/// BIOS Information Table (BIT) Header
> +/// This is the head of the BIT table, that is used to locate the Falcon data.
> +/// The BIT table (with its header) is in the PciAtBiosImage and the falcon data
> +/// it is pointing to is in the FwSecBiosImage.
> +#[derive(Debug, Clone, Copy)]
> +#[allow(dead_code)]
> +struct BitHeader {
> +    /// 0h: BIT Header Identifier (BMP=0x7FFF/BIT=0xB8FF)
> +    pub id: u16,
> +    /// 2h: BIT Header Signature ("BIT\0")
> +    pub signature: [u8; 4],
> +    /// 6h: Binary Coded Decimal Version, ex: 0x0100 is 1.00.
> +    pub bcd_version: u16,
> +    /// 8h: Size of BIT Header (in bytes)
> +    pub header_size: u8,
> +    /// 9h: Size of BIT Tokens (in bytes)
> +    pub token_size: u8,
> +    /// 10h: Number of token entries that follow
> +    pub token_entries: u8,
> +    /// 11h: BIT Header Checksum
> +    pub checksum: u8,
> +}
> +
> +impl TryFrom<&[u8]> for BitHeader {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        if data.len() < 12 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut signature = [0u8; 4];
> +        signature.copy_from_slice(&data[2..6]);
> +
> +        // Check header ID and signature
> +        let id = u16::from_le_bytes([data[0], data[1]]);
> +        if id != 0xB8FF || &signature != b"BIT\0" {
> +            return Err(EINVAL);
> +        }
> +
> +        Ok(BitHeader {
> +            id,
> +            signature,
> +            bcd_version: u16::from_le_bytes([data[6], data[7]]),
> +            header_size: data[8],
> +            token_size: data[9],
> +            token_entries: data[10],
> +            checksum: data[11],
> +        })
> +    }
> +}
> +
> +/// BIT Token Entry: Records in the BIT table followed by the BIT header
> +#[derive(Debug, Clone, Copy)]
> +#[allow(dead_code)]
> +struct BitToken {
> +    /// 00h: Token identifier
> +    pub id: u8,
> +    /// 01h: Version of the token data
> +    pub data_version: u8,
> +    /// 02h: Size of token data in bytes
> +    pub data_size: u16,
> +    /// 04h: Offset to the token data
> +    pub data_offset: u16,
> +}
> +
> +// Define the token ID for the Falcon data
> +pub(in crate::vbios) const BIT_TOKEN_ID_FALCON_DATA: u8 = 0x70;
> +
> +impl BitToken {
> +    /// Find a BIT token entry by BIT ID in a PciAtBiosImage
> +    pub(in crate::vbios) fn from_id(image: &PciAtBiosImage, token_id: u8) -> Result<Self> {
> +        let header = image.bit_header.as_ref().ok_or(EINVAL)?;
> +
> +        // Offset to the first token entry
> +        let tokens_start = image.bit_offset.unwrap() + header.header_size as usize;

Please don't use unwrap(). In case it is None this panics the kernel. Please
handle it properly and return an Error if it has an unexpected value.

> +
> +        for i in 0..header.token_entries as usize {
> +            let entry_offset = tokens_start + (i * header.token_size as usize);
> +
> +            // Make sure we don't go out of bounds
> +            if entry_offset + header.token_size as usize > image.base.data.len() {
> +                return Err(EINVAL);
> +            }
> +
> +            // Check if this token has the requested ID
> +            if image.base.data[entry_offset] == token_id {
> +                return Ok(BitToken {
> +                    id: image.base.data[entry_offset],
> +                    data_version: image.base.data[entry_offset + 1],
> +                    data_size: u16::from_le_bytes([
> +                        image.base.data[entry_offset + 2],
> +                        image.base.data[entry_offset + 3],
> +                    ]),
> +                    data_offset: u16::from_le_bytes([
> +                        image.base.data[entry_offset + 4],
> +                        image.base.data[entry_offset + 5],
> +                    ]),
> +                });
> +            }
> +        }
> +
> +        // Token not found
> +        Err(ENOENT)
> +    }
> +}
> +
> +/// PCI ROM Expansion Header as defined in PCI Firmware Specification.
> +/// This is header is at the beginning of every image in the set of
> +/// images in the ROM. It contains a pointer to the PCI Data Structure
> +/// which describes the image.
> +/// For "NBSI" images (NoteBook System Information), the ROM
> +/// header deviates from the standard and contains an offset to the
> +/// NBSI image however we do not yet parse that in this module and keep
> +/// it for future reference.
> +#[derive(Debug, Clone, Copy)]
> +#[allow(dead_code)]
> +struct PciRomHeader {
> +    /// 00h: Signature (0xAA55)
> +    pub signature: u16,
> +    /// 02h: Reserved bytes for processor architecture unique data (20 bytes)
> +    pub reserved: [u8; 20],
> +    /// 16h: NBSI Data Offset (NBSI-specific, offset from header to NBSI image)
> +    pub nbsi_data_offset: Option<u16>,
> +    /// 18h: Pointer to PCI Data Structure (offset from start of ROM image)
> +    pub pci_data_struct_offset: u16,
> +    /// 1Ah: Size of block (this is NBSI-specific)
> +    pub size_of_block: Option<u32>,
> +}
> +
> +impl TryFrom<&[u8]> for PciRomHeader {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        if data.len() < 26 {
> +            // Need at least 26 bytes to read pciDataStrucPtr and sizeOfBlock
> +            return Err(EINVAL);
> +        }
> +
> +        let signature = u16::from_le_bytes([data[0], data[1]]);
> +
> +        // Check for valid ROM signatures
> +        match signature {
> +            0xAA55 | 0xBB77 | 0x4E56 => {}
> +            _ => {
> +                pr_err!("ROM signature unknown {:#x}\n", signature);
> +                return Err(EINVAL);
> +            }
> +        }
> +
> +        // Read the pointer to the PCI Data Structure at offset 0x18
> +        let pci_data_struct_ptr = u16::from_le_bytes([data[24], data[25]]);
> +
> +        // Try to read optional fields if enough data
> +        let mut size_of_block = None;
> +        let mut nbsi_data_offset = None;
> +
> +        if data.len() >= 30 {
> +            // Read size_of_block at offset 0x1A
> +            size_of_block = Some(
> +                (data[29] as u32) << 24
> +                    | (data[28] as u32) << 16
> +                    | (data[27] as u32) << 8
> +                    | (data[26] as u32),
> +            );
> +        }
> +
> +        // For NBSI images, try to read the nbsiDataOffset at offset 0x16
> +        if data.len() >= 24 {
> +            nbsi_data_offset = Some(u16::from_le_bytes([data[22], data[23]]));
> +        }
> +
> +        Ok(PciRomHeader {
> +            signature,
> +            reserved: [0u8; 20],
> +            pci_data_struct_offset: pci_data_struct_ptr,
> +            size_of_block,
> +            nbsi_data_offset,
> +        })
> +    }
> +}
> +
> +/// NVIDIA PCI Data Extension Structure. This is similar to the
> +/// PCI Data Structure, but is Nvidia-specific and is placed right after
> +/// the PCI Data Structure. It contains some fields that are redundant
> +/// with the PCI Data Structure, but are needed for traversing the
> +/// BIOS images. It is expected to be present in all BIOS images except
> +/// for NBSI images.
> +#[derive(Debug, Clone)]
> +#[allow(dead_code)]
> +struct NpdeStruct {
> +    /// 00h: Signature ("NPDE")
> +    pub signature: [u8; 4],
> +    /// 04h: NVIDIA PCI Data Extension Revision
> +    pub npci_data_ext_rev: u16,
> +    /// 06h: NVIDIA PCI Data Extension Length
> +    pub npci_data_ext_len: u16,
> +    /// 08h: Sub-image Length (in 512-byte units)
> +    pub subimage_len: u16,
> +    /// 0Ah: Last image indicator flag
> +    pub last_image: u8,
> +}
> +
> +impl TryFrom<&[u8]> for NpdeStruct {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        if data.len() < 11 {
> +            pr_err!("Not enough data for NpdeStruct\n");
> +            return Err(EINVAL);
> +        }
> +
> +        let mut signature = [0u8; 4];
> +        signature.copy_from_slice(&data[0..4]);
> +
> +        // Signature should be "NPDE" (0x4544504E)
> +        if &signature != b"NPDE" {
> +            pr_err!("Invalid signature for NpdeStruct: {:?}\n", signature);
> +            return Err(EINVAL);
> +        }
> +
> +        Ok(NpdeStruct {
> +            signature,
> +            npci_data_ext_rev: u16::from_le_bytes([data[4], data[5]]),
> +            npci_data_ext_len: u16::from_le_bytes([data[6], data[7]]),
> +            subimage_len: u16::from_le_bytes([data[8], data[9]]),
> +            last_image: data[10],
> +        })
> +    }
> +}
> +
> +impl NpdeStruct {
> +    /// Check if this is the last image in the ROM
> +    fn is_last(&self) -> bool {
> +        self.last_image & 0x80 != 0

What's the magic number for?

> +    }
> +
> +    /// Calculate image size in bytes
> +    fn image_size_bytes(&self) -> Result<usize> {
> +        if self.subimage_len > 0 {
> +            // Image size is in 512-byte blocks
> +            Ok(self.subimage_len as usize * 512)
> +        } else {
> +            Err(EINVAL)
> +        }
> +    }
> +
> +    /// Try to find NPDE in the data, the NPDE is right after the PCIR.
> +    fn find_in_data(data: &[u8], rom_header: &PciRomHeader, pcir: &PcirStruct) -> Option<Self> {
> +        // Calculate the offset where NPDE might be located
> +        // NPDE should be right after the PCIR structure, aligned to 16 bytes
> +        let pcir_offset = rom_header.pci_data_struct_offset as usize;
> +        let npde_start = (pcir_offset + pcir.pci_data_struct_len as usize + 0x0F) & !0x0F;
> +
> +        // Check if we have enough data
> +        if npde_start + 11 > data.len() {
> +            pr_err!("Not enough data for NPDE\n");
> +            return None;
> +        }
> +
> +        // Try to create NPDE from the data
> +        NpdeStruct::try_from(&data[npde_start..])
> +            .inspect_err(|e| {
> +                pr_err!("Error creating NpdeStruct: {:?}\n", e);
> +            })
> +            .ok()
> +    }
> +}
> +// Use a macro to implement BiosImage enum and methods. This avoids having to
> +// repeat each enum type when implementing functions like base() in BiosImage.
> +macro_rules! bios_image {
> +    (
> +        $($variant:ident $class:ident),* $(,)?
> +    ) => {
> +        // BiosImage enum with variants for each image type
> +        enum BiosImage {
> +            $($variant($class)),*
> +        }
> +
> +        impl BiosImage {
> +            /// Get a reference to the common BIOS image data regardless of type
> +            fn base(&self) -> &BiosImageBase {
> +                match self {
> +                    $(Self::$variant(img) => &img.base),*
> +                }
> +            }
> +
> +            /// Returns a string representing the type of BIOS image
> +            fn image_type_str(&self) -> &'static str {
> +                match self {
> +                    $(Self::$variant(_) => stringify!($variant)),*
> +                }
> +            }
> +        }
> +    }
> +}
> +
> +impl BiosImage {
> +    /// Check if this is the last image
> +    fn is_last(&self) -> bool {
> +        let base = self.base();
> +
> +        // For NBSI images (type == 0x70), return true as they're
> +        // considered the last image
> +        if matches!(self, Self::Nbsi(_)) {
> +            return true;
> +        }
> +
> +        // For other image types, check NPDE first if available
> +        if let Some(ref npde) = base.npde {
> +            return npde.is_last();
> +        }
> +
> +        // Otherwise, fall back to checking the PCIR last_image flag
> +        base.pcir.is_last()
> +    }
> +
> +    /// Get the image size in bytes
> +    fn image_size_bytes(&self) -> Result<usize> {
> +        let base = self.base();
> +
> +        // Prefer NPDE image size if available
> +        if let Some(ref npde) = base.npde {
> +            return npde.image_size_bytes();
> +        }
> +
> +        // Otherwise, fall back to the PCIR image size
> +        base.pcir.image_size_bytes()
> +    }
> +}
> +
> +bios_image! {
> +    PciAt PciAtBiosImage,   // PCI-AT compatible BIOS image
> +    Efi EfiBiosImage,       // EFI (Extensible Firmware Interface)
> +    Nbsi NbsiBiosImage,     // NBSI (Nvidia Bios System Interface)
> +    FwSec FwSecBiosImage    // FWSEC (Firmware Security)
> +}
> +
> +struct PciAtBiosImage {
> +    base: BiosImageBase,
> +    bit_header: Option<BitHeader>,
> +    bit_offset: Option<usize>,
> +}
> +
> +struct EfiBiosImage {
> +    base: BiosImageBase,
> +    // EFI-specific fields can be added here in the future.
> +}
> +
> +struct NbsiBiosImage {
> +    base: BiosImageBase,
> +    // NBSI-specific fields can be added here in the future.
> +}
> +
> +pub(crate) struct FwSecBiosImage {
> +    base: BiosImageBase,
> +    // FWSEC-specific fields
> +    // The offset of the Falcon data from the start of Fwsec image
> +    falcon_data_offset: Option<usize>,
> +    // The PmuLookupTable starts at the offset of the falcon data pointer
> +    pmu_lookup_table: Option<PmuLookupTable>,
> +    // The offset of the Falcon ucode
> +    falcon_ucode_offset: Option<usize>,
> +}
> +
> +// Convert from BiosImageBase to BiosImage
> +impl TryFrom<BiosImageBase> for BiosImage {
> +    type Error = Error;
> +
> +    fn try_from(base: BiosImageBase) -> Result<Self> {
> +        match base.pcir.code_type {
> +            0x00 => Ok(BiosImage::PciAt(base.try_into()?)),
> +            0x03 => Ok(BiosImage::Efi(EfiBiosImage { base })),
> +            0x70 => Ok(BiosImage::Nbsi(NbsiBiosImage { base })),
> +            0xE0 => Ok(BiosImage::FwSec(FwSecBiosImage {
> +                base,
> +                falcon_data_offset: None,
> +                pmu_lookup_table: None,
> +                falcon_ucode_offset: None,
> +            })),
> +            _ => {
> +                pr_err!("Unknown BIOS image type {:#x}\n", base.pcir.code_type);
> +                Err(EINVAL)
> +            }
> +        }
> +    }
> +}
> +
> +/// BiosImage creation from a byte slice. This creates a BiosImageBase
> +/// and then converts it to a BiosImage which triggers the constructor of
> +/// the specific BiosImage enum variant.
> +impl TryFrom<&[u8]> for BiosImage {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        let base = BiosImageBase::try_from(data)?;
> +        let image = base.to_image()?;
> +
> +        image
> +            .image_size_bytes()
> +            .inspect_err(|_| pr_err!("Invalid image size computed during BiosImage creation\n"))?;
> +
> +        Ok(image)
> +    }
> +}
> +
> +/// BIOS Image structure containing various headers and references
> +/// fields base to all BIOS images. Each BiosImage type has a
> +/// BiosImageBase type along with other image-specific fields.
> +/// Note that Rust favors composition of types over inheritance.
> +#[derive(Debug)]
> +#[allow(dead_code)]
> +struct BiosImageBase {
> +    /// PCI ROM Expansion Header
> +    pub rom_header: PciRomHeader,
> +    /// PCI Data Structure
> +    pub pcir: PcirStruct,
> +    /// NVIDIA PCI Data Extension (optional)
> +    pub npde: Option<NpdeStruct>,
> +    /// Image data (includes ROM header and PCIR)
> +    pub data: KVec<u8>,

I think those fields don't need to have public visibility, given that the
structure has private visibility.

> +}
> +
> +impl BiosImageBase {
> +    fn to_image(self) -> Result<BiosImage> {
> +        BiosImage::try_from(self)
> +    }
> +}
> +
> +impl TryFrom<&[u8]> for BiosImageBase {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        // Ensure we have enough data for the ROM header
> +        if data.len() < 26 {
> +            pr_err!("Not enough data for ROM header\n");
> +            return Err(EINVAL);
> +        }
> +
> +        // Parse the ROM header
> +        let rom_header = PciRomHeader::try_from(&data[0..26])
> +            .inspect_err(|e| pr_err!("Failed to create PciRomHeader: {:?}\n", e))?;
> +
> +        // Get the PCI Data Structure using the pointer from the ROM header
> +        let pcir_offset = rom_header.pci_data_struct_offset as usize;
> +        let pcir_data = data
> +            .get(pcir_offset..pcir_offset + core::mem::size_of::<PcirStruct>())
> +            .ok_or(EINVAL)
> +            .inspect_err(|_| {
> +                pr_err!(
> +                    "PCIR offset {:#x} out of bounds (data length: {})\n",
> +                    pcir_offset,
> +                    data.len()
> +                );
> +                pr_err!("Consider reading more data for construction of BiosImage\n");
> +            })?;
> +
> +        let pcir = PcirStruct::try_from(pcir_data)
> +            .inspect_err(|e| pr_err!("Failed to create PcirStruct: {:?}\n", e))?;
> +
> +        // Look for NPDE structure if this is not an NBSI image (type != 0x70)
> +        let npde = NpdeStruct::find_in_data(data, &rom_header, &pcir);
> +
> +        // Create a copy of the data
> +        let mut data_copy = KVec::new();
> +        data_copy.extend_with(data.len(), 0, GFP_KERNEL)?;
> +        data_copy.copy_from_slice(data);
> +
> +        Ok(BiosImageBase {
> +            rom_header,
> +            pcir,
> +            npde,
> +            data: data_copy,
> +        })
> +    }
> +}
> +
> +/// The PciAt BIOS image is typically the first BIOS image type found in the
> +/// BIOS image chain. It contains the BIT header and the BIT tokens.
> +impl PciAtBiosImage {
> +    /// Find a byte pattern in a slice
> +    fn find_byte_pattern(haystack: &[u8], needle: &[u8]) -> Option<usize> {
> +        haystack
> +            .windows(needle.len())
> +            .position(|window| window == needle)
> +    }
> +
> +    /// Find the BIT header in the PciAtBiosImage
> +    fn find_bit_header(data: &[u8]) -> Result<(BitHeader, usize)> {
> +        let bit_pattern = [0xff, 0xb8, b'B', b'I', b'T', 0x00];
> +        let bit_offset = Self::find_byte_pattern(data, &bit_pattern);
> +        if bit_offset.is_none() {
> +            return Err(EINVAL);
> +        }
> +
> +        let bit_header = BitHeader::try_from(&data[bit_offset.unwrap()..])?;
> +        Ok((bit_header, bit_offset.unwrap()))
> +    }
> +
> +    /// Get a BIT token entry from the BIT table in the PciAtBiosImage
> +    fn get_bit_token(&self, token_id: u8) -> Result<BitToken> {
> +        BitToken::from_id(self, token_id)
> +    }
> +
> +    /// Find the Falcon data pointer structure in the PciAtBiosImage
> +    /// This is just a 4 byte structure that contains a pointer to the
> +    /// Falcon data in the FWSEC image.
> +    fn falcon_data_ptr(&self) -> Result<u32> {
> +        let token = self.get_bit_token(BIT_TOKEN_ID_FALCON_DATA)?;
> +
> +        // Make sure we don't go out of bounds
> +        if token.data_offset as usize + 4 > self.base.data.len() {
> +            return Err(EINVAL);
> +        }
> +
> +        // read the 4 bytes at the offset specified in the token
> +        let offset = token.data_offset as usize;
> +        let bytes: [u8; 4] = self.base.data[offset..offset + 4].try_into().map_err(|_| {
> +            pr_err!("Failed to convert data slice to array");
> +            EINVAL
> +        })?;
> +
> +        let data_ptr = u32::from_le_bytes(bytes);
> +
> +        if (data_ptr as usize) < self.base.data.len() {
> +            pr_err!("Falcon data pointer out of bounds\n");
> +            return Err(EINVAL);
> +        }
> +
> +        Ok(data_ptr)
> +    }
> +}
> +
> +impl TryFrom<BiosImageBase> for PciAtBiosImage {
> +    type Error = Error;
> +
> +    fn try_from(base: BiosImageBase) -> Result<Self> {
> +        let data_slice = &base.data;
> +        let (bit_header, bit_offset) = PciAtBiosImage::find_bit_header(data_slice)?;
> +
> +        Ok(PciAtBiosImage {
> +            base,
> +            bit_header: Some(bit_header),
> +            bit_offset: Some(bit_offset),
> +        })
> +    }
> +}
> +
> +/// The PmuLookupTableEntry structure is a single entry in the PmuLookupTable.
> +/// See the PmuLookupTable description for more information.
> +#[allow(dead_code)]
> +struct PmuLookupTableEntry {
> +    application_id: u8,
> +    target_id: u8,
> +    data: u32,
> +}
> +
> +impl TryFrom<&[u8]> for PmuLookupTableEntry {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        if data.len() < 5 {
> +            return Err(EINVAL);
> +        }
> +
> +        Ok(PmuLookupTableEntry {
> +            application_id: data[0],
> +            target_id: data[1],
> +            data: u32::from_le_bytes(data[2..6].try_into().map_err(|_| EINVAL)?),
> +        })
> +    }
> +}
> +
> +/// The PmuLookupTableEntry structure is used to find the PmuLookupTableEntry
> +/// for a given application ID. The table of entries is pointed to by the falcon
> +/// data pointer in the BIT table, and is used to locate the Falcon Ucode.
> +#[allow(dead_code)]
> +struct PmuLookupTable {
> +    version: u8,
> +    header_len: u8,
> +    entry_len: u8,
> +    entry_count: u8,
> +    table_data: KVec<u8>,
> +}
> +
> +impl TryFrom<&[u8]> for PmuLookupTable {
> +    type Error = Error;
> +
> +    fn try_from(data: &[u8]) -> Result<Self> {
> +        if data.len() < 4 {
> +            return Err(EINVAL);
> +        }
> +
> +        let header_len = data[1] as usize;
> +        let entry_len = data[2] as usize;
> +        let entry_count = data[3] as usize;
> +
> +        let required_bytes = header_len + (entry_count * entry_len);
> +
> +        if data.len() < required_bytes {
> +            return Err(EINVAL);
> +        }
> +
> +        // Create a copy of only the table data
> +        let mut table_data = KVec::new();
> +
> +        // "last_entry_bytes" is a debugging aid.
> +        // let mut last_entry_bytes: Option<KVec<u8>> = Some(KVec::new());
> +
> +        for &byte in &data[header_len..required_bytes] {
> +            table_data.push(byte, GFP_KERNEL)?;
> +            /*
> +             * Uncomment for debugging (dumps the table data to dmesg):
> +             * last_entry_bytes.as_mut().ok_or(EINVAL)?.push(byte, GFP_KERNEL)?;
> +             *
> +             * let last_entry_bytes_len = last_entry_bytes.as_ref().ok_or(EINVAL)?.len();
> +             * if last_entry_bytes_len == entry_len {
> +             *     pr_info!("Last entry bytes: {:02x?}\n", &last_entry_bytes.as_ref().ok_or(EINVAL)?[..]);
> +             *     last_entry_bytes = Some(KVec::new());
> +             * }
> +             */

You could hide this behind cfg!(debug_assertions).

> +        }
> +
> +        Ok(PmuLookupTable {
> +            version: data[0],
> +            header_len: header_len as u8,
> +            entry_len: entry_len as u8,
> +            entry_count: entry_count as u8,
> +            table_data,
> +        })
> +    }
> +}
> +
> +impl PmuLookupTable {
> +    fn lookup_index(&self, idx: u8) -> Result<PmuLookupTableEntry> {
> +        if idx >= self.entry_count {
> +            return Err(EINVAL);
> +        }
> +
> +        let index = (idx as usize) * self.entry_len as usize;
> +        Ok(PmuLookupTableEntry::try_from(&self.table_data[index..])?)
> +    }
> +
> +    // find entry by type value
> +    fn find_entry_by_type(&self, entry_type: u8) -> Result<PmuLookupTableEntry> {
> +        for i in 0..self.entry_count {
> +            let entry = self.lookup_index(i)?;
> +            if entry.application_id == entry_type {
> +                return Ok(entry);
> +            }
> +        }
> +
> +        Err(EINVAL)
> +    }
> +}
> +
> +/// The FwSecBiosImage structure contains the PMU table and the Falcon Ucode.
> +/// The PMU table contains voltage/frequency tables as well as a pointer to the
> +/// Falcon Ucode.
> +impl FwSecBiosImage {
> +    fn setup_falcon_data(
> +        &mut self,
> +        pci_at_image: &PciAtBiosImage,
> +        first_fwsec_image: &FwSecBiosImage,
> +    ) -> Result<()> {
> +        let mut offset = pci_at_image.falcon_data_ptr()? as usize;
> +
> +        // The falcon data pointer assumes that the PciAt and FWSEC images
> +        // are contiguous in memory. However, testing shows the EFI image sits in
> +        // between them. So calculate the offset from the end of the PciAt image
> +        // rather than the start of it. Compensate.
> +        offset -= pci_at_image.base.data.len();
> +
> +        // The offset is now from the start of the first Fwsec image, however
> +        // the offset points to a location in the second Fwsec image. Since
> +        // the fwsec images are contiguous, subtract the length of the first Fwsec
> +        // image from the offset to get the offset to the start of the second
> +        // Fwsec image.
> +        offset -= first_fwsec_image.base.data.len();
> +
> +        self.falcon_data_offset = Some(offset);
> +
> +        // The PmuLookupTable starts at the offset of the falcon data pointer
> +        self.pmu_lookup_table = Some(PmuLookupTable::try_from(&self.base.data[offset..])?);
> +
> +        match self
> +            .pmu_lookup_table
> +            .as_ref()
> +            .ok_or(EINVAL)?
> +            .find_entry_by_type(FALCON_UCODE_ENTRY_APPID_FWSEC_PROD)
> +        {
> +            Ok(entry) => {
> +                let mut ucode_offset = entry.data as usize;
> +                ucode_offset -= pci_at_image.base.data.len();
> +                ucode_offset -= first_fwsec_image.base.data.len();
> +                self.falcon_ucode_offset = Some(ucode_offset);
> +
> +                /*
> +                 * Uncomment for debug: print the v3_desc header
> +                 * let v3_desc = self.fwsec_header()?;
> +                 * pr_info!("PmuLookupTableEntry v3_desc: {:#?}\n", v3_desc);
> +                 */
> +            }
> +            Err(e) => {
> +                pr_err!("PmuLookupTableEntry not found, error: {:?}\n", e);
> +            }
> +        }
> +        Ok(())
> +    }
> +
> +    /// TODO: These were borrowed from the old code for integrating this module
> +    /// with the outside world. They should be cleaned up and integrated properly.
> +    ///
> +    /// Get the FwSec header (FalconUCodeDescV3)
> +    fn fwsec_header(&self) -> Result<&FalconUCodeDescV3> {
> +        // Get the falcon ucode offset that was found in setup_falcon_data
> +        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)? as usize;
> +
> +        // Make sure the offset is within the data bounds
> +        if falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>() > self.base.data.len() {
> +            pr_err!("fwsec-frts header not contained within BIOS bounds\n");
> +            return Err(ERANGE);
> +        }
> +
> +        // Read the first 4 bytes to get the version
> +        let hdr_bytes: [u8; 4] = self.base.data[falcon_ucode_offset..falcon_ucode_offset + 4]
> +            .try_into()
> +            .map_err(|_| EINVAL)?;
> +        let hdr = u32::from_le_bytes(hdr_bytes);
> +        let ver = (hdr & 0xff00) >> 8;
> +
> +        if ver != 3 {
> +            pr_err!("invalid fwsec firmware version\n");
> +            return Err(EINVAL);
> +        }
> +
> +        // Return a reference to the FalconUCodeDescV3 structure
> +        Ok(unsafe {
> +            &*(self.base.data.as_ptr().add(falcon_ucode_offset) as *const FalconUCodeDescV3)
> +        })
> +    }
> +    /// Get the ucode data as a byte slice
> +    fn fwsec_ucode(&self, v3_desc: &FalconUCodeDescV3) -> Result<&[u8]> {
> +        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)? as usize;
> +
> +        // The ucode data follows the descriptor
> +        let ucode_data_offset = falcon_ucode_offset + v3_desc.size();
> +        let size = (v3_desc.imem_load_size + v3_desc.dmem_load_size) as usize;
> +
> +        // Get the data slice, checking bounds in a single operation
> +        self.base
> +            .data
> +            .get(ucode_data_offset..ucode_data_offset + size)
> +            .ok_or(ERANGE)
> +            .inspect_err(|_| pr_err!("fwsec ucode data not contained within BIOS bounds\n"))
> +    }
> +
> +    /// Get the signatures as a byte slice
> +    fn fwsec_sigs(&self, v3_desc: &FalconUCodeDescV3) -> Result<&[u8]> {
> +        const SIG_SIZE: usize = 96 * 4;
> +
> +        let falcon_ucode_offset = self.falcon_ucode_offset.ok_or(EINVAL)? as usize;
> +
> +        // The signatures data follows the descriptor
> +        let sigs_data_offset = falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
> +        let size = v3_desc.signature_count as usize * SIG_SIZE;
> +
> +        // Make sure the data is within bounds
> +        if sigs_data_offset + size > self.base.data.len() {
> +            pr_err!("fwsec signatures data not contained within BIOS bounds\n");
> +            return Err(ERANGE);
> +        }
> +
> +        Ok(&self.base.data[sigs_data_offset..sigs_data_offset + size])
> +    }
> +}
> 
> -- 
> 2.49.0
> 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 14:06   ` Danilo Krummrich
@ 2025-04-23 14:52     ` Joel Fernandes
  2025-04-23 15:02       ` Danilo Krummrich
  2025-04-24 18:54     ` [PATCH 13/16] " Joel Fernandes
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-23 14:52 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel

Hello, Danilo,
Thanks for all the feedback. Due to the volume of feedback, I will respond
incrementally in multiple emails so we can discuss as we go - hope that's Ok but
let me know if that is annoying.

On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:45PM +0900, Alexandre Courbot wrote:
>> From: Joel Fernandes <joelagnelf@nvidia.com>
>>
>> Add support for navigating and setting up vBIOS ucode data required for
[...]
>> diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..534107b708cab0eb8d0accf7daa5718edf030358
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/vbios.rs
>> @@ -0,0 +1,1103 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +// To be removed when all code is used.
>> +#![allow(dead_code)]
> 
> Please not, use 'expect' and and only where needed. If it would be too much,
> it's probably a good indicator that we want to reduce the size of the patch for
> now.
> 

Sure, I will switch to expect. The addition of a bit of dead code is intentional
as we want to keep unused bits for future extension and lesser reader ambiguity.

Note that I've already been conservative with not adding too much more code than
we need (otherwise this patch could have been 2X the size), however VBIOS is a
complicated thing and I think we want to keep a little more than we need for
future extension for GPU families and proper documentation.

>> +
>> +//! VBIOS extraction and parsing.
>> +
>> +use crate::driver::Bar0;
>> +use crate::firmware::FalconUCodeDescV3;
>> +use core::convert::TryFrom;
>> +use kernel::devres::Devres;
>> +use kernel::error::Result;
>> +use kernel::prelude::*;
>> +
>> +/// The offset of the VBIOS ROM in the BAR0 space.
>> +const ROM_OFFSET: usize = 0x300000;
>> +/// The maximum length of the VBIOS ROM to scan into.
>> +const BIOS_MAX_SCAN_LEN: usize = 0x100000;
>> +/// The size to read ahead when parsing initial BIOS image headers.
>> +const BIOS_READ_AHEAD_SIZE: usize = 1024;
>> +
>> +// PMU lookup table entry types. Used to locate PMU table entries
>> +// in the Fwsec image, corresponding to falcon ucodes.
>> +#[allow(dead_code)]
>> +const FALCON_UCODE_ENTRY_APPID_FIRMWARE_SEC_LIC: u8 = 0x05;
>> +#[allow(dead_code)]
>> +const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45;
>> +const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85;
>> +
>> +pub(crate) struct Vbios {
>> +    pub fwsec_image: Option<FwSecBiosImage>,
>> +}
>> +
>> +impl Vbios {
>> +    /// Read bytes from the ROM at the current end of the data vector
>> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
>> +        let current_len = data.len();
>> +        let start = ROM_OFFSET + current_len;
>> +
>> +        // Ensure length is a multiple of 4 for 32-bit reads
>> +        if len % core::mem::size_of::<u32>() != 0 {
>> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
> 
> Please don't use any of the pr_*() print macros within a driver, use the dev_*()
> ones instead.

Ok I'll switch to this. One slight complication is I've to retrieve the 'dev'
from the Bar0 and pass that along, but that should be doable.

> 
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        // Allocate and zero-initialize the required memory
> 
> That's obvious from the code, if you feel this needs a comment, better explain
> what we need it for, why zero-initialize, etc.

Sure, actually the extends_with() is a performance optimization as we want to do
only a single allocation and then fill in the allocated data. It has nothing to
do with 0-initializing per-se. I will adjust the comment, but:

This code...

>> +        data.extend_with(len, 0, GFP_KERNEL)?;
>> +        with_bar!(?bar0, |bar0_ref| {
>> +            let dst = &mut data[current_len..current_len + len];
>> +            for (idx, chunk) in dst
>> +                .chunks_exact_mut(core::mem::size_of::<u32>())
>> +                .enumerate()
>> +            {
>> +                let addr = start + (idx * core::mem::size_of::<u32>());
>> +                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
>> +                // method out of convenience to convert the 32-bit integer as it
>> +                // is in memory into a byte array without any endianness
>> +                // conversion or byte-swapping.
>> +                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
>> +            }
>> +            Ok(())
>> +        })?;
>> +
>> +        Ok(())
>> +    }
..actually initially was:

+        with_bar!(self.bar0, |bar0| {
+            // Get current length
+            let current_len = self.data.len();
+
+            // Read ROM data bytes push directly to vector
+            for i in 0..bytes as usize {
+                // Read byte from the VBIOS ROM and push it to the data vector
+                let rom_addr = ROM_OFFSET + current_len + i;
+                let byte = bar0.try_readb(rom_addr)?;
+                self.data.push(byte, GFP_KERNEL)?;

Where this bit could result in a lot of allocation.

There was an unsafe() way of not having to do this but we settled with
extends_with().

Thoughts?

Thanks.




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 14:52     ` Joel Fernandes
@ 2025-04-23 15:02       ` Danilo Krummrich
  2025-04-24 19:19         ` Joel Fernandes
  2025-04-24 19:54         ` Joel Fernandes
  0 siblings, 2 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-23 15:02 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote:
> Hello, Danilo,
> Thanks for all the feedback. Due to the volume of feedback, I will respond
> incrementally in multiple emails so we can discuss as we go - hope that's Ok but
> let me know if that is annoying.

That's perfectly fine, whatever works best for you. :)

> On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> 
> >> +impl Vbios {
> >> +    /// Read bytes from the ROM at the current end of the data vector
> >> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
> >> +        let current_len = data.len();
> >> +        let start = ROM_OFFSET + current_len;
> >> +
> >> +        // Ensure length is a multiple of 4 for 32-bit reads
> >> +        if len % core::mem::size_of::<u32>() != 0 {
> >> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
> > 
> > Please don't use any of the pr_*() print macros within a driver, use the dev_*()
> > ones instead.
> 
> Ok I'll switch to this. One slight complication is I've to retrieve the 'dev'
> from the Bar0 and pass that along, but that should be doable.

You can also pass the pci::Device reference to VBios::probe() directly.

> 
> > 
> >> +            return Err(EINVAL);
> >> +        }
> >> +
> >> +        // Allocate and zero-initialize the required memory
> > 
> > That's obvious from the code, if you feel this needs a comment, better explain
> > what we need it for, why zero-initialize, etc.
> 
> Sure, actually the extends_with() is a performance optimization as we want to do
> only a single allocation and then fill in the allocated data. It has nothing to
> do with 0-initializing per-se. I will adjust the comment, but:
> 
> This code...
> 
> >> +        data.extend_with(len, 0, GFP_KERNEL)?;
> >> +        with_bar!(?bar0, |bar0_ref| {
> >> +            let dst = &mut data[current_len..current_len + len];
> >> +            for (idx, chunk) in dst
> >> +                .chunks_exact_mut(core::mem::size_of::<u32>())
> >> +                .enumerate()
> >> +            {
> >> +                let addr = start + (idx * core::mem::size_of::<u32>());
> >> +                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
> >> +                // method out of convenience to convert the 32-bit integer as it
> >> +                // is in memory into a byte array without any endianness
> >> +                // conversion or byte-swapping.
> >> +                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
> >> +            }
> >> +            Ok(())
> >> +        })?;
> >> +
> >> +        Ok(())
> >> +    }
> ..actually initially was:
> 
> +        with_bar!(self.bar0, |bar0| {
> +            // Get current length
> +            let current_len = self.data.len();
> +
> +            // Read ROM data bytes push directly to vector
> +            for i in 0..bytes as usize {
> +                // Read byte from the VBIOS ROM and push it to the data vector
> +                let rom_addr = ROM_OFFSET + current_len + i;
> +                let byte = bar0.try_readb(rom_addr)?;
> +                self.data.push(byte, GFP_KERNEL)?;
> 
> Where this bit could result in a lot of allocation.
> 
> There was an unsafe() way of not having to do this but we settled with
> extends_with().
> 
> Thoughts?

If I understand you correctly, you just want to make sure that subsequent push()
calls don't re-allocate? If so, you can just use reserve() [1] and keep the
subsequent push() calls.

[1] https://rust.docs.kernel.org/kernel/alloc/kvec/struct.Vec.html#method.reserve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset
  2025-04-22 16:23   ` Joel Fernandes
@ 2025-04-24  7:50     ` Alexandre Courbot
  0 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-24  7:50 UTC (permalink / raw)
  To: Joel Fernandes, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet
  Cc: John Hubbard, Ben Skeggs, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Wed Apr 23, 2025 at 1:23 AM JST, Joel Fernandes wrote:
>
>
> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
>> We will commonly need to compare chipset versions, so derive the
>> ordering traits to make that possible. Also derive Copy and Clone since
>> passing Chipset by value will be more efficient than by reference.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/gpu.rs | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 17c9660da45034762edaa78e372d8821144cdeb7..4de67a2dc16302c00530026156d7264cbc7e5b32 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -13,7 +13,7 @@ macro_rules! define_chipset {
>>      ({ $($variant:ident = $value:expr),* $(,)* }) =>
>>      {
>>          /// Enum representation of the GPU chipset.
>> -        #[derive(fmt::Debug)]
>> +        #[derive(fmt::Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
>
> Since Ord implies PartialOrd, do you need both? Same for Eq.

Ord does not imply PartialOrd, it requires it. It's a bit cumbersome but
the compiler will throw an error if `Ord` is derived without
`PartialOrd`. Same thing applies for `Eq`.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 14:06   ` Danilo Krummrich
  2025-04-23 14:52     ` Joel Fernandes
@ 2025-04-24 18:54     ` Joel Fernandes
  2025-04-24 20:08       ` Danilo Krummrich
  2025-04-24 20:22     ` [PATCH 13/16] " Joel Fernandes
  2025-04-26 23:17     ` [13/16] " Joel Fernandes
  3 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-24 18:54 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel



On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
[...]
>> +
>> +    /// Probe for VBIOS extraction
>> +    /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
>> +    pub(crate) fn probe(bar0: &Devres<Bar0>) -> Result<Self> {
> 
> Let's not call it probe(), what about VBios::parse(), or simply VBios::new()?
> 

Yes, new() is better. I changed it.

>> +        // VBIOS data vector: As BIOS images are scanned, they are added to this vector
>> +        // for reference or copying into other data structures. It is the entire
>> +        // scanned contents of the VBIOS which progressively extends. It is used
>> +        // so that we do not re-read any contents that are already read as we use
>> +        // the cumulative length read so far, and re-read any gaps as we extend
>> +        // the length
>> +        let mut data = KVec::new();
>> +
>> +        // Loop through all the BiosImage and extract relevant ones and relevant data from them
>> +        let mut cur_offset = 0;
> 
> I suggest to create a new type that contains data and offset and implement
> read_bios_image_at_offset() and friends as methods of this type. I think this
> would turn out much cleaner.
I moved it into struct Vbios {} itself instead of introducing a new type. Is
that Ok?

I agree it is cleaner. Please see below link for this particular refactor
(moving data) and let me know if it looks Ok to you: http://bit.ly/4lHfDKZ

Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 15:02       ` Danilo Krummrich
@ 2025-04-24 19:19         ` Joel Fernandes
  2025-04-24 20:01           ` Danilo Krummrich
  2025-04-24 19:54         ` Joel Fernandes
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-24 19:19 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:

[..]

> > >> +        data.extend_with(len, 0, GFP_KERNEL)?;
> > >> +        with_bar!(?bar0, |bar0_ref| {
> > >> +            let dst = &mut data[current_len..current_len + len];
> > >> +            for (idx, chunk) in dst
> > >> +                .chunks_exact_mut(core::mem::size_of::<u32>())
> > >> +                .enumerate()
> > >> +            {
> > >> +                let addr = start + (idx * core::mem::size_of::<u32>());
> > >> +                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
> > >> +                // method out of convenience to convert the 32-bit integer as it
> > >> +                // is in memory into a byte array without any endianness
> > >> +                // conversion or byte-swapping.
> > >> +                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
> > >> +            }
> > >> +            Ok(())
> > >> +        })?;
> > >> +
> > >> +        Ok(())
> > >> +    }
> > ..actually initially was:
> > 
> > +        with_bar!(self.bar0, |bar0| {
> > +            // Get current length
> > +            let current_len = self.data.len();
> > +
> > +            // Read ROM data bytes push directly to vector
> > +            for i in 0..bytes as usize {
> > +                // Read byte from the VBIOS ROM and push it to the data vector
> > +                let rom_addr = ROM_OFFSET + current_len + i;
> > +                let byte = bar0.try_readb(rom_addr)?;
> > +                self.data.push(byte, GFP_KERNEL)?;
> > 
> > Where this bit could result in a lot of allocation.
> > 
> > There was an unsafe() way of not having to do this but we settled with
> > extends_with().
> > 
> > Thoughts?
> 
> If I understand you correctly, you just want to make sure that subsequent push()
> calls don't re-allocate? If so, you can just use reserve() [1] and keep the
> subsequent push() calls.
> 
> [1] https://rust.docs.kernel.org/kernel/alloc/kvec/struct.Vec.html#method.reserve



Ok that does turn out to be cleaner! I replaced it with the following and it works.

Let me know if it looks good now? Here's a preview:

-        data.extend_with(len, 0, GFP_KERNEL)?;
+        data.reserve(len, GFP_KERNEL)?;
+
         with_bar_res!(bar0, |bar0_ref| {
-            let dst = &mut data[current_len..current_len + len];
-            for (idx, chunk) in dst
-                .chunks_exact_mut(core::mem::size_of::<u32>())
-                .enumerate()
-            {
-                let addr = start + (idx * core::mem::size_of::<u32>());
-                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
-                // method out of convenience to convert the 32-bit integer as it
-                // is in memory into a byte array without any endianness
-                // conversion or byte-swapping.
-                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
+            // Read ROM data bytes and push directly to vector
+            for i in 0..len {
+                // Read 32-bit word from the VBIOS ROM
+                let rom_addr = start + i * core::mem::size_of::<u32>();
+                let word = bar0_ref.try_read32(rom_addr)?;
+
+                // Convert the u32 to a 4 byte array and push each byte
+                word.to_ne_bytes().iter().try_for_each(|&b| data.push(b, GFP_KERNEL))?;
             }

Thanks.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 15:02       ` Danilo Krummrich
  2025-04-24 19:19         ` Joel Fernandes
@ 2025-04-24 19:54         ` Joel Fernandes
  2025-04-24 20:17           ` Danilo Krummrich
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-24 19:54 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:
> On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote:
> > Hello, Danilo,
> > Thanks for all the feedback. Due to the volume of feedback, I will respond
> > incrementally in multiple emails so we can discuss as we go - hope that's Ok but
> > let me know if that is annoying.
> 
> That's perfectly fine, whatever works best for you. :)
> 
> > On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> > 
> > >> +impl Vbios {
> > >> +    /// Read bytes from the ROM at the current end of the data vector
> > >> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
> > >> +        let current_len = data.len();
> > >> +        let start = ROM_OFFSET + current_len;
> > >> +
> > >> +        // Ensure length is a multiple of 4 for 32-bit reads
> > >> +        if len % core::mem::size_of::<u32>() != 0 {
> > >> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
> > > 
> > > Please don't use any of the pr_*() print macros within a driver, use the dev_*()
> > > ones instead.
> > 
> > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev'
> > from the Bar0 and pass that along, but that should be doable.
> 
> You can also pass the pci::Device reference to VBios::probe() directly.


This turns out to be rather difficult to do in the whole vbios.rs because
we'd have to them propogate pdev to various class methods which may print
errors (some of which don't make sense to pass pdev to, like try_from()). But
I can do it in probe() (or new() as we call it now). See below for preview
diff doing this for many prints where possible, does this work for you?

Preview diff (give or take rustfmt):

---8<-----------------------

diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
index 43cf34a078ae..808e8446ac79 100644
--- a/drivers/gpu/nova-core/firmware/gsp.rs
+++ b/drivers/gpu/nova-core/firmware/gsp.rs
@@ -236,7 +236,7 @@ pub(crate) fn new(
         falcon: &Falcon<Gsp>,
         pdev: &Device,
         bar: &Devres<Bar0>,
-        bios: &Vbios,
+        bios: &Vbios<'_>,
         cmd: FwsecCommand,
     ) -> Result<Self> {
         let v3_desc = bios.fwsec_header()?;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 0069d6ec8751..aa301e2a7111 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -259,7 +259,7 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<
 
         let fb_layout = FbLayout::new(spec.chipset, &bar, &fw.bootloader)?;
         pr_info!("{:#x?}\n", fb_layout);
-        let bios = Vbios::new(&bar)?;
+        let bios = Vbios::new(pdev, &bar)?;
 
         // TODO: should we write 0x0 back when we drop this object?
         let sysmem_flush = DmaObject::new(pdev.as_ref(), 0x1000, "sysmem flush page")?;
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index f0c43a5143d0..c5a8333f00b2 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -8,6 +8,7 @@
 use kernel::devres::Devres;
 use kernel::error::Result;
 use kernel::prelude::*;
+use kernel::pci;
 
 /// The offset of the VBIOS ROM in the BAR0 space.
 const ROM_OFFSET: usize = 0x300000;
@@ -24,7 +25,8 @@
 const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45;
 const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85;
 
-pub(crate) struct Vbios {
+pub(crate) struct Vbios<'a> {
+    pdev: &'a pci::Device,
     pub fwsec_image: Option<FwSecBiosImage>,
     // VBIOS data vector: As BIOS images are scanned, they are added to this vector
     // for reference or copying into other data structures. It is the entire
@@ -35,7 +37,7 @@ pub(crate) struct Vbios {
     data: Option<KVec<u8>>,
 }
 
-impl Vbios {
+impl Vbios<'_> {
     /// Read bytes from the ROM at the current end of the data vector
     fn read_more(&mut self, bar0: &Devres<Bar0>, len: usize) -> Result {
         let data = self.data.as_mut().ok_or(EINVAL)?;
@@ -44,7 +46,7 @@ fn read_more(&mut self, bar0: &Devres<Bar0>, len: usize) -> Result {
 
         // Ensure length is a multiple of 4 for 32-bit reads
         if len % core::mem::size_of::<u32>() != 0 {
-            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
+            dev_err!(self.pdev.as_ref(), "VBIOS read length {} is not a multiple of 4\n", len);
             return Err(EINVAL);
         }
 
@@ -73,7 +75,7 @@ fn read_more_at_offset(
         len: usize,
     ) -> Result {
         if offset > BIOS_MAX_SCAN_LEN {
-            pr_err!("Error: exceeded BIOS scan limit.\n");
+            dev_err!(self.pdev.as_ref(), "Error: exceeded BIOS scan limit.\n");
             return Err(EINVAL);
         }
 
@@ -101,13 +103,14 @@ fn read_bios_image_at_offset(
         let data_len = self.data.as_ref().ok_or(EINVAL)?.len();
         if offset + len > data_len {
             self.read_more_at_offset(bar0, offset, len).inspect_err(|e| {
-                pr_err!("Failed to read more at offset {:#x}: {:?}\n", offset, e)
+                dev_err!(self.pdev.as_ref(), "Failed to read more at offset {:#x}: {:?}\n", offset, e)
             })?;
         }
 
         let data = self.data.as_ref().ok_or(EINVAL)?;
         BiosImage::try_from(&data[offset..offset + len]).inspect_err(|e| {
-            pr_err!(
+            dev_err!(
+                self.pdev.as_ref(),
                 "Failed to create BiosImage at offset {:#x}: {:?}\n",
                 offset,
                 e
@@ -117,8 +120,9 @@ fn read_bios_image_at_offset(
 
     /// Probe for VBIOS extraction
     /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
-    pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> {
+    pub(crate) fn new(pdev: &pci::Device, bar0: &Devres<Bar0>) -> Result<Self> {
         let mut vbios = Self {
+            pdev,
             fwsec_image: None,
             data: Some(KVec::new()),
         };
@@ -137,7 +141,8 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> {
                 vbios.read_bios_image_at_offset(bar0, cur_offset, BIOS_READ_AHEAD_SIZE)
                     .and_then(|image| image.image_size_bytes())
                     .inspect_err(|e| {
-                        pr_err!(
+                        dev_err!(
+                            vbios.pdev.as_ref(),
                             "Failed to parse initial BIOS image headers at offset {:#x}: {:?}\n",
                             cur_offset,
                             e
@@ -148,7 +153,8 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> {
             let full_image =
                 vbios.read_bios_image_at_offset(bar0, cur_offset, image_size)
                     .inspect_err(|e| {
-                        pr_err!(
+                        dev_err!(
+                            vbios.pdev.as_ref(),
                             "Failed to parse full BIOS image at offset {:#x}: {:?}\n",
                             cur_offset,
                             e
@@ -158,7 +164,8 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> {
             // Determine the image type
             let image_type = full_image.image_type_str();
 
-            pr_info!(
+            dev_info!(
+                vbios.pdev.as_ref(),
                 "Found BIOS image at offset {:#x}, size: {:#x}, type: {}\n",
                 cur_offset,
                 image_size,
@@ -195,7 +202,7 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> {
 
             // Safety check - don't go beyond BIOS_MAX_SCAN_LEN (1MB)
             if cur_offset > BIOS_MAX_SCAN_LEN {
-                pr_err!("Error: exceeded BIOS scan limit, stopping scan\n");
+                dev_err!(vbios.pdev.as_ref(), "Error: exceeded BIOS scan limit, stopping scan\n");
                 break;
             }
         } // end of loop
@@ -212,9 +219,9 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> {
             {
                 second
                     .setup_falcon_data(pci_at, first)
-                    .inspect_err(|e| pr_err!("Falcon data setup failed: {:?}\n", e))?;
+                    .inspect_err(|e| dev_err!(vbios.pdev.as_ref(), "Falcon data setup failed: {:?}\n", e))?;
             } else {
-                pr_err!("Missing required images for falcon data setup, skipping\n");
+                dev_err!(vbios.pdev.as_ref(), "Missing required images for falcon data setup, skipping\n");
             }
             second // Return the potentially modified second image
         };

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-24 19:19         ` Joel Fernandes
@ 2025-04-24 20:01           ` Danilo Krummrich
  0 siblings, 0 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-24 20:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Thu, Apr 24, 2025 at 03:19:00PM -0400, Joel Fernandes wrote:
> On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:
> 
> [..]
> 
> > > >> +        data.extend_with(len, 0, GFP_KERNEL)?;
> > > >> +        with_bar!(?bar0, |bar0_ref| {
> > > >> +            let dst = &mut data[current_len..current_len + len];
> > > >> +            for (idx, chunk) in dst
> > > >> +                .chunks_exact_mut(core::mem::size_of::<u32>())
> > > >> +                .enumerate()
> > > >> +            {
> > > >> +                let addr = start + (idx * core::mem::size_of::<u32>());
> > > >> +                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
> > > >> +                // method out of convenience to convert the 32-bit integer as it
> > > >> +                // is in memory into a byte array without any endianness
> > > >> +                // conversion or byte-swapping.
> > > >> +                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
> > > >> +            }
> > > >> +            Ok(())
> > > >> +        })?;
> > > >> +
> > > >> +        Ok(())
> > > >> +    }
> > > ..actually initially was:
> > > 
> > > +        with_bar!(self.bar0, |bar0| {
> > > +            // Get current length
> > > +            let current_len = self.data.len();
> > > +
> > > +            // Read ROM data bytes push directly to vector
> > > +            for i in 0..bytes as usize {
> > > +                // Read byte from the VBIOS ROM and push it to the data vector
> > > +                let rom_addr = ROM_OFFSET + current_len + i;
> > > +                let byte = bar0.try_readb(rom_addr)?;
> > > +                self.data.push(byte, GFP_KERNEL)?;
> > > 
> > > Where this bit could result in a lot of allocation.
> > > 
> > > There was an unsafe() way of not having to do this but we settled with
> > > extends_with().
> > > 
> > > Thoughts?
> > 
> > If I understand you correctly, you just want to make sure that subsequent push()
> > calls don't re-allocate? If so, you can just use reserve() [1] and keep the
> > subsequent push() calls.
> > 
> > [1] https://rust.docs.kernel.org/kernel/alloc/kvec/struct.Vec.html#method.reserve
> 
> 
> 
> Ok that does turn out to be cleaner! I replaced it with the following and it works.
> 
> Let me know if it looks good now? Here's a preview:
> 
> -        data.extend_with(len, 0, GFP_KERNEL)?;
> +        data.reserve(len, GFP_KERNEL)?;
> +
>          with_bar_res!(bar0, |bar0_ref| {
> -            let dst = &mut data[current_len..current_len + len];
> -            for (idx, chunk) in dst
> -                .chunks_exact_mut(core::mem::size_of::<u32>())
> -                .enumerate()
> -            {
> -                let addr = start + (idx * core::mem::size_of::<u32>());
> -                // Convert the u32 to a 4 byte array. We use the .to_ne_bytes()
> -                // method out of convenience to convert the 32-bit integer as it
> -                // is in memory into a byte array without any endianness
> -                // conversion or byte-swapping.
> -                chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes());
> +            // Read ROM data bytes and push directly to vector
> +            for i in 0..len {
> +                // Read 32-bit word from the VBIOS ROM
> +                let rom_addr = start + i * core::mem::size_of::<u32>();
> +                let word = bar0_ref.try_read32(rom_addr)?;
> +
> +                // Convert the u32 to a 4 byte array and push each byte
> +                word.to_ne_bytes().iter().try_for_each(|&b| data.push(b, GFP_KERNEL))?;
>              }

Looks good to me, thanks!

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-24 18:54     ` [PATCH 13/16] " Joel Fernandes
@ 2025-04-24 20:08       ` Danilo Krummrich
  2025-04-25  2:26         ` [13/16] " Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-24 20:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Thu, Apr 24, 2025 at 02:54:42PM -0400, Joel Fernandes wrote:
> 
> 
> On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> [...]
> >> +
> >> +    /// Probe for VBIOS extraction
> >> +    /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
> >> +    pub(crate) fn probe(bar0: &Devres<Bar0>) -> Result<Self> {
> > 
> > Let's not call it probe(), what about VBios::parse(), or simply VBios::new()?
> > 
> 
> Yes, new() is better. I changed it.
> 
> >> +        // VBIOS data vector: As BIOS images are scanned, they are added to this vector
> >> +        // for reference or copying into other data structures. It is the entire
> >> +        // scanned contents of the VBIOS which progressively extends. It is used
> >> +        // so that we do not re-read any contents that are already read as we use
> >> +        // the cumulative length read so far, and re-read any gaps as we extend
> >> +        // the length
> >> +        let mut data = KVec::new();
> >> +
> >> +        // Loop through all the BiosImage and extract relevant ones and relevant data from them
> >> +        let mut cur_offset = 0;
> > 
> > I suggest to create a new type that contains data and offset and implement
> > read_bios_image_at_offset() and friends as methods of this type. I think this
> > would turn out much cleaner.
> I moved it into struct Vbios {} itself instead of introducing a new type. Is
> that Ok?
> 
> I agree it is cleaner. Please see below link for this particular refactor
> (moving data) and let me know if it looks Ok to you: http://bit.ly/4lHfDKZ

I still think a new type would be better, the Option<KVec<u8>> that is only used
for the construction of the actual type instance is a bit weird. It's basically
two types in one, which is also why you need two options -- better separate
them.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-24 19:54         ` Joel Fernandes
@ 2025-04-24 20:17           ` Danilo Krummrich
  2025-04-25  2:32             ` [13/16] " Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-24 20:17 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Thu, Apr 24, 2025 at 03:54:48PM -0400, Joel Fernandes wrote:
> On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:
> > On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote:
> > > Hello, Danilo,
> > > Thanks for all the feedback. Due to the volume of feedback, I will respond
> > > incrementally in multiple emails so we can discuss as we go - hope that's Ok but
> > > let me know if that is annoying.
> > 
> > That's perfectly fine, whatever works best for you. :)
> > 
> > > On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> > > 
> > > >> +impl Vbios {
> > > >> +    /// Read bytes from the ROM at the current end of the data vector
> > > >> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
> > > >> +        let current_len = data.len();
> > > >> +        let start = ROM_OFFSET + current_len;
> > > >> +
> > > >> +        // Ensure length is a multiple of 4 for 32-bit reads
> > > >> +        if len % core::mem::size_of::<u32>() != 0 {
> > > >> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
> > > > 
> > > > Please don't use any of the pr_*() print macros within a driver, use the dev_*()
> > > > ones instead.
> > > 
> > > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev'
> > > from the Bar0 and pass that along, but that should be doable.
> > 
> > You can also pass the pci::Device reference to VBios::probe() directly.
> 
> 
> This turns out to be rather difficult to do in the whole vbios.rs because
> we'd have to them propogate pdev to various class methods which may print
> errors

Note that you can always create an ARef<pci::Device> instance from a
&pci::Device, which you can store temporarily if needed. Though I don't think
it's needed here.

> (some of which don't make sense to pass pdev to, like try_from()).

Yeah, it's indeed difficult with a TryFrom or From impl. I guess you're
referring to things like

	impl TryFrom<&[u8]> for PcirStruct

and I actually think that's a bit of an abuse of the TryFrom trait. A &[u8]
isn't really something that is "natural" to convert to a PcirStruct.

Instead you should just move this code into a normal constructor, i.e.
PcirStruct::new(). Here you can then also pass a device reference to print
errors.

We should really stick to dev_*() print macros from within driver code.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 14:06   ` Danilo Krummrich
  2025-04-23 14:52     ` Joel Fernandes
  2025-04-24 18:54     ` [PATCH 13/16] " Joel Fernandes
@ 2025-04-24 20:22     ` Joel Fernandes
  2025-04-26 23:17     ` [13/16] " Joel Fernandes
  3 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-24 20:22 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel



On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:45PM +0900, Alexandre Courbot wrote:
>> From: Joel Fernandes <joelagnelf@nvidia.com>
>>
>> Add support for navigating and setting up vBIOS ucode data required for
>> GSP to boot. The main data extracted from the vBIOS is the FWSEC-FRTS
>> firmware which runs on the GSP processor. This firmware runs in high
>> secure mode, and sets up the WPR2 (Write protected region) before the
>> Booter runs on the SEC2 processor.
>>
>> Also add log messages to show the BIOS images.
>>
>> [102141.013287] NovaCore: Found BIOS image at offset 0x0, size: 0xfe00, type: PciAt
>> [102141.080692] NovaCore: Found BIOS image at offset 0xfe00, size: 0x14800, type: Efi
>> [102141.098443] NovaCore: Found BIOS image at offset 0x24600, size: 0x5600, type: FwSec
>> [102141.415095] NovaCore: Found BIOS image at offset 0x29c00, size: 0x60800, type: FwSec
>>
>> Tested on my Ampere GA102 and boot is successful.
>>
>> [applied changes by Alex Courbot for fwsec signatures]
>> [applied feedback from Alex Courbot and Timur Tabi]
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/firmware.rs  |    2 -
>>  drivers/gpu/nova-core/gpu.rs       |    5 +
>>  drivers/gpu/nova-core/nova_core.rs |    1 +
>>  drivers/gpu/nova-core/vbios.rs     | 1103 ++++++++++++++++++++++++++++++++++++
>>  4 files changed, 1109 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
>> index 4ef5ba934b9d255635aa9a902e1d3a732d6e5568..58c0513d49e9a0cef36917c8e2b25c414f6fc596 100644
>> --- a/drivers/gpu/nova-core/firmware.rs
>> +++ b/drivers/gpu/nova-core/firmware.rs
>> @@ -44,7 +44,6 @@ pub(crate) fn new(
>>  }
>>  
>>  /// Structure used to describe some firmwares, notable fwsec-frts.
>> -#[allow(dead_code)]
>>  #[repr(C)]
>>  #[derive(Debug, Clone)]
>>  pub(crate) struct FalconUCodeDescV3 {
>> @@ -64,7 +63,6 @@ pub(crate) struct FalconUCodeDescV3 {
>>      _reserved: u16,
>>  }
>>  
>> -#[allow(dead_code)]
>>  impl FalconUCodeDescV3 {
>>      pub(crate) fn size(&self) -> usize {
>>          ((self.hdr & 0xffff0000) >> 16) as usize
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index ec4c648c6e8b4aa7d06c627ed59c0e66a08c679e..2344dfc69fe4246644437d70572680a4450b5bd7 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -11,6 +11,7 @@
>>  use crate::regs;
>>  use crate::timer::Timer;
>>  use crate::util;
>> +use crate::vbios::Vbios;
>>  use core::fmt;
>>  
>>  macro_rules! define_chipset {
>> @@ -157,6 +158,7 @@ pub(crate) struct Gpu {
>>      fw: Firmware,
>>      sysmem_flush: DmaObject,
>>      timer: Timer,
>> +    bios: Vbios,
>>  }
>>  
>>  #[pinned_drop]
>> @@ -237,12 +239,15 @@ pub(crate) fn new(
>>  
>>          let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, true)?;
>>  
>> +        let bios = Vbios::probe(&bar)?;
>> +
>>          Ok(pin_init!(Self {
>>              spec,
>>              bar,
>>              fw,
>>              sysmem_flush,
>>              timer,
>> +            bios,
>>          }))
>>      }
>>  }
>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>> index 4dde8004d24882c60669b5acd6af9d6988c66a9c..2858f4a0dc35eb9d6547d5cbd81de44c8fc47bae 100644
>> --- a/drivers/gpu/nova-core/nova_core.rs
>> +++ b/drivers/gpu/nova-core/nova_core.rs
>> @@ -29,6 +29,7 @@ macro_rules! with_bar {
>>  mod regs;
>>  mod timer;
>>  mod util;
>> +mod vbios;
>>  
>>  kernel::module_pci_driver! {
>>      type: driver::NovaCore,
>> diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..534107b708cab0eb8d0accf7daa5718edf030358
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/vbios.rs
>> @@ -0,0 +1,1103 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +// To be removed when all code is used.
>> +#![allow(dead_code)]
> 
> Please not, use 'expect' and and only where needed. If it would be too much,
> it's probably a good indicator that we want to reduce the size of the patch for
> now.

Done.

[..]

>> +
>> +        // loop till break
> 
> This comment seems unnecessary, better explain what we loop over and why.

Done.

>> +        loop {
>> +            // Try to parse a BIOS image at the current offset
>> +            // This will now check for all valid ROM signatures (0xAA55, 0xBB77, 0x4E56)
>> +            let image_size =
>> +                Self::read_bios_image_at_offset(bar0, &mut data, cur_offset, BIOS_READ_AHEAD_SIZE)
>> +                    .and_then(|image| image.image_size_bytes())
>> +                    .inspect_err(|e| {
>> +                        pr_err!(
>> +                            "Failed to parse initial BIOS image headers at offset {:#x}: {:?}\n",
>> +                            cur_offset,
>> +                            e
>> +                        );
>> +                    })?;
>> +
>> +            // Create a new BiosImage with the full image data
>> +            let full_image =
>> +                Self::read_bios_image_at_offset(bar0, &mut data, cur_offset, image_size)
>> +                    .inspect_err(|e| {
>> +                        pr_err!(
>> +                            "Failed to parse full BIOS image at offset {:#x}: {:?}\n",
>> +                            cur_offset,
>> +                            e
>> +                        );
>> +                    })?;
>> +
>> +            // Determine the image type
>> +            let image_type = full_image.image_type_str();
>> +
>> +            pr_info!(
> 
> I think this should be a debug print.

Done.

Will continue looking into the feedback on the rest of the items and reply. Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-24 20:08       ` Danilo Krummrich
@ 2025-04-25  2:26         ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-25  2:26 UTC (permalink / raw)
  To: Danilo Krummrich, Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

Hello, Danilo,

On April 24, 2025, 8:08 p.m. UTC   Danilo Krummrich wrote:
> On Thu, Apr 24, 2025 at 02:54:42PM -0400, Joel Fernandes wrote:
> > 
> > 
> > On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> > [...]
> > >> +
> > >> +    /// Probe for VBIOS extraction
> > >> +    /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
> > >> +    pub(crate) fn probe(bar0: &Devres<Bar0>) -> Result<Self> {
> > > 
> > > Let's not call it probe(), what about VBios::parse(), or simply VBios::new()?
> > > 
> > 
> > Yes, new() is better. I changed it.
> > 
> > >> +        // VBIOS data vector: As BIOS images are scanned, they are added to this vector
> > >> +        // for reference or copying into other data structures. It is the entire
> > >> +        // scanned contents of the VBIOS which progressively extends. It is used
> > >> +        // so that we do not re-read any contents that are already read as we use
> > >> +        // the cumulative length read so far, and re-read any gaps as we extend
> > >> +        // the length
> > >> +        let mut data = KVec::new();
> > >> +
> > >> +        // Loop through all the BiosImage and extract relevant ones and relevant data from them
> > >> +        let mut cur_offset = 0;
> > > 
> > > I suggest to create a new type that contains data and offset and implement
> > > read_bios_image_at_offset() and friends as methods of this type. I think this
> > > would turn out much cleaner.
> > I moved it into struct Vbios {} itself instead of introducing a new type. Is
> > that Ok?
> > 
> > I agree it is cleaner. Please see below link for this particular refactor
> > (moving data) and let me know if it looks Ok to you: http://bit.ly/4lHfDKZ
> 
> I still think a new type would be better, the Option<KVec<u8>> that is only used
> for the construction of the actual type instance is a bit weird. It's basically
> two types in one, which is also why you need two options -- better separate
> them.

Ok, makes sense. Will make this change and see what it
looks like.

thanks,

- Joel


>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-24 20:17           ` Danilo Krummrich
@ 2025-04-25  2:32             ` Joel Fernandes
  2025-04-25 17:10               ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-25  2:32 UTC (permalink / raw)
  To: Danilo Krummrich, Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

Hello, Danilo,

On April 24, 2025, 8:17 p.m. UTC  Danilo Krummrich wrote:
> On Thu, Apr 24, 2025 at 03:54:48PM -0400, Joel Fernandes wrote:
> > On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:
> > > On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote:
> > > > Hello, Danilo,
> > > > Thanks for all the feedback. Due to the volume of feedback, I will respond
> > > > incrementally in multiple emails so we can discuss as we go - hope that's Ok but
> > > > let me know if that is annoying.
> > > 
> > > That's perfectly fine, whatever works best for you. :)
> > > 
> > > > On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> > > > 
> > > > >> +impl Vbios {
> > > > >> +    /// Read bytes from the ROM at the current end of the data vector
> > > > >> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
> > > > >> +        let current_len = data.len();
> > > > >> +        let start = ROM_OFFSET + current_len;
> > > > >> +
> > > > >> +        // Ensure length is a multiple of 4 for 32-bit reads
> > > > >> +        if len % core::mem::size_of::<u32>() != 0 {
> > > > >> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
> > > > > 
> > > > > Please don't use any of the pr_*() print macros within a driver, use the dev_*()
> > > > > ones instead.
> > > > 
> > > > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev'
> > > > from the Bar0 and pass that along, but that should be doable.
> > > 
> > > You can also pass the pci::Device reference to VBios::probe() directly.
> > 
> > 
> > This turns out to be rather difficult to do in the whole vbios.rs because
> > we'd have to them propogate pdev to various class methods which may print
> > errors
> 
> Note that you can always create an ARef<pci::Device> instance from a
> &pci::Device, which you can store temporarily if needed. Though I don't think
> it's needed here.
> 
> > (some of which don't make sense to pass pdev to, like try_from()).
> 
> Yeah, it's indeed difficult with a TryFrom or From impl. I guess you're
> referring to things like
> 
> 	impl TryFrom<&[u8]> for PcirStruct
> 
> and I actually think that's a bit of an abuse of the TryFrom trait. A &[u8]
> isn't really something that is "natural" to convert to a PcirStruct.
> 
> Instead you should just move this code into a normal constructor, i.e.
> PcirStruct::new(). Here you can then also pass a device reference to print
> errors.

Ok, I had a similar feeling about excessive TryFrom. I will
make this change.

> 
> We should really stick to dev_*() print macros from within driver code.
>   

Ack.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-25  2:32             ` [13/16] " Joel Fernandes
@ 2025-04-25 17:10               ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-25 17:10 UTC (permalink / raw)
  To: Joel Fernandes, Danilo Krummrich, Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On April 25, 2025, 2:32 a.m. UTC  Joel Fernandes wrote:
> Hello, Danilo,
> 
> On April 24, 2025, 8:17 p.m. UTC  Danilo Krummrich wrote:
> > On Thu, Apr 24, 2025 at 03:54:48PM -0400, Joel Fernandes wrote:
> > > On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:
> > > > On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote:
> > > > > Hello, Danilo,
> > > > > Thanks for all the feedback. Due to the volume of feedback, I will respond
> > > > > incrementally in multiple emails so we can discuss as we go - hope that's Ok but
> > > > > let me know if that is annoying.
> > > > 
> > > > That's perfectly fine, whatever works best for you. :)
> > > > 
> > > > > On 4/23/2025 10:06 AM, Danilo Krummrich wrote:
> > > > > 
> > > > > >> +impl Vbios {
> > > > > >> +    /// Read bytes from the ROM at the current end of the data vector
> > > > > >> +    fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result {
> > > > > >> +        let current_len = data.len();
> > > > > >> +        let start = ROM_OFFSET + current_len;
> > > > > >> +
> > > > > >> +        // Ensure length is a multiple of 4 for 32-bit reads
> > > > > >> +        if len % core::mem::size_of::<u32>() != 0 {
> > > > > >> +            pr_err!("VBIOS read length {} is not a multiple of 4\n", len);
> > > > > > 
> > > > > > Please don't use any of the pr_*() print macros within a driver, use the dev_*()
> > > > > > ones instead.
> > > > > 
> > > > > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev'
> > > > > from the Bar0 and pass that along, but that should be doable.
> > > > 
> > > > You can also pass the pci::Device reference to VBios::probe() directly.
> > > 
> > > 
> > > This turns out to be rather difficult to do in the whole vbios.rs because
> > > we'd have to them propogate pdev to various class methods which may print
> > > errors
> > 
> > Note that you can always create an ARef<pci::Device> instance from a
> > &pci::Device, which you can store temporarily if needed. Though I don't think
> > it's needed here.
> > 
> > > (some of which don't make sense to pass pdev to, like try_from()).
> > 
> > Yeah, it's indeed difficult with a TryFrom or From impl. I guess you're
> > referring to things like
> > 
> > 	impl TryFrom<&[u8]> for PcirStruct
> > 
> > and I actually think that's a bit of an abuse of the TryFrom trait. A &[u8]
> > isn't really something that is "natural" to convert to a PcirStruct.
> > 
> > Instead you should just move this code into a normal constructor, i.e.
> > PcirStruct::new(). Here you can then also pass a device reference to print
> > errors.
> 
> Ok, I had a similar feeling about excessive TryFrom. I will
> make this change.
> 
> > 
> > We should really stick to dev_*() print macros from within driver code.
> >   
> 
> Ack.

Here are the changes: https://bit.ly/4lOHk4s

It looks better for sure :)

Now onto working on the loop { } and the read_..() method suggestions. :)

 - Joel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
  2025-04-23 14:06   ` Danilo Krummrich
                       ` (2 preceding siblings ...)
  2025-04-24 20:22     ` [PATCH 13/16] " Joel Fernandes
@ 2025-04-26 23:17     ` Joel Fernandes
  3 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-26 23:17 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hello, Danilo,

On Wed, 23 Apr 2025 16:06:16 +0200, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:45PM +0900, Alexandre Courbot wrote:

[...]

> > +impl NpdeStruct {
> > +    /// Check if this is the last image in the ROM
> > +    fn is_last(&self) -> bool {
> > +        self.last_image & 0x80 != 0
> 
> What's the magic number for?

The NPDE is the NVIDIA PCI Data Extension Structure which is an extension to the
PCI Data Structure.

As per the publicly available PCI Firmware Specification v3.3, in section 5.1 it
says:

"The last image in a ROM has a special encoding in the header to identify it as
the last image."

Then when it describes the PCI data structure, it says for the Last Image
indicator byte:
"Bit 7 in this field tells whether or not this is the last image in the
ROM. A value of 1 indicates “last image;” a value of 0 indicates that
another image follows. Bits 0-6 are reserved."

I will go ahead and a LAST_ROM_IMAGE_MASK and put a comment here where we are
checking the bit, that will clarify it in the code.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/16] gpu: nova-core: define registers layout using helper macro
  2025-04-22 10:29   ` Danilo Krummrich
@ 2025-04-28 14:27     ` Alexandre Courbot
  0 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-28 14:27 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Danilo,

On Tue Apr 22, 2025 at 7:29 PM JST, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:38PM +0900, Alexandre Courbot wrote:
>> Add the register!() macro, which defines a given register's layout and
>> provide bit-field accessors with a way to convert them to a given type.
>> This macro will allow us to make clear definitions of the registers and
>> manipulate their fields safely.
>> 
>> The long-term goal is to eventually move it to the kernel crate so it
>> can be used my other drivers as well, but it was agreed to first land it
>> into nova-core and make it mature there.
>> 
>> To illustrate its usage, use it to define the layout for the Boot0
>> register and use its accessors through the use of the convenience
>> with_bar!() macro, which uses Revocable::try_access() and converts its
>
> s/try_access/try_access_with/

Fixed, thanks.

>
>> returned Option into the proper error as needed.
>
> Using try_access_with() / with_bar! should be a separate patch.

Agreed - done.

>> +register!(Boot0@0x00000000, "Basic revision information about the GPU";
>> +    3:0     minor_rev => as u8, "minor revision of the chip";
>> +    7:4     major_rev => as u8, "major revision of the chip";
>> +    28:20   chipset => try_into Chipset, "chipset model"
>
> Should we preserve the information that this is the combination of two register
> fields?

I've tried to reproduce what the current code did, but indeed according
to OpenRM these are two different fields, `architecture` and
`implementation`.

There's also more: `architecture` is a split field, with its MSB at a
different index from the rest. Right now the MSB is always 0 but the
lower bits are dangerously close to overflowing.

Thankfully, the macro doesn't prevent from extending its definition with
an extra impl block, so I've done just that to provide the correct
architecture as well as the `chipset` pseudo-field that will be the one
we use in the code anyway.

>
>> +);
>> diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..fa9bd6b932048113de997658b112885666e694c9
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/regs/macros.rs
>> @@ -0,0 +1,297 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Types and macros to define register layout and accessors.
>> +//!
>> +//! A single register typically includes several fields, which are accessed through a combination
>> +//! of bit-shift and mask operations that introduce a class of potential mistakes, notably because
>> +//! not all possible field values are necessarily valid.
>> +//!
>> +//! The macros in this module allow to define, using an intruitive and readable syntax, a dedicated
>> +//! type for each register with its own field accessors that can return an error is a field's value
>> +//! is invalid. They also provide a builder type allowing to construct a register value to be
>> +//! written by combining valid values for its fields.
>> +
>> +/// Helper macro for the `register` macro.
>> +///
>> +/// Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
>> +/// and conversion to regular `u32`).
>> +macro_rules! __reg_def_common {
>> +    ($name:ident $(, $type_comment:expr)?) => {
>> +        $(
>> +        #[doc=$type_comment]
>> +        )?
>> +        #[repr(transparent)]
>> +        #[derive(Clone, Copy, Default)]
>> +        pub(crate) struct $name(u32);
>> +
>> +        // TODO: should we display the raw hex value, then the value of all its fields?
>
> To me it seems useful to have both.

Agreed. However this macro has changed A LOT since this first revision,
and I have used TT bundling to make the rules a bit shorter. This makes
the details of the fields inaccessible from the rule that generates the
Debug implementation...

I'll probably try to rework that later, or when we move the macro into
the kernel crate - meanwhile, I hope we can be excused with just the hex
value.

>
>> +        impl ::core::fmt::Debug for $name {
>> +            fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
>> +                f.debug_tuple(stringify!($name))
>> +                    .field(&format_args!("0x{0:x}", &self.0))
>> +                    .finish()
>> +            }
>> +        }
>> +
>> +        impl core::ops::BitOr for $name {
>> +            type Output = Self;
>> +
>> +            fn bitor(self, rhs: Self) -> Self::Output {
>> +                Self(self.0 | rhs.0)
>> +            }
>> +        }
>> +
>> +        impl From<$name> for u32 {
>
> Here and in a few more cases below: This needs the full path; also remember to
> use absolute paths everwhere.

Indeed, thanks for the reminder. I hope I got them all.

>> +            const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
>> +            const SHIFT: u32 = MASK.trailing_zeros();
>> +            let field = (self.0 & MASK) >> SHIFT;
>> +
>> +            $( field as $as_type )?
>> +            $(
>> +            // TODO: it would be nice to throw a compile-time error if $hi != $lo as this means we
>> +            // are considering more than one bit but returning a bool...
>
> Would the following work?
>
> 	const _: () = {
> 	   build_assert!($hi != $lo);
> 	   ()
> 	};

It does! I can even provide a useful error message. Added this check as
well as the one making sure that $hi >= $lo.

>> +macro_rules! register {
>> +    // Create a register at a fixed offset of the MMIO space.
>> +    (
>> +        $name:ident@$offset:expr $(, $type_comment:expr)?
>
> Can we use this as doc-comment?

Oops, I forgot to propagate it somehow. This is fixed.

>
>> +        $(; $hi:tt:$lo:tt $field:ident
>> +            $(=> as $as_type:ty)?
>> +            $(=> as_bit $bit_type:ty)?
>> +            $(=> into $type:ty)?
>> +            $(=> try_into $try_type:ty)?
>> +        $(, $field_comment:expr)?)* $(;)?
>> +    ) => {
>> +        __reg_def_common!($name);
>> +
>> +        #[allow(dead_code)]
>> +        impl $name {
>> +            #[inline]
>> +            pub(crate) fn read<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(bar: &T) -> Self {
>
> Not necessarily a PCI bar, could be any I/O type.

Indeed, renamed to `io`.

>
>> +                Self(bar.read32($offset))
>> +            }
>> +
>> +            #[inline]
>> +            pub(crate) fn write<const SIZE: usize, T: Deref<Target=Io<SIZE>>>(self, bar: &T) {
>> +                bar.write32(self.0, $offset)
>> +            }
>> +
>> +            #[inline]
>> +            pub(crate) fn alter<const SIZE: usize, T: Deref<Target=Io<SIZE>>, F: FnOnce(Self) -> Self>(bar: &T, f: F) {
>> +                let reg = f(Self::read(bar));
>> +                reg.write(bar);
>> +            }
>> +        }
>> +
>> +        __reg_def_getters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
>> +
>> +        __reg_def_setters!($name; $( $hi:$lo $field $(=> as $as_type)? $(=> as_bit $bit_type)? $(=> into $type)? $(=> try_into $try_type)? $(, $field_comment)? );*);
>> +    };
>> +
>> +    // Create a register at a relative offset from a base address.
>> +    (
>> +        $name:ident@+$offset:expr $(, $type_comment:expr)?
>> +        $(; $hi:tt:$lo:tt $field:ident
>> +            $(=> as $as_type:ty)?
>> +            $(=> as_bit $bit_type:ty)?
>> +            $(=> into $type:ty)?
>> +            $(=> try_into $try_type:ty)?
>> +        $(, $field_comment:expr)?)* $(;)?
>> +    ) => {
>
> I assume this is for cases where we have multiple instances of the same
> controller, engine, etc. I think it would be good to add a small example for
> this one too.

I'll add one.

You probably won't recognize this macro in its next revision. I've
finally read the little book of Rust macros and hopefully it is looking
a bit better - the definition of register fields notable should feel
more natural. All in all I think it is definitely better, but that
doesn't necessarily means it will be easier to review. ^_^;

One note, as agreed on Zulip I will rename all the register names to
capital snake case and disable the `camel_case_types` lint on the `regs`
module, so we use the exact same names as OpenRM. I will also make sure
that the names of the fields match (but will keep the accessors in
non-capital snake case).

In parallel I am also prototyping another design based on ZST constants.
If it works it would allow a few more things like register arrays, a
more natural way to perform I/O, and would remove the naming convention
issues since registers would be accessed by constants which should be
named in capital snake-case anyway. My hope is that this version will be
the one we can use in the kernel crate, but I don't think it will be
ready before a couple of cycles at least (if it works at all), so in the
meantime let's keep refining this one.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-22 11:36   ` Danilo Krummrich
@ 2025-04-29 12:48     ` Alexandre Courbot
  2025-04-30 22:45       ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-29 12:48 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue Apr 22, 2025 at 8:36 PM JST, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:40PM +0900, Alexandre Courbot wrote:
>> Upon reset, the GPU executes the GFW_BOOT firmware in order to
>> initialize its base parameters such as clocks. The driver must ensure
>> that this step is completed before using the hardware.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/driver.rs    |  2 +-
>>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>>  drivers/gpu/nova-core/nova_core.rs |  1 +
>>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>>  5 files changed, 58 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/devinit.rs
>> @@ -0,0 +1,40 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Methods for device initialization.
>> +
>> +use kernel::bindings;
>> +use kernel::devres::Devres;
>> +use kernel::prelude::*;
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +
>> +/// Wait for devinit FW completion.
>> +///
>> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>> +/// considered unusable until this step is completed, so it must be waited on very early during
>> +/// driver initialization.
>> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>> +    let mut timeout = 2000;
>> +
>> +    loop {
>> +        let gfw_booted = with_bar!(
>> +            bar,
>> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
>> +                .read_protection_level0_enabled()
>> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
>> +        )?;
>> +
>> +        if gfw_booted {
>> +            return Ok(());
>> +        }
>> +
>> +        if timeout == 0 {
>> +            return Err(ETIMEDOUT);
>> +        }
>> +        timeout -= 1;
>> +
>> +        // SAFETY: msleep should be safe to call with any parameter.
>> +        unsafe { bindings::msleep(2) };
>
> I assume this goes away with [1]? Can we please add a corresponding TODO? Also,
> do you mind preparing the follow-up patches for cases like this (there's also
> the transmute one), such that we can apply them, once the dependencies did land
> and such that we can verify that they suit our needs?
>
> [1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/

Good idea. Added the TODO item with a link to the patch.

>
>> +    }
>> +}
>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
>> --- a/drivers/gpu/nova-core/driver.rs
>> +++ b/drivers/gpu/nova-core/driver.rs
>> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>>      pub(crate) gpu: Gpu,
>>  }
>>  
>> -const BAR0_SIZE: usize = 8;
>> +const BAR0_SIZE: usize = 0x1000000;
>>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>>  
>>  kernel::pci_device_table!(
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -2,6 +2,7 @@
>>  
>>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>>  
>> +use crate::devinit;
>>  use crate::driver::Bar0;
>>  use crate::firmware::Firmware;
>>  use crate::regs;
>> @@ -168,6 +169,10 @@ pub(crate) fn new(
>>              spec.revision
>>          );
>>  
>> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
>> +        devinit::wait_gfw_boot_completion(&bar)
>> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>> +
>>          Ok(pin_init!(Self { spec, bar, fw }))
>>      }
>>  }
>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
>> --- a/drivers/gpu/nova-core/nova_core.rs
>> +++ b/drivers/gpu/nova-core/nova_core.rs
>> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>>      }
>>  }
>>  
>> +mod devinit;
>>  mod driver;
>>  mod firmware;
>>  mod gpu;
>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
>> --- a/drivers/gpu/nova-core/regs.rs
>> +++ b/drivers/gpu/nova-core/regs.rs
>> @@ -13,3 +13,14 @@
>>      7:4     major_rev => as u8, "major revision of the chip";
>>      28:20   chipset => try_into Chipset, "chipset model"
>>  );
>> +
>> +/* GC6 */
>> +
>> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
>> +    0:0     read_protection_level0_enabled => as_bit bool
>> +);
>> +
>> +/* TODO: This is an array of registers. */
>> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
>> +    31:0    value => as u32
>> +);
>
> Please also document new register definitions.

Thankfully Joel's documentation patches take care of this!

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/16] gpu: nova-core: add basic timer device
  2025-04-22 12:07   ` Danilo Krummrich
@ 2025-04-29 13:13     ` Alexandre Courbot
  0 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-29 13:13 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

On Tue Apr 22, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:42PM +0900, Alexandre Courbot wrote:
>> Add a timer that works with GPU time and provides the ability to wait on
>> a condition with a specific timeout.
>
> What can this timer do for us, what and HrTimer can't do for us?

It is local to the GPU, and the source of truth for all GPU-related
operations. Some pushbuffer commands can return timestamps that will
come from this timer and the driver must thus use it as well in
driver-related operations to make sure both are on the same table.

>
>> 
>> The `Duration` Rust type is used to keep track is differences between
>> timestamps ; this will be replaced by the equivalent kernel type once it
>> lands.
>
> Fine for me -- can you please add a corresponding TODO and add it to your list
> of follow-up patches?

Sure.

>
>> diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..8987352f4192bc9b4b2fc0fb5f2e8e62ff27be68
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/timer.rs
>> @@ -0,0 +1,133 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Nova Core Timer subdevice
>> +
>> +// To be removed when all code is used.
>> +#![allow(dead_code)]
>
> Please prefer 'expect'.

Ack.

>
>> +
>> +use core::fmt::Display;
>> +use core::ops::{Add, Sub};
>> +use core::time::Duration;
>> +
>> +use kernel::devres::Devres;
>> +use kernel::num::U64Ext;
>> +use kernel::prelude::*;
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +
>> +/// A timestamp with nanosecond granularity obtained from the GPU timer.
>> +///
>> +/// A timestamp can also be substracted to another in order to obtain a [`Duration`].
>> +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>> +pub(crate) struct Timestamp(u64);
>> +
>> +impl Display for Timestamp {
>> +    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
>> +        write!(f, "{}", self.0)
>> +    }
>> +}
>> +
>> +impl Add<Duration> for Timestamp {
>> +    type Output = Self;
>> +
>> +    fn add(mut self, rhs: Duration) -> Self::Output {
>> +        let mut nanos = rhs.as_nanos();
>> +        while nanos > u64::MAX as u128 {
>> +            self.0 = self.0.wrapping_add(nanos as u64);
>> +            nanos -= u64::MAX as u128;
>> +        }
>> +
>> +        Timestamp(self.0.wrapping_add(nanos as u64))
>> +    }
>> +}
>> +
>> +impl Sub for Timestamp {
>> +    type Output = Duration;
>> +
>> +    fn sub(self, rhs: Self) -> Self::Output {
>> +        Duration::from_nanos(self.0.wrapping_sub(rhs.0))
>> +    }
>> +}
>> +
>> +pub(crate) struct Timer {}
>> +
>> +impl Timer {
>> +    pub(crate) fn new() -> Self {
>> +        Self {}
>> +    }
>> +
>> +    /// Read the current timer timestamp.
>> +    pub(crate) fn read(&self, bar: &Bar0) -> Timestamp {
>> +        loop {
>> +            let hi = regs::PtimerTime1::read(bar);
>> +            let lo = regs::PtimerTime0::read(bar);
>> +
>> +            if hi.hi() == regs::PtimerTime1::read(bar).hi() {
>> +                return Timestamp(u64::from_u32s(hi.hi(), lo.lo()));
>> +            }
>
> So, if hi did not change since we've read both hi and lo, we can trust both
> values. Probably worth to add a brief comment.
>
> Additionally, we may want to add that if we get unlucky, it takes around 4s to
> get unlucky again, even though that's rather obvious.

Added a comment. The odds of being unlucky are infinitesimal and the
consequences (an extra pass of this loop) inconsequential, thankfully.

>
>> +        }
>> +    }
>> +
>> +    #[allow(dead_code)]
>> +    pub(crate) fn time(bar: &Bar0, time: u64) {
>> +        regs::PtimerTime1::default()
>> +            .set_hi(time.upper_32_bits())
>> +            .write(bar);
>> +        regs::PtimerTime0::default()
>> +            .set_lo(time.lower_32_bits())
>> +            .write(bar);
>> +    }
>> +
>> +    /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
>> +    ///
>> +    /// When `cond` evaluates to `Some`, its return value is returned.
>> +    ///
>> +    /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
>> +    /// `Some`, or if the timer device is stuck for some reason.
>> +    pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
>> +        &self,
>> +        bar: &Devres<Bar0>,
>> +        timeout: Duration,
>> +        cond: F,
>> +    ) -> Result<R> {
>> +        // Number of consecutive time reads after which we consider the timer frozen if it hasn't
>> +        // moved forward.
>> +        const MAX_STALLED_READS: usize = 16;
>
> Huh! Can't we trust the timer hardware? Probably one reason more to use HrTimer?

No, to be clear I don't expect this to ever happen in real life, but I
also don't want to leave a loop without an exit condition.

OpenRM and Nouveau are both using it so I believe it can be trusted. :)


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-22 14:44   ` Danilo Krummrich
@ 2025-04-30  6:58     ` Joel Fernandes
  2025-04-30 10:32       ` Danilo Krummrich
  2025-04-30 13:25     ` Alexandre Courbot
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-30  6:58 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel@vger.kernel.org,
	rust-for-linux@vger.kernel.org, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org



> On Apr 22, 2025, at 10:45 AM, Danilo Krummrich <dakr@kernel.org> wrote:
> […]
>> +
>> +    fn get_signature_reg_fuse_version(
>> +        &self,
>> +        bar: &Devres<Bar0>,
>> +        engine_id_mask: u16,
>> +        ucode_id: u8,
>> +    ) -> Result<u32>;
>> +
>> +    // Program the BROM registers prior to starting a secure firmware.
>> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
>> +}
>> +
>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>> +///
>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>> +/// requested HAL.
> 
> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
> relevant to FalconHal impls?
> 
> Can't we do something like I do in the following example [1]?
> 
> ```
> use std::marker::PhantomData;
> use std::ops::Deref;
> 
> trait Engine {
>    const BASE: u32;
> }
> 
> trait Hal<E: Engine> {
>    fn access(&self);
> }
> 
> struct Gsp;
> 
> impl Engine for Gsp {
>    const BASE: u32 = 0x1;
> }
> 
> struct Sec2;
> 
> impl Engine for Sec2 {
>    const BASE: u32 = 0x2;
> }
> 
> struct GA100<E: Engine>(PhantomData<E>);
> 
> impl<E: Engine> Hal<E> for GA100<E> {
>    fn access(&self) {
>        println!("Base: {}", E::BASE);
>    }
> }
> 
> impl<E: Engine> GA100<E> {
>    fn new() -> Self {
>        Self(PhantomData)
>    }
> }
> 
> //struct Falcon<E: Engine>(GA100<E>);
> 
> struct Falcon<H: Hal<E>, E: Engine>(H, PhantomData<E>);
> 
> impl<H: Hal<E>, E: Engine> Falcon<H, E> {
>    fn new(hal: H) -> Self {
>        Self(hal, PhantomData)
>    }
> }
> 
> impl<H: Hal<E>, E: Engine> Deref for Falcon<H, E> {
>    type Target = H;
> 
>    fn deref(&self) -> &Self::Target {
>        &self.0
>    }
> }
> 
> fn main() {
>    let gsp = Falcon::new(GA100::<Gsp>::new());
>    let sec2 = Falcon::new(GA100::<Sec2>::new());
> 
>    gsp.access();
>    sec2.access();
> }
> ```
> 
> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2

I am still researching this idea from a rust point of view, but quick question - will this even work if the chip type (GAxxx) is determined at runtime? That does need runtime polymorphism.

Thanks,

 - Joel


> 
>> +///
>> +/// TODO: replace the return type with `KBox` once it gains the ability to host trait objects.
>> +pub(crate) fn create_falcon_hal<E: FalconEngine + 'static>(
>> +    chipset: Chipset,
>> +) -> Result<Arc<dyn FalconHal<E>>> {
>> +    let hal = match chipset {
>> +        Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 | Chipset::GA107 => {
>> +            Arc::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as Arc<dyn FalconHal<E>>
>> +        }
>> +        _ => return Err(ENOTSUPP),
>> +    };
>> +
>> +    Ok(hal)
>> +}
>> diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..747b02ca671f7d4a97142665a9ba64807c87391e
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
>> @@ -0,0 +1,111 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +use core::marker::PhantomData;
>> +use core::time::Duration;
>> +
>> +use kernel::devres::Devres;
>> +use kernel::prelude::*;
>> +
>> +use crate::driver::Bar0;
>> +use crate::falcon::{FalconBromParams, FalconEngine, FalconModSelAlgo, RiscvCoreSelect};
>> +use crate::regs;
>> +use crate::timer::Timer;
>> +
>> +use super::FalconHal;
>> +
>> +fn select_core_ga102<E: FalconEngine>(bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
>> +    let bcr_ctrl = with_bar!(bar, |b| regs::RiscvBcrCtrl::read(b, E::BASE))?;
>> +    if bcr_ctrl.core_select() != RiscvCoreSelect::Falcon {
>> +        with_bar!(bar, |b| regs::RiscvBcrCtrl::default()
>> +            .set_core_select(RiscvCoreSelect::Falcon)
>> +            .write(b, E::BASE))?;
>> +
>> +        timer.wait_on(bar, Duration::from_millis(10), || {
>> +            bar.try_access_with(|b| regs::RiscvBcrCtrl::read(b, E::BASE))
>> +                .and_then(|v| if v.valid() { Some(()) } else { None })
>> +        })?;
>> +    }
>> +
>> +    Ok(())
>> +}
>> +
>> +fn get_signature_reg_fuse_version_ga102(
>> +    bar: &Devres<Bar0>,
>> +    engine_id_mask: u16,
>> +    ucode_id: u8,
>> +) -> Result<u32> {
>> +    // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
>> +    // registers, which are an array. Our register definition macros do not allow us to manage them
>> +    // properly, so we need to hardcode their addresses for now.
>> +
>> +    // Each engine has 16 ucode version registers numbered from 1 to 16.
>> +    if ucode_id == 0 || ucode_id > 16 {
>> +        pr_warn!("invalid ucode id {:#x}", ucode_id);
>> +        return Err(EINVAL);
>> +    }
>> +    let reg_fuse = if engine_id_mask & 0x0001 != 0 {
>> +        // NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION
>> +        0x824140
>> +    } else if engine_id_mask & 0x0004 != 0 {
>> +        // NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION
>> +        0x824100
>> +    } else if engine_id_mask & 0x0400 != 0 {
>> +        // NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION
>> +        0x8241c0
>> +    } else {
>> +        pr_warn!("unexpected engine_id_mask {:#x}", engine_id_mask);
>> +        return Err(EINVAL);
>> +    } + ((ucode_id - 1) as usize * core::mem::size_of::<u32>());
>> +
>> +    let reg_fuse_version = with_bar!(bar, |b| { b.read32(reg_fuse) })?;
>> +
>> +    // Equivalent of Find Last Set bit.
>> +    Ok(u32::BITS - reg_fuse_version.leading_zeros())
>> +}
>> +
>> +fn program_brom_ga102<E: FalconEngine>(
>> +    bar: &Devres<Bar0>,
>> +    params: &FalconBromParams,
>> +) -> Result<()> {
>> +    with_bar!(bar, |b| {
>> +        regs::FalconBromParaaddr0::default()
>> +            .set_addr(params.pkc_data_offset)
>> +            .write(b, E::BASE);
>> +        regs::FalconBromEngidmask::default()
>> +            .set_mask(params.engine_id_mask as u32)
>> +            .write(b, E::BASE);
>> +        regs::FalconBromCurrUcodeId::default()
>> +            .set_ucode_id(params.ucode_id as u32)
>> +            .write(b, E::BASE);
>> +        regs::FalconModSel::default()
>> +            .set_algo(FalconModSelAlgo::Rsa3k)
>> +            .write(b, E::BASE)
>> +    })
>> +}
>> +
>> +pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
>> +
>> +impl<E: FalconEngine> Ga102<E> {
>> +    pub(super) fn new() -> Self {
>> +        Self(PhantomData)
>> +    }
>> +}
>> +
>> +impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
>> +    fn select_core(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
>> +        select_core_ga102::<E>(bar, timer)
>> +    }
>> +
>> +    fn get_signature_reg_fuse_version(
>> +        &self,
>> +        bar: &Devres<Bar0>,
>> +        engine_id_mask: u16,
>> +        ucode_id: u8,
>> +    ) -> Result<u32> {
>> +        get_signature_reg_fuse_version_ga102(bar, engine_id_mask, ucode_id)
>> +    }
>> +
>> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()> {
>> +        program_brom_ga102::<E>(bar, params)
>> +    }
>> +}
>> diff --git a/drivers/gpu/nova-core/falcon/sec2.rs b/drivers/gpu/nova-core/falcon/sec2.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..85dda3e8380a3d31d34c92c4236c6f81c63ce772
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/falcon/sec2.rs
>> @@ -0,0 +1,9 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +use crate::falcon::{Falcon, FalconEngine};
>> +
>> +pub(crate) struct Sec2;
>> +impl FalconEngine for Sec2 {
>> +    const BASE: usize = 0x00840000;
>> +}
>> +pub(crate) type Sec2Falcon = Falcon<Sec2>;
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 1b3e43e0412e2a2ea178c7404ea647c9e38d4e04..ec4c648c6e8b4aa7d06c627ed59c0e66a08c679e 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -5,6 +5,8 @@
>> use crate::devinit;
>> use crate::dma::DmaObject;
>> use crate::driver::Bar0;
>> +use crate::falcon::gsp::GspFalcon;
>> +use crate::falcon::sec2::Sec2Falcon;
>> use crate::firmware::Firmware;
>> use crate::regs;
>> use crate::timer::Timer;
>> @@ -221,6 +223,20 @@ pub(crate) fn new(
>> 
>>         let timer = Timer::new();
>> 
>> +        let gsp_falcon = GspFalcon::new(
>> +            pdev,
>> +            spec.chipset,
>> +            &bar,
>> +            if spec.chipset > Chipset::GA100 {
>> +                true
>> +            } else {
>> +                false
>> +            },
>> +        )?;
>> +        gsp_falcon.clear_swgen0_intr(&bar)?;
>> +
>> +        let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, true)?;
>> +
>>         Ok(pin_init!(Self {
>>             spec,
>>             bar,
>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>> index df3468c92c6081b3e2db218d92fbe1c40a0a75c3..4dde8004d24882c60669b5acd6af9d6988c66a9c 100644
>> --- a/drivers/gpu/nova-core/nova_core.rs
>> +++ b/drivers/gpu/nova-core/nova_core.rs
>> @@ -23,6 +23,7 @@ macro_rules! with_bar {
>> mod devinit;
>> mod dma;
>> mod driver;
>> +mod falcon;
>> mod firmware;
>> mod gpu;
>> mod regs;
>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>> index f191cf4eb44c2b950e5cfcc6d04f95c122ce29d3..c76a16dc8e7267a4eb54cb71e1cca6fb9e00188f 100644
>> --- a/drivers/gpu/nova-core/regs.rs
>> +++ b/drivers/gpu/nova-core/regs.rs
>> @@ -6,6 +6,10 @@
>> #[macro_use]
>> mod macros;
>> 
>> +use crate::falcon::{
>> +    FalconCoreRev, FalconCoreRevSubversion, FalconFbifMemType, FalconFbifTarget, FalconModSelAlgo,
>> +    FalconSecurityModel, RiscvCoreSelect,
>> +};
>> use crate::gpu::Chipset;
>> 
>> register!(Boot0@0x00000000, "Basic revision information about the GPU";
>> @@ -44,3 +48,188 @@
>> register!(Pgc6AonSecureScratchGroup05@0x00118234;
>>     31:0    value => as u32
>> );
>> +
>> +/* PFALCON */
>> +
>> +register!(FalconIrqsclr@+0x00000004;
>> +    4:4     halt => as_bit bool;
>> +    6:6     swgen0 => as_bit bool;
>> +);
>> +
>> +register!(FalconIrqstat@+0x00000008;
>> +    4:4     halt => as_bit bool;
>> +    6:6     swgen0 => as_bit bool;
>> +);
>> +
>> +register!(FalconIrqmclr@+0x00000014;
>> +    31:0    val => as u32
>> +);
>> +
>> +register!(FalconIrqmask@+0x00000018;
>> +    31:0    val => as u32
>> +);
>> +
>> +register!(FalconRm@+0x00000084;
>> +    31:0    val => as u32
>> +);
>> +
>> +register!(FalconIrqdest@+0x0000001c;
>> +    31:0    val => as u32
>> +);
>> +
>> +register!(FalconMailbox0@+0x00000040;
>> +    31:0    mailbox0 => as u32
>> +);
>> +register!(FalconMailbox1@+0x00000044;
>> +    31:0    mailbox1 => as u32
>> +);
>> +
>> +register!(FalconHwcfg2@+0x000000f4;
>> +    10:10   riscv => as_bit bool;
>> +    12:12   mem_scrubbing => as_bit bool;
>> +    31:31   reset_ready => as_bit bool;
>> +);
>> +
>> +register!(FalconCpuCtl@+0x00000100;
>> +    1:1     start_cpu => as_bit bool;
>> +    4:4     halted => as_bit bool;
>> +    6:6     alias_en => as_bit bool;
>> +);
>> +
>> +register!(FalconBootVec@+0x00000104;
>> +    31:0    boot_vec => as u32
>> +);
>> +
>> +register!(FalconHwCfg@+0x00000108;
>> +    8:0     imem_size => as u32;
>> +    17:9    dmem_size => as u32;
>> +);
>> +
>> +register!(FalconDmaCtl@+0x0000010c;
>> +    0:0     require_ctx => as_bit bool;
>> +    1:1     dmem_scrubbing  => as_bit bool;
>> +    2:2     imem_scrubbing => as_bit bool;
>> +    6:3     dmaq_num => as_bit u8;
>> +    7:7     secure_stat => as_bit bool;
>> +);
>> +
>> +register!(FalconDmaTrfBase@+0x00000110;
>> +    31:0    base => as u32;
>> +);
>> +
>> +register!(FalconDmaTrfMOffs@+0x00000114;
>> +    23:0    offs => as u32;
>> +);
>> +
>> +register!(FalconDmaTrfCmd@+0x00000118;
>> +    0:0     full => as_bit bool;
>> +    1:1     idle => as_bit bool;
>> +    3:2     sec => as_bit u8;
>> +    4:4     imem => as_bit bool;
>> +    5:5     is_write => as_bit bool;
>> +    10:8    size => as u8;
>> +    14:12   ctxdma => as u8;
>> +    16:16   set_dmtag => as u8;
>> +);
>> +
>> +register!(FalconDmaTrfBOffs@+0x0000011c;
>> +    31:0    offs => as u32;
>> +);
>> +
>> +register!(FalconDmaTrfBase1@+0x00000128;
>> +    8:0     base => as u16;
>> +);
>> +
>> +register!(FalconHwcfg1@+0x0000012c;
>> +    3:0     core_rev => try_into FalconCoreRev, "core revision of the falcon";
>> +    5:4     security_model => try_into FalconSecurityModel, "security model of the falcon";
>> +    7:6     core_rev_subversion => into FalconCoreRevSubversion;
>> +    11:8    imem_ports => as u8;
>> +    15:12   dmem_ports => as u8;
>> +);
>> +
>> +register!(FalconCpuCtlAlias@+0x00000130;
>> +    1:1     start_cpu => as_bit bool;
>> +);
>> +
>> +/* TODO: this is an array of registers */
>> +register!(FalconImemC@+0x00000180;
>> +    7:2     offs => as u8;
>> +    23:8    blk => as u8;
>> +    24:24   aincw => as_bit bool;
>> +    25:25   aincr => as_bit bool;
>> +    28:28   secure => as_bit bool;
>> +    29:29   sec_atomic => as_bit bool;
>> +);
>> +
>> +register!(FalconImemD@+0x00000184;
>> +    31:0    data => as u32;
>> +);
>> +
>> +register!(FalconImemT@+0x00000188;
>> +    15:0    data => as u16;
>> +);
>> +
>> +register!(FalconDmemC@+0x000001c0;
>> +    7:2     offs => as u8;
>> +    23:0    addr => as u32;
>> +    23:8    blk => as u8;
>> +    24:24   aincw => as_bit bool;
>> +    25:25   aincr => as_bit bool;
>> +    26:26   settag => as_bit bool;
>> +    27:27   setlvl => as_bit bool;
>> +    28:28   va => as_bit bool;
>> +    29:29   miss => as_bit bool;
>> +);
>> +
>> +register!(FalconDmemD@+0x000001c4;
>> +    31:0    data => as u32;
>> +);
>> +
>> +register!(FalconModSel@+0x00001180;
>> +    7:0     algo => try_into FalconModSelAlgo;
>> +);
>> +register!(FalconBromCurrUcodeId@+0x00001198;
>> +    31:0    ucode_id => as u32;
>> +);
>> +register!(FalconBromEngidmask@+0x0000119c;
>> +    31:0    mask => as u32;
>> +);
>> +register!(FalconBromParaaddr0@+0x00001210;
>> +    31:0    addr => as u32;
>> +);
>> +
>> +register!(RiscvCpuctl@+0x00000388;
>> +    0:0     startcpu => as_bit bool;
>> +    4:4     halted => as_bit bool;
>> +    5:5     stopped => as_bit bool;
>> +    7:7     active_stat => as_bit bool;
>> +);
>> +
>> +register!(FalconEngine@+0x000003c0;
>> +    0:0     reset => as_bit bool;
>> +);
>> +
>> +register!(RiscvIrqmask@+0x00000528;
>> +    31:0    mask => as u32;
>> +);
>> +
>> +register!(RiscvIrqdest@+0x0000052c;
>> +    31:0    dest => as u32;
>> +);
>> +
>> +/* TODO: this is an array of registers */
>> +register!(FalconFbifTranscfg@+0x00000600;
>> +    1:0     target => try_into FalconFbifTarget;
>> +    2:2     mem_type => as_bit FalconFbifMemType;
>> +);
>> +
>> +register!(FalconFbifCtl@+0x00000624;
>> +    7:7     allow_phys_no_ctx => as_bit bool;
>> +);
>> +
>> +register!(RiscvBcrCtrl@+0x00001668;
>> +    0:0     valid => as_bit bool;
>> +    4:4     core_select => as_bit RiscvCoreSelect;
>> +    8:8     br_fetch => as_bit bool;
>> +);
>> diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
>> index 8987352f4192bc9b4b2fc0fb5f2e8e62ff27be68..c03a5c36d1230dfbf2bd6e02a793264280c6d509 100644
>> --- a/drivers/gpu/nova-core/timer.rs
>> +++ b/drivers/gpu/nova-core/timer.rs
>> @@ -2,9 +2,6 @@
>> 
>> //! Nova Core Timer subdevice
>> 
>> -// To be removed when all code is used.
>> -#![allow(dead_code)]
>> -
>> use core::fmt::Display;
>> use core::ops::{Add, Sub};
>> use core::time::Duration;
>> 
>> --
>> 2.49.0
>> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-30  6:58     ` Joel Fernandes
@ 2025-04-30 10:32       ` Danilo Krummrich
  0 siblings, 0 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-30 10:32 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel@vger.kernel.org,
	rust-for-linux@vger.kernel.org, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org

On Wed, Apr 30, 2025 at 06:58:44AM +0000, Joel Fernandes wrote:
> > On Apr 22, 2025, at 10:45 AM, Danilo Krummrich <dakr@kernel.org> wrote:
> 
> > [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
> 
> I am still researching this idea from a rust point of view, but quick question - will this even work if the chip type (GAxxx) is determined at runtime? That does need runtime polymorphism.

I exetended the example in [2] to address this with `enum HalImpl<E: Engine>`
and a second architecture that is picked randomly. It needs match for every
access, but that's probably still better than the dynamic dispatch.

[2] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=99ce0f12542488f78e35356c99a1e23f

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-22 14:44   ` Danilo Krummrich
  2025-04-30  6:58     ` Joel Fernandes
@ 2025-04-30 13:25     ` Alexandre Courbot
  2025-04-30 14:38       ` Joel Fernandes
  1 sibling, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-04-30 13:25 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
	linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Danilo,

On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
> This patch could probably split up a bit, to make it more pleasant to review. :)

Probably yes. I thought since it is mostly new files, splitting up
wouldn't change much. Let me see what I can do.

>
> On Sun, Apr 20, 2025 at 09:19:43PM +0900, Alexandre Courbot wrote:
>> 
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone)]
>> +pub(crate) enum FalconSecurityModel {
>> +    #[default]
>> +    None = 0,
>> +    Light = 2,
>> +    Heavy = 3,
>> +}
>
> Please add an explanation for the different security modules. Where are the
> differences?
>
> I think most of the structures, registers, abbreviations, etc. introduced in
> this patch need some documentation.

I've documented things a bit better for the next revision.

>
> Please see https://docs.kernel.org/gpu/nova/guidelines.html#documentation.
>
>> +
>> +impl TryFrom<u32> for FalconSecurityModel {
>> +    type Error = Error;
>> +
>> +    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
>> +        use FalconSecurityModel::*;
>> +
>> +        let sec_model = match value {
>> +            0 => None,
>> +            2 => Light,
>> +            3 => Heavy,
>> +            _ => return Err(EINVAL),
>> +        };
>> +
>> +        Ok(sec_model)
>> +    }
>> +}
>> +
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>> +pub(crate) enum FalconCoreRevSubversion {
>> +    #[default]
>> +    Subversion0 = 0,
>> +    Subversion1 = 1,
>> +    Subversion2 = 2,
>> +    Subversion3 = 3,
>> +}
>> +
>> +impl From<u32> for FalconCoreRevSubversion {
>> +    fn from(value: u32) -> Self {
>> +        use FalconCoreRevSubversion::*;
>> +
>> +        match value & 0b11 {
>> +            0 => Subversion0,
>> +            1 => Subversion1,
>> +            2 => Subversion2,
>> +            3 => Subversion3,
>> +            // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
>> +            4..=u32::MAX => unsafe { unreachable_unchecked() },
>> +        }
>
> FalconCoreRev uses TryFrom to avoid unsafe code, I think FalconCoreRevSubversion
> should do the same thing.

Since the field from which `FalconCoreRevSubversion` is built is only 2
bits, I thought we could avoid using `TryFrom` since we are effectively
covering all possible values (I wish Rust has n-bit integer types :)).
But yeah I have probably overthought that, and that unsafe block is
unsightly. Converted to `TryFrom`.

>
>> +/// Trait defining the parameters of a given Falcon instance.
>> +pub(crate) trait FalconEngine: Sync {
>> +    /// Base I/O address for the falcon, relative from which its registers are accessed.
>> +    const BASE: usize;
>> +}
>> +
>> +/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
>> +#[derive(Debug)]
>> +pub(crate) struct FalconLoadTarget {
>> +    /// Offset from the start of the source object to copy from.
>> +    pub(crate) src_start: u32,
>> +    /// Offset from the start of the destination memory to copy into.
>> +    pub(crate) dst_start: u32,
>> +    /// Number of bytes to copy.
>> +    pub(crate) len: u32,
>> +}
>> +
>> +#[derive(Debug)]
>> +pub(crate) struct FalconBromParams {
>> +    pub(crate) pkc_data_offset: u32,
>> +    pub(crate) engine_id_mask: u16,
>> +    pub(crate) ucode_id: u8,
>> +}
>> +
>> +pub(crate) trait FalconFirmware {
>> +    type Target: FalconEngine;
>> +
>> +    /// Returns the DMA handle of the object containing the firmware.
>> +    fn dma_handle(&self) -> bindings::dma_addr_t;
>> +
>> +    /// Returns the load parameters for `IMEM`.
>> +    fn imem_load(&self) -> FalconLoadTarget;
>> +
>> +    /// Returns the load parameters for `DMEM`.
>> +    fn dmem_load(&self) -> FalconLoadTarget;
>> +
>> +    /// Returns the parameters to write into the BROM registers.
>> +    fn brom_params(&self) -> FalconBromParams;
>> +
>> +    /// Returns the start address of the firmware.
>> +    fn boot_addr(&self) -> u32;
>> +}
>> +
>> +/// Contains the base parameters common to all Falcon instances.
>> +pub(crate) struct Falcon<E: FalconEngine> {
>> +    pub hal: Arc<dyn FalconHal<E>>,
>
> This should probably be private and instead should be exposed via Deref.

Agreed - actually not all the HAL is supposed to be exposed, so I've
added a proxy method for the only method that needs to be called from
outside this module.

>
> Also, please see my comment at create_falcon_hal() regarding the dynamic
> dispatch.
>
>> +}
>> +
>> +impl<E: FalconEngine + 'static> Falcon<E> {
>> +    pub(crate) fn new(
>> +        pdev: &pci::Device,
>> +        chipset: Chipset,
>> +        bar: &Devres<Bar0>,
>> +        need_riscv: bool,
>> +    ) -> Result<Self> {
>> +        let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, E::BASE))?;
>> +        // Ensure that the revision and security model contain valid values.
>> +        let _rev = hwcfg1.core_rev()?;
>> +        let _sec_model = hwcfg1.security_model()?;
>> +
>> +        if need_riscv {
>> +            let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
>> +            if !hwcfg2.riscv() {
>> +                dev_err!(
>> +                    pdev.as_ref(),
>> +                    "riscv support requested on falcon that does not support it\n"
>> +                );
>> +                return Err(EINVAL);
>> +            }
>> +        }
>> +
>> +        Ok(Self {
>> +            hal: hal::create_falcon_hal(chipset)?,
>
> I'd prefer to move the contents of create_falcon_hal() into this constructor.

I think it is actually beneficial to have this in a dedicated method:
that way the individual HAL constructors do not need to be visible to
the `falcon` module and can be contained in the `hal` sub-module, which
I think helps keeping things at their place. Is there a good reason to
prefer doing it here?

Ah, maybe you are thinking that we are returning a Boxed HAL because we
are going through this function? It's actually on purpose - see below.

>> +pub(crate) struct Gsp;
>> +impl FalconEngine for Gsp {
>> +    const BASE: usize = 0x00110000;
>> +}
>> +
>> +pub(crate) type GspFalcon = Falcon<Gsp>;
>
> Please drop this type alias, Falcon<Gsp> seems simple enough and is much more
> obvious IMHO.

Yeah, I wanted to avoid having to import two symbols into the gpu
module, but I've probably been overthinking it again.

>
>> +
>> +impl Falcon<Gsp> {
>> +    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
>> +    /// allow GSP to signal CPU for processing new messages in message queue.
>> +    pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
>> +        with_bar!(bar, |b| regs::FalconIrqsclr::default()
>> +            .set_swgen0(true)
>> +            .write(b, Gsp::BASE))
>> +    }
>> +}
>> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..5ebf4e88f1f25a13cf47859a53507be53e795d34
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/falcon/hal.rs
>> @@ -0,0 +1,54 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +use kernel::devres::Devres;
>> +use kernel::prelude::*;
>> +use kernel::sync::Arc;
>> +
>> +use crate::driver::Bar0;
>> +use crate::falcon::{FalconBromParams, FalconEngine};
>> +use crate::gpu::Chipset;
>> +use crate::timer::Timer;
>> +
>> +mod ga102;
>> +
>> +/// Hardware Abstraction Layer for Falcon cores.
>> +///
>> +/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
>> +/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
>> +/// registers.
>> +pub(crate) trait FalconHal<E: FalconEngine>: Sync {
>> +    // Activates the Falcon core if the engine is a risvc/falcon dual engine.
>> +    fn select_core(&self, _bar: &Devres<Bar0>, _timer: &Timer) -> Result<()> {
>> +        Ok(())
>> +    }
>> +
>> +    fn get_signature_reg_fuse_version(
>> +        &self,
>> +        bar: &Devres<Bar0>,
>> +        engine_id_mask: u16,
>> +        ucode_id: u8,
>> +    ) -> Result<u32>;
>> +
>> +    // Program the BROM registers prior to starting a secure firmware.
>> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
>> +}
>> +
>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>> +///
>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>> +/// requested HAL.
>
> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
> relevant to FalconHal impls?
>
> Can't we do something like I do in the following example [1]?
>
> ```
> use std::marker::PhantomData;
> use std::ops::Deref;
>
> trait Engine {
>     const BASE: u32;
> }
>
> trait Hal<E: Engine> {
>     fn access(&self);
> }
>
> struct Gsp;
>
> impl Engine for Gsp {
>     const BASE: u32 = 0x1;
> }
>
> struct Sec2;
>
> impl Engine for Sec2 {
>     const BASE: u32 = 0x2;
> }
>
> struct GA100<E: Engine>(PhantomData<E>);
>
> impl<E: Engine> Hal<E> for GA100<E> {
>     fn access(&self) {
>         println!("Base: {}", E::BASE);
>     }
> }
>
> impl<E: Engine> GA100<E> {
>     fn new() -> Self {
>         Self(PhantomData)
>     }
> }
>
> //struct Falcon<E: Engine>(GA100<E>);
>
> struct Falcon<H: Hal<E>, E: Engine>(H, PhantomData<E>);
>
> impl<H: Hal<E>, E: Engine> Falcon<H, E> {
>     fn new(hal: H) -> Self {
>         Self(hal, PhantomData)
>     }
> }
>
> impl<H: Hal<E>, E: Engine> Deref for Falcon<H, E> {
>     type Target = H;
>
>     fn deref(&self) -> &Self::Target {
>         &self.0
>     }
> }
>
> fn main() {
>     let gsp = Falcon::new(GA100::<Gsp>::new());
>     let sec2 = Falcon::new(GA100::<Sec2>::new());
>
>     gsp.access();
>     sec2.access();
> }
> ```
>
> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2

So are you have noticed there are two dimensions from which the falcons
can be instantiated:

- The engine, which determines its register BASE,
- The HAL, which is determined by the chipset.

For the engine, I want to keep things static for the main reason that if
BASE was dynamic, we would have to do all our IO using
try_read()/try_write() and check for an out-of-bounds error at each
register access. The cost of monomorphization is limited as there are
only a handful of engines.

But the HAL introduces a second dimension to this, and if we support N
engines then the amount of monomorphized code would then increase by N
for each new HAL we add. Chipsets are released at a good cadence, so
this is the dimension that risks growing the most.

It is also the one that makes use of methods to abstract things (vs.
fixed parameters), so it is a natural candidate for using virtual
methods. I am not a fan of having ever-growing boilerplate match
statements for each method that needs to be abstracted, especially since
this is that virtual methods do without requiring extra code, and for a
runtime penalty that is completely negligible in our context and IMHO
completely balanced by the smaller binary size that results from their
use.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-30 13:25     ` Alexandre Courbot
@ 2025-04-30 14:38       ` Joel Fernandes
  2025-04-30 18:16         ` Danilo Krummrich
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-04-30 14:38 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel



On 4/30/2025 9:25 AM, Alexandre Courbot wrote:
> Hi Danilo,
> 
> On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
>> This patch could probably split up a bit, to make it more pleasant to review. :)
> 
> Probably yes. I thought since it is mostly new files, splitting up
> wouldn't change much. Let me see what I can do.
> 
>>
>> On Sun, Apr 20, 2025 at 09:19:43PM +0900, Alexandre Courbot wrote:
>>>
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone)]
>>> +pub(crate) enum FalconSecurityModel {
>>> +    #[default]
>>> +    None = 0,
>>> +    Light = 2,
>>> +    Heavy = 3,
>>> +}
>>
>> Please add an explanation for the different security modules. Where are the
>> differences?
>>
>> I think most of the structures, registers, abbreviations, etc. introduced in
>> this patch need some documentation.
> 
> I've documented things a bit better for the next revision.
> 
>>
>> Please see https://docs.kernel.org/gpu/nova/guidelines.html#documentation.
>>
>>> +
>>> +impl TryFrom<u32> for FalconSecurityModel {
>>> +    type Error = Error;
>>> +
>>> +    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
>>> +        use FalconSecurityModel::*;
>>> +
>>> +        let sec_model = match value {
>>> +            0 => None,
>>> +            2 => Light,
>>> +            3 => Heavy,
>>> +            _ => return Err(EINVAL),
>>> +        };
>>> +
>>> +        Ok(sec_model)
>>> +    }
>>> +}
>>> +
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>>> +pub(crate) enum FalconCoreRevSubversion {
>>> +    #[default]
>>> +    Subversion0 = 0,
>>> +    Subversion1 = 1,
>>> +    Subversion2 = 2,
>>> +    Subversion3 = 3,
>>> +}
>>> +
>>> +impl From<u32> for FalconCoreRevSubversion {
>>> +    fn from(value: u32) -> Self {
>>> +        use FalconCoreRevSubversion::*;
>>> +
>>> +        match value & 0b11 {
>>> +            0 => Subversion0,
>>> +            1 => Subversion1,
>>> +            2 => Subversion2,
>>> +            3 => Subversion3,
>>> +            // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
>>> +            4..=u32::MAX => unsafe { unreachable_unchecked() },
>>> +        }
>>
>> FalconCoreRev uses TryFrom to avoid unsafe code, I think FalconCoreRevSubversion
>> should do the same thing.
> 
> Since the field from which `FalconCoreRevSubversion` is built is only 2
> bits, I thought we could avoid using `TryFrom` since we are effectively
> covering all possible values (I wish Rust has n-bit integer types :)).
> But yeah I have probably overthought that, and that unsafe block is
> unsightly. Converted to `TryFrom`.
> 
>>
>>> +/// Trait defining the parameters of a given Falcon instance.
>>> +pub(crate) trait FalconEngine: Sync {
>>> +    /// Base I/O address for the falcon, relative from which its registers are accessed.
>>> +    const BASE: usize;
>>> +}
>>> +
>>> +/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
>>> +#[derive(Debug)]
>>> +pub(crate) struct FalconLoadTarget {
>>> +    /// Offset from the start of the source object to copy from.
>>> +    pub(crate) src_start: u32,
>>> +    /// Offset from the start of the destination memory to copy into.
>>> +    pub(crate) dst_start: u32,
>>> +    /// Number of bytes to copy.
>>> +    pub(crate) len: u32,
>>> +}
>>> +
>>> +#[derive(Debug)]
>>> +pub(crate) struct FalconBromParams {
>>> +    pub(crate) pkc_data_offset: u32,
>>> +    pub(crate) engine_id_mask: u16,
>>> +    pub(crate) ucode_id: u8,
>>> +}
>>> +
>>> +pub(crate) trait FalconFirmware {
>>> +    type Target: FalconEngine;
>>> +
>>> +    /// Returns the DMA handle of the object containing the firmware.
>>> +    fn dma_handle(&self) -> bindings::dma_addr_t;
>>> +
>>> +    /// Returns the load parameters for `IMEM`.
>>> +    fn imem_load(&self) -> FalconLoadTarget;
>>> +
>>> +    /// Returns the load parameters for `DMEM`.
>>> +    fn dmem_load(&self) -> FalconLoadTarget;
>>> +
>>> +    /// Returns the parameters to write into the BROM registers.
>>> +    fn brom_params(&self) -> FalconBromParams;
>>> +
>>> +    /// Returns the start address of the firmware.
>>> +    fn boot_addr(&self) -> u32;
>>> +}
>>> +
>>> +/// Contains the base parameters common to all Falcon instances.
>>> +pub(crate) struct Falcon<E: FalconEngine> {
>>> +    pub hal: Arc<dyn FalconHal<E>>,
>>
>> This should probably be private and instead should be exposed via Deref.
> 
> Agreed - actually not all the HAL is supposed to be exposed, so I've
> added a proxy method for the only method that needs to be called from
> outside this module.
> 
>>
>> Also, please see my comment at create_falcon_hal() regarding the dynamic
>> dispatch.
>>
>>> +}
>>> +
>>> +impl<E: FalconEngine + 'static> Falcon<E> {
>>> +    pub(crate) fn new(
>>> +        pdev: &pci::Device,
>>> +        chipset: Chipset,
>>> +        bar: &Devres<Bar0>,
>>> +        need_riscv: bool,
>>> +    ) -> Result<Self> {
>>> +        let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, E::BASE))?;
>>> +        // Ensure that the revision and security model contain valid values.
>>> +        let _rev = hwcfg1.core_rev()?;
>>> +        let _sec_model = hwcfg1.security_model()?;
>>> +
>>> +        if need_riscv {
>>> +            let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
>>> +            if !hwcfg2.riscv() {
>>> +                dev_err!(
>>> +                    pdev.as_ref(),
>>> +                    "riscv support requested on falcon that does not support it\n"
>>> +                );
>>> +                return Err(EINVAL);
>>> +            }
>>> +        }
>>> +
>>> +        Ok(Self {
>>> +            hal: hal::create_falcon_hal(chipset)?,
>>
>> I'd prefer to move the contents of create_falcon_hal() into this constructor.
> 
> I think it is actually beneficial to have this in a dedicated method:
> that way the individual HAL constructors do not need to be visible to
> the `falcon` module and can be contained in the `hal` sub-module, which
> I think helps keeping things at their place. Is there a good reason to
> prefer doing it here?
> 
> Ah, maybe you are thinking that we are returning a Boxed HAL because we
> are going through this function? It's actually on purpose - see below.
> 
>>> +pub(crate) struct Gsp;
>>> +impl FalconEngine for Gsp {
>>> +    const BASE: usize = 0x00110000;
>>> +}
>>> +
>>> +pub(crate) type GspFalcon = Falcon<Gsp>;
>>
>> Please drop this type alias, Falcon<Gsp> seems simple enough and is much more
>> obvious IMHO.
> 
> Yeah, I wanted to avoid having to import two symbols into the gpu
> module, but I've probably been overthinking it again.
> 
>>
>>> +
>>> +impl Falcon<Gsp> {
>>> +    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
>>> +    /// allow GSP to signal CPU for processing new messages in message queue.
>>> +    pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
>>> +        with_bar!(bar, |b| regs::FalconIrqsclr::default()
>>> +            .set_swgen0(true)
>>> +            .write(b, Gsp::BASE))
>>> +    }
>>> +}
>>> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..5ebf4e88f1f25a13cf47859a53507be53e795d34
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/falcon/hal.rs
>>> @@ -0,0 +1,54 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +use kernel::devres::Devres;
>>> +use kernel::prelude::*;
>>> +use kernel::sync::Arc;
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::falcon::{FalconBromParams, FalconEngine};
>>> +use crate::gpu::Chipset;
>>> +use crate::timer::Timer;
>>> +
>>> +mod ga102;
>>> +
>>> +/// Hardware Abstraction Layer for Falcon cores.
>>> +///
>>> +/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
>>> +/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
>>> +/// registers.
>>> +pub(crate) trait FalconHal<E: FalconEngine>: Sync {
>>> +    // Activates the Falcon core if the engine is a risvc/falcon dual engine.
>>> +    fn select_core(&self, _bar: &Devres<Bar0>, _timer: &Timer) -> Result<()> {
>>> +        Ok(())
>>> +    }
>>> +
>>> +    fn get_signature_reg_fuse_version(
>>> +        &self,
>>> +        bar: &Devres<Bar0>,
>>> +        engine_id_mask: u16,
>>> +        ucode_id: u8,
>>> +    ) -> Result<u32>;
>>> +
>>> +    // Program the BROM registers prior to starting a secure firmware.
>>> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
>>> +}
>>> +
>>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>>> +///
>>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>>> +/// requested HAL.
>>
>> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
>> relevant to FalconHal impls?
>>
>> Can't we do something like I do in the following example [1]?
>>
>> ```
>> use std::marker::PhantomData;
>> use std::ops::Deref;
>>
>> trait Engine {
>>     const BASE: u32;
>> }
>>
>> trait Hal<E: Engine> {
>>     fn access(&self);
>> }
>>
>> struct Gsp;
>>
>> impl Engine for Gsp {
>>     const BASE: u32 = 0x1;
>> }
>>
>> struct Sec2;
>>
>> impl Engine for Sec2 {
>>     const BASE: u32 = 0x2;
>> }
>>
>> struct GA100<E: Engine>(PhantomData<E>);
>>
>> impl<E: Engine> Hal<E> for GA100<E> {
>>     fn access(&self) {
>>         println!("Base: {}", E::BASE);
>>     }
>> }
>>
>> impl<E: Engine> GA100<E> {
>>     fn new() -> Self {
>>         Self(PhantomData)
>>     }
>> }
>>
>> //struct Falcon<E: Engine>(GA100<E>);
>>
>> struct Falcon<H: Hal<E>, E: Engine>(H, PhantomData<E>);
>>
>> impl<H: Hal<E>, E: Engine> Falcon<H, E> {
>>     fn new(hal: H) -> Self {
>>         Self(hal, PhantomData)
>>     }
>> }
>>
>> impl<H: Hal<E>, E: Engine> Deref for Falcon<H, E> {
>>     type Target = H;
>>
>>     fn deref(&self) -> &Self::Target {
>>         &self.0
>>     }
>> }
>>
>> fn main() {
>>     let gsp = Falcon::new(GA100::<Gsp>::new());
>>     let sec2 = Falcon::new(GA100::<Sec2>::new());
>>
>>     gsp.access();
>>     sec2.access();
>> }
>> ```
>>
>> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
> 
> So are you have noticed there are two dimensions from which the falcons
> can be instantiated:
> 
> - The engine, which determines its register BASE,
> - The HAL, which is determined by the chipset.
> 
> For the engine, I want to keep things static for the main reason that if
> BASE was dynamic, we would have to do all our IO using
> try_read()/try_write() and check for an out-of-bounds error at each
> register access. The cost of monomorphization is limited as there are
> only a handful of engines.
> 
> But the HAL introduces a second dimension to this, and if we support N
> engines then the amount of monomorphized code would then increase by N
> for each new HAL we add. Chipsets are released at a good cadence, so
> this is the dimension that risks growing the most.
> 
> It is also the one that makes use of methods to abstract things (vs.
> fixed parameters), so it is a natural candidate for using virtual
> methods. I am not a fan of having ever-growing boilerplate match
> statements for each method that needs to be abstracted, especially since
> this is that virtual methods do without requiring extra code, and for a
> runtime penalty that is completely negligible in our context and IMHO
> completely balanced by the smaller binary size that results from their
> use.
Adding to what Alex said, note that the runtime cost is still there even without
using dyn. Because at runtime, the match conditionals need to route function
calls to the right place. I am just not seeing the benefits of not using dyn for
this use case and only drawbacks. IMHO, we should try to not be doing the
compiler's job.

Maybe the only benefit is you don't need an Arc or Kbox wrapper?

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-30 14:38       ` Joel Fernandes
@ 2025-04-30 18:16         ` Danilo Krummrich
  2025-04-30 23:08           ` Joel Fernandes
  2025-05-01  0:09           ` Alexandre Courbot
  0 siblings, 2 replies; 60+ messages in thread
From: Danilo Krummrich @ 2025-04-30 18:16 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On Wed, Apr 30, 2025 at 10:38:11AM -0400, Joel Fernandes wrote:
> On 4/30/2025 9:25 AM, Alexandre Courbot wrote:
> > On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
> 
> >>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
> >>> +///
> >>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
> >>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
> >>> +/// requested HAL.
> >>
> >> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
> >> relevant to FalconHal impls?
> >>
> >> Can't we do something like I do in the following example [1]?
> >>
> >> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
> > 
> > So are you have noticed there are two dimensions from which the falcons
> > can be instantiated:
> > 
> > - The engine, which determines its register BASE,
> > - The HAL, which is determined by the chipset.
> > 
> > For the engine, I want to keep things static for the main reason that if
> > BASE was dynamic, we would have to do all our IO using
> > try_read()/try_write() and check for an out-of-bounds error at each
> > register access. The cost of monomorphization is limited as there are
> > only a handful of engines.
> > 
> > But the HAL introduces a second dimension to this, and if we support N
> > engines then the amount of monomorphized code would then increase by N
> > for each new HAL we add. Chipsets are released at a good cadence, so
> > this is the dimension that risks growing the most.

I agree, avoiding the dynamic dispatch is probably not worth in this case
considering the long term. However, I wanted to point out an alternative with
[2].

> > It is also the one that makes use of methods to abstract things (vs.
> > fixed parameters), so it is a natural candidate for using virtual
> > methods. I am not a fan of having ever-growing boilerplate match
> > statements for each method that needs to be abstracted, especially since
> > this is that virtual methods do without requiring extra code, and for a
> > runtime penalty that is completely negligible in our context and IMHO
> > completely balanced by the smaller binary size that results from their
> > use.
>
> Adding to what Alex said, note that the runtime cost is still there even without
> using dyn. Because at runtime, the match conditionals need to route function
> calls to the right place.

Honestly, I don't know how dynamic dispatch scales compared to static dispatch
with conditionals.

OOC, I briefly looked for a benchmark and found [3], which doesn't look
unreasonable at a first glance.

I modified it real quick to have more than 2 actions. [4]

2 Actions
---------
Dynamic Dispatch: time:   [2.0679 ns 2.0825 ns 2.0945 ns]
 Static Dispatch: time:   [850.29 ps 851.05 ps 852.36 ps]

20 Actions
----------
Dynamic Dispatch: time:   [21.368 ns 21.827 ns 22.284 ns]
 Static Dispatch: time:   [1.3623 ns 1.3703 ns 1.3793 ns]

100 Actions
-----------
Dynamic Dispatch: time:   [103.72 ns 104.33 ns 105.13 ns]
 Static Dispatch: time:   [4.5905 ns 4.6311 ns 4.6775 ns]

Absolutely take it with a grain of salt, I neither spend a lot of brain power
nor time on this, which usually is not a great combination with benchmarking
things. :)

However, I think it's probably not too important here. Hence, feel free to go
with dynamic dispatch for this.

> I am just not seeing the benefits of not using dyn for
> this use case and only drawbacks. IMHO, we should try to not be doing the
> compiler's job.
> 
> Maybe the only benefit is you don't need an Arc or Kbox wrapper?

That's not a huge concern for me, it's only one single allocation per Engine,
correct?

[2] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=99ce0f12542488f78e35356c99a1e23f
[3] https://github.com/tailcallhq/rust-benchmarks
[4] https://pastebin.com/k0PqtQnq

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
  2025-04-29 12:48     ` Alexandre Courbot
@ 2025-04-30 22:45       ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-30 22:45 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel



On 4/29/2025 8:48 AM, Alexandre Courbot wrote:
> On Tue Apr 22, 2025 at 8:36 PM JST, Danilo Krummrich wrote:
>> On Sun, Apr 20, 2025 at 09:19:40PM +0900, Alexandre Courbot wrote:
>>> Upon reset, the GPU executes the GFW_BOOT firmware in order to
>>> initialize its base parameters such as clocks. The driver must ensure
>>> that this step is completed before using the hardware.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>>>  drivers/gpu/nova-core/driver.rs    |  2 +-
>>>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>>>  drivers/gpu/nova-core/nova_core.rs |  1 +
>>>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>>>  5 files changed, 58 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/devinit.rs
>>> @@ -0,0 +1,40 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Methods for device initialization.
>>> +
>>> +use kernel::bindings;
>>> +use kernel::devres::Devres;
>>> +use kernel::prelude::*;
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::regs;
>>> +
>>> +/// Wait for devinit FW completion.
>>> +///
>>> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>>> +/// considered unusable until this step is completed, so it must be waited on very early during
>>> +/// driver initialization.
>>> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>>> +    let mut timeout = 2000;
>>> +
>>> +    loop {
>>> +        let gfw_booted = with_bar!(
>>> +            bar,
>>> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
>>> +                .read_protection_level0_enabled()
>>> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
>>> +        )?;
>>> +
>>> +        if gfw_booted {
>>> +            return Ok(());
>>> +        }
>>> +
>>> +        if timeout == 0 {
>>> +            return Err(ETIMEDOUT);
>>> +        }
>>> +        timeout -= 1;
>>> +
>>> +        // SAFETY: msleep should be safe to call with any parameter.
>>> +        unsafe { bindings::msleep(2) };
>>
>> I assume this goes away with [1]? Can we please add a corresponding TODO? Also,
>> do you mind preparing the follow-up patches for cases like this (there's also
>> the transmute one), such that we can apply them, once the dependencies did land
>> and such that we can verify that they suit our needs?
>>
>> [1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/
> 
> Good idea. Added the TODO item with a link to the patch.
> 
>>
>>> +    }
>>> +}
>>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>>> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
>>> --- a/drivers/gpu/nova-core/driver.rs
>>> +++ b/drivers/gpu/nova-core/driver.rs
>>> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>>>      pub(crate) gpu: Gpu,
>>>  }
>>>  
>>> -const BAR0_SIZE: usize = 8;
>>> +const BAR0_SIZE: usize = 0x1000000;
>>>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>>>  
>>>  kernel::pci_device_table!(
>>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>>> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
>>> --- a/drivers/gpu/nova-core/gpu.rs
>>> +++ b/drivers/gpu/nova-core/gpu.rs
>>> @@ -2,6 +2,7 @@
>>>  
>>>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>>>  
>>> +use crate::devinit;
>>>  use crate::driver::Bar0;
>>>  use crate::firmware::Firmware;
>>>  use crate::regs;
>>> @@ -168,6 +169,10 @@ pub(crate) fn new(
>>>              spec.revision
>>>          );
>>>  
>>> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
>>> +        devinit::wait_gfw_boot_completion(&bar)
>>> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>>> +
>>>          Ok(pin_init!(Self { spec, bar, fw }))
>>>      }
>>>  }
>>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>>> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
>>> --- a/drivers/gpu/nova-core/nova_core.rs
>>> +++ b/drivers/gpu/nova-core/nova_core.rs
>>> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>>>      }
>>>  }
>>>  
>>> +mod devinit;
>>>  mod driver;
>>>  mod firmware;
>>>  mod gpu;
>>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>>> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
>>> --- a/drivers/gpu/nova-core/regs.rs
>>> +++ b/drivers/gpu/nova-core/regs.rs
>>> @@ -13,3 +13,14 @@
>>>      7:4     major_rev => as u8, "major revision of the chip";
>>>      28:20   chipset => try_into Chipset, "chipset model"
>>>  );
>>> +
>>> +/* GC6 */
>>> +
>>> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
>>> +    0:0     read_protection_level0_enabled => as_bit bool
>>> +);
>>> +
>>> +/* TODO: This is an array of registers. */
>>> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
>>> +    31:0    value => as u32
>>> +);
>>
>> Please also document new register definitions.
> 
> Thankfully Joel's documentation patches take care of this!
> 
Yes, my doc tree (now 8 patches) includes documenting these! Considering the
register renaming stuff it may conflict, but I'll fix that up. :)

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-30 18:16         ` Danilo Krummrich
@ 2025-04-30 23:08           ` Joel Fernandes
  2025-05-01  0:09           ` Alexandre Courbot
  1 sibling, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2025-04-30 23:08 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, John Hubbard, Ben Skeggs, Timur Tabi,
	Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel

On 4/30/2025 2:16 PM, Danilo Krummrich wrote:
[...]
>>> It is also the one that makes use of methods to abstract things (vs.
>>> fixed parameters), so it is a natural candidate for using virtual
>>> methods. I am not a fan of having ever-growing boilerplate match
>>> statements for each method that needs to be abstracted, especially since
>>> this is that virtual methods do without requiring extra code, and for a
>>> runtime penalty that is completely negligible in our context and IMHO
>>> completely balanced by the smaller binary size that results from their
>>> use.
>>
>> Adding to what Alex said, note that the runtime cost is still there even without
>> using dyn. Because at runtime, the match conditionals need to route function
>> calls to the right place.
> 
> Honestly, I don't know how dynamic dispatch scales compared to static dispatch
> with conditionals.
> 
> OOC, I briefly looked for a benchmark and found [3], which doesn't look
> unreasonable at a first glance.
> 
> I modified it real quick to have more than 2 actions. [4]
> 
> 2 Actions
> ---------
> Dynamic Dispatch: time:   [2.0679 ns 2.0825 ns 2.0945 ns]
>  Static Dispatch: time:   [850.29 ps 851.05 ps 852.36 ps]
> 
> 20 Actions
> ----------
> Dynamic Dispatch: time:   [21.368 ns 21.827 ns 22.284 ns]
>  Static Dispatch: time:   [1.3623 ns 1.3703 ns 1.3793 ns]
> 
> 100 Actions
> -----------
> Dynamic Dispatch: time:   [103.72 ns 104.33 ns 105.13 ns]
>  Static Dispatch: time:   [4.5905 ns 4.6311 ns 4.6775 ns]
> 
> Absolutely take it with a grain of salt, I neither spend a lot of brain power
> nor time on this, which usually is not a great combination with benchmarking
> things. :)

Interesting, thanks for running the benchmark. I think this could be because of
function inlining during the static dispatch, so maybe at runtime there is no
overhead after all, even with long match statements. Just speculating, I have
not looked at codegen for this or anything.

But as you noted, the overhead still is not that much an issue (unless say the
method in concern is in an extremely hot path).

> 
> However, I think it's probably not too important here. Hence, feel free to go
> with dynamic dispatch for this.

Ok thanks, sounds good to me. It does seem the code is a lot more readable IMHO
as well, with dyn.

>> I am just not seeing the benefits of not using dyn for
>> this use case and only drawbacks. IMHO, we should try to not be doing the
>> compiler's job.
>>
>> Maybe the only benefit is you don't need an Arc or Kbox wrapper?
> 
> That's not a huge concern for me, it's only one single allocation per Engine,
> correct?

Yes, that's right. I was more referring to the fact that static dispatch as in
your example does not need Arc/Box, however even with Arc/Box IMHO the
readability of the code using dyn is more due to the lack of long match
statements on the access methods.

thanks,

- Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-04-30 18:16         ` Danilo Krummrich
  2025-04-30 23:08           ` Joel Fernandes
@ 2025-05-01  0:09           ` Alexandre Courbot
  2025-05-01  0:22             ` Joel Fernandes
  1 sibling, 1 reply; 60+ messages in thread
From: Alexandre Courbot @ 2025-05-01  0:09 UTC (permalink / raw)
  To: Danilo Krummrich, Joel Fernandes
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel

On Thu May 1, 2025 at 3:16 AM JST, Danilo Krummrich wrote:
> On Wed, Apr 30, 2025 at 10:38:11AM -0400, Joel Fernandes wrote:
>> On 4/30/2025 9:25 AM, Alexandre Courbot wrote:
>> > On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
>> 
>> >>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>> >>> +///
>> >>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>> >>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>> >>> +/// requested HAL.
>> >>
>> >> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
>> >> relevant to FalconHal impls?
>> >>
>> >> Can't we do something like I do in the following example [1]?
>> >>
>> >> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
>> > 
>> > So are you have noticed there are two dimensions from which the falcons
>> > can be instantiated:
>> > 
>> > - The engine, which determines its register BASE,
>> > - The HAL, which is determined by the chipset.
>> > 
>> > For the engine, I want to keep things static for the main reason that if
>> > BASE was dynamic, we would have to do all our IO using
>> > try_read()/try_write() and check for an out-of-bounds error at each
>> > register access. The cost of monomorphization is limited as there are
>> > only a handful of engines.
>> > 
>> > But the HAL introduces a second dimension to this, and if we support N
>> > engines then the amount of monomorphized code would then increase by N
>> > for each new HAL we add. Chipsets are released at a good cadence, so
>> > this is the dimension that risks growing the most.
>
> I agree, avoiding the dynamic dispatch is probably not worth in this case
> considering the long term. However, I wanted to point out an alternative with
> [2].
>
>> > It is also the one that makes use of methods to abstract things (vs.
>> > fixed parameters), so it is a natural candidate for using virtual
>> > methods. I am not a fan of having ever-growing boilerplate match
>> > statements for each method that needs to be abstracted, especially since
>> > this is that virtual methods do without requiring extra code, and for a
>> > runtime penalty that is completely negligible in our context and IMHO
>> > completely balanced by the smaller binary size that results from their
>> > use.
>>
>> Adding to what Alex said, note that the runtime cost is still there even without
>> using dyn. Because at runtime, the match conditionals need to route function
>> calls to the right place.
>
> Honestly, I don't know how dynamic dispatch scales compared to static dispatch
> with conditionals.
>
> OOC, I briefly looked for a benchmark and found [3], which doesn't look
> unreasonable at a first glance.
>
> I modified it real quick to have more than 2 actions. [4]
>
> 2 Actions
> ---------
> Dynamic Dispatch: time:   [2.0679 ns 2.0825 ns 2.0945 ns]
>  Static Dispatch: time:   [850.29 ps 851.05 ps 852.36 ps]
>
> 20 Actions
> ----------
> Dynamic Dispatch: time:   [21.368 ns 21.827 ns 22.284 ns]
>  Static Dispatch: time:   [1.3623 ns 1.3703 ns 1.3793 ns]
>
> 100 Actions
> -----------
> Dynamic Dispatch: time:   [103.72 ns 104.33 ns 105.13 ns]
>  Static Dispatch: time:   [4.5905 ns 4.6311 ns 4.6775 ns]
>
> Absolutely take it with a grain of salt, I neither spend a lot of brain power
> nor time on this, which usually is not a great combination with benchmarking
> things. :)
>
> However, I think it's probably not too important here. Hence, feel free to go
> with dynamic dispatch for this.

Indeed, it looks like the cost of dispatch will be completely shadowed
by the IO behind it anyway. And these HAL calls are like a few here and
there anyway, it's not like they are on a critical path.

>
>> I am just not seeing the benefits of not using dyn for
>> this use case and only drawbacks. IMHO, we should try to not be doing the
>> compiler's job.
>> 
>> Maybe the only benefit is you don't need an Arc or Kbox wrapper?
>
> That's not a huge concern for me, it's only one single allocation per Engine,
> correct?

Correct. Note that for other engines we will be able to store the HALs as
static singletons instead of building them on the heap like I am
currently doing. The reason for doing this on falcon is that the
dual-dimension of the instances makes it more complex to build and look
them up.

... or maybe I could just use a macro? Let me try that and see whether
it works.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-05-01  0:09           ` Alexandre Courbot
@ 2025-05-01  0:22             ` Joel Fernandes
  2025-05-01 14:07               ` Alexandre Courbot
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2025-05-01  0:22 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel

Hi Alex,

On 4/30/2025 8:09 PM, Alexandre Courbot wrote:
>>> I am just not seeing the benefits of not using dyn for
>>> this use case and only drawbacks. IMHO, we should try to not be doing the
>>> compiler's job.
>>>
>>> Maybe the only benefit is you don't need an Arc or Kbox wrapper?
>> That's not a huge concern for me, it's only one single allocation per Engine,
>> correct?
> Correct. Note that for other engines we will be able to store the HALs as
> static singletons instead of building them on the heap like I am
> currently doing. The reason for doing this on falcon is that the
> dual-dimension of the instances makes it more complex to build and look
> them up.
> 
> ... or maybe I could just use a macro? Let me try that and see whether
> it works.

Do you mean a macro for create_falcon_hal which adds an entry to this?

 let hal = match chipset {
    Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106
|Chipset::GA107 => { .. }


Actually it would be nice if a single macro defined both a chipset and created
the hal together in the above list, that way the definition of a "chipset" is in
a singe place. Kind of like what I did in the vbios patch for various BiosImage.
But not sure how easy it is to do for Falcon.

Or perhaps you meant a macro that statically allocates the Engine + HAL
combination, and avoids need for Arc/KBox and their corresponding allocations?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
  2025-05-01  0:22             ` Joel Fernandes
@ 2025-05-01 14:07               ` Alexandre Courbot
  0 siblings, 0 replies; 60+ messages in thread
From: Alexandre Courbot @ 2025-05-01 14:07 UTC (permalink / raw)
  To: Joel Fernandes, Danilo Krummrich
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, John Hubbard,
	Ben Skeggs, Timur Tabi, Alistair Popple, linux-kernel,
	rust-for-linux, nouveau, dri-devel

On Thu May 1, 2025 at 9:22 AM JST, Joel Fernandes wrote:
> Hi Alex,
>
> On 4/30/2025 8:09 PM, Alexandre Courbot wrote:
>>>> I am just not seeing the benefits of not using dyn for
>>>> this use case and only drawbacks. IMHO, we should try to not be doing the
>>>> compiler's job.
>>>>
>>>> Maybe the only benefit is you don't need an Arc or Kbox wrapper?
>>> That's not a huge concern for me, it's only one single allocation per Engine,
>>> correct?
>> Correct. Note that for other engines we will be able to store the HALs as
>> static singletons instead of building them on the heap like I am
>> currently doing. The reason for doing this on falcon is that the
>> dual-dimension of the instances makes it more complex to build and look
>> them up.
>> 
>> ... or maybe I could just use a macro? Let me try that and see whether
>> it works.
>
> Do you mean a macro for create_falcon_hal which adds an entry to this?
>
>  let hal = match chipset {
>     Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106
> |Chipset::GA107 => { .. }
>
>
> Actually it would be nice if a single macro defined both a chipset and created
> the hal together in the above list, that way the definition of a "chipset" is in
> a singe place. Kind of like what I did in the vbios patch for various BiosImage.
> But not sure how easy it is to do for Falcon.
>
> Or perhaps you meant a macro that statically allocates the Engine + HAL
> combination, and avoids need for Arc/KBox and their corresponding allocations?

I was thinking of a macro to create all the Chipset * Engine static
instances of HALs, and generate the body of a lookup function to return
the right one for a given Chipset at runtime.

But trying to write it, I realized it wasn't as easy as I thought since
generics cannot be used as macro parameters - i.e. if you have 
<E: Engine> as a generic and pass `E` to the macro, it will see... `E`
and not whatever was bound to `E` when the code is monomorphized (macros
are expanded before generics it seems).

A solution to that would involve new traits and a bunch of boilerplate,
which I have decided is not worth the trouble to save one 8-bytes object
on the heap per falcon instance. :) I'll keep things as they currently
are for now.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2025-05-01 14:08 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-20 12:19 [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
2025-04-20 12:19 ` [PATCH 01/16] rust: add useful ops for u64 Alexandre Courbot
2025-04-20 12:19 ` [PATCH 02/16] rust: make ETIMEDOUT error available Alexandre Courbot
2025-04-20 12:19 ` [PATCH 03/16] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
2025-04-22 16:23   ` Joel Fernandes
2025-04-24  7:50     ` Alexandre Courbot
2025-04-20 12:19 ` [PATCH 04/16] gpu: nova-core: add missing GA100 definition Alexandre Courbot
2025-04-20 12:19 ` [PATCH 05/16] gpu: nova-core: take bound device in Gpu::new Alexandre Courbot
2025-04-20 12:19 ` [PATCH 06/16] gpu: nova-core: define registers layout using helper macro Alexandre Courbot
2025-04-22 10:29   ` Danilo Krummrich
2025-04-28 14:27     ` Alexandre Courbot
2025-04-20 12:19 ` [PATCH 07/16] gpu: nova-core: move Firmware to firmware module Alexandre Courbot
2025-04-20 12:19 ` [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
2025-04-21 21:45   ` Joel Fernandes
2025-04-22 11:28     ` Danilo Krummrich
2025-04-22 13:06       ` Alexandre Courbot
2025-04-22 13:46         ` Joel Fernandes
2025-04-22 11:36   ` Danilo Krummrich
2025-04-29 12:48     ` Alexandre Courbot
2025-04-30 22:45       ` Joel Fernandes
2025-04-20 12:19 ` [PATCH 09/16] gpu: nova-core: register sysmem flush page Alexandre Courbot
2025-04-22 11:45   ` Danilo Krummrich
2025-04-23 13:03     ` Alexandre Courbot
2025-04-22 18:50   ` Joel Fernandes
2025-04-20 12:19 ` [PATCH 10/16] gpu: nova-core: add basic timer device Alexandre Courbot
2025-04-22 12:07   ` Danilo Krummrich
2025-04-29 13:13     ` Alexandre Courbot
2025-04-20 12:19 ` [PATCH 11/16] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
2025-04-22 14:44   ` Danilo Krummrich
2025-04-30  6:58     ` Joel Fernandes
2025-04-30 10:32       ` Danilo Krummrich
2025-04-30 13:25     ` Alexandre Courbot
2025-04-30 14:38       ` Joel Fernandes
2025-04-30 18:16         ` Danilo Krummrich
2025-04-30 23:08           ` Joel Fernandes
2025-05-01  0:09           ` Alexandre Courbot
2025-05-01  0:22             ` Joel Fernandes
2025-05-01 14:07               ` Alexandre Courbot
2025-04-20 12:19 ` [PATCH 12/16] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
2025-04-22 14:46   ` Danilo Krummrich
2025-04-20 12:19 ` [PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot Alexandre Courbot
2025-04-23 14:06   ` Danilo Krummrich
2025-04-23 14:52     ` Joel Fernandes
2025-04-23 15:02       ` Danilo Krummrich
2025-04-24 19:19         ` Joel Fernandes
2025-04-24 20:01           ` Danilo Krummrich
2025-04-24 19:54         ` Joel Fernandes
2025-04-24 20:17           ` Danilo Krummrich
2025-04-25  2:32             ` [13/16] " Joel Fernandes
2025-04-25 17:10               ` Joel Fernandes
2025-04-24 18:54     ` [PATCH 13/16] " Joel Fernandes
2025-04-24 20:08       ` Danilo Krummrich
2025-04-25  2:26         ` [13/16] " Joel Fernandes
2025-04-24 20:22     ` [PATCH 13/16] " Joel Fernandes
2025-04-26 23:17     ` [13/16] " Joel Fernandes
2025-04-20 12:19 ` [PATCH 14/16] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
2025-04-20 12:19 ` [PATCH 15/16] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
2025-04-20 12:19 ` [PATCH 16/16] gpu: nova-core: load and " Alexandre Courbot
2025-04-22  8:40 ` [PATCH 00/16] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
2025-04-22 14:12   ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).