* [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
@ 2025-06-12 14:01 Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 01/23] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
` (24 more replies)
0 siblings, 25 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Benno Lossin, Lyude Paul, Shirish Baskaran
Hi everyone,
The feedback on v4 has been (hopefully) addressed. I guess the main
remaining unknown is the direction of the `num` module ; for this
iteration, following the received feedback I have eschewed the extension
trait and implemented the alignment functions as methods of the new
`PowerOfTwo` type. This has the benefit of making it impossible to call
them with undesirable (i.e. non-power of two) values. The `fls` function
is now provided as a series of const functions for each supported type,
generated by a macro.
It feels like the `num` module could be its own series though, so if
there is still discussion about it, I can also extract it and implement
the functionality we need in nova-core as local helper functions until
it gets merged at its own pace.
As previously, this series only successfully probes Ampere GPUs, but
support for other generations is on the way.
Upon successful probe, the driver will display the range of the WPR2
region constructed by FWSEC-FRTS with debug priority:
[ 95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
[ 95.436002] NovaCore 0000:01:00.0: GPU instance built
This series is based on v6.16-rc1 with no other dependencies.
There are bits of documentation still missing, these are addressed by
Joel in his own documentation patch series [1]. I'll also double-check
and send follow-up patches if anything is still missing after that.
[1] https://lore.kernel.org/rust-for-linux/20250503040802.1411285-1-joelagnelf@nvidia.com/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v5:
- Rebased on top of 6.16-rc1.
- Improve invariants of CoherentAllocation related to the new `size`
method.
- Use SZ_* consts when redefining BAR0 size.
- Split VBIOS patch into 3 patches (Joel)
- Convert all `Result<()>` into `Result`.
- Use `::cast<T>()` instead of ` as ` to convert pointer types.
- Use `KBox` instead of `Arc` for falcon HALs.
- Do not use `get_` prefix on methods that do not increase reference
count.
- Replace arbitrary immediate values with proper constants.
- Use EIO to indicate firmware errors.
- Use inspect_err to be more verbose on which step of the FWSEC setup
failed.
- Move sysmem flush page into its own type and add its registration to
the FB HAL.
- Turn HAL getters into standalone functions.
- Patch FWSEC command at construction time.
- Force the signing stage (or an explicit non-signing state transition)
on the firmware DMA objects.
- Link to v4: https://lore.kernel.org/r/20250521-nova-frts-v4-0-05dfd4f39479@nvidia.com
Changes in v4:
- Improve documentation of falcon security modes (thanks Joel!)
- Add the definition of the size of CoherentAllocation as one of its
invariants.
- Better document GFW boot progress, registers and use wait_on() helper,
and move it to `gfw` module instead of `devinit`.
- Add missing TODOs for workarounds waiting to be replaced by in-flight
R4L features.
- Register macro: add the offset of the register as a type constant, and
allow register aliases for registers which can be interpreted
differently depending on context.
- Rework the `num` module using only macros (to allow use of overflowing
ops), and add the `PowerOfTwo` type.
- Add a proper HAL to the `fb` module.
- Move HAL builders to impl blocks of Chipset.
- Add proper types and traits for signatures.
- Proactively split FalconFirmware into distinct traits to ease
management of v2 vs v3 FWSEC headers that will be needed for Turing
support.
- Link to v3:
https://lore.kernel.org/r/20250507-nova-frts-v3-0-fcb02749754d@nvidia.com
Changes in v3:
- Rebased on top of latest nova-next.
- Use the new Devres::access() and remove the now unneeded with_bar!()
macro.
- Dropped `rust: devres: allow to borrow a reference to the resource's
Device` as it is not needed anymore.
- Fixed more erroneous uses of `ERANGE` error.
- Optimized alignment computations of the FB layout a bit.
- Link to v2: https://lore.kernel.org/r/20250501-nova-frts-v2-0-b4a137175337@nvidia.com
Changes in v2:
- Rebased on latest nova-next.
- Fixed all clippy warnings.
- Added `count` and `size` methods to `CoherentAllocation`.
- Added method to obtain a reference to the `Device` from a `Devres`
(this is super convenient).
- Split `DmaObject` into its own patch and added `Deref` implementation.
- Squashed field names from [3] into "extract FWSEC from BIOS".
- Fixed erroneous use of `ERANGE` error.
- Reworked `register!()` macro towards a more intuitive syntax, moved
its helper macros into internal rules to avoid polluting the macro
namespace.
- Renamed all registers to capital snake case to better match OpenRM.
- Removed declarations for registers that are not used yet.
- Added more documentation for items not covered by Joel's documentation
patches.
- Removed timer device and replaced it with a helper function using
`Ktime`. This also made [4] unneeded so it is dropped.
- Unregister the sysmem flush page upon device destruction.
- ... probably more that I forgot. >_<
- Link to v1: https://lore.kernel.org/r/20250420-nova-frts-v1-0-ecd1cca23963@nvidia.com
[3] https://lore.kernel.org/all/20250423225405.139613-6-joelagnelf@nvidia.com/
[4] https://lore.kernel.org/lkml/20250420-nova-frts-v1-1-ecd1cca23963@nvidia.com/
---
Alexandre Courbot (20):
rust: dma: expose the count and size of CoherentAllocation
rust: make ETIMEDOUT error available
rust: sizes: add constants up to SZ_2G
rust: add new `num` module with `PowerOfTwo` type
rust: num: add the `fls` operation
gpu: nova-core: use absolute paths in register!() macro
gpu: nova-core: add delimiter for helper rules in register!() macro
gpu: nova-core: expose the offset of each register as a type constant
gpu: nova-core: allow register aliases
gpu: nova-core: increase BAR0 size to 16MB
gpu: nova-core: add helper function to wait on condition
gpu: nova-core: wait for GFW_BOOT completion
gpu: nova-core: add DMA object struct
gpu: nova-core: register sysmem flush page
gpu: nova-core: add falcon register definitions and base code
gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
gpu: nova-core: compute layout of the FRTS region
gpu: nova-core: add types for patching firmware binaries
gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
gpu: nova-core: load and run FWSEC-FRTS
Joel Fernandes (3):
gpu: nova-core: vbios: Add base support for VBIOS construction and iteration
gpu: nova-core: vbios: Add support to look up PMU table in FWSEC
gpu: nova-core: vbios: Add support for FWSEC ucode extraction
drivers/gpu/nova-core/dma.rs | 58 ++
drivers/gpu/nova-core/driver.rs | 4 +-
drivers/gpu/nova-core/falcon.rs | 557 ++++++++++++++
drivers/gpu/nova-core/falcon/gsp.rs | 24 +
drivers/gpu/nova-core/falcon/hal.rs | 54 ++
drivers/gpu/nova-core/falcon/hal/ga102.rs | 117 +++
drivers/gpu/nova-core/falcon/sec2.rs | 10 +
drivers/gpu/nova-core/fb.rs | 136 ++++
drivers/gpu/nova-core/fb/hal.rs | 39 +
drivers/gpu/nova-core/fb/hal/ga100.rs | 57 ++
drivers/gpu/nova-core/fb/hal/ga102.rs | 36 +
drivers/gpu/nova-core/fb/hal/tu102.rs | 58 ++
drivers/gpu/nova-core/firmware.rs | 108 +++
drivers/gpu/nova-core/firmware/fwsec.rs | 395 ++++++++++
drivers/gpu/nova-core/gfw.rs | 39 +
drivers/gpu/nova-core/gpu.rs | 121 ++-
drivers/gpu/nova-core/nova_core.rs | 5 +
drivers/gpu/nova-core/regs.rs | 265 +++++++
drivers/gpu/nova-core/regs/macros.rs | 63 +-
drivers/gpu/nova-core/util.rs | 28 +
drivers/gpu/nova-core/vbios.rs | 1157 +++++++++++++++++++++++++++++
rust/kernel/dma.rs | 32 +-
rust/kernel/error.rs | 1 +
rust/kernel/lib.rs | 1 +
rust/kernel/num.rs | 204 +++++
rust/kernel/sizes.rs | 24 +
26 files changed, 3573 insertions(+), 20 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250417-nova-frts-96ef299abe2c
Best regards,
--
Alexandre Courbot <acourbot@nvidia.com>
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v5 01/23] rust: dma: expose the count and size of CoherentAllocation
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 02/23] rust: make ETIMEDOUT error available Alexandre Courbot
` (23 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
These properties are very useful to have (and to be used by nova-core)
and should be accessible.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
rust/kernel/dma.rs | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index a33261c62e0c2d3c2c9e92a4c058faab594e5355..1a6fc800256500ae04099fbf4f9a1bd1115ce202 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -114,9 +114,11 @@ pub mod attrs {
///
/// # Invariants
///
-/// For the lifetime of an instance of [`CoherentAllocation`], the `cpu_addr` is a valid pointer
-/// to an allocated region of consistent memory and `dma_handle` is the DMA address base of
-/// the region.
+/// - For the lifetime of an instance of [`CoherentAllocation`], the `cpu_addr` is a valid pointer
+/// to an allocated region of consistent memory and `dma_handle` is the DMA address base of the
+/// region.
+/// - The size in bytes of the allocation is equal to `size_of::<T> * count`.
+/// - `size_of::<T> * count` fits into a `usize`.
// TODO
//
// DMA allocations potentially carry device resources (e.g.IOMMU mappings), hence for soundness
@@ -179,9 +181,12 @@ pub fn alloc_attrs(
if ret.is_null() {
return Err(ENOMEM);
}
- // INVARIANT: We just successfully allocated a coherent region which is accessible for
- // `count` elements, hence the cpu address is valid. We also hold a refcounted reference
- // to the device.
+ // INVARIANT:
+ // - We just successfully allocated a coherent region which is accessible for
+ // `count` elements, hence the cpu address is valid. We also hold a refcounted reference
+ // to the device.
+ // - The allocated `size` is equal to `size_of::<T> * count`.
+ // - The allocated `size` fits into a `usize`.
Ok(Self {
dev: dev.into(),
dma_handle,
@@ -201,6 +206,21 @@ pub fn alloc_coherent(
CoherentAllocation::alloc_attrs(dev, count, gfp_flags, Attrs(0))
}
+ /// Returns the number of elements `T` in this allocation.
+ ///
+ /// Note that this is not the size of the allocation in bytes, which is provided by
+ /// [`Self::size`].
+ pub fn count(&self) -> usize {
+ self.count
+ }
+
+ /// Returns the size in bytes of this allocation.
+ pub fn size(&self) -> usize {
+ // INVARIANT: The type invariant of `Self` guarantees that size_of::<T> * count` fits into
+ // a `usize`.
+ self.count * core::mem::size_of::<T>()
+ }
+
/// Returns the base address to the allocated region in the CPU's virtual address space.
pub fn start_ptr(&self) -> *const T {
self.cpu_addr
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 02/23] rust: make ETIMEDOUT error available
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 01/23] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 03/23] rust: sizes: add constants up to SZ_2G Alexandre Courbot
` (22 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Benno Lossin
We will use this error in the nova-core driver.
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
rust/kernel/error.rs | 1 +
1 file changed, 1 insertion(+)
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 3dee3139fcd4379b94748c0ba1965f4e1865b633..083c7b068cf4e185100de96e520c54437898ee72 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -65,6 +65,7 @@ macro_rules! declare_err {
declare_err!(EDOM, "Math argument out of domain of func.");
declare_err!(ERANGE, "Math result not representable.");
declare_err!(EOVERFLOW, "Value too large for defined data type.");
+ declare_err!(ETIMEDOUT, "Connection timed out.");
declare_err!(ERESTARTSYS, "Restart the system call.");
declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
declare_err!(ERESTARTNOHAND, "Restart if no handler.");
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 03/23] rust: sizes: add constants up to SZ_2G
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 01/23] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 02/23] rust: make ETIMEDOUT error available Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
` (21 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
nova-core will need to use SZ_1M, so make the remaining constants
available.
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
rust/kernel/sizes.rs | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/rust/kernel/sizes.rs b/rust/kernel/sizes.rs
index 834c343e4170f507821b870e77afd08e2392911f..661e680d9330616478513a19fe2f87f9521516d7 100644
--- a/rust/kernel/sizes.rs
+++ b/rust/kernel/sizes.rs
@@ -24,3 +24,27 @@
pub const SZ_256K: usize = bindings::SZ_256K as usize;
/// 0x00080000
pub const SZ_512K: usize = bindings::SZ_512K as usize;
+/// 0x00100000
+pub const SZ_1M: usize = bindings::SZ_1M as usize;
+/// 0x00200000
+pub const SZ_2M: usize = bindings::SZ_2M as usize;
+/// 0x00400000
+pub const SZ_4M: usize = bindings::SZ_4M as usize;
+/// 0x00800000
+pub const SZ_8M: usize = bindings::SZ_8M as usize;
+/// 0x01000000
+pub const SZ_16M: usize = bindings::SZ_16M as usize;
+/// 0x02000000
+pub const SZ_32M: usize = bindings::SZ_32M as usize;
+/// 0x04000000
+pub const SZ_64M: usize = bindings::SZ_64M as usize;
+/// 0x08000000
+pub const SZ_128M: usize = bindings::SZ_128M as usize;
+/// 0x10000000
+pub const SZ_256M: usize = bindings::SZ_256M as usize;
+/// 0x20000000
+pub const SZ_512M: usize = bindings::SZ_512M as usize;
+/// 0x40000000
+pub const SZ_1G: usize = bindings::SZ_1G as usize;
+/// 0x80000000
+pub const SZ_2G: usize = bindings::SZ_2G as usize;
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (2 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 03/23] rust: sizes: add constants up to SZ_2G Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 15:07 ` Boqun Feng
` (3 more replies)
2025-06-12 14:01 ` [PATCH v5 05/23] rust: num: add the `fls` operation Alexandre Courbot
` (20 subsequent siblings)
24 siblings, 4 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
Introduce the `num` module, featuring the `PowerOfTwo` unsigned wrapper
that guarantees (at build-time or runtime) that a value is a power of
two.
Such a property is often useful to maintain. In the context of the
kernel, powers of two are often used to align addresses or sizes up and
down, or to create masks. These operations are provided by this type.
It is introduced to be first used by the nova-core driver.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
rust/kernel/lib.rs | 1 +
rust/kernel/num.rs | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 174 insertions(+)
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 6b4774b2b1c37f4da1866e993be6230bc6715841..2955f65da1278dd4cba1e4272ff178b8211a892c 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -89,6 +89,7 @@
pub mod mm;
#[cfg(CONFIG_NET)]
pub mod net;
+pub mod num;
pub mod of;
#[cfg(CONFIG_PM_OPP)]
pub mod opp;
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index 0000000000000000000000000000000000000000..ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+use crate::build_assert;
+use core::borrow::Borrow;
+use core::fmt::Debug;
+use core::hash::Hash;
+use core::ops::Deref;
+
+/// An unsigned integer which is guaranteed to be a power of 2.
+#[derive(Debug, Clone, Copy)]
+#[repr(transparent)]
+pub struct PowerOfTwo<T>(T);
+
+macro_rules! power_of_two_impl {
+ ($($t:ty),+) => {
+ $(
+ impl PowerOfTwo<$t> {
+ /// Validates that `v` is a power of two at build-time, and returns it wrapped into
+ /// `PowerOfTwo`.
+ ///
+ /// A build error is triggered if `v` cannot be asserted to be a power of two.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::num::PowerOfTwo;
+ ///
+ /// let v = PowerOfTwo::<u32>::new(256);
+ /// assert_eq!(v.value(), 256);
+ /// ```
+ #[inline(always)]
+ pub const fn new(v: $t) -> Self {
+ build_assert!(v.count_ones() == 1);
+ Self(v)
+ }
+
+ /// Validates that `v` is a power of two at runtime, and returns it wrapped into
+ /// `PowerOfTwo`.
+ ///
+ /// `None` is returned if `v` was not a power of two.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::num::PowerOfTwo;
+ ///
+ /// assert_eq!(PowerOfTwo::<u32>::try_new(16).unwrap().value(), 16);
+ /// assert_eq!(PowerOfTwo::<u32>::try_new(15), None);
+ /// ```
+ #[inline(always)]
+ pub const fn try_new(v: $t) -> Option<Self> {
+ match v.count_ones() {
+ 1 => Some(Self(v)),
+ _ => None,
+ }
+ }
+
+ /// Returns the value of this instance.
+ ///
+ /// It is guaranteed to be a power of two.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::num::PowerOfTwo;
+ ///
+ /// let v = PowerOfTwo::<u32>::new(256);
+ /// assert_eq!(v.value(), 256);
+ /// ```
+ #[inline(always)]
+ pub const fn value(&self) -> $t {
+ self.0
+ }
+
+ /// Returns the mask corresponding to `self.value() - 1`.
+ #[inline(always)]
+ pub const fn mask(&self) -> $t {
+ self.0.wrapping_sub(1)
+ }
+
+ /// Aligns `self` down to `alignment`.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::num::PowerOfTwo;
+ ///
+ /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_down(0x4fff), 0x4000);
+ /// ```
+ #[inline(always)]
+ pub const fn align_down(self, value: $t) -> $t {
+ value & !self.mask()
+ }
+
+ /// Aligns `value` up to `self`.
+ ///
+ /// Wraps around to `0` if the requested alignment pushes the result above the
+ /// type's limits.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::num::PowerOfTwo;
+ ///
+ /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x4fff), 0x5000);
+ /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x4000), 0x4000);
+ /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x0), 0x0);
+ /// assert_eq!(PowerOfTwo::<u16>::new(0x100).align_up(0xffff), 0x0);
+ /// ```
+ #[inline(always)]
+ pub const fn align_up(self, value: $t) -> $t {
+ self.align_down(value.wrapping_add(self.mask()))
+ }
+ }
+ )+
+ };
+}
+
+power_of_two_impl!(usize, u8, u16, u32, u64, u128);
+
+impl<T> Deref for PowerOfTwo<T> {
+ type Target = T;
+
+ fn deref(&self) -> &Self::Target {
+ &self.0
+ }
+}
+
+impl<T> PartialEq for PowerOfTwo<T>
+where
+ T: PartialEq,
+{
+ fn eq(&self, other: &Self) -> bool {
+ self.0 == other.0
+ }
+}
+
+impl<T> Eq for PowerOfTwo<T> where T: Eq {}
+
+impl<T> PartialOrd for PowerOfTwo<T>
+where
+ T: PartialOrd,
+{
+ fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
+ self.0.partial_cmp(&other.0)
+ }
+}
+
+impl<T> Ord for PowerOfTwo<T>
+where
+ T: Ord,
+{
+ fn cmp(&self, other: &Self) -> core::cmp::Ordering {
+ self.0.cmp(&other.0)
+ }
+}
+
+impl<T> Hash for PowerOfTwo<T>
+where
+ T: Hash,
+{
+ fn hash<H: core::hash::Hasher>(&self, state: &mut H) {
+ self.0.hash(state);
+ }
+}
+
+impl<T> Borrow<T> for PowerOfTwo<T> {
+ fn borrow(&self) -> &T {
+ &self.0
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (3 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-14 19:16 ` Benno Lossin
2025-06-15 9:37 ` Miguel Ojeda
2025-06-12 14:01 ` [PATCH v5 06/23] gpu: nova-core: use absolute paths in register!() macro Alexandre Courbot
` (19 subsequent siblings)
24 siblings, 2 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
Add an equivalent to the `fls` (Find Last Set bit) C function to Rust
unsigned types.
It is to be first used by the nova-core driver.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
rust/kernel/num.rs | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
index ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a..934afe17719f789c569dbd54534adc2e26fe59f2 100644
--- a/rust/kernel/num.rs
+++ b/rust/kernel/num.rs
@@ -171,3 +171,34 @@ fn borrow(&self) -> &T {
&self.0
}
}
+
+macro_rules! impl_fls {
+ ($($t:ty),+) => {
+ $(
+ ::kernel::macros::paste! {
+ /// Find Last Set Bit: return the 1-based index of the last (i.e. most significant) set
+ /// bit in `v`.
+ ///
+ /// Equivalent to the C `fls` function.
+ ///
+ /// # Examples
+ ///
+ /// ```
+ /// use kernel::num::fls_u32;
+ ///
+ /// assert_eq!(fls_u32(0x0), 0);
+ /// assert_eq!(fls_u32(0x1), 1);
+ /// assert_eq!(fls_u32(0x10), 5);
+ /// assert_eq!(fls_u32(0xffff), 16);
+ /// assert_eq!(fls_u32(0x8000_0000), 32);
+ /// ```
+ #[inline(always)]
+ pub const fn [<fls_ $t>](v: $t) -> u32 {
+ $t::BITS - v.leading_zeros()
+ }
+ }
+ )+
+ };
+}
+
+impl_fls!(usize, u8, u16, u32, u64, u128);
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 06/23] gpu: nova-core: use absolute paths in register!() macro
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (4 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 05/23] rust: num: add the `fls` operation Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 07/23] gpu: nova-core: add delimiter for helper rules " Alexandre Courbot
` (18 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
Fix the paths that were not absolute to prevent a potential local module
from being picked up.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/regs/macros.rs | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
index 7ecc70efb3cd723b673cd72915e72b8a4a009f06..40bf9346cd0699ede05cfddff5d39822c696c164 100644
--- a/drivers/gpu/nova-core/regs/macros.rs
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -114,7 +114,7 @@ fn fmt(&self, f: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
}
}
- impl core::ops::BitOr for $name {
+ impl ::core::ops::BitOr for $name {
type Output = Self;
fn bitor(self, rhs: Self) -> Self::Output {
@@ -161,7 +161,7 @@ impl $name {
(@check_field_bounds $hi:tt:$lo:tt $field:ident as bool) => {
#[allow(clippy::eq_op)]
const _: () = {
- kernel::build_assert!(
+ ::kernel::build_assert!(
$hi == $lo,
concat!("boolean field `", stringify!($field), "` covers more than one bit")
);
@@ -172,7 +172,7 @@ impl $name {
(@check_field_bounds $hi:tt:$lo:tt $field:ident as $type:tt) => {
#[allow(clippy::eq_op)]
const _: () = {
- kernel::build_assert!(
+ ::kernel::build_assert!(
$hi >= $lo,
concat!("field `", stringify!($field), "`'s MSB is smaller than its LSB")
);
@@ -234,7 +234,7 @@ impl $name {
@leaf_accessor $name:ident $hi:tt:$lo:tt $field:ident as $type:ty
{ $process:expr } $to_type:ty => $res_type:ty $(, $comment:literal)?;
) => {
- kernel::macros::paste!(
+ ::kernel::macros::paste!(
const [<$field:upper>]: ::core::ops::RangeInclusive<u8> = $lo..=$hi;
const [<$field:upper _MASK>]: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1);
const [<$field:upper _SHIFT>]: u32 = Self::[<$field:upper _MASK>].trailing_zeros();
@@ -246,7 +246,7 @@ impl $name {
)?
#[inline]
pub(crate) fn $field(self) -> $res_type {
- kernel::macros::paste!(
+ ::kernel::macros::paste!(
const MASK: u32 = $name::[<$field:upper _MASK>];
const SHIFT: u32 = $name::[<$field:upper _SHIFT>];
);
@@ -255,7 +255,7 @@ pub(crate) fn $field(self) -> $res_type {
$process(field)
}
- kernel::macros::paste!(
+ ::kernel::macros::paste!(
$(
#[doc="Sets the value of this field:"]
#[doc=$comment]
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 07/23] gpu: nova-core: add delimiter for helper rules in register!() macro
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (5 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 06/23] gpu: nova-core: use absolute paths in register!() macro Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 08/23] gpu: nova-core: expose the offset of each register as a type constant Alexandre Courbot
` (17 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
This macro is pretty complex, and most rules are just helper, so add a
delimiter to indicate when users only interested in using it can stop
reading.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/regs/macros.rs | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
index 40bf9346cd0699ede05cfddff5d39822c696c164..d7f09026390b4ccb1c969f2b29caf07fa9204a77 100644
--- a/drivers/gpu/nova-core/regs/macros.rs
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -94,6 +94,8 @@ macro_rules! register {
register!(@io$name @ + $offset);
};
+ // All rules below are helpers.
+
// Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
// and conversion to regular `u32`).
(@common $name:ident $(, $comment:literal)?) => {
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 08/23] gpu: nova-core: expose the offset of each register as a type constant
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (6 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 07/23] gpu: nova-core: add delimiter for helper rules " Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 09/23] gpu: nova-core: allow register aliases Alexandre Courbot
` (16 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
Although we want to access registers using the provided methods, it is
sometimes needed to use their raw offset, for instance when working with
a register array.
Expose the offset of each register using a type constant to avoid
resorting to hardcoded values.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/regs/macros.rs | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
index d7f09026390b4ccb1c969f2b29caf07fa9204a77..7cd013f3c90bbd8ca437d4072cae8f11d7946fcd 100644
--- a/drivers/gpu/nova-core/regs/macros.rs
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -78,7 +78,7 @@ macro_rules! register {
$($fields:tt)*
}
) => {
- register!(@common $name $(, $comment)?);
+ register!(@common $name @ $offset $(, $comment)?);
register!(@field_accessors $name { $($fields)* });
register!(@io $name @ $offset);
};
@@ -89,7 +89,7 @@ macro_rules! register {
$($fields:tt)*
}
) => {
- register!(@common $name $(, $comment)?);
+ register!(@common $name @ $offset $(, $comment)?);
register!(@field_accessors $name { $($fields)* });
register!(@io$name @ + $offset);
};
@@ -98,7 +98,7 @@ macro_rules! register {
// Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
// and conversion to regular `u32`).
- (@common $name:ident $(, $comment:literal)?) => {
+ (@common $name:ident @ $offset:literal $(, $comment:literal)?) => {
$(
#[doc=$comment]
)?
@@ -106,6 +106,11 @@ macro_rules! register {
#[derive(Clone, Copy, Default)]
pub(crate) struct $name(u32);
+ #[allow(dead_code)]
+ impl $name {
+ pub(crate) const OFFSET: usize = $offset;
+ }
+
// TODO: display the raw hex value, then the value of all the fields. This requires
// matching the fields, which will complexify the syntax considerably...
impl ::core::fmt::Debug for $name {
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 09/23] gpu: nova-core: allow register aliases
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (7 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 08/23] gpu: nova-core: expose the offset of each register as a type constant Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 10/23] gpu: nova-core: increase BAR0 size to 16MB Alexandre Courbot
` (15 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
Some registers (notably scratch registers) don't have a definitive
purpose, but need to be interpreted differently depending on context.
Expand the register!() macro to support a syntax indicating that a
register type should be at the same offset as another one, but under a
different name, and with different fields and documentation.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/regs/macros.rs | 40 ++++++++++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/regs/macros.rs b/drivers/gpu/nova-core/regs/macros.rs
index 7cd013f3c90bbd8ca437d4072cae8f11d7946fcd..e0e6fef3796f9dd2ce4e0223444a05bcc53075a6 100644
--- a/drivers/gpu/nova-core/regs/macros.rs
+++ b/drivers/gpu/nova-core/regs/macros.rs
@@ -71,6 +71,20 @@
/// pr_info!("CPU CTL: {:#x}", cpuctl);
/// cpuctl.set_start(true).write(&bar, CPU_BASE);
/// ```
+///
+/// It is also possible to create a alias register by using the `=> ALIAS` syntax. This is useful
+/// for cases where a register's interpretation depends on the context:
+///
+/// ```no_run
+/// register!(SCRATCH_0 @ 0x0000100, "Scratch register 0" {
+/// 31:0 value as u32, "Raw value";
+///
+/// register!(SCRATCH_0_BOOT_STATUS => SCRATCH_0, "Boot status of the firmware" {
+/// 0:0 completed as bool, "Whether the firmware has completed booting";
+/// ```
+///
+/// In this example, `SCRATCH_0_BOOT_STATUS` uses the same I/O address as `SCRATCH_0`, while also
+/// providing its own `completed` method.
macro_rules! register {
// Creates a register at a fixed offset of the MMIO space.
(
@@ -83,6 +97,17 @@ macro_rules! register {
register!(@io $name @ $offset);
};
+ // Creates a alias register of fixed offset register `alias` with its own fields.
+ (
+ $name:ident => $alias:ident $(, $comment:literal)? {
+ $($fields:tt)*
+ }
+ ) => {
+ register!(@common $name @ $alias::OFFSET $(, $comment)?);
+ register!(@field_accessors $name { $($fields)* });
+ register!(@io $name @ $alias::OFFSET);
+ };
+
// Creates a register at a relative offset from a base address.
(
$name:ident @ + $offset:literal $(, $comment:literal)? {
@@ -94,11 +119,22 @@ macro_rules! register {
register!(@io$name @ + $offset);
};
+ // Creates a alias register of relative offset register `alias` with its own fields.
+ (
+ $name:ident => + $alias:ident $(, $comment:literal)? {
+ $($fields:tt)*
+ }
+ ) => {
+ register!(@common $name @ $alias::OFFSET $(, $comment)?);
+ register!(@field_accessors $name { $($fields)* });
+ register!(@io $name @ + $alias::OFFSET);
+ };
+
// All rules below are helpers.
// Defines the wrapper `$name` type, as well as its relevant implementations (`Debug`, `BitOr`,
// and conversion to regular `u32`).
- (@common $name:ident @ $offset:literal $(, $comment:literal)?) => {
+ (@common $name:ident @ $offset:expr $(, $comment:literal)?) => {
$(
#[doc=$comment]
)?
@@ -280,7 +316,7 @@ pub(crate) fn [<set_ $field>](mut self, value: $to_type) -> Self {
};
// Creates the IO accessors for a fixed offset register.
- (@io $name:ident @ $offset:literal) => {
+ (@io $name:ident @ $offset:expr) => {
#[allow(dead_code)]
impl $name {
#[inline]
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 10/23] gpu: nova-core: increase BAR0 size to 16MB
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (8 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 09/23] gpu: nova-core: allow register aliases Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 11/23] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
` (14 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
The Turing+ register address space spans over that range, so increase it
as future patches will access more registers.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/driver.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 8c86101c26cb5fe5eb9a3d03268338c6b58baef7..ffe25c7a2fdad289549460f7fd87d6e09299a35c 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
-use kernel::{auxiliary, bindings, c_str, device::Core, pci, prelude::*};
+use kernel::{auxiliary, bindings, c_str, device::Core, pci, prelude::*, sizes::SZ_16M};
use crate::gpu::Gpu;
@@ -11,7 +11,7 @@ pub(crate) struct NovaCore {
_reg: auxiliary::Registration,
}
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = SZ_16M;
pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
kernel::pci_device_table!(
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 11/23] gpu: nova-core: add helper function to wait on condition
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (9 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 10/23] gpu: nova-core: increase BAR0 size to 16MB Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 12/23] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
` (13 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
While programming the hardware, we frequently need to busy-wait until
a condition (like a given bit of a register to switch value) happens.
Add a basic `wait_on` helper function to wait on such conditions
expressed as a closure, with a timeout argument.
This is temporary as we will switch to `read_poll_timeout` [1] once it
is available.
[1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/util.rs | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/gpu/nova-core/util.rs b/drivers/gpu/nova-core/util.rs
index 332a64cfc6a9d7d787fbdc228887c0be53a97160..c50bfa5ab7fe385fae26c8909ae5984b96af618a 100644
--- a/drivers/gpu/nova-core/util.rs
+++ b/drivers/gpu/nova-core/util.rs
@@ -1,5 +1,10 @@
// SPDX-License-Identifier: GPL-2.0
+use core::time::Duration;
+
+use kernel::prelude::*;
+use kernel::time::Instant;
+
pub(crate) const fn to_lowercase_bytes<const N: usize>(s: &str) -> [u8; N] {
let src = s.as_bytes();
let mut dst = [0; N];
@@ -19,3 +24,27 @@ pub(crate) const fn const_bytes_to_str(bytes: &[u8]) -> &str {
Err(_) => kernel::build_error!("Bytes are not valid UTF-8."),
}
}
+
+/// Wait until `cond` is true or `timeout` elapsed.
+///
+/// When `cond` evaluates to `Some`, its return value is returned.
+///
+/// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
+/// `Some`.
+///
+/// TODO: replace with `read_poll_timeout` once it is available.
+/// (https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/)
+#[expect(dead_code)]
+pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(timeout: Duration, cond: F) -> Result<R> {
+ let start_time = Instant::now();
+
+ loop {
+ if let Some(ret) = cond() {
+ return Ok(ret);
+ }
+
+ if start_time.elapsed().as_nanos() > timeout.as_nanos() as i64 {
+ return Err(ETIMEDOUT);
+ }
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 12/23] gpu: nova-core: wait for GFW_BOOT completion
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (10 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 11/23] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 13/23] gpu: nova-core: add DMA object struct Alexandre Courbot
` (12 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
Upon reset, the GPU executes the GFW (GPU Firmware) in order to
initialize its base parameters such as clocks. The driver must ensure
that this step is completed before using the hardware.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gfw.rs | 39 ++++++++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/gpu.rs | 5 +++++
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/regs.rs | 25 ++++++++++++++++++++++++
drivers/gpu/nova-core/util.rs | 1 -
5 files changed, 70 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/nova-core/gfw.rs b/drivers/gpu/nova-core/gfw.rs
new file mode 100644
index 0000000000000000000000000000000000000000..911338660f9774d74c71c090517b220b64989bf6
--- /dev/null
+++ b/drivers/gpu/nova-core/gfw.rs
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! GPU Firmware (GFW) support.
+//!
+//! Upon reset, the GPU runs some firmware code from the BIOS to setup its core parameters. Most of
+//! the GPU is considered unusable until this step is completed, so we must wait on it before
+//! performing driver initialization.
+
+use core::time::Duration;
+
+use kernel::bindings;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+use crate::util;
+
+/// Wait until GFW (GPU Firmware) completes, or a 4 seconds timeout elapses.
+pub(crate) fn wait_gfw_boot_completion(bar: &Bar0) -> Result {
+ util::wait_on(Duration::from_secs(4), || {
+ // Check that FWSEC has lowered its protection level before reading the GFW_BOOT
+ // status.
+ let gfw_booted = regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK::read(bar)
+ .read_protection_level0()
+ && regs::NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_0_GFW_BOOT::read(bar).completed();
+
+ if gfw_booted {
+ Some(())
+ } else {
+ // Avoid busy-looping.
+ // SAFETY: msleep should be safe to call with any parameter.
+ // TODO: replace with [1] once it merges.
+ // [1] https://lore.kernel.org/rust-for-linux/20250423192857.199712-6-fujita.tomonori@gmail.com/
+ unsafe { bindings::msleep(1) };
+
+ None
+ }
+ })
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 60b86f3702842dc2c8b06f092250a5bad3b97bf4..e44ff6fa07147c6dd1515c2c6c0df927a2257c85 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -4,6 +4,7 @@
use crate::driver::Bar0;
use crate::firmware::{Firmware, FIRMWARE_VERSION};
+use crate::gfw;
use crate::regs;
use crate::util;
use core::fmt;
@@ -182,6 +183,10 @@ pub(crate) fn new(
spec.revision
);
+ // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
+ gfw::wait_gfw_boot_completion(bar)
+ .inspect_err(|_| dev_err!(pdev.as_ref(), "GFW boot did not complete"))?;
+
Ok(pin_init!(Self {
spec,
bar: devres_bar,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 618632f0abcc8f5ef6945a04fc084acc4ecbf20b..c3fde3e132ea658888851137ab47fcb7b3637577 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -4,6 +4,7 @@
mod driver;
mod firmware;
+mod gfw;
mod gpu;
mod regs;
mod util;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 5a12732303066f78b8ec5745096cef632ff3bfba..cba442da51181971f209b338249307c11ac481e3 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -37,3 +37,28 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
.and_then(Chipset::try_from)
}
}
+
+/* PGC6 */
+
+register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {
+ 0:0 read_protection_level0 as bool, "Set after FWSEC lowers its protection level";
+});
+
+// TODO: This is an array of registers.
+register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05 @ 0x00118234 {
+ 31:0 value as u32;
+});
+
+register!(
+ NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_0_GFW_BOOT => NV_PGC6_AON_SECURE_SCRATCH_GROUP_05,
+ "Scratch group 05 register 0 used as GFW boot progress indicator" {
+ 7:0 progress as u8, "Progress of GFW boot (0xff means completed)";
+ }
+);
+
+impl NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_0_GFW_BOOT {
+ /// Returns `true` if GFW boot is completed.
+ pub(crate) fn completed(self) -> bool {
+ self.progress() == 0xff
+ }
+}
diff --git a/drivers/gpu/nova-core/util.rs b/drivers/gpu/nova-core/util.rs
index c50bfa5ab7fe385fae26c8909ae5984b96af618a..69f29238b25ed949b00def1b748df3ff7567d83c 100644
--- a/drivers/gpu/nova-core/util.rs
+++ b/drivers/gpu/nova-core/util.rs
@@ -34,7 +34,6 @@ pub(crate) const fn const_bytes_to_str(bytes: &[u8]) -> &str {
///
/// TODO: replace with `read_poll_timeout` once it is available.
/// (https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/)
-#[expect(dead_code)]
pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(timeout: Duration, cond: F) -> Result<R> {
let start_time = Instant::now();
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 13/23] gpu: nova-core: add DMA object struct
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (11 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 12/23] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 14/23] gpu: nova-core: register sysmem flush page Alexandre Courbot
` (11 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
Since we will need to allocate lots of distinct memory chunks to be
shared between GPU and CPU, introduce a type dedicated to that. It is a
light wrapper around CoherentAllocation.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/dma.rs | 61 ++++++++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/nova_core.rs | 1 +
2 files changed, 62 insertions(+)
diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
new file mode 100644
index 0000000000000000000000000000000000000000..4b063aaef65ec4e2f476fc5ce9dc25341b6660ca
--- /dev/null
+++ b/drivers/gpu/nova-core/dma.rs
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Simple DMA object wrapper.
+
+// To be removed when all code is used.
+#![expect(dead_code)]
+
+use core::ops::{Deref, DerefMut};
+
+use kernel::device;
+use kernel::dma::CoherentAllocation;
+use kernel::page::PAGE_SIZE;
+use kernel::prelude::*;
+
+pub(crate) struct DmaObject {
+ dma: CoherentAllocation<u8>,
+}
+
+impl DmaObject {
+ pub(crate) fn new(dev: &device::Device<device::Bound>, len: usize) -> Result<Self> {
+ let len = core::alloc::Layout::from_size_align(len, PAGE_SIZE)
+ .map_err(|_| EINVAL)?
+ .pad_to_align()
+ .size();
+ let dma = CoherentAllocation::alloc_coherent(dev, len, GFP_KERNEL | __GFP_ZERO)?;
+
+ Ok(Self { dma })
+ }
+
+ pub(crate) fn from_data(dev: &device::Device<device::Bound>, data: &[u8]) -> Result<Self> {
+ Self::new(dev, data.len()).map(|mut dma_obj| {
+ // TODO: replace with `CoherentAllocation::write()` once available.
+ // SAFETY:
+ // - `dma_obj`'s size is at least `data.len()`.
+ // - We have just created this object and there is no other user at this stage.
+ unsafe {
+ core::ptr::copy_nonoverlapping(
+ data.as_ptr(),
+ dma_obj.dma.start_ptr_mut(),
+ data.len(),
+ );
+ }
+
+ dma_obj
+ })
+ }
+}
+
+impl Deref for DmaObject {
+ type Target = CoherentAllocation<u8>;
+
+ fn deref(&self) -> &Self::Target {
+ &self.dma
+ }
+}
+
+impl DerefMut for DmaObject {
+ fn deref_mut(&mut self) -> &mut Self::Target {
+ &mut self.dma
+ }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index c3fde3e132ea658888851137ab47fcb7b3637577..121fe5c11044a192212d0a64353b7acad58c796a 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -2,6 +2,7 @@
//! Nova Core GPU Driver
+mod dma;
mod driver;
mod firmware;
mod gfw;
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 14/23] gpu: nova-core: register sysmem flush page
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (12 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 13/23] gpu: nova-core: add DMA object struct Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
` (10 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
Reserve a page of system memory so sysmembar can perform a read on it if
a system write occurred since the last flush. Do this early as it can be
required to e.g. reset the GPU falcons.
Chipsets capabilities differ in that respect, so this commit also
introduces the FB HAL.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/fb.rs | 66 +++++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/fb/hal.rs | 31 ++++++++++++++++
drivers/gpu/nova-core/fb/hal/ga100.rs | 45 ++++++++++++++++++++++++
drivers/gpu/nova-core/fb/hal/tu102.rs | 42 ++++++++++++++++++++++
drivers/gpu/nova-core/gpu.rs | 25 +++++++++++--
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/regs.rs | 10 ++++++
7 files changed, 218 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
new file mode 100644
index 0000000000000000000000000000000000000000..308cd76edfee5a2e8a4cd979c20da2ce51cb16a5
--- /dev/null
+++ b/drivers/gpu/nova-core/fb.rs
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::prelude::*;
+use kernel::types::ARef;
+use kernel::{dev_warn, device};
+
+use crate::dma::DmaObject;
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+
+mod hal;
+
+/// Type holding the sysmem flush memory page, a page of memory to be written into the
+/// `NV_PFB_NISO_FLUSH_SYSMEM_ADDR*` registers and used to maintain memory coherency.
+///
+/// Users are responsible for manually calling [`Self::unregister`] before dropping this object, or
+/// the page might remain in use even after it has been freed.
+pub(crate) struct SysmemFlush {
+ /// Chipset we are operating on.
+ chipset: Chipset,
+ device: ARef<device::Device>,
+ /// Keep the page alive as long as we need it.
+ page: DmaObject,
+}
+
+impl SysmemFlush {
+ /// Allocate a memory page and register it as the sysmem flush page.
+ pub(crate) fn register(
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ chipset: Chipset,
+ ) -> Result<Self> {
+ let page = DmaObject::new(dev, kernel::bindings::PAGE_SIZE)?;
+
+ hal::fb_hal(chipset).write_sysmem_flush_page(bar, page.dma_handle())?;
+
+ Ok(Self {
+ chipset,
+ device: dev.into(),
+ page,
+ })
+ }
+
+ /// Unregister the managed sysmem flush page.
+ ///
+ /// Users must make sure to call this method before dropping the object.
+ pub(crate) fn unregister(self, bar: &Bar0) {
+ let hal = hal::fb_hal(self.chipset);
+
+ if hal.read_sysmem_flush_page(bar) == self.page.dma_handle() {
+ let _ = hal.write_sysmem_flush_page(bar, 0).inspect_err(|e| {
+ dev_warn!(
+ &self.device,
+ "failed to unregister sysmem flush page: {:?}",
+ e
+ )
+ });
+ } else {
+ // Another page has been registered after us for some reason - warn as this is a bug.
+ dev_warn!(
+ &self.device,
+ "attempt to unregister a sysmem flush page that is not active\n"
+ );
+ }
+ }
+}
diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
new file mode 100644
index 0000000000000000000000000000000000000000..23eab57eec9f524e066d3324eb7f5f2bf78481d2
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal.rs
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+
+mod ga100;
+mod tu102;
+
+pub(crate) trait FbHal {
+ /// Returns the address of the currently-registered sysmem flush page.
+ fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64;
+
+ /// Register `addr` as the address of the sysmem flush page.
+ ///
+ /// This might fail if the address is too large for the receiving register.
+ fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result;
+}
+
+/// Returns the HAL corresponding to `chipset`.
+pub(super) fn fb_hal(chipset: Chipset) -> &'static dyn FbHal {
+ use Chipset::*;
+
+ match chipset {
+ TU102 | TU104 | TU106 | TU117 | TU116 => tu102::TU102_HAL,
+ GA100 | GA102 | GA103 | GA104 | GA106 | GA107 | AD102 | AD103 | AD104 | AD106 | AD107 => {
+ ga100::GA100_HAL
+ }
+ }
+}
diff --git a/drivers/gpu/nova-core/fb/hal/ga100.rs b/drivers/gpu/nova-core/fb/hal/ga100.rs
new file mode 100644
index 0000000000000000000000000000000000000000..7c10436c1c590d9b767c399b69370697fdf8d239
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal/ga100.rs
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+
+struct Ga100;
+
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::fb::hal::FbHal;
+use crate::regs;
+
+use super::tu102::FLUSH_SYSMEM_ADDR_SHIFT;
+
+pub(super) fn read_sysmem_flush_page_ga100(bar: &Bar0) -> u64 {
+ (regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08() as u64) << FLUSH_SYSMEM_ADDR_SHIFT
+ | (regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40() as u64)
+ << FLUSH_SYSMEM_ADDR_SHIFT_HI
+}
+
+pub(super) fn write_sysmem_flush_page_ga100(bar: &Bar0, addr: u64) {
+ regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::default()
+ .set_adr_63_40((addr >> FLUSH_SYSMEM_ADDR_SHIFT_HI) as u32)
+ .write(bar);
+ regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::default()
+ .set_adr_39_08((addr >> FLUSH_SYSMEM_ADDR_SHIFT) as u32)
+ .write(bar);
+}
+
+/// Shift applied to the sysmem address before it is written into
+/// `NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI`,
+const FLUSH_SYSMEM_ADDR_SHIFT_HI: u32 = 40;
+
+impl FbHal for Ga100 {
+ fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
+ read_sysmem_flush_page_ga100(bar)
+ }
+
+ fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
+ write_sysmem_flush_page_ga100(bar, addr);
+
+ Ok(())
+ }
+}
+
+const GA100: Ga100 = Ga100;
+pub(super) const GA100_HAL: &dyn FbHal = &GA100;
diff --git a/drivers/gpu/nova-core/fb/hal/tu102.rs b/drivers/gpu/nova-core/fb/hal/tu102.rs
new file mode 100644
index 0000000000000000000000000000000000000000..048859f9fd9d6cfb630da0a8c3513becf3ab62d6
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal/tu102.rs
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use crate::driver::Bar0;
+use crate::fb::hal::FbHal;
+use crate::regs;
+use kernel::prelude::*;
+
+/// Shift applied to the sysmem address before it is written into `NV_PFB_NISO_FLUSH_SYSMEM_ADDR`,
+/// to be used by HALs.
+pub(super) const FLUSH_SYSMEM_ADDR_SHIFT: u32 = 8;
+
+pub(super) fn read_sysmem_flush_page_gm107(bar: &Bar0) -> u64 {
+ (regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08() as u64) << FLUSH_SYSMEM_ADDR_SHIFT
+}
+
+pub(super) fn write_sysmem_flush_page_gm107(bar: &Bar0, addr: u64) -> Result {
+ // Check that the address doesn't overflow the receiving 32-bit register.
+ if addr >> (u32::BITS + FLUSH_SYSMEM_ADDR_SHIFT) == 0 {
+ regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::default()
+ .set_adr_39_08((addr >> FLUSH_SYSMEM_ADDR_SHIFT) as u32)
+ .write(bar);
+
+ Ok(())
+ } else {
+ Err(EINVAL)
+ }
+}
+
+struct Tu102;
+
+impl FbHal for Tu102 {
+ fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
+ read_sysmem_flush_page_gm107(bar)
+ }
+
+ fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
+ write_sysmem_flush_page_gm107(bar, addr)
+ }
+}
+
+const TU102: Tu102 = Tu102;
+pub(super) const TU102_HAL: &dyn FbHal = &TU102;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index e44ff6fa07147c6dd1515c2c6c0df927a2257c85..768579dfdfc7e9e61c613202030d2c7ee6054e2a 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -3,6 +3,7 @@
use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
use crate::driver::Bar0;
+use crate::fb::SysmemFlush;
use crate::firmware::{Firmware, FIRMWARE_VERSION};
use crate::gfw;
use crate::regs;
@@ -158,12 +159,28 @@ fn new(bar: &Bar0) -> Result<Spec> {
}
/// Structure holding the resources required to operate the GPU.
-#[pin_data]
+#[pin_data(PinnedDrop)]
pub(crate) struct Gpu {
spec: Spec,
/// MMIO mapping of PCI BAR 0
bar: Devres<Bar0>,
fw: Firmware,
+ /// System memory page required for flushing all pending GPU-side memory writes done through
+ /// PCIE into system memory.
+ ///
+ /// We use an `Option` so we can take the object during `drop`. It is not accessed otherwise.
+ sysmem_flush: Option<SysmemFlush>,
+}
+
+#[pinned_drop]
+impl PinnedDrop for Gpu {
+ fn drop(mut self: Pin<&mut Self>) {
+ // Unregister the sysmem flush page before we release it.
+ let _ = self
+ .sysmem_flush
+ .take()
+ .map(|sysmem_flush| self.bar.try_access_with(|b| sysmem_flush.unregister(b)));
+ }
}
impl Gpu {
@@ -187,10 +204,14 @@ pub(crate) fn new(
gfw::wait_gfw_boot_completion(bar)
.inspect_err(|_| dev_err!(pdev.as_ref(), "GFW boot did not complete"))?;
+ // System memory page required for sysmembar to properly flush into system memory.
+ let sysmem_flush = SysmemFlush::register(pdev.as_ref(), bar, spec.chipset)?;
+
Ok(pin_init!(Self {
spec,
bar: devres_bar,
- fw
+ fw,
+ sysmem_flush: Some(sysmem_flush),
}))
}
}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 121fe5c11044a192212d0a64353b7acad58c796a..8ac04b8586e7314528e081464ed73ee615001e9b 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -4,6 +4,7 @@
mod dma;
mod driver;
+mod fb;
mod firmware;
mod gfw;
mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index cba442da51181971f209b338249307c11ac481e3..b599e7ddad57ed8defe0324056571ba46b926cf6 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -38,6 +38,16 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
}
}
+/* PFB */
+
+register!(NV_PFB_NISO_FLUSH_SYSMEM_ADDR @ 0x00100c10 {
+ 31:0 adr_39_08 as u32;
+});
+
+register!(NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI @ 0x00100c40 {
+ 23:0 adr_63_40 as u32;
+});
+
/* PGC6 */
register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (13 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 14/23] gpu: nova-core: register sysmem flush page Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-17 16:33 ` Danilo Krummrich
2025-06-12 14:01 ` [PATCH v5 16/23] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
` (9 subsequent siblings)
24 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
Booting the GSP on Ampere requires an intricate dance between the GSP
and SEC2 falcons, where the GSP starts by running the FWSEC firmware to
create the WPR2 region , and then SEC2 loads the actual RISC-V firmware
into the GSP.
Add the common Falcon code and HAL for Ampere GPUs, and instantiate the
GSP and SEC2 Falcons that will be required to perform that dance and
boot the GSP.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/falcon.rs | 560 ++++++++++++++++++++++++++++++
drivers/gpu/nova-core/falcon/gsp.rs | 24 ++
drivers/gpu/nova-core/falcon/hal.rs | 54 +++
drivers/gpu/nova-core/falcon/hal/ga102.rs | 117 +++++++
drivers/gpu/nova-core/falcon/sec2.rs | 10 +
drivers/gpu/nova-core/gpu.rs | 11 +
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/regs.rs | 139 ++++++++
8 files changed, 916 insertions(+)
diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
new file mode 100644
index 0000000000000000000000000000000000000000..25ed8ee30def3abcc43bcba965eb62f49d532604
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -0,0 +1,560 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Falcon microprocessor base support
+
+// To be removed when all code is used.
+#![expect(dead_code)]
+
+use core::ops::Deref;
+use core::time::Duration;
+use hal::FalconHal;
+use kernel::bindings;
+use kernel::device;
+use kernel::prelude::*;
+use kernel::types::ARef;
+
+use crate::dma::DmaObject;
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+use crate::regs;
+use crate::util;
+
+pub(crate) mod gsp;
+mod hal;
+pub(crate) mod sec2;
+
+/// Revision number of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
+/// register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRev {
+ #[default]
+ Rev1 = 1,
+ Rev2 = 2,
+ Rev3 = 3,
+ Rev4 = 4,
+ Rev5 = 5,
+ Rev6 = 6,
+ Rev7 = 7,
+}
+
+impl TryFrom<u8> for FalconCoreRev {
+ type Error = Error;
+
+ fn try_from(value: u8) -> Result<Self> {
+ use FalconCoreRev::*;
+
+ let rev = match value {
+ 1 => Rev1,
+ 2 => Rev2,
+ 3 => Rev3,
+ 4 => Rev4,
+ 5 => Rev5,
+ 6 => Rev6,
+ 7 => Rev7,
+ _ => return Err(EINVAL),
+ };
+
+ Ok(rev)
+ }
+}
+
+/// Revision subversion number of a falcon core, used in the
+/// [`crate::regs::NV_PFALCON_FALCON_HWCFG1`] register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRevSubversion {
+ #[default]
+ Subversion0 = 0,
+ Subversion1 = 1,
+ Subversion2 = 2,
+ Subversion3 = 3,
+}
+
+impl TryFrom<u8> for FalconCoreRevSubversion {
+ type Error = Error;
+
+ fn try_from(value: u8) -> Result<Self> {
+ use FalconCoreRevSubversion::*;
+
+ let sub_version = match value & 0b11 {
+ 0 => Subversion0,
+ 1 => Subversion1,
+ 2 => Subversion2,
+ 3 => Subversion3,
+ _ => return Err(EINVAL),
+ };
+
+ Ok(sub_version)
+ }
+}
+
+/// Security model of a falcon core, used in the [`crate::regs::NV_PFALCON_FALCON_HWCFG1`]
+/// register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone)]
+pub(crate) enum FalconSecurityModel {
+ /// Non-Secure: runs unsigned code without privileges.
+ #[default]
+ None = 0,
+ /// Low-Secure: runs code with some privileges. Can only be entered from `Heavy` mode, which
+ /// will typically validate the LS code through some signature.
+ Light = 2,
+ /// High-Secure: runs signed code with full privileges. Signature is validated by boot ROM.
+ Heavy = 3,
+}
+
+impl TryFrom<u8> for FalconSecurityModel {
+ type Error = Error;
+
+ fn try_from(value: u8) -> Result<Self> {
+ use FalconSecurityModel::*;
+
+ let sec_model = match value {
+ 0 => None,
+ 2 => Light,
+ 3 => Heavy,
+ _ => return Err(EINVAL),
+ };
+
+ Ok(sec_model)
+ }
+}
+
+/// Signing algorithm for a given firmware, used in the [`crate::regs::NV_PFALCON2_FALCON_MOD_SEL`]
+/// register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq)]
+pub(crate) enum FalconModSelAlgo {
+ /// RSA3K.
+ #[default]
+ Rsa3k = 1,
+}
+
+impl TryFrom<u8> for FalconModSelAlgo {
+ type Error = Error;
+
+ fn try_from(value: u8) -> Result<Self> {
+ match value {
+ 1 => Ok(FalconModSelAlgo::Rsa3k),
+ _ => Err(EINVAL),
+ }
+ }
+}
+
+/// Valid values for the `size` field of the [`crate::regs::NV_PFALCON_FALCON_DMATRFCMD`] register.
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq)]
+pub(crate) enum DmaTrfCmdSize {
+ /// 256 bytes transfer.
+ #[default]
+ Size256B = 0x6,
+}
+
+impl TryFrom<u8> for DmaTrfCmdSize {
+ type Error = Error;
+
+ fn try_from(value: u8) -> Result<Self> {
+ match value {
+ 0x6 => Ok(Self::Size256B),
+ _ => Err(EINVAL),
+ }
+ }
+}
+
+/// Currently active core on a dual falcon/riscv (Peregrine) controller.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) enum PeregrineCoreSelect {
+ /// Falcon core is active.
+ #[default]
+ Falcon = 0,
+ /// RISC-V core is active.
+ Riscv = 1,
+}
+
+impl From<bool> for PeregrineCoreSelect {
+ fn from(value: bool) -> Self {
+ match value {
+ false => PeregrineCoreSelect::Falcon,
+ true => PeregrineCoreSelect::Riscv,
+ }
+ }
+}
+
+/// Different types of memory present in a falcon core.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum FalconMem {
+ /// Instruction Memory.
+ Imem,
+ /// Data Memory.
+ Dmem,
+}
+
+/// Target/source of a DMA transfer to/from falcon memory.
+#[derive(Debug, Clone, Default)]
+pub(crate) enum FalconFbifTarget {
+ /// VRAM.
+ #[default]
+ LocalFb = 0,
+ /// Coherent system memory.
+ CoherentSysmem = 1,
+ /// Non-coherent system memory.
+ NoncoherentSysmem = 2,
+}
+
+impl TryFrom<u8> for FalconFbifTarget {
+ type Error = Error;
+
+ fn try_from(value: u8) -> Result<Self> {
+ let res = match value {
+ 0 => Self::LocalFb,
+ 1 => Self::CoherentSysmem,
+ 2 => Self::NoncoherentSysmem,
+ _ => return Err(EINVAL),
+ };
+
+ Ok(res)
+ }
+}
+
+/// Type of memory addresses to use.
+#[derive(Debug, Clone, Default)]
+pub(crate) enum FalconFbifMemType {
+ /// Virtual memory addresses.
+ #[default]
+ Virtual = 0,
+ /// Physical memory addresses.
+ Physical = 1,
+}
+
+/// Conversion from a single-bit register field.
+impl From<bool> for FalconFbifMemType {
+ fn from(value: bool) -> Self {
+ match value {
+ false => Self::Virtual,
+ true => Self::Physical,
+ }
+ }
+}
+
+/// Trait defining the parameters of a given Falcon instance.
+pub(crate) trait FalconEngine: Sync {
+ /// Base I/O address for the falcon, relative from which its registers are accessed.
+ const BASE: usize;
+}
+
+/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
+#[derive(Debug)]
+pub(crate) struct FalconLoadTarget {
+ /// Offset from the start of the source object to copy from.
+ pub(crate) src_start: u32,
+ /// Offset from the start of the destination memory to copy into.
+ pub(crate) dst_start: u32,
+ /// Number of bytes to copy.
+ pub(crate) len: u32,
+}
+
+/// Parameters for the falcon boot ROM.
+#[derive(Debug)]
+pub(crate) struct FalconBromParams {
+ /// Offset in `DMEM`` of the firmware's signature.
+ pub(crate) pkc_data_offset: u32,
+ /// Mask of engines valid for this firmware.
+ pub(crate) engine_id_mask: u16,
+ /// ID of the ucode used to infer a fuse register to validate the signature.
+ pub(crate) ucode_id: u8,
+}
+
+/// Trait for providing load parameters of falcon firmwares.
+pub(crate) trait FalconLoadParams {
+ /// Returns the load parameters for `IMEM`.
+ fn imem_load_params(&self) -> FalconLoadTarget;
+
+ /// Returns the load parameters for `DMEM`.
+ fn dmem_load_params(&self) -> FalconLoadTarget;
+
+ /// Returns the parameters to write into the BROM registers.
+ fn brom_params(&self) -> FalconBromParams;
+
+ /// Returns the start address of the firmware.
+ fn boot_addr(&self) -> u32;
+}
+
+/// Trait for a falcon firmware.
+///
+/// A falcon firmware can be loaded on a given engine, and is presented in the form of a DMA
+/// object.
+pub(crate) trait FalconFirmware: FalconLoadParams + Deref<Target = DmaObject> {
+ /// Engine on which this firmware is to be loaded.
+ type Target: FalconEngine;
+}
+
+/// Contains the base parameters common to all Falcon instances.
+pub(crate) struct Falcon<E: FalconEngine> {
+ hal: KBox<dyn FalconHal<E>>,
+ dev: ARef<device::Device>,
+}
+
+impl<E: FalconEngine + 'static> Falcon<E> {
+ /// Create a new falcon instance.
+ ///
+ /// `need_riscv` is set to `true` if the caller expects the falcon to be a dual falcon/riscv
+ /// controller.
+ pub(crate) fn new(
+ dev: &device::Device,
+ chipset: Chipset,
+ bar: &Bar0,
+ need_riscv: bool,
+ ) -> Result<Self> {
+ let hwcfg1 = regs::NV_PFALCON_FALCON_HWCFG1::read(bar, E::BASE);
+ // Check that the revision and security model contain valid values.
+ let _ = hwcfg1.core_rev()?;
+ let _ = hwcfg1.security_model()?;
+
+ if need_riscv {
+ let hwcfg2 = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, E::BASE);
+ if !hwcfg2.riscv() {
+ dev_err!(
+ dev,
+ "riscv support requested on a controller that does not support it\n"
+ );
+ return Err(EINVAL);
+ }
+ }
+
+ Ok(Self {
+ hal: hal::falcon_hal(chipset)?,
+ dev: dev.into(),
+ })
+ }
+
+ /// Wait for memory scrubbing to complete.
+ fn reset_wait_mem_scrubbing(&self, bar: &Bar0) -> Result {
+ util::wait_on(Duration::from_millis(20), || {
+ let r = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, E::BASE);
+ if r.mem_scrubbing() {
+ Some(())
+ } else {
+ None
+ }
+ })
+ }
+
+ /// Reset the falcon engine.
+ fn reset_eng(&self, bar: &Bar0) -> Result {
+ let _ = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, E::BASE);
+
+ // According to OpenRM's `kflcnPreResetWait_GA102` documentation, HW sometimes does not set
+ // RESET_READY so a non-failing timeout is used.
+ let _ = util::wait_on(Duration::from_micros(150), || {
+ let r = regs::NV_PFALCON_FALCON_HWCFG2::read(bar, E::BASE);
+ if r.reset_ready() {
+ Some(())
+ } else {
+ None
+ }
+ });
+
+ regs::NV_PFALCON_FALCON_ENGINE::alter(bar, E::BASE, |v| v.set_reset(true));
+
+ // TODO: replace with udelay() or equivalent once available.
+ let _: Result = util::wait_on(Duration::from_micros(10), || None);
+
+ regs::NV_PFALCON_FALCON_ENGINE::alter(bar, E::BASE, |v| v.set_reset(false));
+
+ self.reset_wait_mem_scrubbing(bar)?;
+
+ Ok(())
+ }
+
+ /// Reset the controller, select the falcon core, and wait for memory scrubbing to complete.
+ pub(crate) fn reset(&self, bar: &Bar0) -> Result {
+ self.reset_eng(bar)?;
+ self.hal.select_core(self, bar)?;
+ self.reset_wait_mem_scrubbing(bar)?;
+
+ regs::NV_PFALCON_FALCON_RM::default()
+ .set_value(regs::NV_PMC_BOOT_0::read(bar).into())
+ .write(bar, E::BASE);
+
+ Ok(())
+ }
+
+ /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
+ /// `target_mem`.
+ ///
+ /// `sec` is set if the loaded firmware is expected to run in secure mode.
+ fn dma_wr(
+ &self,
+ bar: &Bar0,
+ dma_handle: bindings::dma_addr_t,
+ target_mem: FalconMem,
+ load_offsets: FalconLoadTarget,
+ sec: bool,
+ ) -> Result {
+ const DMA_LEN: u32 = 256;
+
+ // For IMEM, we want to use the start offset as a virtual address tag for each page, since
+ // code addresses in the firmware (and the boot vector) are virtual.
+ //
+ // For DMEM we can fold the start offset into the DMA handle.
+ let (src_start, dma_start) = match target_mem {
+ FalconMem::Imem => (load_offsets.src_start, dma_handle),
+ FalconMem::Dmem => (
+ 0,
+ dma_handle + load_offsets.src_start as bindings::dma_addr_t,
+ ),
+ };
+ if dma_start % DMA_LEN as bindings::dma_addr_t > 0 {
+ dev_err!(
+ self.dev,
+ "DMA transfer start addresses must be a multiple of {}",
+ DMA_LEN
+ );
+ return Err(EINVAL);
+ }
+ if load_offsets.len % DMA_LEN > 0 {
+ dev_err!(
+ self.dev,
+ "DMA transfer length must be a multiple of {}",
+ DMA_LEN
+ );
+ return Err(EINVAL);
+ }
+
+ // Set up the base source DMA address.
+
+ regs::NV_PFALCON_FALCON_DMATRFBASE::default()
+ .set_base((dma_start >> 8) as u32)
+ .write(bar, E::BASE);
+ regs::NV_PFALCON_FALCON_DMATRFBASE1::default()
+ .set_base((dma_start >> 40) as u16)
+ .write(bar, E::BASE);
+
+ let cmd = regs::NV_PFALCON_FALCON_DMATRFCMD::default()
+ .set_size(DmaTrfCmdSize::Size256B)
+ .set_imem(target_mem == FalconMem::Imem)
+ .set_sec(if sec { 1 } else { 0 });
+
+ for pos in (0..load_offsets.len).step_by(DMA_LEN as usize) {
+ // Perform a transfer of size `DMA_LEN`.
+ regs::NV_PFALCON_FALCON_DMATRFMOFFS::default()
+ .set_offs(load_offsets.dst_start + pos)
+ .write(bar, E::BASE);
+ regs::NV_PFALCON_FALCON_DMATRFFBOFFS::default()
+ .set_offs(src_start + pos)
+ .write(bar, E::BASE);
+ cmd.write(bar, E::BASE);
+
+ // Wait for the transfer to complete.
+ util::wait_on(Duration::from_millis(2000), || {
+ let r = regs::NV_PFALCON_FALCON_DMATRFCMD::read(bar, E::BASE);
+ if r.idle() {
+ Some(())
+ } else {
+ None
+ }
+ })?;
+ }
+
+ Ok(())
+ }
+
+ /// Perform a DMA load into `IMEM` and `DMEM` of `fw`, and prepare the falcon to run it.
+ pub(crate) fn dma_load<F: FalconFirmware<Target = E>>(&self, bar: &Bar0, fw: &F) -> Result {
+ let dma_handle = fw.dma_handle();
+
+ regs::NV_PFALCON_FBIF_CTL::alter(bar, E::BASE, |v| v.set_allow_phys_no_ctx(true));
+ regs::NV_PFALCON_FALCON_DMACTL::default().write(bar, E::BASE);
+ regs::NV_PFALCON_FBIF_TRANSCFG::alter(bar, E::BASE, |v| {
+ v.set_target(FalconFbifTarget::CoherentSysmem)
+ .set_mem_type(FalconFbifMemType::Physical)
+ });
+
+ self.dma_wr(
+ bar,
+ dma_handle,
+ FalconMem::Imem,
+ fw.imem_load_params(),
+ true,
+ )?;
+ self.dma_wr(
+ bar,
+ dma_handle,
+ FalconMem::Dmem,
+ fw.dmem_load_params(),
+ true,
+ )?;
+
+ self.hal.program_brom(self, bar, &fw.brom_params())?;
+
+ // Set `BootVec` to start of non-secure code.
+ regs::NV_PFALCON_FALCON_BOOTVEC::default()
+ .set_value(fw.boot_addr())
+ .write(bar, E::BASE);
+
+ Ok(())
+ }
+
+ /// Start running the loaded firmware.
+ ///
+ /// `mbox0` and `mbox1` are optional parameters to write into the `MBOX0` and `MBOX1` registers
+ /// prior to running.
+ ///
+ /// Returns `MBOX0` and `MBOX1` after the firmware has stopped running.
+ pub(crate) fn boot(
+ &self,
+ bar: &Bar0,
+ mbox0: Option<u32>,
+ mbox1: Option<u32>,
+ ) -> Result<(u32, u32)> {
+ if let Some(mbox0) = mbox0 {
+ regs::NV_PFALCON_FALCON_MAILBOX0::default()
+ .set_value(mbox0)
+ .write(bar, E::BASE);
+ }
+
+ if let Some(mbox1) = mbox1 {
+ regs::NV_PFALCON_FALCON_MAILBOX1::default()
+ .set_value(mbox1)
+ .write(bar, E::BASE);
+ }
+
+ match regs::NV_PFALCON_FALCON_CPUCTL::read(bar, E::BASE).alias_en() {
+ true => regs::NV_PFALCON_FALCON_CPUCTL_ALIAS::default()
+ .set_startcpu(true)
+ .write(bar, E::BASE),
+ false => regs::NV_PFALCON_FALCON_CPUCTL::default()
+ .set_startcpu(true)
+ .write(bar, E::BASE),
+ }
+
+ util::wait_on(Duration::from_secs(2), || {
+ let r = regs::NV_PFALCON_FALCON_CPUCTL::read(bar, E::BASE);
+ if r.halted() {
+ Some(())
+ } else {
+ None
+ }
+ })?;
+
+ let (mbox0, mbox1) = (
+ regs::NV_PFALCON_FALCON_MAILBOX0::read(bar, E::BASE).value(),
+ regs::NV_PFALCON_FALCON_MAILBOX1::read(bar, E::BASE).value(),
+ );
+
+ Ok((mbox0, mbox1))
+ }
+
+ /// Returns the fused version of the signature to use in order to run a HS firmware on this
+ /// falcon instance. `engine_id_mask` and `ucode_id` are obtained from the firmware header.
+ pub(crate) fn signature_reg_fuse_version(
+ &self,
+ bar: &Bar0,
+ engine_id_mask: u16,
+ ucode_id: u8,
+ ) -> Result<u32> {
+ self.hal
+ .signature_reg_fuse_version(self, bar, engine_id_mask, ucode_id)
+ }
+}
diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..d622e9a64470932af0b48032be5a1d4b518bf4a7
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/gsp.rs
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use crate::{
+ driver::Bar0,
+ falcon::{Falcon, FalconEngine},
+ regs,
+};
+
+/// Type specifying the `Gsp` falcon engine. Cannot be instantiated.
+pub(crate) struct Gsp(());
+
+impl FalconEngine for Gsp {
+ const BASE: usize = 0x00110000;
+}
+
+impl Falcon<Gsp> {
+ /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
+ /// allow GSP to signal CPU for processing new messages in message queue.
+ pub(crate) fn clear_swgen0_intr(&self, bar: &Bar0) {
+ regs::NV_PFALCON_FALCON_IRQSCLR::default()
+ .set_swgen0(true)
+ .write(bar, Gsp::BASE);
+ }
+}
diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
new file mode 100644
index 0000000000000000000000000000000000000000..fdb4828f0b7be02729a01497bfc2198d8387d16b
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal.rs
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::falcon::{Falcon, FalconBromParams, FalconEngine};
+use crate::gpu::Chipset;
+
+mod ga102;
+
+/// Hardware Abstraction Layer for Falcon cores.
+///
+/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
+/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
+/// registers.
+pub(crate) trait FalconHal<E: FalconEngine>: Sync {
+ // Activates the Falcon core if the engine is a risvc/falcon dual engine.
+ fn select_core(&self, _falcon: &Falcon<E>, _bar: &Bar0) -> Result {
+ Ok(())
+ }
+
+ /// Returns the fused version of the signature to use in order to run a HS firmware on this
+ /// falcon instance. `engine_id_mask` and `ucode_id` are obtained from the firmware header.
+ fn signature_reg_fuse_version(
+ &self,
+ falcon: &Falcon<E>,
+ bar: &Bar0,
+ engine_id_mask: u16,
+ ucode_id: u8,
+ ) -> Result<u32>;
+
+ // Program the boot ROM registers prior to starting a secure firmware.
+ fn program_brom(&self, falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result;
+}
+
+/// Returns a boxed falcon HAL adequate for `chipset`.
+///
+/// We use a heap-allocated trait object instead of a statically defined one because the
+/// generic `FalconEngine` argument makes it difficult to define all the combinations
+/// statically.
+pub(super) fn falcon_hal<E: FalconEngine + 'static>(
+ chipset: Chipset,
+) -> Result<KBox<dyn FalconHal<E>>> {
+ use Chipset::*;
+
+ let hal = match chipset {
+ GA102 | GA103 | GA104 | GA106 | GA107 => {
+ KBox::new(ga102::Ga102::<E>::new(), GFP_KERNEL)? as KBox<dyn FalconHal<E>>
+ }
+ _ => return Err(ENOTSUPP),
+ };
+
+ Ok(hal)
+}
diff --git a/drivers/gpu/nova-core/falcon/hal/ga102.rs b/drivers/gpu/nova-core/falcon/hal/ga102.rs
new file mode 100644
index 0000000000000000000000000000000000000000..d4caa19229612a83d9b27e3af76c361be416d49d
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/hal/ga102.rs
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::marker::PhantomData;
+use core::time::Duration;
+
+use kernel::device;
+use kernel::num::fls_u32;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::falcon::{
+ Falcon, FalconBromParams, FalconEngine, FalconModSelAlgo, PeregrineCoreSelect,
+};
+use crate::regs;
+use crate::util;
+
+use super::FalconHal;
+
+fn select_core_ga102<E: FalconEngine>(bar: &Bar0) -> Result {
+ let bcr_ctrl = regs::NV_PRISCV_RISCV_BCR_CTRL::read(bar, E::BASE);
+ if bcr_ctrl.core_select() != PeregrineCoreSelect::Falcon {
+ regs::NV_PRISCV_RISCV_BCR_CTRL::default()
+ .set_core_select(PeregrineCoreSelect::Falcon)
+ .write(bar, E::BASE);
+
+ util::wait_on(Duration::from_millis(10), || {
+ let r = regs::NV_PRISCV_RISCV_BCR_CTRL::read(bar, E::BASE);
+ if r.valid() {
+ Some(())
+ } else {
+ None
+ }
+ })?;
+ }
+
+ Ok(())
+}
+
+fn signature_reg_fuse_version_ga102(
+ dev: &device::Device,
+ bar: &Bar0,
+ engine_id_mask: u16,
+ ucode_id: u8,
+) -> Result<u32> {
+ // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
+ // registers, which are an array. Our register definition macros do not allow us to manage them
+ // properly, so we need to hardcode their addresses for now.
+
+ // Each engine has 16 ucode version registers numbered from 1 to 16.
+ if ucode_id == 0 || ucode_id > 16 {
+ dev_err!(dev, "invalid ucode id {:#x}", ucode_id);
+ return Err(EINVAL);
+ }
+
+ // Base address of the FUSE registers array corresponding to the engine.
+ let reg_fuse_base = if engine_id_mask & 0x0001 != 0 {
+ regs::NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION::OFFSET
+ } else if engine_id_mask & 0x0004 != 0 {
+ regs::NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION::OFFSET
+ } else if engine_id_mask & 0x0400 != 0 {
+ regs::NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION::OFFSET
+ } else {
+ dev_err!(dev, "unexpected engine_id_mask {:#x}", engine_id_mask);
+ return Err(EINVAL);
+ };
+
+ // Read `reg_fuse_base[ucode_id - 1]`.
+ let reg_fuse_version =
+ bar.read32(reg_fuse_base + ((ucode_id - 1) as usize * core::mem::size_of::<u32>()));
+
+ Ok(fls_u32(reg_fuse_version))
+}
+
+fn program_brom_ga102<E: FalconEngine>(bar: &Bar0, params: &FalconBromParams) -> Result {
+ regs::NV_PFALCON2_FALCON_BROM_PARAADDR::default()
+ .set_value(params.pkc_data_offset)
+ .write(bar, E::BASE);
+ regs::NV_PFALCON2_FALCON_BROM_ENGIDMASK::default()
+ .set_value(params.engine_id_mask as u32)
+ .write(bar, E::BASE);
+ regs::NV_PFALCON2_FALCON_BROM_CURR_UCODE_ID::default()
+ .set_ucode_id(params.ucode_id)
+ .write(bar, E::BASE);
+ regs::NV_PFALCON2_FALCON_MOD_SEL::default()
+ .set_algo(FalconModSelAlgo::Rsa3k)
+ .write(bar, E::BASE);
+
+ Ok(())
+}
+
+pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
+
+impl<E: FalconEngine> Ga102<E> {
+ pub(super) fn new() -> Self {
+ Self(PhantomData)
+ }
+}
+
+impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
+ fn select_core(&self, _falcon: &Falcon<E>, bar: &Bar0) -> Result {
+ select_core_ga102::<E>(bar)
+ }
+
+ fn signature_reg_fuse_version(
+ &self,
+ falcon: &Falcon<E>,
+ bar: &Bar0,
+ engine_id_mask: u16,
+ ucode_id: u8,
+ ) -> Result<u32> {
+ signature_reg_fuse_version_ga102(&falcon.dev, bar, engine_id_mask, ucode_id)
+ }
+
+ fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
+ program_brom_ga102::<E>(bar, params)
+ }
+}
diff --git a/drivers/gpu/nova-core/falcon/sec2.rs b/drivers/gpu/nova-core/falcon/sec2.rs
new file mode 100644
index 0000000000000000000000000000000000000000..5147d9e2a7fe859210727504688d84cca4de991b
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon/sec2.rs
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use crate::falcon::FalconEngine;
+
+/// Type specifying the `Sec2` falcon engine. Cannot be instantiated.
+pub(crate) struct Sec2(());
+
+impl FalconEngine for Sec2 {
+ const BASE: usize = 0x00840000;
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 768579dfdfc7e9e61c613202030d2c7ee6054e2a..c9f7f604a5de6ea4eb85f061cae826302c1902c3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -3,6 +3,7 @@
use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
use crate::driver::Bar0;
+use crate::falcon::{gsp::Gsp, sec2::Sec2, Falcon};
use crate::fb::SysmemFlush;
use crate::firmware::{Firmware, FIRMWARE_VERSION};
use crate::gfw;
@@ -207,6 +208,16 @@ pub(crate) fn new(
// System memory page required for sysmembar to properly flush into system memory.
let sysmem_flush = SysmemFlush::register(pdev.as_ref(), bar, spec.chipset)?;
+ let gsp_falcon = Falcon::<Gsp>::new(
+ pdev.as_ref(),
+ spec.chipset,
+ bar,
+ spec.chipset > Chipset::GA100,
+ )?;
+ gsp_falcon.clear_swgen0_intr(bar);
+
+ let _sec2_falcon = Falcon::<Sec2>::new(pdev.as_ref(), spec.chipset, bar, true)?;
+
Ok(pin_init!(Self {
spec,
bar: devres_bar,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 8ac04b8586e7314528e081464ed73ee615001e9b..808997bbe36d2fa1dc8b8940c1f9373d9bdbfb69 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -4,6 +4,7 @@
mod dma;
mod driver;
+mod falcon;
mod fb;
mod firmware;
mod gfw;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index b599e7ddad57ed8defe0324056571ba46b926cf6..b9fbc847c943b54557259ebc0d1cf3cb1bbc7a1b 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -7,6 +7,10 @@
#[macro_use]
mod macros;
+use crate::falcon::{
+ DmaTrfCmdSize, FalconCoreRev, FalconCoreRevSubversion, FalconFbifMemType, FalconFbifTarget,
+ FalconModSelAlgo, FalconSecurityModel, PeregrineCoreSelect,
+};
use crate::gpu::{Architecture, Chipset};
use kernel::prelude::*;
@@ -72,3 +76,138 @@ pub(crate) fn completed(self) -> bool {
self.progress() == 0xff
}
}
+
+/* FUSE */
+
+register!(NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION @ 0x00824100 {
+ 15:0 data as u16;
+});
+
+register!(NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION @ 0x00824140 {
+ 15:0 data as u16;
+});
+
+register!(NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION @ 0x008241c0 {
+ 15:0 data as u16;
+});
+
+/* PFALCON */
+
+register!(NV_PFALCON_FALCON_IRQSCLR @ +0x00000004 {
+ 4:4 halt as bool;
+ 6:6 swgen0 as bool;
+});
+
+register!(NV_PFALCON_FALCON_MAILBOX0 @ +0x00000040 {
+ 31:0 value as u32;
+});
+
+register!(NV_PFALCON_FALCON_MAILBOX1 @ +0x00000044 {
+ 31:0 value as u32;
+});
+
+register!(NV_PFALCON_FALCON_RM @ +0x00000084 {
+ 31:0 value as u32;
+});
+
+register!(NV_PFALCON_FALCON_HWCFG2 @ +0x000000f4 {
+ 10:10 riscv as bool;
+ 12:12 mem_scrubbing as bool;
+ 31:31 reset_ready as bool, "Signal indicating that reset is completed (GA102+)";
+});
+
+register!(NV_PFALCON_FALCON_CPUCTL @ +0x00000100 {
+ 1:1 startcpu as bool;
+ 4:4 halted as bool;
+ 6:6 alias_en as bool;
+});
+
+register!(NV_PFALCON_FALCON_BOOTVEC @ +0x00000104 {
+ 31:0 value as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMACTL @ +0x0000010c {
+ 0:0 require_ctx as bool;
+ 1:1 dmem_scrubbing as bool;
+ 2:2 imem_scrubbing as bool;
+ 6:3 dmaq_num as u8;
+ 7:7 secure_stat as bool;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFBASE @ +0x00000110 {
+ 31:0 base as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFMOFFS @ +0x00000114 {
+ 23:0 offs as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFCMD @ +0x00000118 {
+ 0:0 full as bool;
+ 1:1 idle as bool;
+ 3:2 sec as u8;
+ 4:4 imem as bool;
+ 5:5 is_write as bool;
+ 10:8 size as u8 ?=> DmaTrfCmdSize;
+ 14:12 ctxdma as u8;
+ 16:16 set_dmtag as u8;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFFBOFFS @ +0x0000011c {
+ 31:0 offs as u32;
+});
+
+register!(NV_PFALCON_FALCON_DMATRFBASE1 @ +0x00000128 {
+ 8:0 base as u16;
+});
+
+register!(NV_PFALCON_FALCON_HWCFG1 @ +0x0000012c {
+ 3:0 core_rev as u8 ?=> FalconCoreRev, "Core revision";
+ 5:4 security_model as u8 ?=> FalconSecurityModel, "Security model";
+ 7:6 core_rev_subversion as u8 ?=> FalconCoreRevSubversion, "Core revision subversion";
+});
+
+register!(NV_PFALCON_FALCON_CPUCTL_ALIAS @ +0x00000130 {
+ 1:1 startcpu as bool;
+});
+
+// Actually known as `NV_PSEC_FALCON_ENGINE` and `NV_PGSP_FALCON_ENGINE` depending on the falcon
+// instance.
+register!(NV_PFALCON_FALCON_ENGINE @ +0x000003c0 {
+ 0:0 reset as bool;
+});
+
+// TODO: this is an array of registers.
+register!(NV_PFALCON_FBIF_TRANSCFG @ +0x00000600 {
+ 1:0 target as u8 ?=> FalconFbifTarget;
+ 2:2 mem_type as bool => FalconFbifMemType;
+});
+
+register!(NV_PFALCON_FBIF_CTL @ +0x00000624 {
+ 7:7 allow_phys_no_ctx as bool;
+});
+
+register!(NV_PFALCON2_FALCON_MOD_SEL @ +0x00001180 {
+ 7:0 algo as u8 ?=> FalconModSelAlgo;
+});
+
+register!(NV_PFALCON2_FALCON_BROM_CURR_UCODE_ID @ +0x00001198 {
+ 7:0 ucode_id as u8;
+});
+
+register!(NV_PFALCON2_FALCON_BROM_ENGIDMASK @ +0x0000119c {
+ 31:0 value as u32;
+});
+
+// TODO: this is an array of registers.
+register!(NV_PFALCON2_FALCON_BROM_PARAADDR @ +0x00001210 {
+ 31:0 value as u32;
+});
+
+/* PRISCV */
+
+register!(NV_PRISCV_RISCV_BCR_CTRL @ +0x00001668 {
+ 0:0 valid as bool;
+ 4:4 core_select as bool => PeregrineCoreSelect;
+ 8:8 br_fetch as bool;
+});
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 16/23] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (14 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 17/23] gpu: nova-core: vbios: Add base support for VBIOS construction and iteration Alexandre Courbot
` (8 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
FWSEC-FRTS is the first firmware we need to run on the GSP falcon in
order to initiate the GSP boot process. Introduce the structure that
describes it.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/firmware.rs | 45 +++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 4b8a38358a4f6da2a4d57f8db50ea9e788c3e4b5..2f4f5c7c7902a386a44bc9cf5eb6d46375fe0e5a 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -41,6 +41,51 @@ pub(crate) fn new(dev: &device::Device, chipset: Chipset, ver: &str) -> Result<F
}
}
+/// Structure used to describe some firmwares, notably FWSEC-FRTS.
+#[repr(C)]
+#[derive(Debug, Clone)]
+pub(crate) struct FalconUCodeDescV3 {
+ /// Header defined by `NV_BIT_FALCON_UCODE_DESC_HEADER_VDESC*` in OpenRM.
+ hdr: u32,
+ /// Stored size of the ucode after the header.
+ stored_size: u32,
+ /// Offset in `DMEM` at which the signature is expected to be found.
+ pub(crate) pkc_data_offset: u32,
+ /// Offset after the code segment at which the app headers are located.
+ pub(crate) interface_offset: u32,
+ /// Base address at which to load the code segment into `IMEM`.
+ pub(crate) imem_phys_base: u32,
+ /// Size in bytes of the code to copy into `IMEM`.
+ pub(crate) imem_load_size: u32,
+ /// Virtual `IMEM` address (i.e. `tag`) at which the code should start.
+ pub(crate) imem_virt_base: u32,
+ /// Base address at which to load the data segment into `DMEM`.
+ pub(crate) dmem_phys_base: u32,
+ /// Size in bytes of the data to copy into `DMEM`.
+ pub(crate) dmem_load_size: u32,
+ /// Mask of the falcon engines on which this firmware can run.
+ pub(crate) engine_id_mask: u16,
+ /// ID of the ucode used to infer a fuse register to validate the signature.
+ pub(crate) ucode_id: u8,
+ /// Number of signatures in this firmware.
+ pub(crate) signature_count: u8,
+ /// Versions of the signatures, used to infer a valid signature to use.
+ pub(crate) signature_versions: u16,
+ _reserved: u16,
+}
+
+// To be removed once that code is used.
+#[expect(dead_code)]
+impl FalconUCodeDescV3 {
+ /// Returns the size in bytes of the header.
+ pub(crate) fn size(&self) -> usize {
+ const HDR_SIZE_SHIFT: u32 = 16;
+ const HDR_SIZE_MASK: u32 = 0xffff0000;
+
+ ((self.hdr & HDR_SIZE_MASK) >> HDR_SIZE_SHIFT) as usize
+ }
+}
+
pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
impl<const N: usize> ModInfoBuilder<N> {
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 17/23] gpu: nova-core: vbios: Add base support for VBIOS construction and iteration
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (15 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 16/23] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 18/23] gpu: nova-core: vbios: Add support to look up PMU table in FWSEC Alexandre Courbot
` (7 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Shirish Baskaran
From: Joel Fernandes <joelagnelf@nvidia.com>
Add support for navigating the VBIOS images required for extracting
ucode data for GSP to boot. Later patches will build on this.
Debug log messages will show the BIOS images:
[102141.013287] NovaCore: Found BIOS image at offset 0x0, size: 0xfe00, type: PciAt
[102141.080692] NovaCore: Found BIOS image at offset 0xfe00, size: 0x14800, type: Efi
[102141.098443] NovaCore: Found BIOS image at offset 0x24600, size: 0x5600, type: FwSec
[102141.415095] NovaCore: Found BIOS image at offset 0x29c00, size: 0x60800, type: FwSec
[applied feedback from Alex Courbot and Timur Tabi]
[applied changes related to code reorg, prints etc from Danilo Krummrich]
[acourbot@nvidia.com: fix clippy warnings, read_more() function]
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shirish Baskaran <sbaskaran@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
drivers/gpu/nova-core/firmware.rs | 4 +-
drivers/gpu/nova-core/gpu.rs | 4 +
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/vbios.rs | 681 +++++++++++++++++++++++++++++++++++++
4 files changed, 688 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 2f4f5c7c7902a386a44bc9cf5eb6d46375fe0e5a..41f43a729ad3bf2c4acb6108f41e0905a6fac0df 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -44,6 +44,7 @@ pub(crate) fn new(dev: &device::Device, chipset: Chipset, ver: &str) -> Result<F
/// Structure used to describe some firmwares, notably FWSEC-FRTS.
#[repr(C)]
#[derive(Debug, Clone)]
+#[allow(dead_code)] // Temporary, will be removed in later patch.
pub(crate) struct FalconUCodeDescV3 {
/// Header defined by `NV_BIT_FALCON_UCODE_DESC_HEADER_VDESC*` in OpenRM.
hdr: u32,
@@ -74,10 +75,9 @@ pub(crate) struct FalconUCodeDescV3 {
_reserved: u16,
}
-// To be removed once that code is used.
-#[expect(dead_code)]
impl FalconUCodeDescV3 {
/// Returns the size in bytes of the header.
+ #[expect(dead_code)] // Temporary, will be removed in later patch.
pub(crate) fn size(&self) -> usize {
const HDR_SIZE_SHIFT: u32 = 16;
const HDR_SIZE_MASK: u32 = 0xffff0000;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index c9f7f604a5de6ea4eb85f061cae826302c1902c3..1c577d3eff8b32bbc45d7d2302c3e2246bef3b44 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -9,6 +9,7 @@
use crate::gfw;
use crate::regs;
use crate::util;
+use crate::vbios::Vbios;
use core::fmt;
macro_rules! define_chipset {
@@ -218,6 +219,9 @@ pub(crate) fn new(
let _sec2_falcon = Falcon::<Sec2>::new(pdev.as_ref(), spec.chipset, bar, true)?;
+ // Will be used in a later patch when fwsec firmware is needed.
+ let _bios = Vbios::new(pdev, bar)?;
+
Ok(pin_init!(Self {
spec,
bar: devres_bar,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 808997bbe36d2fa1dc8b8940c1f9373d9bdbfb69..de14f2e926361a4f954b1a8d0b95b0e985e54eec 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -11,6 +11,7 @@
mod gpu;
mod regs;
mod util;
+mod vbios;
pub(crate) const MODULE_NAME: &kernel::str::CStr = <LocalModule as kernel::ModuleMetadata>::NAME;
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
new file mode 100644
index 0000000000000000000000000000000000000000..aa6f19ddd51752ba453a1600ea002a198e27af5d
--- /dev/null
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -0,0 +1,681 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! VBIOS extraction and parsing.
+
+// To be removed when all code is used.
+#![expect(dead_code)]
+
+use crate::driver::Bar0;
+use core::convert::TryFrom;
+use kernel::error::Result;
+use kernel::num::PowerOfTwo;
+use kernel::pci;
+use kernel::prelude::*;
+
+/// The offset of the VBIOS ROM in the BAR0 space.
+const ROM_OFFSET: usize = 0x300000;
+/// The maximum length of the VBIOS ROM to scan into.
+const BIOS_MAX_SCAN_LEN: usize = 0x100000;
+/// The size to read ahead when parsing initial BIOS image headers.
+const BIOS_READ_AHEAD_SIZE: usize = 1024;
+/// The bit in the last image indicator byte for the PCI Data Structure that
+/// indicates the last image. Bit 0-6 are reserved, bit 7 is last image bit.
+const LAST_IMAGE_BIT_MASK: u8 = 0x80;
+
+// PMU lookup table entry types. Used to locate PMU table entries
+// in the Fwsec image, corresponding to falcon ucodes.
+#[expect(dead_code)]
+const FALCON_UCODE_ENTRY_APPID_FIRMWARE_SEC_LIC: u8 = 0x05;
+#[expect(dead_code)]
+const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45;
+const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85;
+
+/// Vbios Reader for constructing the VBIOS data
+struct VbiosIterator<'a> {
+ pdev: &'a pci::Device,
+ bar0: &'a Bar0,
+ // VBIOS data vector: As BIOS images are scanned, they are added to this vector
+ // for reference or copying into other data structures. It is the entire
+ // scanned contents of the VBIOS which progressively extends. It is used
+ // so that we do not re-read any contents that are already read as we use
+ // the cumulative length read so far, and re-read any gaps as we extend
+ // the length.
+ data: KVec<u8>,
+ current_offset: usize, // Current offset for iterator
+ last_found: bool, // Whether the last image has been found
+}
+
+impl<'a> VbiosIterator<'a> {
+ fn new(pdev: &'a pci::Device, bar0: &'a Bar0) -> Result<Self> {
+ Ok(Self {
+ pdev,
+ bar0,
+ data: KVec::new(),
+ current_offset: 0,
+ last_found: false,
+ })
+ }
+
+ /// Read bytes from the ROM at the current end of the data vector
+ fn read_more(&mut self, len: usize) -> Result {
+ let current_len = self.data.len();
+ let start = ROM_OFFSET + current_len;
+
+ // Ensure length is a multiple of 4 for 32-bit reads
+ if len % core::mem::size_of::<u32>() != 0 {
+ dev_err!(
+ self.pdev.as_ref(),
+ "VBIOS read length {} is not a multiple of 4\n",
+ len
+ );
+ return Err(EINVAL);
+ }
+
+ self.data.reserve(len, GFP_KERNEL)?;
+ // Read ROM data bytes and push directly to vector
+ for addr in (start..start + len).step_by(core::mem::size_of::<u32>()) {
+ // Read 32-bit word from the VBIOS ROM
+ let word = self.bar0.try_read32(addr)?;
+
+ // Convert the u32 to a 4 byte array and push each byte
+ word.to_ne_bytes()
+ .iter()
+ .try_for_each(|&b| self.data.push(b, GFP_KERNEL))?;
+ }
+
+ Ok(())
+ }
+
+ /// Read bytes at a specific offset, filling any gap
+ fn read_more_at_offset(&mut self, offset: usize, len: usize) -> Result {
+ if offset > BIOS_MAX_SCAN_LEN {
+ dev_err!(self.pdev.as_ref(), "Error: exceeded BIOS scan limit.\n");
+ return Err(EINVAL);
+ }
+
+ // If offset is beyond current data size, fill the gap first
+ let current_len = self.data.len();
+ let gap_bytes = offset.saturating_sub(current_len);
+
+ // Now read the requested bytes at the offset
+ self.read_more(gap_bytes + len)
+ }
+
+ /// Read a BIOS image at a specific offset and create a BiosImage from it.
+ /// self.data is extended as needed and a new BiosImage is returned.
+ /// @context is a string describing the operation for error reporting
+ fn read_bios_image_at_offset(
+ &mut self,
+ offset: usize,
+ len: usize,
+ context: &str,
+ ) -> Result<BiosImage> {
+ let data_len = self.data.len();
+ if offset + len > data_len {
+ self.read_more_at_offset(offset, len).inspect_err(|e| {
+ dev_err!(
+ self.pdev.as_ref(),
+ "Failed to read more at offset {:#x}: {:?}\n",
+ offset,
+ e
+ )
+ })?;
+ }
+
+ BiosImage::new(self.pdev, &self.data[offset..offset + len]).inspect_err(|err| {
+ dev_err!(
+ self.pdev.as_ref(),
+ "Failed to {} at offset {:#x}: {:?}\n",
+ context,
+ offset,
+ err
+ )
+ })
+ }
+}
+
+impl<'a> Iterator for VbiosIterator<'a> {
+ type Item = Result<BiosImage>;
+
+ /// Iterate over all VBIOS images until the last image is detected or offset
+ /// exceeds scan limit.
+ fn next(&mut self) -> Option<Self::Item> {
+ if self.last_found {
+ return None;
+ }
+
+ if self.current_offset > BIOS_MAX_SCAN_LEN {
+ dev_err!(
+ self.pdev.as_ref(),
+ "Error: exceeded BIOS scan limit, stopping scan\n"
+ );
+ return None;
+ }
+
+ // Parse image headers first to get image size
+ let image_size = match self.read_bios_image_at_offset(
+ self.current_offset,
+ BIOS_READ_AHEAD_SIZE,
+ "parse initial BIOS image headers",
+ ) {
+ Ok(image) => image.image_size_bytes(),
+ Err(e) => return Some(Err(e)),
+ };
+
+ // Now create a new BiosImage with the full image data
+ let full_image = match self.read_bios_image_at_offset(
+ self.current_offset,
+ image_size,
+ "parse full BIOS image",
+ ) {
+ Ok(image) => image,
+ Err(e) => return Some(Err(e)),
+ };
+
+ self.last_found = full_image.is_last();
+
+ // Advance to next image (aligned to 512 bytes)
+ self.current_offset += image_size;
+ self.current_offset = PowerOfTwo::<usize>::new(512).align_up(self.current_offset);
+
+ Some(Ok(full_image))
+ }
+}
+
+pub(crate) struct Vbios {
+ fwsec_image: FwSecBiosImage,
+}
+
+impl Vbios {
+ /// Probe for VBIOS extraction
+ /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore.
+ pub(crate) fn new(pdev: &pci::Device, bar0: &Bar0) -> Result<Vbios> {
+ // Images to extract from iteration
+ let mut pci_at_image: Option<PciAtBiosImage> = None;
+ let mut first_fwsec_image: Option<FwSecBiosImage> = None;
+ let mut second_fwsec_image: Option<FwSecBiosImage> = None;
+
+ // Parse all VBIOS images in the ROM
+ for image_result in VbiosIterator::new(pdev, bar0)? {
+ let full_image = image_result?;
+
+ dev_dbg!(
+ pdev.as_ref(),
+ "Found BIOS image: size: {:#x}, type: {}, last: {}\n",
+ full_image.image_size_bytes(),
+ full_image.image_type_str(),
+ full_image.is_last()
+ );
+
+ // Get references to images we will need after the loop, in order to
+ // setup the falcon data offset.
+ match full_image {
+ BiosImage::PciAt(image) => {
+ pci_at_image = Some(image);
+ }
+ BiosImage::FwSec(image) => {
+ if first_fwsec_image.is_none() {
+ first_fwsec_image = Some(image);
+ } else {
+ second_fwsec_image = Some(image);
+ }
+ }
+ // For now we don't need to handle these
+ BiosImage::Efi(_image) => {}
+ BiosImage::Nbsi(_image) => {}
+ }
+ }
+
+ // Using all the images, setup the falcon data pointer in Fwsec.
+ // These are temporarily unused images and will be used in later patches.
+ if let (Some(second), Some(_first), Some(_pci_at)) =
+ (second_fwsec_image, first_fwsec_image, pci_at_image)
+ {
+ Ok(Vbios {
+ fwsec_image: second,
+ })
+ } else {
+ dev_err!(
+ pdev.as_ref(),
+ "Missing required images for falcon data setup, skipping\n"
+ );
+ Err(EINVAL)
+ }
+ }
+}
+
+/// PCI Data Structure as defined in PCI Firmware Specification
+#[derive(Debug, Clone)]
+#[repr(C)]
+struct PcirStruct {
+ /// PCI Data Structure signature ("PCIR" or "NPDS")
+ signature: [u8; 4],
+ /// PCI Vendor ID (e.g., 0x10DE for NVIDIA)
+ vendor_id: u16,
+ /// PCI Device ID
+ device_id: u16,
+ /// Device List Pointer
+ device_list_ptr: u16,
+ /// PCI Data Structure Length
+ pci_data_struct_len: u16,
+ /// PCI Data Structure Revision
+ pci_data_struct_rev: u8,
+ /// Class code (3 bytes, 0x03 for display controller)
+ class_code: [u8; 3],
+ /// Size of this image in 512-byte blocks
+ image_len: u16,
+ /// Revision Level of the Vendor's ROM
+ vendor_rom_rev: u16,
+ /// ROM image type (0x00 = PC-AT compatible, 0x03 = EFI, 0x70 = NBSI)
+ code_type: u8,
+ /// Last image indicator (0x00 = Not last image, 0x80 = Last image)
+ last_image: u8,
+ /// Maximum Run-time Image Length (units of 512 bytes)
+ max_runtime_image_len: u16,
+}
+
+impl PcirStruct {
+ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+ if data.len() < core::mem::size_of::<PcirStruct>() {
+ dev_err!(pdev.as_ref(), "Not enough data for PcirStruct\n");
+ return Err(EINVAL);
+ }
+
+ let mut signature = [0u8; 4];
+ signature.copy_from_slice(&data[0..4]);
+
+ // Signature should be "PCIR" (0x52494350) or "NPDS" (0x5344504e)
+ if &signature != b"PCIR" && &signature != b"NPDS" {
+ dev_err!(
+ pdev.as_ref(),
+ "Invalid signature for PcirStruct: {:?}\n",
+ signature
+ );
+ return Err(EINVAL);
+ }
+
+ let mut class_code = [0u8; 3];
+ class_code.copy_from_slice(&data[13..16]);
+
+ let image_len = u16::from_le_bytes([data[16], data[17]]);
+ if image_len == 0 {
+ dev_err!(pdev.as_ref(), "Invalid image length: 0\n");
+ return Err(EINVAL);
+ }
+
+ Ok(PcirStruct {
+ signature,
+ vendor_id: u16::from_le_bytes([data[4], data[5]]),
+ device_id: u16::from_le_bytes([data[6], data[7]]),
+ device_list_ptr: u16::from_le_bytes([data[8], data[9]]),
+ pci_data_struct_len: u16::from_le_bytes([data[10], data[11]]),
+ pci_data_struct_rev: data[12],
+ class_code,
+ image_len,
+ vendor_rom_rev: u16::from_le_bytes([data[18], data[19]]),
+ code_type: data[20],
+ last_image: data[21],
+ max_runtime_image_len: u16::from_le_bytes([data[22], data[23]]),
+ })
+ }
+
+ /// Check if this is the last image in the ROM
+ fn is_last(&self) -> bool {
+ self.last_image & LAST_IMAGE_BIT_MASK != 0
+ }
+
+ /// Calculate image size in bytes from 512-byte blocks
+ fn image_size_bytes(&self) -> usize {
+ self.image_len as usize * 512
+ }
+}
+
+/// PCI ROM Expansion Header as defined in PCI Firmware Specification.
+/// This is header is at the beginning of every image in the set of
+/// images in the ROM. It contains a pointer to the PCI Data Structure
+/// which describes the image.
+/// For "NBSI" images (NoteBook System Information), the ROM
+/// header deviates from the standard and contains an offset to the
+/// NBSI image however we do not yet parse that in this module and keep
+/// it for future reference.
+#[derive(Debug, Clone, Copy)]
+#[expect(dead_code)]
+struct PciRomHeader {
+ /// 00h: Signature (0xAA55)
+ signature: u16,
+ /// 02h: Reserved bytes for processor architecture unique data (20 bytes)
+ reserved: [u8; 20],
+ /// 16h: NBSI Data Offset (NBSI-specific, offset from header to NBSI image)
+ nbsi_data_offset: Option<u16>,
+ /// 18h: Pointer to PCI Data Structure (offset from start of ROM image)
+ pci_data_struct_offset: u16,
+ /// 1Ah: Size of block (this is NBSI-specific)
+ size_of_block: Option<u32>,
+}
+
+impl PciRomHeader {
+ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+ if data.len() < 26 {
+ // Need at least 26 bytes to read pciDataStrucPtr and sizeOfBlock
+ return Err(EINVAL);
+ }
+
+ let signature = u16::from_le_bytes([data[0], data[1]]);
+
+ // Check for valid ROM signatures
+ match signature {
+ 0xAA55 | 0xBB77 | 0x4E56 => {}
+ _ => {
+ dev_err!(pdev.as_ref(), "ROM signature unknown {:#x}\n", signature);
+ return Err(EINVAL);
+ }
+ }
+
+ // Read the pointer to the PCI Data Structure at offset 0x18
+ let pci_data_struct_ptr = u16::from_le_bytes([data[24], data[25]]);
+
+ // Try to read optional fields if enough data
+ let mut size_of_block = None;
+ let mut nbsi_data_offset = None;
+
+ if data.len() >= 30 {
+ // Read size_of_block at offset 0x1A
+ size_of_block = Some(
+ (data[29] as u32) << 24
+ | (data[28] as u32) << 16
+ | (data[27] as u32) << 8
+ | (data[26] as u32),
+ );
+ }
+
+ // For NBSI images, try to read the nbsiDataOffset at offset 0x16
+ if data.len() >= 24 {
+ nbsi_data_offset = Some(u16::from_le_bytes([data[22], data[23]]));
+ }
+
+ Ok(PciRomHeader {
+ signature,
+ reserved: [0u8; 20],
+ pci_data_struct_offset: pci_data_struct_ptr,
+ size_of_block,
+ nbsi_data_offset,
+ })
+ }
+}
+
+/// NVIDIA PCI Data Extension Structure. This is similar to the
+/// PCI Data Structure, but is Nvidia-specific and is placed right after
+/// the PCI Data Structure. It contains some fields that are redundant
+/// with the PCI Data Structure, but are needed for traversing the
+/// BIOS images. It is expected to be present in all BIOS images except
+/// for NBSI images.
+#[derive(Debug, Clone)]
+#[repr(C)]
+struct NpdeStruct {
+ /// 00h: Signature ("NPDE")
+ signature: [u8; 4],
+ /// 04h: NVIDIA PCI Data Extension Revision
+ npci_data_ext_rev: u16,
+ /// 06h: NVIDIA PCI Data Extension Length
+ npci_data_ext_len: u16,
+ /// 08h: Sub-image Length (in 512-byte units)
+ subimage_len: u16,
+ /// 0Ah: Last image indicator flag
+ last_image: u8,
+}
+
+impl NpdeStruct {
+ fn new(pdev: &pci::Device, data: &[u8]) -> Option<Self> {
+ if data.len() < core::mem::size_of::<Self>() {
+ dev_dbg!(pdev.as_ref(), "Not enough data for NpdeStruct\n");
+ return None;
+ }
+
+ let mut signature = [0u8; 4];
+ signature.copy_from_slice(&data[0..4]);
+
+ // Signature should be "NPDE" (0x4544504E)
+ if &signature != b"NPDE" {
+ dev_dbg!(
+ pdev.as_ref(),
+ "Invalid signature for NpdeStruct: {:?}\n",
+ signature
+ );
+ return None;
+ }
+
+ let subimage_len = u16::from_le_bytes([data[8], data[9]]);
+ if subimage_len == 0 {
+ dev_dbg!(pdev.as_ref(), "Invalid subimage length: 0\n");
+ return None;
+ }
+
+ Some(NpdeStruct {
+ signature,
+ npci_data_ext_rev: u16::from_le_bytes([data[4], data[5]]),
+ npci_data_ext_len: u16::from_le_bytes([data[6], data[7]]),
+ subimage_len,
+ last_image: data[10],
+ })
+ }
+
+ /// Check if this is the last image in the ROM
+ fn is_last(&self) -> bool {
+ self.last_image & LAST_IMAGE_BIT_MASK != 0
+ }
+
+ /// Calculate image size in bytes from 512-byte blocks
+ fn image_size_bytes(&self) -> usize {
+ self.subimage_len as usize * 512
+ }
+
+ /// Try to find NPDE in the data, the NPDE is right after the PCIR.
+ fn find_in_data(
+ pdev: &pci::Device,
+ data: &[u8],
+ rom_header: &PciRomHeader,
+ pcir: &PcirStruct,
+ ) -> Option<Self> {
+ // Calculate the offset where NPDE might be located
+ // NPDE should be right after the PCIR structure, aligned to 16 bytes
+ let pcir_offset = rom_header.pci_data_struct_offset as usize;
+ let npde_start = (pcir_offset + pcir.pci_data_struct_len as usize + 0x0F) & !0x0F;
+
+ // Check if we have enough data
+ if npde_start + core::mem::size_of::<Self>() > data.len() {
+ dev_dbg!(pdev.as_ref(), "Not enough data for NPDE\n");
+ return None;
+ }
+
+ // Try to create NPDE from the data
+ NpdeStruct::new(pdev, &data[npde_start..])
+ }
+}
+
+// Use a macro to implement BiosImage enum and methods. This avoids having to
+// repeat each enum type when implementing functions like base() in BiosImage.
+macro_rules! bios_image {
+ (
+ $($variant:ident: $class:ident),* $(,)?
+ ) => {
+ // BiosImage enum with variants for each image type
+ enum BiosImage {
+ $($variant($class)),*
+ }
+
+ impl BiosImage {
+ /// Get a reference to the common BIOS image data regardless of type
+ fn base(&self) -> &BiosImageBase {
+ match self {
+ $(Self::$variant(img) => &img.base),*
+ }
+ }
+
+ /// Returns a string representing the type of BIOS image
+ fn image_type_str(&self) -> &'static str {
+ match self {
+ $(Self::$variant(_) => stringify!($variant)),*
+ }
+ }
+ }
+ }
+}
+
+impl BiosImage {
+ /// Check if this is the last image
+ fn is_last(&self) -> bool {
+ let base = self.base();
+
+ // For NBSI images (type == 0x70), return true as they're
+ // considered the last image
+ if matches!(self, Self::Nbsi(_)) {
+ return true;
+ }
+
+ // For other image types, check the NPDE first if available
+ if let Some(ref npde) = base.npde {
+ return npde.is_last();
+ }
+
+ // Otherwise, fall back to checking the PCIR last_image flag
+ base.pcir.is_last()
+ }
+
+ /// Get the image size in bytes
+ fn image_size_bytes(&self) -> usize {
+ let base = self.base();
+
+ // Prefer NPDE image size if available
+ if let Some(ref npde) = base.npde {
+ return npde.image_size_bytes();
+ }
+
+ // Otherwise, fall back to the PCIR image size
+ base.pcir.image_size_bytes()
+ }
+
+ /// Create a BiosImageBase from a byte slice and convert it to a BiosImage
+ /// which triggers the constructor of the specific BiosImage enum variant.
+ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+ let base = BiosImageBase::new(pdev, data)?;
+ let image = base.into_image().inspect_err(|e| {
+ dev_err!(pdev.as_ref(), "Failed to create BiosImage: {:?}\n", e);
+ })?;
+
+ Ok(image)
+ }
+}
+
+bios_image! {
+ PciAt: PciAtBiosImage, // PCI-AT compatible BIOS image
+ Efi: EfiBiosImage, // EFI (Extensible Firmware Interface)
+ Nbsi: NbsiBiosImage, // NBSI (Nvidia Bios System Interface)
+ FwSec: FwSecBiosImage, // FWSEC (Firmware Security)
+}
+
+struct PciAtBiosImage {
+ base: BiosImageBase,
+ // PCI-AT-specific fields can be added here in the future.
+}
+
+struct EfiBiosImage {
+ base: BiosImageBase,
+ // EFI-specific fields can be added here in the future.
+}
+
+struct NbsiBiosImage {
+ base: BiosImageBase,
+ // NBSI-specific fields can be added here in the future.
+}
+
+struct FwSecBiosImage {
+ base: BiosImageBase,
+ // FWSEC-specific fields can be added here in the future.
+}
+
+// Convert from BiosImageBase to BiosImage
+impl TryFrom<BiosImageBase> for BiosImage {
+ type Error = Error;
+
+ fn try_from(base: BiosImageBase) -> Result<Self> {
+ match base.pcir.code_type {
+ 0x00 => Ok(BiosImage::PciAt(PciAtBiosImage { base })),
+ 0x03 => Ok(BiosImage::Efi(EfiBiosImage { base })),
+ 0x70 => Ok(BiosImage::Nbsi(NbsiBiosImage { base })),
+ 0xE0 => Ok(BiosImage::FwSec(FwSecBiosImage { base })),
+ _ => Err(EINVAL),
+ }
+ }
+}
+
+/// BIOS Image structure containing various headers and references
+/// fields base to all BIOS images. Each BiosImage type has a
+/// BiosImageBase type along with other image-specific fields.
+/// Note that Rust favors composition of types over inheritance.
+#[derive(Debug)]
+#[expect(dead_code)]
+struct BiosImageBase {
+ /// PCI ROM Expansion Header
+ rom_header: PciRomHeader,
+ /// PCI Data Structure
+ pcir: PcirStruct,
+ /// NVIDIA PCI Data Extension (optional)
+ npde: Option<NpdeStruct>,
+ /// Image data (includes ROM header and PCIR)
+ data: KVec<u8>,
+}
+
+impl BiosImageBase {
+ fn into_image(self) -> Result<BiosImage> {
+ BiosImage::try_from(self)
+ }
+
+ /// Creates a new BiosImageBase from raw byte data.
+ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+ // Ensure we have enough data for the ROM header
+ if data.len() < 26 {
+ dev_err!(pdev.as_ref(), "Not enough data for ROM header\n");
+ return Err(EINVAL);
+ }
+
+ // Parse the ROM header
+ let rom_header = PciRomHeader::new(pdev, &data[0..26])
+ .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to create PciRomHeader: {:?}\n", e))?;
+
+ // Get the PCI Data Structure using the pointer from the ROM header
+ let pcir_offset = rom_header.pci_data_struct_offset as usize;
+ let pcir_data = data
+ .get(pcir_offset..pcir_offset + core::mem::size_of::<PcirStruct>())
+ .ok_or(EINVAL)
+ .inspect_err(|_| {
+ dev_err!(
+ pdev.as_ref(),
+ "PCIR offset {:#x} out of bounds (data length: {})\n",
+ pcir_offset,
+ data.len()
+ );
+ dev_err!(
+ pdev.as_ref(),
+ "Consider reading more data for construction of BiosImage\n"
+ );
+ })?;
+
+ let pcir = PcirStruct::new(pdev, pcir_data)
+ .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to create PcirStruct: {:?}\n", e))?;
+
+ // Look for NPDE structure if this is not an NBSI image (type != 0x70)
+ let npde = NpdeStruct::find_in_data(pdev, data, &rom_header, &pcir);
+
+ // Create a copy of the data
+ let mut data_copy = KVec::new();
+ data_copy.extend_with(data.len(), 0, GFP_KERNEL)?;
+ data_copy.copy_from_slice(data);
+
+ Ok(BiosImageBase {
+ rom_header,
+ pcir,
+ npde,
+ data: data_copy,
+ })
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 18/23] gpu: nova-core: vbios: Add support to look up PMU table in FWSEC
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (16 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 17/23] gpu: nova-core: vbios: Add base support for VBIOS construction and iteration Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 19/23] gpu: nova-core: vbios: Add support for FWSEC ucode extraction Alexandre Courbot
` (6 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
From: Joel Fernandes <joelagnelf@nvidia.com>
The PMU table in the FWSEC image has to be located to locate the start
of the Falcon ucode in the same or another FWSEC image. Add support for
the same.
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/vbios.rs | 179 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 177 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index aa6f19ddd51752ba453a1600ea002a198e27af5d..312caf82d14588e21e0fa2bae0f8954d0efe3479 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -330,6 +330,111 @@ fn image_size_bytes(&self) -> usize {
}
}
+/// BIOS Information Table (BIT) Header
+/// This is the head of the BIT table, that is used to locate the Falcon data.
+/// The BIT table (with its header) is in the PciAtBiosImage and the falcon data
+/// it is pointing to is in the FwSecBiosImage.
+#[derive(Debug, Clone, Copy)]
+#[expect(dead_code)]
+struct BitHeader {
+ /// 0h: BIT Header Identifier (BMP=0x7FFF/BIT=0xB8FF)
+ id: u16,
+ /// 2h: BIT Header Signature ("BIT\0")
+ signature: [u8; 4],
+ /// 6h: Binary Coded Decimal Version, ex: 0x0100 is 1.00.
+ bcd_version: u16,
+ /// 8h: Size of BIT Header (in bytes)
+ header_size: u8,
+ /// 9h: Size of BIT Tokens (in bytes)
+ token_size: u8,
+ /// 10h: Number of token entries that follow
+ token_entries: u8,
+ /// 11h: BIT Header Checksum
+ checksum: u8,
+}
+
+impl BitHeader {
+ fn new(data: &[u8]) -> Result<Self> {
+ if data.len() < 12 {
+ return Err(EINVAL);
+ }
+
+ let mut signature = [0u8; 4];
+ signature.copy_from_slice(&data[2..6]);
+
+ // Check header ID and signature
+ let id = u16::from_le_bytes([data[0], data[1]]);
+ if id != 0xB8FF || &signature != b"BIT\0" {
+ return Err(EINVAL);
+ }
+
+ Ok(BitHeader {
+ id,
+ signature,
+ bcd_version: u16::from_le_bytes([data[6], data[7]]),
+ header_size: data[8],
+ token_size: data[9],
+ token_entries: data[10],
+ checksum: data[11],
+ })
+ }
+}
+
+/// BIT Token Entry: Records in the BIT table followed by the BIT header
+#[derive(Debug, Clone, Copy)]
+#[expect(dead_code)]
+struct BitToken {
+ /// 00h: Token identifier
+ id: u8,
+ /// 01h: Version of the token data
+ data_version: u8,
+ /// 02h: Size of token data in bytes
+ data_size: u16,
+ /// 04h: Offset to the token data
+ data_offset: u16,
+}
+
+// Define the token ID for the Falcon data
+const BIT_TOKEN_ID_FALCON_DATA: u8 = 0x70;
+
+impl BitToken {
+ /// Find a BIT token entry by BIT ID in a PciAtBiosImage
+ fn from_id(image: &PciAtBiosImage, token_id: u8) -> Result<Self> {
+ let header = &image.bit_header;
+
+ // Offset to the first token entry
+ let tokens_start = image.bit_offset + header.header_size as usize;
+
+ for i in 0..header.token_entries as usize {
+ let entry_offset = tokens_start + (i * header.token_size as usize);
+
+ // Make sure we don't go out of bounds
+ if entry_offset + header.token_size as usize > image.base.data.len() {
+ return Err(EINVAL);
+ }
+
+ // Check if this token has the requested ID
+ if image.base.data[entry_offset] == token_id {
+ return Ok(BitToken {
+ id: image.base.data[entry_offset],
+ data_version: image.base.data[entry_offset + 1],
+ data_size: u16::from_le_bytes([
+ image.base.data[entry_offset + 2],
+ image.base.data[entry_offset + 3],
+ ]),
+ data_offset: u16::from_le_bytes([
+ image.base.data[entry_offset + 4],
+ image.base.data[entry_offset + 5],
+ ]),
+ });
+ }
+ }
+
+ // Token not found
+ Err(ENOENT)
+ }
+}
+
/// PCI ROM Expansion Header as defined in PCI Firmware Specification.
/// This is header is at the beginning of every image in the set of
/// images in the ROM. It contains a pointer to the PCI Data Structure
@@ -575,7 +680,8 @@ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
struct PciAtBiosImage {
base: BiosImageBase,
- // PCI-AT-specific fields can be added here in the future.
+ bit_header: BitHeader,
+ bit_offset: usize,
}
struct EfiBiosImage {
@@ -599,7 +705,7 @@ impl TryFrom<BiosImageBase> for BiosImage {
fn try_from(base: BiosImageBase) -> Result<Self> {
match base.pcir.code_type {
- 0x00 => Ok(BiosImage::PciAt(PciAtBiosImage { base })),
+ 0x00 => Ok(BiosImage::PciAt(base.try_into()?)),
0x03 => Ok(BiosImage::Efi(EfiBiosImage { base })),
0x70 => Ok(BiosImage::Nbsi(NbsiBiosImage { base })),
0xE0 => Ok(BiosImage::FwSec(FwSecBiosImage { base })),
@@ -679,3 +785,72 @@ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
})
}
}
+
+/// The PciAt BIOS image is typically the first BIOS image type found in the
+/// BIOS image chain. It contains the BIT header and the BIT tokens.
+impl PciAtBiosImage {
+ /// Find a byte pattern in a slice
+ fn find_byte_pattern(haystack: &[u8], needle: &[u8]) -> Result<usize> {
+ haystack
+ .windows(needle.len())
+ .position(|window| window == needle)
+ .ok_or(EINVAL)
+ }
+
+ /// Find the BIT header in the PciAtBiosImage
+ fn find_bit_header(data: &[u8]) -> Result<(BitHeader, usize)> {
+ let bit_pattern = [0xff, 0xb8, b'B', b'I', b'T', 0x00];
+ let bit_offset = Self::find_byte_pattern(data, &bit_pattern)?;
+ let bit_header = BitHeader::new(&data[bit_offset..])?;
+
+ Ok((bit_header, bit_offset))
+ }
+
+ /// Get a BIT token entry from the BIT table in the PciAtBiosImage
+ fn get_bit_token(&self, token_id: u8) -> Result<BitToken> {
+ BitToken::from_id(self, token_id)
+ }
+
+ /// Find the Falcon data pointer structure in the PciAtBiosImage
+ /// This is just a 4 byte structure that contains a pointer to the
+ /// Falcon data in the FWSEC image.
+ fn falcon_data_ptr(&self, pdev: &pci::Device) -> Result<u32> {
+ let token = self.get_bit_token(BIT_TOKEN_ID_FALCON_DATA)?;
+
+ // Make sure we don't go out of bounds
+ if token.data_offset as usize + 4 > self.base.data.len() {
+ return Err(EINVAL);
+ }
+
+ // read the 4 bytes at the offset specified in the token
+ let offset = token.data_offset as usize;
+ let bytes: [u8; 4] = self.base.data[offset..offset + 4].try_into().map_err(|_| {
+ dev_err!(pdev.as_ref(), "Failed to convert data slice to array");
+ EINVAL
+ })?;
+
+ let data_ptr = u32::from_le_bytes(bytes);
+
+ if (data_ptr as usize) < self.base.data.len() {
+ dev_err!(pdev.as_ref(), "Falcon data pointer out of bounds\n");
+ return Err(EINVAL);
+ }
+
+ Ok(data_ptr)
+ }
+}
+
+impl TryFrom<BiosImageBase> for PciAtBiosImage {
+ type Error = Error;
+
+ fn try_from(base: BiosImageBase) -> Result<Self> {
+ let data_slice = &base.data;
+ let (bit_header, bit_offset) = PciAtBiosImage::find_bit_header(data_slice)?;
+
+ Ok(PciAtBiosImage {
+ base,
+ bit_header,
+ bit_offset,
+ })
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 19/23] gpu: nova-core: vbios: Add support for FWSEC ucode extraction
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (17 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 18/23] gpu: nova-core: vbios: Add support to look up PMU table in FWSEC Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 20/23] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
` (5 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Shirish Baskaran
From: Joel Fernandes <joelagnelf@nvidia.com>
Using the support for navigating the VBIOS, add support to extract vBIOS
ucode data required for GSP to boot. The main data extracted from the
vBIOS is the FWSEC-FRTS firmware which runs on the GSP processor. This
firmware runs in high secure mode, and sets up the WPR2 (Write protected
region) before the Booter runs on the SEC2 processor.
Tested on my Ampere GA102 and boot is successful.
[applied changes by Alex Courbot for fwsec signatures]
[acourbot@nvidia.com: remove now-unneeded Devres acquisition]
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shirish Baskaran <sbaskaran@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
drivers/gpu/nova-core/firmware.rs | 2 -
drivers/gpu/nova-core/vbios.rs | 307 ++++++++++++++++++++++++++++++++++++--
2 files changed, 298 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 41f43a729ad3bf2c4acb6108f41e0905a6fac0df..e5583925cb3b4353b521c68175f8cf0c2d6ce830 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -44,7 +44,6 @@ pub(crate) fn new(dev: &device::Device, chipset: Chipset, ver: &str) -> Result<F
/// Structure used to describe some firmwares, notably FWSEC-FRTS.
#[repr(C)]
#[derive(Debug, Clone)]
-#[allow(dead_code)] // Temporary, will be removed in later patch.
pub(crate) struct FalconUCodeDescV3 {
/// Header defined by `NV_BIT_FALCON_UCODE_DESC_HEADER_VDESC*` in OpenRM.
hdr: u32,
@@ -77,7 +76,6 @@ pub(crate) struct FalconUCodeDescV3 {
impl FalconUCodeDescV3 {
/// Returns the size in bytes of the header.
- #[expect(dead_code)] // Temporary, will be removed in later patch.
pub(crate) fn size(&self) -> usize {
const HDR_SIZE_SHIFT: u32 = 16;
const HDR_SIZE_MASK: u32 = 0xffff0000;
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index 312caf82d14588e21e0fa2bae0f8954d0efe3479..032ee510646af21f26f3f46c2d54a0f812c25978 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -6,7 +6,9 @@
#![expect(dead_code)]
use crate::driver::Bar0;
+use crate::firmware::FalconUCodeDescV3;
use core::convert::TryFrom;
+use kernel::device;
use kernel::error::Result;
use kernel::num::PowerOfTwo;
use kernel::pci;
@@ -192,8 +194,8 @@ impl Vbios {
pub(crate) fn new(pdev: &pci::Device, bar0: &Bar0) -> Result<Vbios> {
// Images to extract from iteration
let mut pci_at_image: Option<PciAtBiosImage> = None;
- let mut first_fwsec_image: Option<FwSecBiosImage> = None;
- let mut second_fwsec_image: Option<FwSecBiosImage> = None;
+ let mut first_fwsec_image: Option<FwSecBiosBuilder> = None;
+ let mut second_fwsec_image: Option<FwSecBiosBuilder> = None;
// Parse all VBIOS images in the ROM
for image_result in VbiosIterator::new(pdev, bar0)? {
@@ -227,12 +229,14 @@ pub(crate) fn new(pdev: &pci::Device, bar0: &Bar0) -> Result<Vbios> {
}
// Using all the images, setup the falcon data pointer in Fwsec.
- // These are temporarily unused images and will be used in later patches.
- if let (Some(second), Some(_first), Some(_pci_at)) =
+ if let (Some(mut second), Some(first), Some(pci_at)) =
(second_fwsec_image, first_fwsec_image, pci_at_image)
{
+ second
+ .setup_falcon_data(pdev, &pci_at, &first)
+ .inspect_err(|e| dev_err!(pdev.as_ref(), "Falcon data setup failed: {:?}\n", e))?;
Ok(Vbios {
- fwsec_image: second,
+ fwsec_image: second.build(pdev)?,
})
} else {
dev_err!(
@@ -242,6 +246,10 @@ pub(crate) fn new(pdev: &pci::Device, bar0: &Bar0) -> Result<Vbios> {
Err(EINVAL)
}
}
+
+ pub(crate) fn fwsec_image(&self) -> &FwSecBiosImage {
+ &self.fwsec_image
+ }
}
/// PCI Data Structure as defined in PCI Firmware Specification
@@ -675,7 +683,7 @@ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
PciAt: PciAtBiosImage, // PCI-AT compatible BIOS image
Efi: EfiBiosImage, // EFI (Extensible Firmware Interface)
Nbsi: NbsiBiosImage, // NBSI (Nvidia Bios System Interface)
- FwSec: FwSecBiosImage, // FWSEC (Firmware Security)
+ FwSec: FwSecBiosBuilder, // FWSEC (Firmware Security)
}
struct PciAtBiosImage {
@@ -694,9 +702,24 @@ struct NbsiBiosImage {
// NBSI-specific fields can be added here in the future.
}
-struct FwSecBiosImage {
+struct FwSecBiosBuilder {
base: BiosImageBase,
- // FWSEC-specific fields can be added here in the future.
+ /// These are temporary fields that are used during the construction of
+ /// the FwSecBiosBuilder. Once FwSecBiosBuilder is constructed, the
+ /// falcon_ucode_offset will be copied into a new FwSecBiosImage.
+ ///
+ /// The offset of the Falcon data from the start of Fwsec image
+ falcon_data_offset: Option<usize>,
+ /// The PmuLookupTable starts at the offset of the falcon data pointer
+ pmu_lookup_table: Option<PmuLookupTable>,
+ /// The offset of the Falcon ucode
+ falcon_ucode_offset: Option<usize>,
+}
+
+pub(crate) struct FwSecBiosImage {
+ base: BiosImageBase,
+ /// The offset of the Falcon ucode
+ falcon_ucode_offset: usize,
}
// Convert from BiosImageBase to BiosImage
@@ -708,7 +731,12 @@ fn try_from(base: BiosImageBase) -> Result<Self> {
0x00 => Ok(BiosImage::PciAt(base.try_into()?)),
0x03 => Ok(BiosImage::Efi(EfiBiosImage { base })),
0x70 => Ok(BiosImage::Nbsi(NbsiBiosImage { base })),
- 0xE0 => Ok(BiosImage::FwSec(FwSecBiosImage { base })),
+ 0xE0 => Ok(BiosImage::FwSec(FwSecBiosBuilder {
+ base,
+ falcon_data_offset: None,
+ pmu_lookup_table: None,
+ falcon_ucode_offset: None,
+ })),
_ => Err(EINVAL),
}
}
@@ -854,3 +882,264 @@ fn try_from(base: BiosImageBase) -> Result<Self> {
})
}
}
+
+/// The PmuLookupTableEntry structure is a single entry in the PmuLookupTable.
+/// See the PmuLookupTable description for more information.
+#[expect(dead_code)]
+struct PmuLookupTableEntry {
+ application_id: u8,
+ target_id: u8,
+ data: u32,
+}
+
+impl PmuLookupTableEntry {
+ fn new(data: &[u8]) -> Result<Self> {
+ if data.len() < 5 {
+ return Err(EINVAL);
+ }
+
+ Ok(PmuLookupTableEntry {
+ application_id: data[0],
+ target_id: data[1],
+ data: u32::from_le_bytes(data[2..6].try_into().map_err(|_| EINVAL)?),
+ })
+ }
+}
+
+/// The PmuLookupTableEntry structure is used to find the PmuLookupTableEntry
+/// for a given application ID. The table of entries is pointed to by the falcon
+/// data pointer in the BIT table, and is used to locate the Falcon Ucode.
+#[expect(dead_code)]
+struct PmuLookupTable {
+ version: u8,
+ header_len: u8,
+ entry_len: u8,
+ entry_count: u8,
+ table_data: KVec<u8>,
+}
+
+impl PmuLookupTable {
+ fn new(pdev: &pci::Device, data: &[u8]) -> Result<Self> {
+ if data.len() < 4 {
+ return Err(EINVAL);
+ }
+
+ let header_len = data[1] as usize;
+ let entry_len = data[2] as usize;
+ let entry_count = data[3] as usize;
+
+ let required_bytes = header_len + (entry_count * entry_len);
+
+ if data.len() < required_bytes {
+ dev_err!(
+ pdev.as_ref(),
+ "PmuLookupTable data length less than required\n"
+ );
+ return Err(EINVAL);
+ }
+
+ // Create a copy of only the table data
+ let table_data = {
+ let mut ret = KVec::new();
+ ret.extend_from_slice(&data[header_len..required_bytes], GFP_KERNEL)?;
+ ret
+ };
+
+ // Debug logging of entries (dumps the table data to dmesg)
+ for i in (header_len..required_bytes).step_by(entry_len) {
+ dev_dbg!(
+ pdev.as_ref(),
+ "PMU entry: {:02x?}\n",
+ &data[i..][..entry_len]
+ );
+ }
+
+ Ok(PmuLookupTable {
+ version: data[0],
+ header_len: header_len as u8,
+ entry_len: entry_len as u8,
+ entry_count: entry_count as u8,
+ table_data,
+ })
+ }
+
+ fn lookup_index(&self, idx: u8) -> Result<PmuLookupTableEntry> {
+ if idx >= self.entry_count {
+ return Err(EINVAL);
+ }
+
+ let index = (idx as usize) * self.entry_len as usize;
+ PmuLookupTableEntry::new(&self.table_data[index..])
+ }
+
+ // find entry by type value
+ fn find_entry_by_type(&self, entry_type: u8) -> Result<PmuLookupTableEntry> {
+ for i in 0..self.entry_count {
+ let entry = self.lookup_index(i)?;
+ if entry.application_id == entry_type {
+ return Ok(entry);
+ }
+ }
+
+ Err(EINVAL)
+ }
+}
+
+/// The FwSecBiosImage structure contains the PMU table and the Falcon Ucode.
+/// The PMU table contains voltage/frequency tables as well as a pointer to the
+/// Falcon Ucode.
+impl FwSecBiosBuilder {
+ fn setup_falcon_data(
+ &mut self,
+ pdev: &pci::Device,
+ pci_at_image: &PciAtBiosImage,
+ first_fwsec: &FwSecBiosBuilder,
+ ) -> Result {
+ let mut offset = pci_at_image.falcon_data_ptr(pdev)? as usize;
+ let mut pmu_in_first_fwsec = false;
+
+ // The falcon data pointer assumes that the PciAt and FWSEC images
+ // are contiguous in memory. However, testing shows the EFI image sits in
+ // between them. So calculate the offset from the end of the PciAt image
+ // rather than the start of it. Compensate.
+ offset -= pci_at_image.base.data.len();
+
+ // The offset is now from the start of the first Fwsec image, however
+ // the offset points to a location in the second Fwsec image. Since
+ // the fwsec images are contiguous, subtract the length of the first Fwsec
+ // image from the offset to get the offset to the start of the second
+ // Fwsec image.
+ if offset < first_fwsec.base.data.len() {
+ pmu_in_first_fwsec = true;
+ } else {
+ offset -= first_fwsec.base.data.len();
+ }
+
+ self.falcon_data_offset = Some(offset);
+
+ if pmu_in_first_fwsec {
+ self.pmu_lookup_table =
+ Some(PmuLookupTable::new(pdev, &first_fwsec.base.data[offset..])?);
+ } else {
+ self.pmu_lookup_table = Some(PmuLookupTable::new(pdev, &self.base.data[offset..])?);
+ }
+
+ match self
+ .pmu_lookup_table
+ .as_ref()
+ .ok_or(EINVAL)?
+ .find_entry_by_type(FALCON_UCODE_ENTRY_APPID_FWSEC_PROD)
+ {
+ Ok(entry) => {
+ let mut ucode_offset = entry.data as usize;
+ ucode_offset -= pci_at_image.base.data.len();
+ if ucode_offset < first_fwsec.base.data.len() {
+ dev_err!(pdev.as_ref(), "Falcon Ucode offset not in second Fwsec.\n");
+ return Err(EINVAL);
+ }
+ ucode_offset -= first_fwsec.base.data.len();
+ self.falcon_ucode_offset = Some(ucode_offset);
+ }
+ Err(e) => {
+ dev_err!(
+ pdev.as_ref(),
+ "PmuLookupTableEntry not found, error: {:?}\n",
+ e
+ );
+ return Err(EINVAL);
+ }
+ }
+ Ok(())
+ }
+
+ /// Build the final FwSecBiosImage from this builder
+ fn build(self, pdev: &pci::Device) -> Result<FwSecBiosImage> {
+ let ret = FwSecBiosImage {
+ base: self.base,
+ falcon_ucode_offset: self.falcon_ucode_offset.ok_or(EINVAL)?,
+ };
+
+ if cfg!(debug_assertions) {
+ // Print the desc header for debugging
+ let desc = ret.header(pdev.as_ref())?;
+ dev_dbg!(pdev.as_ref(), "PmuLookupTableEntry desc: {:#?}\n", desc);
+ }
+
+ Ok(ret)
+ }
+}
+
+impl FwSecBiosImage {
+ /// Get the FwSec header (FalconUCodeDescV3)
+ pub(crate) fn header(&self, dev: &device::Device) -> Result<&FalconUCodeDescV3> {
+ // Get the falcon ucode offset that was found in setup_falcon_data
+ let falcon_ucode_offset = self.falcon_ucode_offset;
+
+ // Make sure the offset is within the data bounds
+ if falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>() > self.base.data.len() {
+ dev_err!(dev, "fwsec-frts header not contained within BIOS bounds\n");
+ return Err(ERANGE);
+ }
+
+ // Read the first 4 bytes to get the version
+ let hdr_bytes: [u8; 4] = self.base.data[falcon_ucode_offset..falcon_ucode_offset + 4]
+ .try_into()
+ .map_err(|_| EINVAL)?;
+ let hdr = u32::from_le_bytes(hdr_bytes);
+ let ver = (hdr & 0xff00) >> 8;
+
+ if ver != 3 {
+ dev_err!(dev, "invalid fwsec firmware version: {:?}\n", ver);
+ return Err(EINVAL);
+ }
+
+ // Return a reference to the FalconUCodeDescV3 structure.
+ // SAFETY: we have checked that `falcon_ucode_offset + size_of::<FalconUCodeDescV3>`
+ // is within the bounds of `data`. Also, this data vector is from ROM, and 'data' field
+ // in BiosImageBase is immutable after construction.
+ Ok(unsafe {
+ &*(self
+ .base
+ .data
+ .as_ptr()
+ .add(falcon_ucode_offset)
+ .cast::<FalconUCodeDescV3>())
+ })
+ }
+
+ /// Get the ucode data as a byte slice
+ pub(crate) fn ucode(&self, dev: &device::Device, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+ let falcon_ucode_offset = self.falcon_ucode_offset;
+
+ // The ucode data follows the descriptor
+ let ucode_data_offset = falcon_ucode_offset + desc.size();
+ let size = (desc.imem_load_size + desc.dmem_load_size) as usize;
+
+ // Get the data slice, checking bounds in a single operation
+ self.base
+ .data
+ .get(ucode_data_offset..ucode_data_offset + size)
+ .ok_or(ERANGE)
+ .inspect_err(|_| dev_err!(dev, "fwsec ucode data not contained within BIOS bounds\n"))
+ }
+
+ /// Get the signatures as a byte slice
+ pub(crate) fn sigs(&self, dev: &device::Device, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
+ const SIG_SIZE: usize = 96 * 4;
+
+ // The signatures data follows the descriptor
+ let sigs_data_offset = self.falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
+ let size = desc.signature_count as usize * SIG_SIZE;
+
+ // Make sure the data is within bounds
+ if sigs_data_offset + size > self.base.data.len() {
+ dev_err!(
+ dev,
+ "fwsec signatures data not contained within BIOS bounds\n"
+ );
+ return Err(ERANGE);
+ }
+
+ Ok(&self.base.data[sigs_data_offset..sigs_data_offset + size])
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 20/23] gpu: nova-core: compute layout of the FRTS region
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (18 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 19/23] gpu: nova-core: vbios: Add support for FWSEC ucode extraction Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 21/23] gpu: nova-core: add types for patching firmware binaries Alexandre Courbot
` (4 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
FWSEC-FRTS is run with the desired address of the FRTS region as
parameter, which we need to compute depending on some hardware
parameters.
Do this in a `FbLayout` structure, that will be later extended to
describe more memory regions used to boot the GSP.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/fb.rs | 70 ++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/fb/hal.rs | 12 +++++-
drivers/gpu/nova-core/fb/hal/ga100.rs | 12 ++++++
drivers/gpu/nova-core/fb/hal/ga102.rs | 36 +++++++++++++++++
drivers/gpu/nova-core/fb/hal/tu102.rs | 16 ++++++++
drivers/gpu/nova-core/gpu.rs | 4 ++
drivers/gpu/nova-core/regs.rs | 76 +++++++++++++++++++++++++++++++++++
7 files changed, 224 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index 308cd76edfee5a2e8a4cd979c20da2ce51cb16a5..39c7a7c506dd83776eb2b23f0bfb5c57a4d3f84f 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -1,12 +1,17 @@
// SPDX-License-Identifier: GPL-2.0
+use core::ops::Range;
+
+use kernel::num::PowerOfTwo;
use kernel::prelude::*;
+use kernel::sizes::*;
use kernel::types::ARef;
use kernel::{dev_warn, device};
use crate::dma::DmaObject;
use crate::driver::Bar0;
use crate::gpu::Chipset;
+use crate::regs;
mod hal;
@@ -64,3 +69,68 @@ pub(crate) fn unregister(self, bar: &Bar0) {
}
}
}
+
+/// Layout of the GPU framebuffer memory.
+///
+/// Contains ranges of GPU memory reserved for a given purpose during the GSP bootup process.
+#[derive(Debug)]
+#[expect(dead_code)]
+pub(crate) struct FbLayout {
+ pub fb: Range<u64>,
+ pub vga_workspace: Range<u64>,
+ pub frts: Range<u64>,
+}
+
+impl FbLayout {
+ /// Computes the FB layout.
+ pub(crate) fn new(chipset: Chipset, bar: &Bar0) -> Result<Self> {
+ let hal = hal::fb_hal(chipset);
+
+ let fb = {
+ let fb_size = hal.vidmem_size(bar);
+
+ 0..fb_size
+ };
+
+ let vga_workspace = {
+ let vga_base = {
+ const NV_PRAMIN_SIZE: u64 = SZ_1M as u64;
+ let base = fb.end - NV_PRAMIN_SIZE;
+
+ if hal.supports_display(bar) {
+ match regs::NV_PDISP_VGA_WORKSPACE_BASE::read(bar).vga_workspace_addr() {
+ Some(addr) => {
+ if addr < base {
+ const VBIOS_WORKSPACE_SIZE: u64 = SZ_128K as u64;
+
+ // Point workspace address to end of framebuffer.
+ fb.end - VBIOS_WORKSPACE_SIZE
+ } else {
+ addr
+ }
+ }
+ None => base,
+ }
+ } else {
+ base
+ }
+ };
+
+ vga_base..fb.end
+ };
+
+ let frts = {
+ const FRTS_DOWN_ALIGN: PowerOfTwo<u64> = PowerOfTwo::<u64>::new(SZ_128K as u64);
+ const FRTS_SIZE: u64 = SZ_1M as u64;
+ let frts_base = FRTS_DOWN_ALIGN.align_down(vga_workspace.start) - FRTS_SIZE;
+
+ frts_base..frts_base + FRTS_SIZE
+ };
+
+ Ok(Self {
+ fb,
+ vga_workspace,
+ frts,
+ })
+ }
+}
diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
index 23eab57eec9f524e066d3324eb7f5f2bf78481d2..2f914948bb9a9842fd00a4c6381420b74de81c3f 100644
--- a/drivers/gpu/nova-core/fb/hal.rs
+++ b/drivers/gpu/nova-core/fb/hal.rs
@@ -6,6 +6,7 @@
use crate::gpu::Chipset;
mod ga100;
+mod ga102;
mod tu102;
pub(crate) trait FbHal {
@@ -16,6 +17,12 @@ pub(crate) trait FbHal {
///
/// This might fail if the address is too large for the receiving register.
fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result;
+
+ /// Returns `true` is display is supported.
+ fn supports_display(&self, bar: &Bar0) -> bool;
+
+ /// Returns the VRAM size, in bytes.
+ fn vidmem_size(&self, bar: &Bar0) -> u64;
}
/// Returns the HAL corresponding to `chipset`.
@@ -24,8 +31,9 @@ pub(super) fn fb_hal(chipset: Chipset) -> &'static dyn FbHal {
match chipset {
TU102 | TU104 | TU106 | TU117 | TU116 => tu102::TU102_HAL,
- GA100 | GA102 | GA103 | GA104 | GA106 | GA107 | AD102 | AD103 | AD104 | AD106 | AD107 => {
- ga100::GA100_HAL
+ GA100 => ga100::GA100_HAL,
+ GA102 | GA103 | GA104 | GA106 | GA107 | AD102 | AD103 | AD104 | AD106 | AD107 => {
+ ga102::GA102_HAL
}
}
}
diff --git a/drivers/gpu/nova-core/fb/hal/ga100.rs b/drivers/gpu/nova-core/fb/hal/ga100.rs
index 7c10436c1c590d9b767c399b69370697fdf8d239..4827721c9860649601b274c3986470096e1fe9bc 100644
--- a/drivers/gpu/nova-core/fb/hal/ga100.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga100.rs
@@ -25,6 +25,10 @@ pub(super) fn write_sysmem_flush_page_ga100(bar: &Bar0, addr: u64) {
.write(bar);
}
+pub(super) fn display_enabled_ga100(bar: &Bar0) -> bool {
+ !regs::ga100::NV_FUSE_STATUS_OPT_DISPLAY::read(bar).display_disabled()
+}
+
/// Shift applied to the sysmem address before it is written into
/// `NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI`,
const FLUSH_SYSMEM_ADDR_SHIFT_HI: u32 = 40;
@@ -39,6 +43,14 @@ fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
Ok(())
}
+
+ fn supports_display(&self, bar: &Bar0) -> bool {
+ display_enabled_ga100(bar)
+ }
+
+ fn vidmem_size(&self, bar: &Bar0) -> u64 {
+ super::tu102::vidmem_size_gp102(bar)
+ }
}
const GA100: Ga100 = Ga100;
diff --git a/drivers/gpu/nova-core/fb/hal/ga102.rs b/drivers/gpu/nova-core/fb/hal/ga102.rs
new file mode 100644
index 0000000000000000000000000000000000000000..a73b77e3971513d088211a97ad8e50b00a9131f7
--- /dev/null
+++ b/drivers/gpu/nova-core/fb/hal/ga102.rs
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::fb::hal::FbHal;
+use crate::regs;
+
+fn vidmem_size_ga102(bar: &Bar0) -> u64 {
+ regs::NV_USABLE_FB_SIZE_IN_MB::read(bar).usable_fb_size()
+}
+
+struct Ga102;
+
+impl FbHal for Ga102 {
+ fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
+ super::ga100::read_sysmem_flush_page_ga100(bar)
+ }
+
+ fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
+ super::ga100::write_sysmem_flush_page_ga100(bar, addr);
+
+ Ok(())
+ }
+
+ fn supports_display(&self, bar: &Bar0) -> bool {
+ super::ga100::display_enabled_ga100(bar)
+ }
+
+ fn vidmem_size(&self, bar: &Bar0) -> u64 {
+ vidmem_size_ga102(bar)
+ }
+}
+
+const GA102: Ga102 = Ga102;
+pub(super) const GA102_HAL: &dyn FbHal = &GA102;
diff --git a/drivers/gpu/nova-core/fb/hal/tu102.rs b/drivers/gpu/nova-core/fb/hal/tu102.rs
index 048859f9fd9d6cfb630da0a8c3513becf3ab62d6..6f8ae58e9481017f1a81fb8e75fb24782e50a781 100644
--- a/drivers/gpu/nova-core/fb/hal/tu102.rs
+++ b/drivers/gpu/nova-core/fb/hal/tu102.rs
@@ -26,6 +26,14 @@ pub(super) fn write_sysmem_flush_page_gm107(bar: &Bar0, addr: u64) -> Result {
}
}
+pub(super) fn display_enabled_gm107(bar: &Bar0) -> bool {
+ !regs::gm107::NV_FUSE_STATUS_OPT_DISPLAY::read(bar).display_disabled()
+}
+
+pub(super) fn vidmem_size_gp102(bar: &Bar0) -> u64 {
+ regs::NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE::read(bar).usable_fb_size()
+}
+
struct Tu102;
impl FbHal for Tu102 {
@@ -36,6 +44,14 @@ fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64 {
fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result {
write_sysmem_flush_page_gm107(bar, addr)
}
+
+ fn supports_display(&self, bar: &Bar0) -> bool {
+ display_enabled_gm107(bar)
+ }
+
+ fn vidmem_size(&self, bar: &Bar0) -> u64 {
+ vidmem_size_gp102(bar)
+ }
}
const TU102: Tu102 = Tu102;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 1c577d3eff8b32bbc45d7d2302c3e2246bef3b44..413f1ab85b37926cdfd9a9c76167816b21d89adc 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -4,6 +4,7 @@
use crate::driver::Bar0;
use crate::falcon::{gsp::Gsp, sec2::Sec2, Falcon};
+use crate::fb::FbLayout;
use crate::fb::SysmemFlush;
use crate::firmware::{Firmware, FIRMWARE_VERSION};
use crate::gfw;
@@ -219,6 +220,9 @@ pub(crate) fn new(
let _sec2_falcon = Falcon::<Sec2>::new(pdev.as_ref(), spec.chipset, bar, true)?;
+ let fb_layout = FbLayout::new(spec.chipset, bar)?;
+ dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
+
// Will be used in a later patch when fwsec firmware is needed.
let _bios = Vbios::new(pdev, bar)?;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index b9fbc847c943b54557259ebc0d1cf3cb1bbc7a1b..54d4d37d6bf2c31947b965258d2733009c293a18 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -52,6 +52,27 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
23:0 adr_63_40 as u32;
});
+register!(NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE @ 0x00100ce0 {
+ 3:0 lower_scale as u8;
+ 9:4 lower_mag as u8;
+ 30:30 ecc_mode_enabled as bool;
+});
+
+impl NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE {
+ /// Returns the usable framebuffer size, in bytes.
+ pub(crate) fn usable_fb_size(self) -> u64 {
+ let size = ((self.lower_mag() as u64) << (self.lower_scale() as u64))
+ * kernel::sizes::SZ_1M as u64;
+
+ if self.ecc_mode_enabled() {
+ // Remove the amount of memory reserved for ECC (one per 16 units).
+ size / 16 * 15
+ } else {
+ size
+ }
+ }
+}
+
/* PGC6 */
register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {
@@ -77,6 +98,42 @@ pub(crate) fn completed(self) -> bool {
}
}
+register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_42 @ 0x001183a4 {
+ 31:0 value as u32;
+});
+
+register!(
+ NV_USABLE_FB_SIZE_IN_MB => NV_PGC6_AON_SECURE_SCRATCH_GROUP_42,
+ "Scratch group 42 register used as framebuffer size" {
+ 31:0 value as u32, "Usable framebuffer size, in megabytes";
+ }
+);
+
+impl NV_USABLE_FB_SIZE_IN_MB {
+ /// Returns the usable framebuffer size, in bytes.
+ pub(crate) fn usable_fb_size(self) -> u64 {
+ u64::from(self.value()) * kernel::sizes::SZ_1M as u64
+ }
+}
+
+/* PDISP */
+
+register!(NV_PDISP_VGA_WORKSPACE_BASE @ 0x00625f04 {
+ 3:3 status_valid as bool, "Set if the `addr` field is valid";
+ 31:8 addr as u32, "VGA workspace base address divided by 0x10000";
+});
+
+impl NV_PDISP_VGA_WORKSPACE_BASE {
+ /// Returns the base address of the VGA workspace, or `None` if none exists.
+ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
+ if self.status_valid() {
+ Some((self.addr() as u64) << 16)
+ } else {
+ None
+ }
+ }
+}
+
/* FUSE */
register!(NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION @ 0x00824100 {
@@ -211,3 +268,22 @@ pub(crate) fn completed(self) -> bool {
4:4 core_select as bool => PeregrineCoreSelect;
8:8 br_fetch as bool;
});
+
+// The modules below provide registers that are not identical on all supported chips. They should
+// only be used in HAL modules.
+
+pub(crate) mod gm107 {
+ /* FUSE */
+
+ register!(NV_FUSE_STATUS_OPT_DISPLAY @ 0x00021c04 {
+ 0:0 display_disabled as bool;
+ });
+}
+
+pub(crate) mod ga100 {
+ /* FUSE */
+
+ register!(NV_FUSE_STATUS_OPT_DISPLAY @ 0x00820c04 {
+ 0:0 display_disabled as bool;
+ });
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 21/23] gpu: nova-core: add types for patching firmware binaries
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (19 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 20/23] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 22/23] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
` (3 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
Some of the firmwares need to be patched at load-time with a signature.
Add a couple of types and traits that sub-modules can use to implement
this behavior, while ensuring that the correct kind of signature is
applied to the firmware.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/firmware.rs | 64 +++++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index e5583925cb3b4353b521c68175f8cf0c2d6ce830..32553b5142d6623bdaaa9d480fbff11069198606 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -3,11 +3,15 @@
//! Contains structures and functions dedicated to the parsing, building and patching of firmwares
//! to be loaded into a given execution unit.
+use core::marker::PhantomData;
+
use kernel::device;
use kernel::firmware;
use kernel::prelude::*;
use kernel::str::CString;
+use crate::dma::DmaObject;
+use crate::falcon::FalconFirmware;
use crate::gpu;
use crate::gpu::Chipset;
@@ -84,6 +88,66 @@ pub(crate) fn size(&self) -> usize {
}
}
+/// Trait implemented by types defining the signed state of a firmware.
+trait SignedState {}
+
+/// Type indicating that the firmware must be signed before it can be used.
+struct Unsigned;
+impl SignedState for Unsigned {}
+
+/// Type indicating that the firmware is signed and ready to be loaded.
+struct Signed;
+impl SignedState for Signed {}
+
+/// A [`DmaObject`] containing a specific microcode ready to be loaded into a falcon.
+///
+/// This is module-local and meant for sub-modules to use internally.
+///
+/// After construction, a firmware is [`Unsigned`], and must generally be patched with a signature
+/// before it can be loaded (with an exception for development hardware). The
+/// [`Self::patch_signature`] and [`Self::no_patch_signature`] methods are used to transition the
+/// firmware to its [`Signed`] state.
+struct FirmwareDmaObject<F: FalconFirmware, S: SignedState>(DmaObject, PhantomData<(F, S)>);
+
+/// Trait for signatures to be patched directly into a given firmware.
+///
+/// This is module-local and meant for sub-modules to use internally.
+trait FirmwareSignature<F: FalconFirmware>: AsRef<[u8]> {}
+
+#[expect(unused)]
+impl<F: FalconFirmware> FirmwareDmaObject<F, Unsigned> {
+ /// Patches the firmware at offset `sig_base_img` with `signature`.
+ fn patch_signature<S: FirmwareSignature<F>>(
+ mut self,
+ signature: &S,
+ sig_base_img: usize,
+ ) -> Result<FirmwareDmaObject<F, Signed>> {
+ let signature_bytes = signature.as_ref();
+ if sig_base_img + signature_bytes.len() > self.0.size() {
+ return Err(EINVAL);
+ }
+
+ // SAFETY: we are the only user of this object, so there cannot be any race.
+ let dst = unsafe { self.0.start_ptr_mut().add(sig_base_img) };
+
+ // SAFETY: `signature` and `dst` are valid, properly aligned, and do not overlap.
+ unsafe {
+ core::ptr::copy_nonoverlapping(signature_bytes.as_ptr(), dst, signature_bytes.len())
+ };
+
+ Ok(FirmwareDmaObject(self.0, PhantomData))
+ }
+
+ /// Mark the firmware as signed without patching it.
+ ///
+ /// This method is used to explicitly confirm that we do not need to sign the firmware, while
+ /// allowing us to continue as if it was. This is typically only needed for development
+ /// hardware.
+ fn no_patch_signature(self) -> FirmwareDmaObject<F, Signed> {
+ FirmwareDmaObject(self.0, PhantomData)
+ }
+}
+
pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
impl<const N: usize> ModInfoBuilder<N> {
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 22/23] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (20 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 21/23] gpu: nova-core: add types for patching firmware binaries Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 23/23] gpu: nova-core: load and " Alexandre Courbot
` (2 subsequent siblings)
24 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot
The FWSEC firmware needs to be extracted from the VBIOS and patched with
the desired command, as well as the right signature. Do this so we are
ready to load and run this firmware into the GSP falcon and create the
FRTS region.
[joelagnelf@nvidia.com: give better names to FalconAppifHdrV1's fields]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/dma.rs | 3 -
drivers/gpu/nova-core/firmware.rs | 3 +-
drivers/gpu/nova-core/firmware/fwsec.rs | 395 ++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/gpu.rs | 15 +-
drivers/gpu/nova-core/vbios.rs | 30 ++-
5 files changed, 431 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/nova-core/dma.rs b/drivers/gpu/nova-core/dma.rs
index 4b063aaef65ec4e2f476fc5ce9dc25341b6660ca..1f1f8c378d8e2cf51edc772e7afe392e9c9c8831 100644
--- a/drivers/gpu/nova-core/dma.rs
+++ b/drivers/gpu/nova-core/dma.rs
@@ -2,9 +2,6 @@
//! Simple DMA object wrapper.
-// To be removed when all code is used.
-#![expect(dead_code)]
-
use core::ops::{Deref, DerefMut};
use kernel::device;
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 32553b5142d6623bdaaa9d480fbff11069198606..ae449a98dffb51e400db058c7368f0632b62f147 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -15,6 +15,8 @@
use crate::gpu;
use crate::gpu::Chipset;
+pub(crate) mod fwsec;
+
pub(crate) const FIRMWARE_VERSION: &str = "535.113.01";
/// Structure encapsulating the firmware blobs required for the GPU to operate.
@@ -114,7 +116,6 @@ impl SignedState for Signed {}
/// This is module-local and meant for sub-modules to use internally.
trait FirmwareSignature<F: FalconFirmware>: AsRef<[u8]> {}
-#[expect(unused)]
impl<F: FalconFirmware> FirmwareDmaObject<F, Unsigned> {
/// Patches the firmware at offset `sig_base_img` with `signature`.
fn patch_signature<S: FirmwareSignature<F>>(
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
new file mode 100644
index 0000000000000000000000000000000000000000..e02c051a682b790b1627ace42c7aaa214b8903df
--- /dev/null
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -0,0 +1,395 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! FWSEC is a High Secure firmware that is extracted from the BIOS and performs the first step of
+//! the GSP startup by creating the WPR2 memory region and copying critical areas of the VBIOS into
+//! it after authenticating them, ensuring they haven't been tampered with. It runs on the GSP
+//! falcon.
+//!
+//! Before being run, it needs to be patched in two areas:
+//!
+//! - The command to be run, as this firmware can perform several tasks ;
+//! - The ucode signature, so the GSP falcon can run FWSEC in HS mode.
+
+use core::marker::PhantomData;
+use core::ops::Deref;
+
+use kernel::device::{self, Device};
+use kernel::num::PowerOfTwo;
+use kernel::prelude::*;
+use kernel::transmute::FromBytes;
+
+use crate::dma::DmaObject;
+use crate::driver::Bar0;
+use crate::falcon::gsp::Gsp;
+use crate::falcon::{Falcon, FalconBromParams, FalconFirmware, FalconLoadParams, FalconLoadTarget};
+use crate::firmware::{FalconUCodeDescV3, FirmwareDmaObject, FirmwareSignature, Signed, Unsigned};
+use crate::vbios::Vbios;
+
+const NVFW_FALCON_APPIF_ID_DMEMMAPPER: u32 = 0x4;
+
+#[repr(C)]
+#[derive(Debug)]
+struct FalconAppifHdrV1 {
+ version: u8,
+ header_size: u8,
+ entry_size: u8,
+ entry_count: u8,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifHdrV1 {}
+
+#[repr(C, packed)]
+#[derive(Debug)]
+struct FalconAppifV1 {
+ id: u32,
+ dmem_base: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifV1 {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct FalconAppifDmemmapperV3 {
+ signature: u32,
+ version: u16,
+ size: u16,
+ cmd_in_buffer_offset: u32,
+ cmd_in_buffer_size: u32,
+ cmd_out_buffer_offset: u32,
+ cmd_out_buffer_size: u32,
+ nvf_img_data_buffer_offset: u32,
+ nvf_img_data_buffer_size: u32,
+ printf_buffer_hdr: u32,
+ ucode_build_time_stamp: u32,
+ ucode_signature: u32,
+ init_cmd: u32,
+ ucode_feature: u32,
+ ucode_cmd_mask0: u32,
+ ucode_cmd_mask1: u32,
+ multi_tgt_tbl: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FalconAppifDmemmapperV3 {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct ReadVbios {
+ ver: u32,
+ hdr: u32,
+ addr: u64,
+ size: u32,
+ flags: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for ReadVbios {}
+
+#[derive(Debug)]
+#[repr(C, packed)]
+struct FrtsRegion {
+ ver: u32,
+ hdr: u32,
+ addr: u32,
+ size: u32,
+ ftype: u32,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FrtsRegion {}
+
+const NVFW_FRTS_CMD_REGION_TYPE_FB: u32 = 2;
+
+#[repr(C, packed)]
+struct FrtsCmd {
+ read_vbios: ReadVbios,
+ frts_region: FrtsRegion,
+}
+// SAFETY: any byte sequence is valid for this struct.
+unsafe impl FromBytes for FrtsCmd {}
+
+const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS: u32 = 0x15;
+const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_SB: u32 = 0x19;
+
+/// Command for the [`FwsecFirmware`] to execute.
+pub(crate) enum FwsecCommand {
+ /// Asks [`FwsecFirmware`] to carve out the WPR2 area and place a verified copy of the VBIOS
+ /// image into it.
+ Frts { frts_addr: u64, frts_size: u64 },
+ /// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
+ #[expect(dead_code)]
+ Sb,
+}
+
+/// Size of the signatures used in FWSEC.
+const BCRT30_RSA3K_SIG_SIZE: usize = 384;
+
+/// A single signature that can be patched into a FWSEC image.
+#[repr(transparent)]
+pub(crate) struct Bcrt30Rsa3kSignature([u8; BCRT30_RSA3K_SIG_SIZE]);
+
+/// SAFETY: A signature is just an array of bytes.
+unsafe impl FromBytes for Bcrt30Rsa3kSignature {}
+
+impl From<[u8; BCRT30_RSA3K_SIG_SIZE]> for Bcrt30Rsa3kSignature {
+ fn from(sig: [u8; BCRT30_RSA3K_SIG_SIZE]) -> Self {
+ Self(sig)
+ }
+}
+
+impl AsRef<[u8]> for Bcrt30Rsa3kSignature {
+ fn as_ref(&self) -> &[u8] {
+ &self.0
+ }
+}
+
+impl FirmwareSignature<FwsecFirmware> for Bcrt30Rsa3kSignature {}
+
+/// Reinterpret the area starting from `offset` in `fw` as an instance of `T` (which must implement
+/// [`FromBytes`]) and return a reference to it.
+///
+/// # Safety
+///
+/// Callers must ensure that the region of memory returned is not written for as long as the
+/// returned reference is alive.
+///
+/// TODO: Remove this and `transmute_mut` once we have a way to transmute objects implementing
+/// FromBytes, e.g.:
+/// https://lore.kernel.org/lkml/20250330234039.29814-1-christiansantoslima21@gmail.com/
+unsafe fn transmute<'a, 'b, T: Sized + FromBytes>(
+ fw: &'a DmaObject,
+ offset: usize,
+) -> Result<&'b T> {
+ if offset + size_of::<T>() > fw.size() {
+ return Err(EINVAL);
+ }
+ if (fw.start_ptr() as usize + offset) % align_of::<T>() != 0 {
+ return Err(EINVAL);
+ }
+
+ // SAFETY: we have checked that the pointer is properly aligned that its pointed memory is
+ // large enough the contains an instance of `T`, which implements `FromBytes`.
+ Ok(unsafe { &*(fw.start_ptr().add(offset).cast::<T>()) })
+}
+
+/// Reinterpret the area starting from `offset` in `fw` as a mutable instance of `T` (which must
+/// implement [`FromBytes`]) and return a reference to it.
+///
+/// # Safety
+///
+/// Callers must ensure that the region of memory returned is not read or written for as long as
+/// the returned reference is alive.
+unsafe fn transmute_mut<'a, 'b, T: Sized + FromBytes>(
+ fw: &'a mut DmaObject,
+ offset: usize,
+) -> Result<&'b mut T> {
+ if offset + size_of::<T>() > fw.size() {
+ return Err(EINVAL);
+ }
+ if (fw.start_ptr_mut() as usize + offset) % align_of::<T>() != 0 {
+ return Err(EINVAL);
+ }
+
+ // SAFETY: we have checked that the pointer is properly aligned that its pointed memory is
+ // large enough the contains an instance of `T`, which implements `FromBytes`.
+ Ok(unsafe { &mut *(fw.start_ptr_mut().add(offset).cast::<T>()) })
+}
+
+/// The FWSEC microcode, extracted from the BIOS and to be run on the GSP falcon.
+///
+/// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
+pub(crate) struct FwsecFirmware {
+ /// Descriptor of the firmware.
+ desc: FalconUCodeDescV3,
+ /// GPU-accessible DMA object containing the firmware.
+ ucode: FirmwareDmaObject<Self, Signed>,
+}
+
+impl FalconLoadParams for FwsecFirmware {
+ fn imem_load_params(&self) -> FalconLoadTarget {
+ FalconLoadTarget {
+ src_start: 0,
+ dst_start: self.desc.imem_phys_base,
+ len: self.desc.imem_load_size,
+ }
+ }
+
+ fn dmem_load_params(&self) -> FalconLoadTarget {
+ FalconLoadTarget {
+ src_start: self.desc.imem_load_size,
+ dst_start: self.desc.dmem_phys_base,
+ len: PowerOfTwo::<u32>::new(256).align_up(self.desc.dmem_load_size),
+ }
+ }
+
+ fn brom_params(&self) -> FalconBromParams {
+ FalconBromParams {
+ pkc_data_offset: self.desc.pkc_data_offset,
+ engine_id_mask: self.desc.engine_id_mask,
+ ucode_id: self.desc.ucode_id,
+ }
+ }
+
+ fn boot_addr(&self) -> u32 {
+ 0
+ }
+}
+
+impl Deref for FwsecFirmware {
+ type Target = DmaObject;
+
+ fn deref(&self) -> &Self::Target {
+ &self.ucode.0
+ }
+}
+
+impl FalconFirmware for FwsecFirmware {
+ type Target = Gsp;
+}
+
+impl FirmwareDmaObject<FwsecFirmware, Unsigned> {
+ fn new_fwsec(dev: &Device<device::Bound>, bios: &Vbios, cmd: FwsecCommand) -> Result<Self> {
+ let desc = bios.fwsec_image().header(dev)?;
+ let ucode = bios.fwsec_image().ucode(dev, desc)?;
+ let mut dma_object = DmaObject::from_data(dev, ucode)?;
+
+ let hdr_offset = (desc.imem_load_size + desc.interface_offset) as usize;
+ // SAFETY: we have an exclusive reference to `self`, and no caller should have shared
+ // `self` with the hardware yet.
+ let hdr: &FalconAppifHdrV1 = unsafe { transmute(&dma_object, hdr_offset) }?;
+
+ if hdr.version != 1 {
+ return Err(EINVAL);
+ }
+
+ // Find the DMEM mapper section in the firmware.
+ for i in 0..hdr.entry_count as usize {
+ let app: &FalconAppifV1 =
+ // SAFETY: we have an exclusive reference to `self`, and no caller should have shared
+ // `self` with the hardware yet.
+ unsafe {
+ transmute(
+ &dma_object,
+ hdr_offset + hdr.header_size as usize + i * hdr.entry_size as usize
+ )
+ }?;
+
+ if app.id != NVFW_FALCON_APPIF_ID_DMEMMAPPER {
+ continue;
+ }
+
+ // SAFETY: we have an exclusive reference to `self`, and no caller should have shared
+ // `self` with the hardware yet.
+ let dmem_mapper: &mut FalconAppifDmemmapperV3 = unsafe {
+ transmute_mut(
+ &mut dma_object,
+ (desc.imem_load_size + app.dmem_base) as usize,
+ )
+ }?;
+
+ // SAFETY: we have an exclusive reference to `self`, and no caller should have shared
+ // `self` with the hardware yet.
+ let frts_cmd: &mut FrtsCmd = unsafe {
+ transmute_mut(
+ &mut dma_object,
+ (desc.imem_load_size + dmem_mapper.cmd_in_buffer_offset) as usize,
+ )
+ }?;
+
+ frts_cmd.read_vbios = ReadVbios {
+ ver: 1,
+ hdr: size_of::<ReadVbios>() as u32,
+ addr: 0,
+ size: 0,
+ flags: 2,
+ };
+
+ dmem_mapper.init_cmd = match cmd {
+ FwsecCommand::Frts {
+ frts_addr,
+ frts_size,
+ } => {
+ frts_cmd.frts_region = FrtsRegion {
+ ver: 1,
+ hdr: size_of::<FrtsRegion>() as u32,
+ addr: (frts_addr >> 12) as u32,
+ size: (frts_size >> 12) as u32,
+ ftype: NVFW_FRTS_CMD_REGION_TYPE_FB,
+ };
+
+ NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS
+ }
+ FwsecCommand::Sb => NVFW_FALCON_APPIF_DMEMMAPPER_CMD_SB,
+ };
+
+ // Return early as we found and patched the DMEMMAPPER region.
+ return Ok(Self(dma_object, PhantomData));
+ }
+
+ Err(ENOTSUPP)
+ }
+}
+
+impl FwsecFirmware {
+ /// Extract the Fwsec firmware from `bios` and patch it to run on `falcon` with the `cmd`
+ /// command.
+ pub(crate) fn new(
+ falcon: &Falcon<Gsp>,
+ dev: &Device<device::Bound>,
+ bar: &Bar0,
+ bios: &Vbios,
+ cmd: FwsecCommand,
+ ) -> Result<Self> {
+ let ucode_dma = FirmwareDmaObject::<Self, _>::new_fwsec(dev, bios, cmd)?;
+
+ // Patch signature if needed.
+ let desc = bios.fwsec_image().header(dev)?;
+ let ucode_signed = if desc.signature_count != 0 {
+ let sig_base_img = (desc.imem_load_size + desc.pkc_data_offset) as usize;
+ let desc_sig_versions = desc.signature_versions as u32;
+ let reg_fuse_version =
+ falcon.signature_reg_fuse_version(bar, desc.engine_id_mask, desc.ucode_id)?;
+ dev_dbg!(
+ dev,
+ "desc_sig_versions: {:#x}, reg_fuse_version: {}\n",
+ desc_sig_versions,
+ reg_fuse_version
+ );
+ let signature_idx = {
+ let reg_fuse_version_bit = 1 << reg_fuse_version;
+
+ // Check if the fuse version is supported by the firmware.
+ if desc_sig_versions & reg_fuse_version_bit == 0 {
+ dev_err!(
+ dev,
+ "no matching signature: {:#x} {:#x}\n",
+ reg_fuse_version_bit,
+ desc_sig_versions,
+ );
+ return Err(EINVAL);
+ }
+
+ // `desc_sig_versions` has one bit set per included signature. Thus, the index of
+ // the signature to patch is the number of bits in `desc_sig_versions` set to `1`
+ // before `reg_fuse_version_bit`.
+
+ // Mask of the bits of `desc_sig_versions` to preserve.
+ let reg_fuse_version_mask = reg_fuse_version_bit.wrapping_sub(1);
+
+ (desc_sig_versions & reg_fuse_version_mask).count_ones() as usize
+ };
+
+ dev_dbg!(dev, "patching signature with index {}\n", signature_idx);
+ let signature = bios
+ .fwsec_image()
+ .sigs(dev, desc)
+ .and_then(|sigs| sigs.get(signature_idx).ok_or(EINVAL))?;
+
+ ucode_dma.patch_signature(signature, sig_base_img)?
+ } else {
+ ucode_dma.no_patch_signature()
+ };
+
+ Ok(FwsecFirmware {
+ desc: desc.clone(),
+ ucode: ucode_signed,
+ })
+ }
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 413f1ab85b37926cdfd9a9c76167816b21d89adc..b0bc390b972b5e75538797acd6abffd013a8a159 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -6,6 +6,7 @@
use crate::falcon::{gsp::Gsp, sec2::Sec2, Falcon};
use crate::fb::FbLayout;
use crate::fb::SysmemFlush;
+use crate::firmware::fwsec::{FwsecCommand, FwsecFirmware};
use crate::firmware::{Firmware, FIRMWARE_VERSION};
use crate::gfw;
use crate::regs;
@@ -223,8 +224,18 @@ pub(crate) fn new(
let fb_layout = FbLayout::new(spec.chipset, bar)?;
dev_dbg!(pdev.as_ref(), "{:#x?}\n", fb_layout);
- // Will be used in a later patch when fwsec firmware is needed.
- let _bios = Vbios::new(pdev, bar)?;
+ let bios = Vbios::new(pdev, bar)?;
+
+ let _fwsec_frts = FwsecFirmware::new(
+ &gsp_falcon,
+ pdev.as_ref(),
+ bar,
+ &bios,
+ FwsecCommand::Frts {
+ frts_addr: fb_layout.frts.start,
+ frts_size: fb_layout.frts.end - fb_layout.frts.start,
+ },
+ )?;
Ok(pin_init!(Self {
spec,
diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs
index 032ee510646af21f26f3f46c2d54a0f812c25978..cac55d1534831775c14f3fed1e939ed89c7eba84 100644
--- a/drivers/gpu/nova-core/vbios.rs
+++ b/drivers/gpu/nova-core/vbios.rs
@@ -2,10 +2,8 @@
//! VBIOS extraction and parsing.
-// To be removed when all code is used.
-#![expect(dead_code)]
-
use crate::driver::Bar0;
+use crate::firmware::fwsec::Bcrt30Rsa3kSignature;
use crate::firmware::FalconUCodeDescV3;
use core::convert::TryFrom;
use kernel::device;
@@ -1124,15 +1122,18 @@ pub(crate) fn ucode(&self, dev: &device::Device, desc: &FalconUCodeDescV3) -> Re
}
/// Get the signatures as a byte slice
- pub(crate) fn sigs(&self, dev: &device::Device, desc: &FalconUCodeDescV3) -> Result<&[u8]> {
- const SIG_SIZE: usize = 96 * 4;
-
+ pub(crate) fn sigs(
+ &self,
+ dev: &device::Device,
+ desc: &FalconUCodeDescV3,
+ ) -> Result<&[Bcrt30Rsa3kSignature]> {
// The signatures data follows the descriptor
let sigs_data_offset = self.falcon_ucode_offset + core::mem::size_of::<FalconUCodeDescV3>();
- let size = desc.signature_count as usize * SIG_SIZE;
+ let sigs_size =
+ desc.signature_count as usize * core::mem::size_of::<Bcrt30Rsa3kSignature>();
// Make sure the data is within bounds
- if sigs_data_offset + size > self.base.data.len() {
+ if sigs_data_offset + sigs_size > self.base.data.len() {
dev_err!(
dev,
"fwsec signatures data not contained within BIOS bounds\n"
@@ -1140,6 +1141,17 @@ pub(crate) fn sigs(&self, dev: &device::Device, desc: &FalconUCodeDescV3) -> Res
return Err(ERANGE);
}
- Ok(&self.base.data[sigs_data_offset..sigs_data_offset + size])
+ // SAFETY: we checked that `data + sigs_data_offset + (signature_count *
+ // sizeof::<Bcrt30Rsa3kSignature>()` is within the bounds of `data`.
+ Ok(unsafe {
+ core::slice::from_raw_parts(
+ self.base
+ .data
+ .as_ptr()
+ .add(sigs_data_offset)
+ .cast::<Bcrt30Rsa3kSignature>(),
+ desc.signature_count as usize,
+ )
+ })
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 23/23] gpu: nova-core: load and run FWSEC-FRTS
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (21 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 22/23] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
@ 2025-06-12 14:01 ` Alexandre Courbot
2025-06-18 20:23 ` Danilo Krummrich
2025-06-17 20:14 ` [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
2025-06-18 20:14 ` Danilo Krummrich
24 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-12 14:01 UTC (permalink / raw)
To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel,
Alexandre Courbot, Lyude Paul
With all the required pieces in place, load FWSEC-FRTS onto the GSP
falcon, run it, and check that it successfully carved out the WPR2
region out of framebuffer memory.
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/falcon.rs | 3 --
drivers/gpu/nova-core/gpu.rs | 63 ++++++++++++++++++++++++++++++++++++++++-
drivers/gpu/nova-core/regs.rs | 15 ++++++++++
3 files changed, 77 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
index 25ed8ee30def3abcc43bcba965eb62f49d532604..486be64895a0250ae4263de708784a8fdf1d54b5 100644
--- a/drivers/gpu/nova-core/falcon.rs
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -2,9 +2,6 @@
//! Falcon microprocessor base support
-// To be removed when all code is used.
-#![expect(dead_code)]
-
use core::ops::Deref;
use core::time::Duration;
use hal::FalconHal;
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index b0bc390b972b5e75538797acd6abffd013a8a159..7af35ffa1d2f900e0117a55ec41312d16d718f67 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -226,7 +226,7 @@ pub(crate) fn new(
let bios = Vbios::new(pdev, bar)?;
- let _fwsec_frts = FwsecFirmware::new(
+ let fwsec_frts = FwsecFirmware::new(
&gsp_falcon,
pdev.as_ref(),
bar,
@@ -237,6 +237,67 @@ pub(crate) fn new(
},
)?;
+ // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
+ // reset.
+ if regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() != 0 {
+ dev_err!(
+ pdev.as_ref(),
+ "WPR2 region already exists - GPU needs to be reset to proceed\n"
+ );
+ return Err(EBUSY);
+ }
+
+ // Reset falcon, load FWSEC-FRTS, and run it.
+ gsp_falcon
+ .reset(bar)
+ .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to reset GSP falcon: {:?}\n", e))?;
+ gsp_falcon
+ .dma_load(bar, &fwsec_frts)
+ .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to load FWSEC-FRTS: {:?}\n", e))?;
+ let (mbox0, _) = gsp_falcon
+ .boot(bar, Some(0), None)
+ .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to boot FWSEC-FRTS: {:?}\n", e))?;
+ if mbox0 != 0 {
+ dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
+ return Err(EIO);
+ }
+
+ // SCRATCH_E contains FWSEC-FRTS' error code, if any.
+ let frts_status = regs::NV_PBUS_SW_SCRATCH_0E::read(bar).frts_err_code();
+ if frts_status != 0 {
+ dev_err!(
+ pdev.as_ref(),
+ "FWSEC-FRTS returned with error code {:#x}",
+ frts_status
+ );
+ return Err(EIO);
+ }
+
+ // Check the WPR2 has been created as we requested.
+ let (wpr2_lo, wpr2_hi) = (
+ (regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO::read(bar).lo_val() as u64) << 12,
+ (regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() as u64) << 12,
+ );
+ if wpr2_hi == 0 {
+ dev_err!(
+ pdev.as_ref(),
+ "WPR2 region not created after running FWSEC-FRTS\n"
+ );
+
+ return Err(EIO);
+ } else if wpr2_lo != fb_layout.frts.start {
+ dev_err!(
+ pdev.as_ref(),
+ "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
+ wpr2_lo,
+ fb_layout.frts.start,
+ );
+ return Err(EIO);
+ }
+
+ dev_dbg!(pdev.as_ref(), "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
+ dev_dbg!(pdev.as_ref(), "GPU instance built\n");
+
Ok(pin_init!(Self {
spec,
bar: devres_bar,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 54d4d37d6bf2c31947b965258d2733009c293a18..2a2d5610e552780957bcf00e0da1ec4cd3ac85d2 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -42,6 +42,13 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
}
}
+/* PBUS */
+
+// TODO: this is an array of registers.
+register!(NV_PBUS_SW_SCRATCH_0E@0x00001438 {
+ 31:16 frts_err_code as u16;
+});
+
/* PFB */
register!(NV_PFB_NISO_FLUSH_SYSMEM_ADDR @ 0x00100c10 {
@@ -73,6 +80,14 @@ pub(crate) fn usable_fb_size(self) -> u64 {
}
}
+register!(NV_PFB_PRI_MMU_WPR2_ADDR_LO@0x001fa824 {
+ 31:4 lo_val as u32;
+});
+
+register!(NV_PFB_PRI_MMU_WPR2_ADDR_HI@0x001fa828 {
+ 31:4 hi_val as u32;
+});
+
/* PGC6 */
register!(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK @ 0x00118128 {
--
2.49.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
@ 2025-06-12 15:07 ` Boqun Feng
2025-06-12 20:00 ` John Hubbard
2025-06-13 14:16 ` Alexandre Courbot
2025-06-14 17:31 ` Boqun Feng
` (2 subsequent siblings)
3 siblings, 2 replies; 58+ messages in thread
From: Boqun Feng @ 2025-06-12 15:07 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
> Introduce the `num` module, featuring the `PowerOfTwo` unsigned wrapper
> that guarantees (at build-time or runtime) that a value is a power of
> two.
>
> Such a property is often useful to maintain. In the context of the
> kernel, powers of two are often used to align addresses or sizes up and
> down, or to create masks. These operations are provided by this type.
>
> It is introduced to be first used by the nova-core driver.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> rust/kernel/lib.rs | 1 +
> rust/kernel/num.rs | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 174 insertions(+)
>
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 6b4774b2b1c37f4da1866e993be6230bc6715841..2955f65da1278dd4cba1e4272ff178b8211a892c 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -89,6 +89,7 @@
> pub mod mm;
> #[cfg(CONFIG_NET)]
> pub mod net;
> +pub mod num;
> pub mod of;
> #[cfg(CONFIG_PM_OPP)]
> pub mod opp;
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a
> --- /dev/null
> +++ b/rust/kernel/num.rs
> @@ -0,0 +1,173 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Numerical and binary utilities for primitive types.
> +
> +use crate::build_assert;
> +use core::borrow::Borrow;
> +use core::fmt::Debug;
> +use core::hash::Hash;
> +use core::ops::Deref;
> +
> +/// An unsigned integer which is guaranteed to be a power of 2.
> +#[derive(Debug, Clone, Copy)]
> +#[repr(transparent)]
> +pub struct PowerOfTwo<T>(T);
> +
> +macro_rules! power_of_two_impl {
> + ($($t:ty),+) => {
> + $(
> + impl PowerOfTwo<$t> {
> + /// Validates that `v` is a power of two at build-time, and returns it wrapped into
> + /// `PowerOfTwo`.
> + ///
> + /// A build error is triggered if `v` cannot be asserted to be a power of two.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// let v = PowerOfTwo::<u32>::new(256);
> + /// assert_eq!(v.value(), 256);
> + /// ```
> + #[inline(always)]
> + pub const fn new(v: $t) -> Self {
Then this function should be unsafe, because an invalid `v` can create
an invalid PowerOfTwo.
> + build_assert!(v.count_ones() == 1);
> + Self(v)
> + }
> +
> + /// Validates that `v` is a power of two at runtime, and returns it wrapped into
> + /// `PowerOfTwo`.
> + ///
> + /// `None` is returned if `v` was not a power of two.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// assert_eq!(PowerOfTwo::<u32>::try_new(16).unwrap().value(), 16);
> + /// assert_eq!(PowerOfTwo::<u32>::try_new(15), None);
> + /// ```
> + #[inline(always)]
> + pub const fn try_new(v: $t) -> Option<Self> {
> + match v.count_ones() {
> + 1 => Some(Self(v)),
> + _ => None,
> + }
> + }
> +
> + /// Returns the value of this instance.
> + ///
> + /// It is guaranteed to be a power of two.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// let v = PowerOfTwo::<u32>::new(256);
> + /// assert_eq!(v.value(), 256);
> + /// ```
> + #[inline(always)]
> + pub const fn value(&self) -> $t {
> + self.0
> + }
> +
> + /// Returns the mask corresponding to `self.value() - 1`.
> + #[inline(always)]
> + pub const fn mask(&self) -> $t {
> + self.0.wrapping_sub(1)
> + }
> +
> + /// Aligns `self` down to `alignment`.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_down(0x4fff), 0x4000);
> + /// ```
> + #[inline(always)]
> + pub const fn align_down(self, value: $t) -> $t {
I'm late to party, but could we instead implement:
pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
value & !((1 << shift) - 1)
}
pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
let mask = (1 << shift) - 1;
value.wrapping_add(mask) & !mask
}
? It's much harder to pass an invalid alignment with this.
Regards,
Boqun
> + value & !self.mask()
> + }
> +
> + /// Aligns `value` up to `self`.
> + ///
> + /// Wraps around to `0` if the requested alignment pushes the result above the
> + /// type's limits.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x4fff), 0x5000);
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x4000), 0x4000);
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x0), 0x0);
> + /// assert_eq!(PowerOfTwo::<u16>::new(0x100).align_up(0xffff), 0x0);
> + /// ```
> + #[inline(always)]
> + pub const fn align_up(self, value: $t) -> $t {
> + self.align_down(value.wrapping_add(self.mask()))
> + }
> + }
> + )+
> + };
> +}
> +
> +power_of_two_impl!(usize, u8, u16, u32, u64, u128);
> +
> +impl<T> Deref for PowerOfTwo<T> {
> + type Target = T;
> +
> + fn deref(&self) -> &Self::Target {
> + &self.0
> + }
> +}
> +
> +impl<T> PartialEq for PowerOfTwo<T>
> +where
> + T: PartialEq,
> +{
> + fn eq(&self, other: &Self) -> bool {
> + self.0 == other.0
> + }
> +}
> +
> +impl<T> Eq for PowerOfTwo<T> where T: Eq {}
> +
> +impl<T> PartialOrd for PowerOfTwo<T>
> +where
> + T: PartialOrd,
> +{
> + fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
> + self.0.partial_cmp(&other.0)
> + }
> +}
> +
> +impl<T> Ord for PowerOfTwo<T>
> +where
> + T: Ord,
> +{
> + fn cmp(&self, other: &Self) -> core::cmp::Ordering {
> + self.0.cmp(&other.0)
> + }
> +}
> +
> +impl<T> Hash for PowerOfTwo<T>
> +where
> + T: Hash,
> +{
> + fn hash<H: core::hash::Hasher>(&self, state: &mut H) {
> + self.0.hash(state);
> + }
> +}
> +
> +impl<T> Borrow<T> for PowerOfTwo<T> {
> + fn borrow(&self) -> &T {
> + &self.0
> + }
> +}
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 15:07 ` Boqun Feng
@ 2025-06-12 20:00 ` John Hubbard
2025-06-12 20:05 ` Boqun Feng
2025-06-13 14:16 ` Alexandre Courbot
1 sibling, 1 reply; 58+ messages in thread
From: John Hubbard @ 2025-06-12 20:00 UTC (permalink / raw)
To: Boqun Feng, Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, Ben Skeggs, Joel Fernandes,
Timur Tabi, Alistair Popple, linux-kernel, rust-for-linux,
nouveau, dri-devel
On 6/12/25 8:07 AM, Boqun Feng wrote:
> On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
...
>> + #[inline(always)]
>> + pub const fn align_down(self, value: $t) -> $t {
>
> I'm late to party, but could we instead implement:
>
> pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
> value & !((1 << shift) - 1)
> }
>
> pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
> let mask = (1 << shift) - 1;
> value.wrapping_add(mask) & !mask
> }
Just a naming concern here.
The function name, and the "shift" argument is extremely odd there.
And that's because it is re-inventing the concept of align_down()
and align_up(), but with a misleading name and a hard to understand
"shift" argument.
If you are "rounding" to a power of two, that's normally called
alignment, at least in kernel code. And if you are rounding to the
nearest...integer, for example, that's rounding.
But "rounding" with a "shift" argument? That's a little too
creative! :)
>
> ? It's much harder to pass an invalid alignment with this.
Hopefully we can address argument validation without blowing up
the usual naming conventions.
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 20:00 ` John Hubbard
@ 2025-06-12 20:05 ` Boqun Feng
2025-06-12 20:08 ` John Hubbard
0 siblings, 1 reply; 58+ messages in thread
From: Boqun Feng @ 2025-06-12 20:05 UTC (permalink / raw)
To: John Hubbard
Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Thu, Jun 12, 2025 at 01:00:12PM -0700, John Hubbard wrote:
> On 6/12/25 8:07 AM, Boqun Feng wrote:
> > On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
> ...
> >> + #[inline(always)]
> >> + pub const fn align_down(self, value: $t) -> $t {
> >
> > I'm late to party, but could we instead implement:
> >
> > pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
> > value & !((1 << shift) - 1)
> > }
> >
> > pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
> > let mask = (1 << shift) - 1;
> > value.wrapping_add(mask) & !mask
> > }
>
> Just a naming concern here.
>
> The function name, and the "shift" argument is extremely odd there.
> And that's because it is re-inventing the concept of align_down()
> and align_up(), but with a misleading name and a hard to understand
> "shift" argument.
>
> If you are "rounding" to a power of two, that's normally called
> alignment, at least in kernel code. And if you are rounding to the
> nearest...integer, for example, that's rounding.
>
> But "rounding" with a "shift" argument? That's a little too
> creative! :)
>
Oh, sorry, I should have mentioned where I got these names, see
round_up() and round_down() in include/linux/math.h. But no objection to
find a better name for "shift".
Regards,
Boqun
> >
> > ? It's much harder to pass an invalid alignment with this.
>
> Hopefully we can address argument validation without blowing up
> the usual naming conventions.
>
>
> thanks,
> --
> John Hubbard
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 20:05 ` Boqun Feng
@ 2025-06-12 20:08 ` John Hubbard
2025-06-12 20:12 ` Boqun Feng
0 siblings, 1 reply; 58+ messages in thread
From: John Hubbard @ 2025-06-12 20:08 UTC (permalink / raw)
To: Boqun Feng
Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On 6/12/25 1:05 PM, Boqun Feng wrote:
> On Thu, Jun 12, 2025 at 01:00:12PM -0700, John Hubbard wrote:
>> On 6/12/25 8:07 AM, Boqun Feng wrote:
>>> On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
>> ...
>>>> + #[inline(always)]
>>>> + pub const fn align_down(self, value: $t) -> $t {
>>>
>>> I'm late to party, but could we instead implement:
>>>
>>> pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
>>> value & !((1 << shift) - 1)
>>> }
>>>
>>> pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
>>> let mask = (1 << shift) - 1;
>>> value.wrapping_add(mask) & !mask
>>> }
>>
>> Just a naming concern here.
>>
>> The function name, and the "shift" argument is extremely odd there.
>> And that's because it is re-inventing the concept of align_down()
>> and align_up(), but with a misleading name and a hard to understand
>> "shift" argument.
>>
>> If you are "rounding" to a power of two, that's normally called
>> alignment, at least in kernel code. And if you are rounding to the
>> nearest...integer, for example, that's rounding.
>>
>> But "rounding" with a "shift" argument? That's a little too
>> creative! :)
>>
>
> Oh, sorry, I should have mentioned where I got these names, see
> round_up() and round_down() in include/linux/math.h. But no objection to
> find a better name for "shift".
lol, perfect response! So my complaint is really about the kernel's existing
math.h, rather than your proposal. OK then. :)
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 20:08 ` John Hubbard
@ 2025-06-12 20:12 ` Boqun Feng
0 siblings, 0 replies; 58+ messages in thread
From: Boqun Feng @ 2025-06-12 20:12 UTC (permalink / raw)
To: John Hubbard
Cc: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Thu, Jun 12, 2025 at 01:08:25PM -0700, John Hubbard wrote:
> On 6/12/25 1:05 PM, Boqun Feng wrote:
> > On Thu, Jun 12, 2025 at 01:00:12PM -0700, John Hubbard wrote:
> >> On 6/12/25 8:07 AM, Boqun Feng wrote:
> >>> On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
> >> ...
> >>>> + #[inline(always)]
> >>>> + pub const fn align_down(self, value: $t) -> $t {
> >>>
> >>> I'm late to party, but could we instead implement:
> >>>
> >>> pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
> >>> value & !((1 << shift) - 1)
> >>> }
> >>>
> >>> pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
> >>> let mask = (1 << shift) - 1;
> >>> value.wrapping_add(mask) & !mask
> >>> }
> >>
> >> Just a naming concern here.
> >>
> >> The function name, and the "shift" argument is extremely odd there.
> >> And that's because it is re-inventing the concept of align_down()
> >> and align_up(), but with a misleading name and a hard to understand
> >> "shift" argument.
> >>
> >> If you are "rounding" to a power of two, that's normally called
> >> alignment, at least in kernel code. And if you are rounding to the
> >> nearest...integer, for example, that's rounding.
> >>
> >> But "rounding" with a "shift" argument? That's a little too
> >> creative! :)
> >>
> >
> > Oh, sorry, I should have mentioned where I got these names, see
> > round_up() and round_down() in include/linux/math.h. But no objection to
> > find a better name for "shift".
>
> lol, perfect response! So my complaint is really about the kernel's existing
> math.h, rather than your proposal. OK then. :)
>
;-) I realised I misunderstood round_up() and round_down(), I thought
they are using the numbers of bits of the alignment, but it turns out
they are using the alignment itself. What I tried to suggest is that
for this align functions, we use numbers of bits instead of alignment.
Not sure about the name now :)
Regards,
Boqun
> thanks,
> --
> John Hubbard
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 15:07 ` Boqun Feng
2025-06-12 20:00 ` John Hubbard
@ 2025-06-13 14:16 ` Alexandre Courbot
2025-06-13 15:25 ` Boqun Feng
2025-06-14 17:08 ` Boqun Feng
1 sibling, 2 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-13 14:16 UTC (permalink / raw)
To: Boqun Feng
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Fri Jun 13, 2025 at 12:07 AM JST, Boqun Feng wrote:
> On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
>> Introduce the `num` module, featuring the `PowerOfTwo` unsigned wrapper
>> that guarantees (at build-time or runtime) that a value is a power of
>> two.
>>
>> Such a property is often useful to maintain. In the context of the
>> kernel, powers of two are often used to align addresses or sizes up and
>> down, or to create masks. These operations are provided by this type.
>>
>> It is introduced to be first used by the nova-core driver.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> rust/kernel/lib.rs | 1 +
>> rust/kernel/num.rs | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 174 insertions(+)
>>
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index 6b4774b2b1c37f4da1866e993be6230bc6715841..2955f65da1278dd4cba1e4272ff178b8211a892c 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -89,6 +89,7 @@
>> pub mod mm;
>> #[cfg(CONFIG_NET)]
>> pub mod net;
>> +pub mod num;
>> pub mod of;
>> #[cfg(CONFIG_PM_OPP)]
>> pub mod opp;
>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a
>> --- /dev/null
>> +++ b/rust/kernel/num.rs
>> @@ -0,0 +1,173 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Numerical and binary utilities for primitive types.
>> +
>> +use crate::build_assert;
>> +use core::borrow::Borrow;
>> +use core::fmt::Debug;
>> +use core::hash::Hash;
>> +use core::ops::Deref;
>> +
>> +/// An unsigned integer which is guaranteed to be a power of 2.
>> +#[derive(Debug, Clone, Copy)]
>> +#[repr(transparent)]
>> +pub struct PowerOfTwo<T>(T);
>> +
>> +macro_rules! power_of_two_impl {
>> + ($($t:ty),+) => {
>> + $(
>> + impl PowerOfTwo<$t> {
>> + /// Validates that `v` is a power of two at build-time, and returns it wrapped into
>> + /// `PowerOfTwo`.
>> + ///
>> + /// A build error is triggered if `v` cannot be asserted to be a power of two.
>> + ///
>> + /// # Examples
>> + ///
>> + /// ```
>> + /// use kernel::num::PowerOfTwo;
>> + ///
>> + /// let v = PowerOfTwo::<u32>::new(256);
>> + /// assert_eq!(v.value(), 256);
>> + /// ```
>> + #[inline(always)]
>> + pub const fn new(v: $t) -> Self {
>
> Then this function should be unsafe, because an invalid `v` can create
> an invalid PowerOfTwo.
Doesn't the `build_assert` below allow us to keep this method safe,
since it will fail at build-time if it cannot be asserted that `v` is a
power of two?
>
>> + build_assert!(v.count_ones() == 1);
>> + Self(v)
>> + }
>> +
>> + /// Validates that `v` is a power of two at runtime, and returns it wrapped into
>> + /// `PowerOfTwo`.
>> + ///
>> + /// `None` is returned if `v` was not a power of two.
>> + ///
>> + /// # Examples
>> + ///
>> + /// ```
>> + /// use kernel::num::PowerOfTwo;
>> + ///
>> + /// assert_eq!(PowerOfTwo::<u32>::try_new(16).unwrap().value(), 16);
>> + /// assert_eq!(PowerOfTwo::<u32>::try_new(15), None);
>> + /// ```
>> + #[inline(always)]
>> + pub const fn try_new(v: $t) -> Option<Self> {
>> + match v.count_ones() {
>> + 1 => Some(Self(v)),
>> + _ => None,
>> + }
>> + }
>> +
>> + /// Returns the value of this instance.
>> + ///
>> + /// It is guaranteed to be a power of two.
>> + ///
>> + /// # Examples
>> + ///
>> + /// ```
>> + /// use kernel::num::PowerOfTwo;
>> + ///
>> + /// let v = PowerOfTwo::<u32>::new(256);
>> + /// assert_eq!(v.value(), 256);
>> + /// ```
>> + #[inline(always)]
>> + pub const fn value(&self) -> $t {
>> + self.0
>> + }
>> +
>> + /// Returns the mask corresponding to `self.value() - 1`.
>> + #[inline(always)]
>> + pub const fn mask(&self) -> $t {
>> + self.0.wrapping_sub(1)
>> + }
>> +
>> + /// Aligns `self` down to `alignment`.
>> + ///
>> + /// # Examples
>> + ///
>> + /// ```
>> + /// use kernel::num::PowerOfTwo;
>> + ///
>> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_down(0x4fff), 0x4000);
>> + /// ```
>> + #[inline(always)]
>> + pub const fn align_down(self, value: $t) -> $t {
>
> I'm late to party, but could we instead implement:
>
> pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
> value & !((1 << shift) - 1)
> }
>
> pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
> let mask = (1 << shift) - 1;
> value.wrapping_add(mask) & !mask
> }
>
> ? It's much harder to pass an invalid alignment with this.
It also forces you to think in terms of shifts instead of values - i.e.
you cannot round to `0x1000` as it commonly done in the kernel, now you
need to do some mental gymnastics to know it is actually a shift of `12`.
Being able to use the actual value to round to is more familiar (and
natural) to me.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-13 14:16 ` Alexandre Courbot
@ 2025-06-13 15:25 ` Boqun Feng
2025-06-14 17:08 ` Boqun Feng
1 sibling, 0 replies; 58+ messages in thread
From: Boqun Feng @ 2025-06-13 15:25 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Fri, Jun 13, 2025 at 11:16:10PM +0900, Alexandre Courbot wrote:
[...]
> >> +#[repr(transparent)]
> >> +pub struct PowerOfTwo<T>(T);
> >> +
> >> +macro_rules! power_of_two_impl {
> >> + ($($t:ty),+) => {
> >> + $(
> >> + impl PowerOfTwo<$t> {
> >> + /// Validates that `v` is a power of two at build-time, and returns it wrapped into
> >> + /// `PowerOfTwo`.
> >> + ///
> >> + /// A build error is triggered if `v` cannot be asserted to be a power of two.
> >> + ///
> >> + /// # Examples
> >> + ///
> >> + /// ```
> >> + /// use kernel::num::PowerOfTwo;
> >> + ///
> >> + /// let v = PowerOfTwo::<u32>::new(256);
> >> + /// assert_eq!(v.value(), 256);
> >> + /// ```
> >> + #[inline(always)]
> >> + pub const fn new(v: $t) -> Self {
> >
> > Then this function should be unsafe, because an invalid `v` can create
> > an invalid PowerOfTwo.
>
> Doesn't the `build_assert` below allow us to keep this method safe,
> since it will fail at build-time if it cannot be asserted that `v` is a
> power of two?
>
You're right, I misunderstood a bit, so if compiler cannot be sure about
the assertion from build_assert!() it'll still generate a build error,
i.e. even for cases like:
pub fn my_power_of_two(v: i32) -> PowerOfTwo<i32> {
PowerOfTwo::new(v)
}
where `v` is a user input and the value is unknown at the build time.
build_assert!() will trigger.
Regards,
Boqun
> >
> >> + build_assert!(v.count_ones() == 1);
> >> + Self(v)
> >> + }
[...]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-13 14:16 ` Alexandre Courbot
2025-06-13 15:25 ` Boqun Feng
@ 2025-06-14 17:08 ` Boqun Feng
2025-06-16 5:14 ` Alexandre Courbot
1 sibling, 1 reply; 58+ messages in thread
From: Boqun Feng @ 2025-06-14 17:08 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Fri, Jun 13, 2025 at 11:16:10PM +0900, Alexandre Courbot wrote:
[...]
> >> + /// Aligns `self` down to `alignment`.
> >> + ///
> >> + /// # Examples
> >> + ///
> >> + /// ```
> >> + /// use kernel::num::PowerOfTwo;
> >> + ///
> >> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_down(0x4fff), 0x4000);
> >> + /// ```
> >> + #[inline(always)]
> >> + pub const fn align_down(self, value: $t) -> $t {
> >
> > I'm late to party, but could we instead implement:
> >
> > pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
> > value & !((1 << shift) - 1)
> > }
> >
> > pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
> > let mask = (1 << shift) - 1;
> > value.wrapping_add(mask) & !mask
> > }
> >
> > ? It's much harder to pass an invalid alignment with this.
>
> It also forces you to think in terms of shifts instead of values - i.e.
> you cannot round to `0x1000` as it commonly done in the kernel, now you
Well, for const values, you can always define:
const ROUND_SHIFT_0X1000: i32 = 12;
because `0x1000` is just a name ;-)
or we define an Alignment in term of the shift:
pub struct Alignment {
shift: i8,
}
ipml Alignment {
pub const new(shift: i8) -> Self {
Self { shift }
}
}
then
const ALIGN_0x1000: Alignment = Alignment::new(12);
and
pub const fn round_down_i32(value: i32, align: Alignment) -> i32 {
...
}
My point was that instead of the value itself, we can always use the
shift to represent a power of two, and that would avoid troubles when we
need to check the internal representation.
That said, after some experiments by myself, I haven't found any
significant difference between shift representations vs value
representations. So no strong reason of using a shift representation.
Regards,
Boqun
> need to do some mental gymnastics to know it is actually a shift of `12`.
> Being able to use the actual value to round to is more familiar (and
> natural) to me.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
2025-06-12 15:07 ` Boqun Feng
@ 2025-06-14 17:31 ` Boqun Feng
2025-06-16 5:19 ` Alexandre Courbot
2025-06-14 19:09 ` Benno Lossin
2025-06-15 13:32 ` Miguel Ojeda
3 siblings, 1 reply; 58+ messages in thread
From: Boqun Feng @ 2025-06-14 17:31 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
[...]
> +/// An unsigned integer which is guaranteed to be a power of 2.
> +#[derive(Debug, Clone, Copy)]
> +#[repr(transparent)]
> +pub struct PowerOfTwo<T>(T);
> +
[...]
> +impl<T> Deref for PowerOfTwo<T> {
Why do we need `impl Deref` (and the `impl Borrow` below)? A similar
concept `NonZero` in std doesn't impl them as well.
> + type Target = T;
> +
> + fn deref(&self) -> &Self::Target {
> + &self.0
> + }
> +}
> +
> +impl<T> PartialEq for PowerOfTwo<T>
Any reason you want to impl these manually instead of deriving? For
`NonZero`, the std wants to impl these traits only for
`ZeroablePrimitive` types, but we don't have a similar trait here.
Explaining the above in the comments is much appreciated.
Regards,
Boqun
> +where
> + T: PartialEq,
> +{
> + fn eq(&self, other: &Self) -> bool {
> + self.0 == other.0
> + }
> +}
> +
> +impl<T> Eq for PowerOfTwo<T> where T: Eq {}
> +
> +impl<T> PartialOrd for PowerOfTwo<T>
> +where
> + T: PartialOrd,
> +{
> + fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
> + self.0.partial_cmp(&other.0)
> + }
> +}
> +
> +impl<T> Ord for PowerOfTwo<T>
> +where
> + T: Ord,
> +{
> + fn cmp(&self, other: &Self) -> core::cmp::Ordering {
> + self.0.cmp(&other.0)
> + }
> +}
> +
> +impl<T> Hash for PowerOfTwo<T>
> +where
> + T: Hash,
> +{
> + fn hash<H: core::hash::Hasher>(&self, state: &mut H) {
> + self.0.hash(state);
> + }
> +}
> +
> +impl<T> Borrow<T> for PowerOfTwo<T> {
> + fn borrow(&self) -> &T {
> + &self.0
> + }
> +}
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
2025-06-12 15:07 ` Boqun Feng
2025-06-14 17:31 ` Boqun Feng
@ 2025-06-14 19:09 ` Benno Lossin
2025-06-15 13:32 ` Miguel Ojeda
3 siblings, 0 replies; 58+ messages in thread
From: Benno Lossin @ 2025-06-14 19:09 UTC (permalink / raw)
To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel
On Thu Jun 12, 2025 at 4:01 PM CEST, Alexandre Courbot wrote:
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a
> --- /dev/null
> +++ b/rust/kernel/num.rs
> @@ -0,0 +1,173 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Numerical and binary utilities for primitive types.
> +
> +use crate::build_assert;
> +use core::borrow::Borrow;
> +use core::fmt::Debug;
> +use core::hash::Hash;
> +use core::ops::Deref;
> +
> +/// An unsigned integer which is guaranteed to be a power of 2.
> +#[derive(Debug, Clone, Copy)]
> +#[repr(transparent)]
Let's add a `# Safety` section with the invariant that `T` is a power of
2.
Maybe we should even have an `Int` trait for the different integer types
that we constrain `T` to.
> +pub struct PowerOfTwo<T>(T);
> +
> +macro_rules! power_of_two_impl {
> + ($($t:ty),+) => {
> + $(
> + impl PowerOfTwo<$t> {
> + /// Validates that `v` is a power of two at build-time, and returns it wrapped into
> + /// `PowerOfTwo`.
> + ///
> + /// A build error is triggered if `v` cannot be asserted to be a power of two.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// let v = PowerOfTwo::<u32>::new(256);
> + /// assert_eq!(v.value(), 256);
> + /// ```
> + #[inline(always)]
> + pub const fn new(v: $t) -> Self {
> + build_assert!(v.count_ones() == 1);
> + Self(v)
> + }
We also probably want an `unsafe new_unchecked(v: $t) -> Self`. It can
still use a `debug_assert!` to verify the value.
> +
> + /// Validates that `v` is a power of two at runtime, and returns it wrapped into
> + /// `PowerOfTwo`.
> + ///
> + /// `None` is returned if `v` was not a power of two.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// assert_eq!(PowerOfTwo::<u32>::try_new(16).unwrap().value(), 16);
> + /// assert_eq!(PowerOfTwo::<u32>::try_new(15), None);
> + /// ```
> + #[inline(always)]
> + pub const fn try_new(v: $t) -> Option<Self> {
> + match v.count_ones() {
> + 1 => Some(Self(v)),
> + _ => None,
> + }
> + }
> +
> + /// Returns the value of this instance.
> + ///
> + /// It is guaranteed to be a power of two.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// let v = PowerOfTwo::<u32>::new(256);
> + /// assert_eq!(v.value(), 256);
> + /// ```
> + #[inline(always)]
> + pub const fn value(&self) -> $t {
Since this type is `Copy`, we should use `self` here instead of `&self`.
Why not add
if !self.0.is_power_of_two() {
unsafe { ::core::hint::unreachable_unchecked() }
}
here?
> + self.0
> + }
> +
> + /// Returns the mask corresponding to `self.value() - 1`.
> + #[inline(always)]
> + pub const fn mask(&self) -> $t {
> + self.0.wrapping_sub(1)
And then use `self.value()` here instead?
(we could even use `self.value() - 1`, since the optimizer can remove
the overflow check: https://godbolt.org/z/nvGaozGMW but wrapping_sub is
fine. The optimizations will most likely be more useful in other
arithmetic with `.value()`)
> + }
> +
> + /// Aligns `self` down to `alignment`.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_down(0x4fff), 0x4000);
> + /// ```
> + #[inline(always)]
> + pub const fn align_down(self, value: $t) -> $t {
> + value & !self.mask()
> + }
> +
> + /// Aligns `value` up to `self`.
> + ///
> + /// Wraps around to `0` if the requested alignment pushes the result above the
> + /// type's limits.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::PowerOfTwo;
> + ///
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x4fff), 0x5000);
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x4000), 0x4000);
> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_up(0x0), 0x0);
> + /// assert_eq!(PowerOfTwo::<u16>::new(0x100).align_up(0xffff), 0x0);
> + /// ```
> + #[inline(always)]
> + pub const fn align_up(self, value: $t) -> $t {
> + self.align_down(value.wrapping_add(self.mask()))
> + }
> + }
> + )+
> + };
> +}
> +
> +power_of_two_impl!(usize, u8, u16, u32, u64, u128);
> +
> +impl<T> Deref for PowerOfTwo<T> {
> + type Target = T;
> +
> + fn deref(&self) -> &Self::Target {
> + &self.0
> + }
> +}
> +
> +impl<T> PartialEq for PowerOfTwo<T>
> +where
> + T: PartialEq,
> +{
> + fn eq(&self, other: &Self) -> bool {
> + self.0 == other.0
> + }
> +}
> +
> +impl<T> Eq for PowerOfTwo<T> where T: Eq {}
> +
> +impl<T> PartialOrd for PowerOfTwo<T>
> +where
> + T: PartialOrd,
> +{
> + fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
> + self.0.partial_cmp(&other.0)
> + }
> +}
> +
> +impl<T> Ord for PowerOfTwo<T>
> +where
> + T: Ord,
> +{
> + fn cmp(&self, other: &Self) -> core::cmp::Ordering {
> + self.0.cmp(&other.0)
> + }
> +}
> +
> +impl<T> Hash for PowerOfTwo<T>
> +where
> + T: Hash,
> +{
> + fn hash<H: core::hash::Hasher>(&self, state: &mut H) {
> + self.0.hash(state);
> + }
> +}
Can't these traits also be implemented using the derive macros?
---
Cheers,
Benno
> +
> +impl<T> Borrow<T> for PowerOfTwo<T> {
> + fn borrow(&self) -> &T {
> + &self.0
> + }
> +}
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-12 14:01 ` [PATCH v5 05/23] rust: num: add the `fls` operation Alexandre Courbot
@ 2025-06-14 19:16 ` Benno Lossin
2025-06-16 6:41 ` Alexandre Courbot
2025-06-15 9:37 ` Miguel Ojeda
1 sibling, 1 reply; 58+ messages in thread
From: Benno Lossin @ 2025-06-14 19:16 UTC (permalink / raw)
To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel
On Thu Jun 12, 2025 at 4:01 PM CEST, Alexandre Courbot wrote:
> Add an equivalent to the `fls` (Find Last Set bit) C function to Rust
> unsigned types.
Have you tried to upstream this?
> It is to be first used by the nova-core driver.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> rust/kernel/num.rs | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
> index ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a..934afe17719f789c569dbd54534adc2e26fe59f2 100644
> --- a/rust/kernel/num.rs
> +++ b/rust/kernel/num.rs
> @@ -171,3 +171,34 @@ fn borrow(&self) -> &T {
> &self.0
> }
> }
> +
> +macro_rules! impl_fls {
> + ($($t:ty),+) => {
> + $(
> + ::kernel::macros::paste! {
> + /// Find Last Set Bit: return the 1-based index of the last (i.e. most significant) set
> + /// bit in `v`.
> + ///
> + /// Equivalent to the C `fls` function.
> + ///
> + /// # Examples
> + ///
> + /// ```
> + /// use kernel::num::fls_u32;
> + ///
> + /// assert_eq!(fls_u32(0x0), 0);
> + /// assert_eq!(fls_u32(0x1), 1);
> + /// assert_eq!(fls_u32(0x10), 5);
> + /// assert_eq!(fls_u32(0xffff), 16);
> + /// assert_eq!(fls_u32(0x8000_0000), 32);
> + /// ```
> + #[inline(always)]
> + pub const fn [<fls_ $t>](v: $t) -> u32 {
Can we name this `find_last_set_bit_ $t`? When the upstream function
lands, we should also rename this one.
---
Cheers,
Benno
> + $t::BITS - v.leading_zeros()
> + }
> + }
> + )+
> + };
> +}
> +
> +impl_fls!(usize, u8, u16, u32, u64, u128);
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-12 14:01 ` [PATCH v5 05/23] rust: num: add the `fls` operation Alexandre Courbot
2025-06-14 19:16 ` Benno Lossin
@ 2025-06-15 9:37 ` Miguel Ojeda
2025-06-15 10:51 ` Alexandre Courbot
1 sibling, 1 reply; 58+ messages in thread
From: Miguel Ojeda @ 2025-06-15 9:37 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Thu, Jun 12, 2025 at 4:02 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> + /// ```
> + /// use kernel::num::fls_u32;
> + ///
> + /// assert_eq!(fls_u32(0x0), 0);
> + /// assert_eq!(fls_u32(0x1), 1);
> + /// assert_eq!(fls_u32(0x10), 5);
> + /// assert_eq!(fls_u32(0xffff), 16);
> + /// assert_eq!(fls_u32(0x8000_0000), 32);
> + /// ```
For a future patch series: this could provide examples per type
(passing them in the `impl_fls!` call).
I can create a good first issue if this lands and it is not somewhere already.
Cheers,
Miguel
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-15 9:37 ` Miguel Ojeda
@ 2025-06-15 10:51 ` Alexandre Courbot
2025-06-15 10:58 ` Alexandre Courbot
0 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-15 10:51 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 6:37 PM JST, Miguel Ojeda wrote:
> On Thu, Jun 12, 2025 at 4:02 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> + /// ```
>> + /// use kernel::num::fls_u32;
>> + ///
>> + /// assert_eq!(fls_u32(0x0), 0);
>> + /// assert_eq!(fls_u32(0x1), 1);
>> + /// assert_eq!(fls_u32(0x10), 5);
>> + /// assert_eq!(fls_u32(0xffff), 16);
>> + /// assert_eq!(fls_u32(0x8000_0000), 32);
>> + /// ```
>
> For a future patch series: this could provide examples per type
> (passing them in the `impl_fls!` call).
>
> I can create a good first issue if this lands and it is not somewhere already.
I was worried that the examples would be mostly duplicated, although
it is true that seeing how the function behaves at the limits of each
type is valuable. I'll prepare a patch to either squash for v6 or submit
as a follow-up.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-15 10:51 ` Alexandre Courbot
@ 2025-06-15 10:58 ` Alexandre Courbot
2025-06-15 13:25 ` Miguel Ojeda
0 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-15 10:58 UTC (permalink / raw)
To: Alexandre Courbot, Miguel Ojeda
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 7:51 PM JST, Alexandre Courbot wrote:
> On Sun Jun 15, 2025 at 6:37 PM JST, Miguel Ojeda wrote:
>> On Thu, Jun 12, 2025 at 4:02 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>>
>>> + /// ```
>>> + /// use kernel::num::fls_u32;
>>> + ///
>>> + /// assert_eq!(fls_u32(0x0), 0);
>>> + /// assert_eq!(fls_u32(0x1), 1);
>>> + /// assert_eq!(fls_u32(0x10), 5);
>>> + /// assert_eq!(fls_u32(0xffff), 16);
>>> + /// assert_eq!(fls_u32(0x8000_0000), 32);
>>> + /// ```
>>
>> For a future patch series: this could provide examples per type
>> (passing them in the `impl_fls!` call).
>>
>> I can create a good first issue if this lands and it is not somewhere already.
>
> I was worried that the examples would be mostly duplicated, although
> it is true that seeing how the function behaves at the limits of each
> type is valuable. I'll prepare a patch to either squash for v6 or submit
> as a follow-up.
Also, although this will work nicely for `impl_fls!` which is a single
function, I'm afraid this won't scale well for `power_of_two_impl!`,
which defines 6 functions per type... Any suggestions for this case?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-15 10:58 ` Alexandre Courbot
@ 2025-06-15 13:25 ` Miguel Ojeda
2025-06-16 6:36 ` Alexandre Courbot
0 siblings, 1 reply; 58+ messages in thread
From: Miguel Ojeda @ 2025-06-15 13:25 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Sun, Jun 15, 2025 at 12:58 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> Also, although this will work nicely for `impl_fls!` which is a single
> function, I'm afraid this won't scale well for `power_of_two_impl!`,
> which defines 6 functions per type... Any suggestions for this case?
We can always generate the same "cases", i.e. sharing as much as
possible the lines, and just passing the values (numbers) that
actually differ, which you then plug into the example line
concatenating.
The standard library does that for their integer macros, e.g.
https://doc.rust-lang.org/src/core/num/int_macros.rs.html#3639-3644
If that happened to be too onerous for some reason, then we could
ignore it for the time being (i.e. we don't need to delay things just
for that), or we could put them as `#[test]`s to at least have them as
tests.
Cheers,
Miguel
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
` (2 preceding siblings ...)
2025-06-14 19:09 ` Benno Lossin
@ 2025-06-15 13:32 ` Miguel Ojeda
2025-06-16 5:13 ` Alexandre Courbot
3 siblings, 1 reply; 58+ messages in thread
From: Miguel Ojeda @ 2025-06-15 13:32 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Thu, Jun 12, 2025 at 4:02 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> + /// assert_eq!(PowerOfTwo::<u32>::try_new(16).unwrap().value(), 16);
By the way, we are trying to write examples close to normal kernel
code as possible, so could you please use `?` here instead of
`unwrap()`?
It is not a big deal, when within `assert`s, but there is value in not
showing any `unwrap()`s, and to spot easily places where we actually
do `unwrap()`.
Also, please use intra-doc links wherever they may work, e.g. I think
[`PowerOfTwo`] and [`None`] will work.
Thanks!
Cheers,
Miguel
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-15 13:32 ` Miguel Ojeda
@ 2025-06-16 5:13 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-16 5:13 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 10:32 PM JST, Miguel Ojeda wrote:
> On Thu, Jun 12, 2025 at 4:02 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> + /// assert_eq!(PowerOfTwo::<u32>::try_new(16).unwrap().value(), 16);
>
> By the way, we are trying to write examples close to normal kernel
> code as possible, so could you please use `?` here instead of
> `unwrap()`?
>
> It is not a big deal, when within `assert`s, but there is value in not
> showing any `unwrap()`s, and to spot easily places where we actually
> do `unwrap()`.
The fact that `try_new` returns an `Option` makes it a bit difficult to do
nicely - one would have to add a verbose `ok_or` to turn it into a
`Result`.
But that doesn't matter as this test can be (better) written as follows:
assert_eq!(PowerOfTwo::<u32>::try_new(16), Some(PowerOfTwo::<u32>::new(16)));
And all is well.
> Also, please use intra-doc links wherever they may work, e.g. I think
> [`PowerOfTwo`] and [`None`] will work.
Added the links where relevant, sorry for the omission!
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-14 17:08 ` Boqun Feng
@ 2025-06-16 5:14 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-16 5:14 UTC (permalink / raw)
To: Boqun Feng
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 2:08 AM JST, Boqun Feng wrote:
> On Fri, Jun 13, 2025 at 11:16:10PM +0900, Alexandre Courbot wrote:
> [...]
>> >> + /// Aligns `self` down to `alignment`.
>> >> + ///
>> >> + /// # Examples
>> >> + ///
>> >> + /// ```
>> >> + /// use kernel::num::PowerOfTwo;
>> >> + ///
>> >> + /// assert_eq!(PowerOfTwo::<u32>::new(0x1000).align_down(0x4fff), 0x4000);
>> >> + /// ```
>> >> + #[inline(always)]
>> >> + pub const fn align_down(self, value: $t) -> $t {
>> >
>> > I'm late to party, but could we instead implement:
>> >
>> > pub const fn round_down<i32>(value: i32, shift: i32) -> i32 {
>> > value & !((1 << shift) - 1)
>> > }
>> >
>> > pub const fn round_up<i32>(value: i32, shift: i32) -> i32 {
>> > let mask = (1 << shift) - 1;
>> > value.wrapping_add(mask) & !mask
>> > }
>> >
>> > ? It's much harder to pass an invalid alignment with this.
>>
>> It also forces you to think in terms of shifts instead of values - i.e.
>> you cannot round to `0x1000` as it commonly done in the kernel, now you
>
> Well, for const values, you can always define:
>
> const ROUND_SHIFT_0X1000: i32 = 12;
>
> because `0x1000` is just a name ;-)
>
> or we define an Alignment in term of the shift:
>
> pub struct Alignment {
> shift: i8,
> }
>
> ipml Alignment {
> pub const new(shift: i8) -> Self {
> Self { shift }
> }
> }
>
> then
>
> const ALIGN_0x1000: Alignment = Alignment::new(12);
Now you take the risk that due to a typo the name of the constant does
not match the alignment - something you cannot have if you use values
directly (and if one wants to reason in terms of alignment, they can do
`PowerOfTwo::<u32>::new(1 << 12)`, or we can even add an alternative
constructor for that).
>
> and
>
> pub const fn round_down_i32(value: i32, align: Alignment) -> i32 {
> ...
> }
>
> My point was that instead of the value itself, we can always use the
> shift to represent a power of two, and that would avoid troubles when we
> need to check the internal representation.
Storing the shift instead of the value means that we need to recreate
the latter every time we need to access it (e.g. to apply a mask).
>
> That said, after some experiments by myself, I haven't found any
> significant difference between shift representations vs value
> representations. So no strong reason of using a shift representation.
I'm open to any representation but AFAICT there is no obvious benefit
(and a slight drawback when requesting the value) in representing these
as a shift.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type
2025-06-14 17:31 ` Boqun Feng
@ 2025-06-16 5:19 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-16 5:19 UTC (permalink / raw)
To: Boqun Feng
Cc: Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 2:31 AM JST, Boqun Feng wrote:
> On Thu, Jun 12, 2025 at 11:01:32PM +0900, Alexandre Courbot wrote:
> [...]
>> +/// An unsigned integer which is guaranteed to be a power of 2.
>> +#[derive(Debug, Clone, Copy)]
>> +#[repr(transparent)]
>> +pub struct PowerOfTwo<T>(T);
>> +
> [...]
>> +impl<T> Deref for PowerOfTwo<T> {
>
> Why do we need `impl Deref` (and the `impl Borrow` below)? A similar
> concept `NonZero` in std doesn't impl them as well.
I wanted to be exhaustive but you're right, we don't really need these
implementations (especially if `NonZero` doesn't provide them either).
>
>> + type Target = T;
>> +
>> + fn deref(&self) -> &Self::Target {
>> + &self.0
>> + }
>> +}
>> +
>> +impl<T> PartialEq for PowerOfTwo<T>
>
> Any reason you want to impl these manually instead of deriving? For
> `NonZero`, the std wants to impl these traits only for
> `ZeroablePrimitive` types, but we don't have a similar trait here.
Deriving works perfectly well! :) Thanks for pointing this out.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-15 13:25 ` Miguel Ojeda
@ 2025-06-16 6:36 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-16 6:36 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Benno Lossin, John Hubbard,
Ben Skeggs, Joel Fernandes, Timur Tabi, Alistair Popple,
linux-kernel, rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 10:25 PM JST, Miguel Ojeda wrote:
> On Sun, Jun 15, 2025 at 12:58 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> Also, although this will work nicely for `impl_fls!` which is a single
>> function, I'm afraid this won't scale well for `power_of_two_impl!`,
>> which defines 6 functions per type... Any suggestions for this case?
>
> We can always generate the same "cases", i.e. sharing as much as
> possible the lines, and just passing the values (numbers) that
> actually differ, which you then plug into the example line
> concatenating.
>
> The standard library does that for their integer macros, e.g.
>
> https://doc.rust-lang.org/src/core/num/int_macros.rs.html#3639-3644
>
> If that happened to be too onerous for some reason, then we could
> ignore it for the time being (i.e. we don't need to delay things just
> for that), or we could put them as `#[test]`s to at least have them as
> tests.
Thanks, this appears to work quite nicely (if a bit verbose), and I can
adjust the tests to avoid the need to take extra arguments.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-14 19:16 ` Benno Lossin
@ 2025-06-16 6:41 ` Alexandre Courbot
2025-06-18 19:24 ` Benno Lossin
0 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-16 6:41 UTC (permalink / raw)
To: Benno Lossin, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel
On Sun Jun 15, 2025 at 4:16 AM JST, Benno Lossin wrote:
> On Thu Jun 12, 2025 at 4:01 PM CEST, Alexandre Courbot wrote:
>> Add an equivalent to the `fls` (Find Last Set bit) C function to Rust
>> unsigned types.
>
> Have you tried to upstream this?
I will consider alongside `prev_multiple_of` that we discussed during v4. :)
>
>> It is to be first used by the nova-core driver.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> rust/kernel/num.rs | 31 +++++++++++++++++++++++++++++++
>> 1 file changed, 31 insertions(+)
>>
>> diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
>> index ee0f67ad1a89e69f5f8d2077eba5541b472e7d8a..934afe17719f789c569dbd54534adc2e26fe59f2 100644
>> --- a/rust/kernel/num.rs
>> +++ b/rust/kernel/num.rs
>> @@ -171,3 +171,34 @@ fn borrow(&self) -> &T {
>> &self.0
>> }
>> }
>> +
>> +macro_rules! impl_fls {
>> + ($($t:ty),+) => {
>> + $(
>> + ::kernel::macros::paste! {
>> + /// Find Last Set Bit: return the 1-based index of the last (i.e. most significant) set
>> + /// bit in `v`.
>> + ///
>> + /// Equivalent to the C `fls` function.
>> + ///
>> + /// # Examples
>> + ///
>> + /// ```
>> + /// use kernel::num::fls_u32;
>> + ///
>> + /// assert_eq!(fls_u32(0x0), 0);
>> + /// assert_eq!(fls_u32(0x1), 1);
>> + /// assert_eq!(fls_u32(0x10), 5);
>> + /// assert_eq!(fls_u32(0xffff), 16);
>> + /// assert_eq!(fls_u32(0x8000_0000), 32);
>> + /// ```
>> + #[inline(always)]
>> + pub const fn [<fls_ $t>](v: $t) -> u32 {
>
> Can we name this `find_last_set_bit_ $t`? When the upstream function
> lands, we should also rename this one.
We can - but as for `align_up`/`next_multiple_of`, I am not sure which
naming scheme (kernel-like or closer to Rust conventions) is favored in
such cases, and so far it seems to come down to personal preference. I
tend to think that staying close to kernel conventions make it easier to
understand when a function is the equivalent of a C one, but whichever
policy we adopt it would be nice to codify it somewhere (apologies if it
is already and I missed it).
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code
2025-06-12 14:01 ` [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
@ 2025-06-17 16:33 ` Danilo Krummrich
2025-06-18 5:26 ` Alexandre Courbot
0 siblings, 1 reply; 58+ messages in thread
From: Danilo Krummrich @ 2025-06-17 16:33 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul
On Thu, Jun 12, 2025 at 11:01:43PM +0900, Alexandre Courbot wrote:
> + /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
> + /// `target_mem`.
> + ///
> + /// `sec` is set if the loaded firmware is expected to run in secure mode.
> + fn dma_wr(
> + &self,
> + bar: &Bar0,
> + dma_handle: bindings::dma_addr_t,
I think we should pass &F from dma_load() rather than the raw handle.
<snip>
> +fn select_core_ga102<E: FalconEngine>(bar: &Bar0) -> Result {
> + let bcr_ctrl = regs::NV_PRISCV_RISCV_BCR_CTRL::read(bar, E::BASE);
> + if bcr_ctrl.core_select() != PeregrineCoreSelect::Falcon {
> + regs::NV_PRISCV_RISCV_BCR_CTRL::default()
> + .set_core_select(PeregrineCoreSelect::Falcon)
> + .write(bar, E::BASE);
> +
> + util::wait_on(Duration::from_millis(10), || {
As agreed, can you please add a brief comment to justify the timeout?
> + let r = regs::NV_PRISCV_RISCV_BCR_CTRL::read(bar, E::BASE);
> + if r.valid() {
> + Some(())
> + } else {
> + None
> + }
> + })?;
> + }
> +
> + Ok(())
> +}
> +
> +fn signature_reg_fuse_version_ga102(
> + dev: &device::Device,
> + bar: &Bar0,
> + engine_id_mask: u16,
> + ucode_id: u8,
> +) -> Result<u32> {
> + // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
> + // registers, which are an array. Our register definition macros do not allow us to manage them
> + // properly, so we need to hardcode their addresses for now.
Sounds like a TODO?
> +
> + // Each engine has 16 ucode version registers numbered from 1 to 16.
> + if ucode_id == 0 || ucode_id > 16 {
> + dev_err!(dev, "invalid ucode id {:#x}", ucode_id);
> + return Err(EINVAL);
> + }
> +
> + // Base address of the FUSE registers array corresponding to the engine.
> + let reg_fuse_base = if engine_id_mask & 0x0001 != 0 {
> + regs::NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION::OFFSET
> + } else if engine_id_mask & 0x0004 != 0 {
> + regs::NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION::OFFSET
> + } else if engine_id_mask & 0x0400 != 0 {
> + regs::NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION::OFFSET
> + } else {
> + dev_err!(dev, "unexpected engine_id_mask {:#x}", engine_id_mask);
> + return Err(EINVAL);
> + };
> +
> + // Read `reg_fuse_base[ucode_id - 1]`.
> + let reg_fuse_version =
> + bar.read32(reg_fuse_base + ((ucode_id - 1) as usize * core::mem::size_of::<u32>()));
> +
> + Ok(fls_u32(reg_fuse_version))
> +}
> +
> +fn program_brom_ga102<E: FalconEngine>(bar: &Bar0, params: &FalconBromParams) -> Result {
> + regs::NV_PFALCON2_FALCON_BROM_PARAADDR::default()
> + .set_value(params.pkc_data_offset)
> + .write(bar, E::BASE);
> + regs::NV_PFALCON2_FALCON_BROM_ENGIDMASK::default()
> + .set_value(params.engine_id_mask as u32)
> + .write(bar, E::BASE);
> + regs::NV_PFALCON2_FALCON_BROM_CURR_UCODE_ID::default()
> + .set_ucode_id(params.ucode_id)
> + .write(bar, E::BASE);
> + regs::NV_PFALCON2_FALCON_MOD_SEL::default()
> + .set_algo(FalconModSelAlgo::Rsa3k)
> + .write(bar, E::BASE);
> +
> + Ok(())
> +}
> +
> +pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
> +
> +impl<E: FalconEngine> Ga102<E> {
> + pub(super) fn new() -> Self {
> + Self(PhantomData)
> + }
> +}
> +
> +impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
> + fn select_core(&self, _falcon: &Falcon<E>, bar: &Bar0) -> Result {
> + select_core_ga102::<E>(bar)
> + }
> +
> + fn signature_reg_fuse_version(
> + &self,
> + falcon: &Falcon<E>,
> + bar: &Bar0,
> + engine_id_mask: u16,
> + ucode_id: u8,
> + ) -> Result<u32> {
> + signature_reg_fuse_version_ga102(&falcon.dev, bar, engine_id_mask, ucode_id)
> + }
> +
> + fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
> + program_brom_ga102::<E>(bar, params)
> + }
Why are those two separate functions?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (22 preceding siblings ...)
2025-06-12 14:01 ` [PATCH v5 23/23] gpu: nova-core: load and " Alexandre Courbot
@ 2025-06-17 20:14 ` Danilo Krummrich
2025-06-18 8:25 ` Alexandre Courbot
2025-06-18 20:14 ` Danilo Krummrich
24 siblings, 1 reply; 58+ messages in thread
From: Danilo Krummrich @ 2025-06-17 20:14 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul, Shirish Baskaran
On Thu, Jun 12, 2025 at 11:01:28PM +0900, Alexandre Courbot wrote:
> Hi everyone,
>
> The feedback on v4 has been (hopefully) addressed. I guess the main
> remaining unknown is the direction of the `num` module ; for this
> iteration, following the received feedback I have eschewed the extension
> trait and implemented the alignment functions as methods of the new
> `PowerOfTwo` type. This has the benefit of making it impossible to call
> them with undesirable (i.e. non-power of two) values. The `fls` function
> is now provided as a series of const functions for each supported type,
> generated by a macro.
>
> It feels like the `num` module could be its own series though, so if
> there is still discussion about it, I can also extract it and implement
> the functionality we need in nova-core as local helper functions until
> it gets merged at its own pace.
>
> As previously, this series only successfully probes Ampere GPUs, but
> support for other generations is on the way.
>
> Upon successful probe, the driver will display the range of the WPR2
> region constructed by FWSEC-FRTS with debug priority:
>
> [ 95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
> [ 95.436002] NovaCore 0000:01:00.0: GPU instance built
>
> This series is based on v6.16-rc1 with no other dependencies.
>
> There are bits of documentation still missing, these are addressed by
> Joel in his own documentation patch series [1]. I'll also double-check
> and send follow-up patches if anything is still missing after that.
>
> [1] https://lore.kernel.org/rust-for-linux/20250503040802.1411285-1-joelagnelf@nvidia.com/
I think this series collected quite a few TODOs to follow up on once the
corresponding abstractions are in place, etc. This is fine and expected.
However, I think we should list those things in a central place, e.g. our TODO
list, in order to make it easier to follow up.
Additionally, it might get us more contributors who might be interested in
following up on those things.
@Alex: Can you please add such a list?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code
2025-06-17 16:33 ` Danilo Krummrich
@ 2025-06-18 5:26 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-18 5:26 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul
On Wed Jun 18, 2025 at 1:33 AM JST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 11:01:43PM +0900, Alexandre Courbot wrote:
>> + /// Perform a DMA write according to `load_offsets` from `dma_handle` into the falcon's
>> + /// `target_mem`.
>> + ///
>> + /// `sec` is set if the loaded firmware is expected to run in secure mode.
>> + fn dma_wr(
>> + &self,
>> + bar: &Bar0,
>> + dma_handle: bindings::dma_addr_t,
>
> I think we should pass &F from dma_load() rather than the raw handle.
Agreed, done.
>
> <snip>
>
>> +fn select_core_ga102<E: FalconEngine>(bar: &Bar0) -> Result {
>> + let bcr_ctrl = regs::NV_PRISCV_RISCV_BCR_CTRL::read(bar, E::BASE);
>> + if bcr_ctrl.core_select() != PeregrineCoreSelect::Falcon {
>> + regs::NV_PRISCV_RISCV_BCR_CTRL::default()
>> + .set_core_select(PeregrineCoreSelect::Falcon)
>> + .write(bar, E::BASE);
>> +
>> + util::wait_on(Duration::from_millis(10), || {
>
> As agreed, can you please add a brief comment to justify the timeout?
Oops, for some reason I haven't addressed that part of your comment last
time, sorry about that. Added `// TIMEOUT:` statements above all calls
to `wait_on`. Note that sometimes the justification for these cannot be
more than "arbitrarily high value indicating something went wrong".
(similarly, I have added a `dma_handle_with_offset` method to
`CoherentAllocation` as I said I would in v4).
>
>> + let r = regs::NV_PRISCV_RISCV_BCR_CTRL::read(bar, E::BASE);
>> + if r.valid() {
>> + Some(())
>> + } else {
>> + None
>> + }
>> + })?;
>> + }
>> +
>> + Ok(())
>> +}
>> +
>> +fn signature_reg_fuse_version_ga102(
>> + dev: &device::Device,
>> + bar: &Bar0,
>> + engine_id_mask: u16,
>> + ucode_id: u8,
>> +) -> Result<u32> {
>> + // The ucode fuse versions are contained in the FUSE_OPT_FPF_<ENGINE>_UCODE<X>_VERSION
>> + // registers, which are an array. Our register definition macros do not allow us to manage them
>> + // properly, so we need to hardcode their addresses for now.
>
> Sounds like a TODO?
Yes, although it is addressed in the next iteration of the register
macro (which I will send after this series), which supports register
arrays. Marked this as a TODO nonetheless.
>
>> +
>> + // Each engine has 16 ucode version registers numbered from 1 to 16.
>> + if ucode_id == 0 || ucode_id > 16 {
>> + dev_err!(dev, "invalid ucode id {:#x}", ucode_id);
>> + return Err(EINVAL);
>> + }
>> +
>> + // Base address of the FUSE registers array corresponding to the engine.
>> + let reg_fuse_base = if engine_id_mask & 0x0001 != 0 {
>> + regs::NV_FUSE_OPT_FPF_SEC2_UCODE1_VERSION::OFFSET
>> + } else if engine_id_mask & 0x0004 != 0 {
>> + regs::NV_FUSE_OPT_FPF_NVDEC_UCODE1_VERSION::OFFSET
>> + } else if engine_id_mask & 0x0400 != 0 {
>> + regs::NV_FUSE_OPT_FPF_GSP_UCODE1_VERSION::OFFSET
>> + } else {
>> + dev_err!(dev, "unexpected engine_id_mask {:#x}", engine_id_mask);
>> + return Err(EINVAL);
>> + };
>> +
>> + // Read `reg_fuse_base[ucode_id - 1]`.
>> + let reg_fuse_version =
>> + bar.read32(reg_fuse_base + ((ucode_id - 1) as usize * core::mem::size_of::<u32>()));
>> +
>> + Ok(fls_u32(reg_fuse_version))
>> +}
>> +
>> +fn program_brom_ga102<E: FalconEngine>(bar: &Bar0, params: &FalconBromParams) -> Result {
>> + regs::NV_PFALCON2_FALCON_BROM_PARAADDR::default()
>> + .set_value(params.pkc_data_offset)
>> + .write(bar, E::BASE);
>> + regs::NV_PFALCON2_FALCON_BROM_ENGIDMASK::default()
>> + .set_value(params.engine_id_mask as u32)
>> + .write(bar, E::BASE);
>> + regs::NV_PFALCON2_FALCON_BROM_CURR_UCODE_ID::default()
>> + .set_ucode_id(params.ucode_id)
>> + .write(bar, E::BASE);
>> + regs::NV_PFALCON2_FALCON_MOD_SEL::default()
>> + .set_algo(FalconModSelAlgo::Rsa3k)
>> + .write(bar, E::BASE);
>> +
>> + Ok(())
>> +}
>> +
>> +pub(super) struct Ga102<E: FalconEngine>(PhantomData<E>);
>> +
>> +impl<E: FalconEngine> Ga102<E> {
>> + pub(super) fn new() -> Self {
>> + Self(PhantomData)
>> + }
>> +}
>> +
>> +impl<E: FalconEngine> FalconHal<E> for Ga102<E> {
>> + fn select_core(&self, _falcon: &Falcon<E>, bar: &Bar0) -> Result {
>> + select_core_ga102::<E>(bar)
>> + }
>> +
>> + fn signature_reg_fuse_version(
>> + &self,
>> + falcon: &Falcon<E>,
>> + bar: &Bar0,
>> + engine_id_mask: u16,
>> + ucode_id: u8,
>> + ) -> Result<u32> {
>> + signature_reg_fuse_version_ga102(&falcon.dev, bar, engine_id_mask, ucode_id)
>> + }
>> +
>> + fn program_brom(&self, _falcon: &Falcon<E>, bar: &Bar0, params: &FalconBromParams) -> Result {
>> + program_brom_ga102::<E>(bar, params)
>> + }
>
> Why are those two separate functions?
Do you mean why does `program_brom` calls `program_brom_ga102`? This is
so HAL methods can be re-used in other architectures. For instance,
Hopper's HAL be identical to Ampere save for `select_core`, so having
everything in separate functions allows the Hopper HAL to just call
`program_brom_ga102`. It's a sane convention to have IMHO, maybe we
should codify it via a HAL paragraph in the guidelines document?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
2025-06-17 20:14 ` [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
@ 2025-06-18 8:25 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-18 8:25 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul, Shirish Baskaran
On Wed Jun 18, 2025 at 5:14 AM JST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 11:01:28PM +0900, Alexandre Courbot wrote:
>> Hi everyone,
>>
>> The feedback on v4 has been (hopefully) addressed. I guess the main
>> remaining unknown is the direction of the `num` module ; for this
>> iteration, following the received feedback I have eschewed the extension
>> trait and implemented the alignment functions as methods of the new
>> `PowerOfTwo` type. This has the benefit of making it impossible to call
>> them with undesirable (i.e. non-power of two) values. The `fls` function
>> is now provided as a series of const functions for each supported type,
>> generated by a macro.
>>
>> It feels like the `num` module could be its own series though, so if
>> there is still discussion about it, I can also extract it and implement
>> the functionality we need in nova-core as local helper functions until
>> it gets merged at its own pace.
>>
>> As previously, this series only successfully probes Ampere GPUs, but
>> support for other generations is on the way.
>>
>> Upon successful probe, the driver will display the range of the WPR2
>> region constructed by FWSEC-FRTS with debug priority:
>>
>> [ 95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
>> [ 95.436002] NovaCore 0000:01:00.0: GPU instance built
>>
>> This series is based on v6.16-rc1 with no other dependencies.
>>
>> There are bits of documentation still missing, these are addressed by
>> Joel in his own documentation patch series [1]. I'll also double-check
>> and send follow-up patches if anything is still missing after that.
>>
>> [1] https://lore.kernel.org/rust-for-linux/20250503040802.1411285-1-joelagnelf@nvidia.com/
>
> I think this series collected quite a few TODOs to follow up on once the
> corresponding abstractions are in place, etc. This is fine and expected.
>
> However, I think we should list those things in a central place, e.g. our TODO
> list, in order to make it easier to follow up.
>
> Additionally, it might get us more contributors who might be interested in
> following up on those things.
>
> @Alex: Can you please add such a list?
I went through every TODO in the code and found the following could be
done:
- Update the entry about the registers macro with remaining sub-tasks
before it can "graduate" from Nova (notably register arrays, which are
several TODOs by themselves),
- Mention the missing `FromBytes::from_bytes` that will allow us to
remove some unsafe code,
- Mention the missing features of `CoherentAllocation` (write() and
as_slice()) that require us to use unsafe code,
- I wanted to mention the missing xarray but noticed it has been merged,
so we can just use it and remove the corresponding TODO. :)
... and that's all I noticed, but please let me know if I missed
something.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-16 6:41 ` Alexandre Courbot
@ 2025-06-18 19:24 ` Benno Lossin
2025-06-19 13:26 ` Alexandre Courbot
0 siblings, 1 reply; 58+ messages in thread
From: Benno Lossin @ 2025-06-18 19:24 UTC (permalink / raw)
To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel
On Mon Jun 16, 2025 at 8:41 AM CEST, Alexandre Courbot wrote:
> On Sun Jun 15, 2025 at 4:16 AM JST, Benno Lossin wrote:
>> On Thu Jun 12, 2025 at 4:01 PM CEST, Alexandre Courbot wrote:
>>> + #[inline(always)]
>>> + pub const fn [<fls_ $t>](v: $t) -> u32 {
>>
>> Can we name this `find_last_set_bit_ $t`? When the upstream function
>> lands, we should also rename this one.
>
> We can - but as for `align_up`/`next_multiple_of`, I am not sure which
> naming scheme (kernel-like or closer to Rust conventions) is favored in
> such cases, and so far it seems to come down to personal preference. I
> tend to think that staying close to kernel conventions make it easier to
> understand when a function is the equivalent of a C one, but whichever
> policy we adopt it would be nice to codify it somewhere (apologies if it
> is already and I missed it).
I don't think we have it written down anywhere. I don't think that we
should have a global rule for this. Certain things are more in the
purview of the kernel and others are more on the Rust side.
My opinion is that this, since it will hopefully be in `core` at some
point, should go with the Rust naming.
---
Cheers,
Benno
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
` (23 preceding siblings ...)
2025-06-17 20:14 ` [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
@ 2025-06-18 20:14 ` Danilo Krummrich
2025-06-19 7:14 ` Alexandre Courbot
24 siblings, 1 reply; 58+ messages in thread
From: Danilo Krummrich @ 2025-06-18 20:14 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul, Shirish Baskaran
On Thu, Jun 12, 2025 at 11:01:28PM +0900, Alexandre Courbot wrote:
> Hi everyone,
>
> The feedback on v4 has been (hopefully) addressed. I guess the main
> remaining unknown is the direction of the `num` module ; for this
> iteration, following the received feedback I have eschewed the extension
> trait and implemented the alignment functions as methods of the new
> `PowerOfTwo` type. This has the benefit of making it impossible to call
> them with undesirable (i.e. non-power of two) values. The `fls` function
> is now provided as a series of const functions for each supported type,
> generated by a macro.
>
> It feels like the `num` module could be its own series though, so if
> there is still discussion about it, I can also extract it and implement
> the functionality we need in nova-core as local helper functions until
> it gets merged at its own pace.
>
> As previously, this series only successfully probes Ampere GPUs, but
> support for other generations is on the way.
>
> Upon successful probe, the driver will display the range of the WPR2
> region constructed by FWSEC-FRTS with debug priority:
>
> [ 95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
> [ 95.436002] NovaCore 0000:01:00.0: GPU instance built
>
> This series is based on v6.16-rc1 with no other dependencies.
If compiled with rustc 1.78 there are missing imports of size_of() and
align_of() which break the build.
There are also a few warnings still:
warning: unreachable `pub` field
--> drivers/gpu/nova-core/fb.rs:79:5
|
79 | pub fb: Range<u64>,
| ---^^^^^^^^^^^^^^^
| |
| help: consider restricting its visibility: `pub(crate)`
|
= note: requested on the command line with `-W unreachable-pub`
warning: unreachable `pub` field
--> drivers/gpu/nova-core/fb.rs:80:5
|
80 | pub vga_workspace: Range<u64>,
| ---^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| help: consider restricting its visibility: `pub(crate)`
warning: unreachable `pub` field
--> drivers/gpu/nova-core/fb.rs:81:5
|
81 | pub frts: Range<u64>,
| ---^^^^^^^^^^^^^^^^^
| |
| help: consider restricting its visibility: `pub(crate)`
warning: 3 warnings emitted
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 23/23] gpu: nova-core: load and run FWSEC-FRTS
2025-06-12 14:01 ` [PATCH v5 23/23] gpu: nova-core: load and " Alexandre Courbot
@ 2025-06-18 20:23 ` Danilo Krummrich
2025-06-18 20:24 ` Danilo Krummrich
0 siblings, 1 reply; 58+ messages in thread
From: Danilo Krummrich @ 2025-06-18 20:23 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul
On Thu, Jun 12, 2025 at 11:01:51PM +0900, Alexandre Courbot wrote:
> @@ -237,6 +237,67 @@ pub(crate) fn new(
> },
> )?;
>
> + // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
> + // reset.
> + if regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() != 0 {
> + dev_err!(
> + pdev.as_ref(),
> + "WPR2 region already exists - GPU needs to be reset to proceed\n"
> + );
> + return Err(EBUSY);
> + }
> +
> + // Reset falcon, load FWSEC-FRTS, and run it.
> + gsp_falcon
> + .reset(bar)
> + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to reset GSP falcon: {:?}\n", e))?;
> + gsp_falcon
> + .dma_load(bar, &fwsec_frts)
> + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to load FWSEC-FRTS: {:?}\n", e))?;
> + let (mbox0, _) = gsp_falcon
> + .boot(bar, Some(0), None)
> + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to boot FWSEC-FRTS: {:?}\n", e))?;
> + if mbox0 != 0 {
> + dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
> + return Err(EIO);
> + }
> +
> + // SCRATCH_E contains FWSEC-FRTS' error code, if any.
> + let frts_status = regs::NV_PBUS_SW_SCRATCH_0E::read(bar).frts_err_code();
> + if frts_status != 0 {
> + dev_err!(
> + pdev.as_ref(),
> + "FWSEC-FRTS returned with error code {:#x}",
> + frts_status
> + );
> + return Err(EIO);
> + }
> +
> + // Check the WPR2 has been created as we requested.
> + let (wpr2_lo, wpr2_hi) = (
> + (regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO::read(bar).lo_val() as u64) << 12,
> + (regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() as u64) << 12,
> + );
> + if wpr2_hi == 0 {
> + dev_err!(
> + pdev.as_ref(),
> + "WPR2 region not created after running FWSEC-FRTS\n"
> + );
> +
> + return Err(EIO);
> + } else if wpr2_lo != fb_layout.frts.start {
> + dev_err!(
> + pdev.as_ref(),
> + "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
> + wpr2_lo,
> + fb_layout.frts.start,
> + );
> + return Err(EIO);
> + }
> +
> + dev_dbg!(pdev.as_ref(), "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
> + dev_dbg!(pdev.as_ref(), "GPU instance built\n");
> +
This makes Gpu::new() quite messy, can we move this to a separate function
please?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 23/23] gpu: nova-core: load and run FWSEC-FRTS
2025-06-18 20:23 ` Danilo Krummrich
@ 2025-06-18 20:24 ` Danilo Krummrich
2025-06-19 12:35 ` Alexandre Courbot
0 siblings, 1 reply; 58+ messages in thread
From: Danilo Krummrich @ 2025-06-18 20:24 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul
On Wed, Jun 18, 2025 at 10:23:15PM +0200, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 11:01:51PM +0900, Alexandre Courbot wrote:
> > @@ -237,6 +237,67 @@ pub(crate) fn new(
> > },
> > )?;
> >
> > + // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
> > + // reset.
> > + if regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() != 0 {
> > + dev_err!(
> > + pdev.as_ref(),
> > + "WPR2 region already exists - GPU needs to be reset to proceed\n"
> > + );
> > + return Err(EBUSY);
> > + }
> > +
> > + // Reset falcon, load FWSEC-FRTS, and run it.
> > + gsp_falcon
> > + .reset(bar)
> > + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to reset GSP falcon: {:?}\n", e))?;
> > + gsp_falcon
> > + .dma_load(bar, &fwsec_frts)
> > + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to load FWSEC-FRTS: {:?}\n", e))?;
> > + let (mbox0, _) = gsp_falcon
> > + .boot(bar, Some(0), None)
> > + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to boot FWSEC-FRTS: {:?}\n", e))?;
> > + if mbox0 != 0 {
> > + dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
> > + return Err(EIO);
> > + }
> > +
> > + // SCRATCH_E contains FWSEC-FRTS' error code, if any.
> > + let frts_status = regs::NV_PBUS_SW_SCRATCH_0E::read(bar).frts_err_code();
> > + if frts_status != 0 {
> > + dev_err!(
> > + pdev.as_ref(),
> > + "FWSEC-FRTS returned with error code {:#x}",
> > + frts_status
> > + );
> > + return Err(EIO);
> > + }
> > +
> > + // Check the WPR2 has been created as we requested.
> > + let (wpr2_lo, wpr2_hi) = (
> > + (regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO::read(bar).lo_val() as u64) << 12,
> > + (regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() as u64) << 12,
> > + );
> > + if wpr2_hi == 0 {
> > + dev_err!(
> > + pdev.as_ref(),
> > + "WPR2 region not created after running FWSEC-FRTS\n"
> > + );
> > +
> > + return Err(EIO);
> > + } else if wpr2_lo != fb_layout.frts.start {
> > + dev_err!(
> > + pdev.as_ref(),
> > + "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
> > + wpr2_lo,
> > + fb_layout.frts.start,
> > + );
> > + return Err(EIO);
> > + }
> > +
> > + dev_dbg!(pdev.as_ref(), "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
> > + dev_dbg!(pdev.as_ref(), "GPU instance built\n");
> > +
>
> This makes Gpu::new() quite messy, can we move this to a separate function
> please?
Actually, can't this just be a method of FwsecFirmware?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization
2025-06-18 20:14 ` Danilo Krummrich
@ 2025-06-19 7:14 ` Alexandre Courbot
0 siblings, 0 replies; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-19 7:14 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul, Shirish Baskaran
On Thu Jun 19, 2025 at 5:14 AM JST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 11:01:28PM +0900, Alexandre Courbot wrote:
>> Hi everyone,
>>
>> The feedback on v4 has been (hopefully) addressed. I guess the main
>> remaining unknown is the direction of the `num` module ; for this
>> iteration, following the received feedback I have eschewed the extension
>> trait and implemented the alignment functions as methods of the new
>> `PowerOfTwo` type. This has the benefit of making it impossible to call
>> them with undesirable (i.e. non-power of two) values. The `fls` function
>> is now provided as a series of const functions for each supported type,
>> generated by a macro.
>>
>> It feels like the `num` module could be its own series though, so if
>> there is still discussion about it, I can also extract it and implement
>> the functionality we need in nova-core as local helper functions until
>> it gets merged at its own pace.
>>
>> As previously, this series only successfully probes Ampere GPUs, but
>> support for other generations is on the way.
>>
>> Upon successful probe, the driver will display the range of the WPR2
>> region constructed by FWSEC-FRTS with debug priority:
>>
>> [ 95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000
>> [ 95.436002] NovaCore 0000:01:00.0: GPU instance built
>>
>> This series is based on v6.16-rc1 with no other dependencies.
>
> If compiled with rustc 1.78 there are missing imports of size_of() and
> align_of() which break the build.
>
> There are also a few warnings still:
>
> warning: unreachable `pub` field
> --> drivers/gpu/nova-core/fb.rs:79:5
> |
> 79 | pub fb: Range<u64>,
> | ---^^^^^^^^^^^^^^^
> | |
> | help: consider restricting its visibility: `pub(crate)`
> |
> = note: requested on the command line with `-W unreachable-pub`
>
> warning: unreachable `pub` field
> --> drivers/gpu/nova-core/fb.rs:80:5
> |
> 80 | pub vga_workspace: Range<u64>,
> | ---^^^^^^^^^^^^^^^^^^^^^^^^^^
> | |
> | help: consider restricting its visibility: `pub(crate)`
>
> warning: unreachable `pub` field
> --> drivers/gpu/nova-core/fb.rs:81:5
> |
> 81 | pub frts: Range<u64>,
> | ---^^^^^^^^^^^^^^^^^
> | |
> | help: consider restricting its visibility: `pub(crate)`
>
> warning: 3 warnings emitted
Sorry about this. These are confirmed fixed in v6.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 23/23] gpu: nova-core: load and run FWSEC-FRTS
2025-06-18 20:24 ` Danilo Krummrich
@ 2025-06-19 12:35 ` Alexandre Courbot
2025-06-19 12:43 ` Danilo Krummrich
0 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-19 12:35 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul
On Thu Jun 19, 2025 at 5:24 AM JST, Danilo Krummrich wrote:
> On Wed, Jun 18, 2025 at 10:23:15PM +0200, Danilo Krummrich wrote:
>> On Thu, Jun 12, 2025 at 11:01:51PM +0900, Alexandre Courbot wrote:
>> > @@ -237,6 +237,67 @@ pub(crate) fn new(
>> > },
>> > )?;
>> >
>> > + // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
>> > + // reset.
>> > + if regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() != 0 {
>> > + dev_err!(
>> > + pdev.as_ref(),
>> > + "WPR2 region already exists - GPU needs to be reset to proceed\n"
>> > + );
>> > + return Err(EBUSY);
>> > + }
>> > +
>> > + // Reset falcon, load FWSEC-FRTS, and run it.
>> > + gsp_falcon
>> > + .reset(bar)
>> > + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to reset GSP falcon: {:?}\n", e))?;
>> > + gsp_falcon
>> > + .dma_load(bar, &fwsec_frts)
>> > + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to load FWSEC-FRTS: {:?}\n", e))?;
>> > + let (mbox0, _) = gsp_falcon
>> > + .boot(bar, Some(0), None)
>> > + .inspect_err(|e| dev_err!(pdev.as_ref(), "Failed to boot FWSEC-FRTS: {:?}\n", e))?;
>> > + if mbox0 != 0 {
>> > + dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
>> > + return Err(EIO);
>> > + }
>> > +
>> > + // SCRATCH_E contains FWSEC-FRTS' error code, if any.
>> > + let frts_status = regs::NV_PBUS_SW_SCRATCH_0E::read(bar).frts_err_code();
>> > + if frts_status != 0 {
>> > + dev_err!(
>> > + pdev.as_ref(),
>> > + "FWSEC-FRTS returned with error code {:#x}",
>> > + frts_status
>> > + );
>> > + return Err(EIO);
>> > + }
>> > +
>> > + // Check the WPR2 has been created as we requested.
>> > + let (wpr2_lo, wpr2_hi) = (
>> > + (regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO::read(bar).lo_val() as u64) << 12,
>> > + (regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() as u64) << 12,
>> > + );
>> > + if wpr2_hi == 0 {
>> > + dev_err!(
>> > + pdev.as_ref(),
>> > + "WPR2 region not created after running FWSEC-FRTS\n"
>> > + );
>> > +
>> > + return Err(EIO);
>> > + } else if wpr2_lo != fb_layout.frts.start {
>> > + dev_err!(
>> > + pdev.as_ref(),
>> > + "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
>> > + wpr2_lo,
>> > + fb_layout.frts.start,
>> > + );
>> > + return Err(EIO);
>> > + }
>> > +
>> > + dev_dbg!(pdev.as_ref(), "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
>> > + dev_dbg!(pdev.as_ref(), "GPU instance built\n");
>> > +
>>
>> This makes Gpu::new() quite messy, can we move this to a separate function
>> please?
>
> Actually, can't this just be a method of FwsecFirmware?
Yes and no. :) FWSEC can run two commands, `Frts` and `Sb`, and some of
the code here is specific to `Frts`. The code that is not specific to it
(loading the firmware into the falcon, booting and checking MBOX) can be
moved into a method of `FwsecFirmware`, and it makes sense to do so
actually.
All of this code is going to be moved out of `Gpu::new()` eventually
(i.e. the follow-up patchset), but we are still figuring out where it
will eventually land. We will need some other entity to manage the GSP
boot (GspBooter?), and I am still learning which parts are common to all
GPU families and which ones should be a HAL. So for now I'd rather keep
it here, modulo the part that can be moved into `FwsecFirmware`.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 23/23] gpu: nova-core: load and run FWSEC-FRTS
2025-06-19 12:35 ` Alexandre Courbot
@ 2025-06-19 12:43 ` Danilo Krummrich
0 siblings, 0 replies; 58+ messages in thread
From: Danilo Krummrich @ 2025-06-19 12:43 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Benno Lossin, John Hubbard, Ben Skeggs,
Joel Fernandes, Timur Tabi, Alistair Popple, linux-kernel,
rust-for-linux, nouveau, dri-devel, Lyude Paul
On 6/19/25 2:35 PM, Alexandre Courbot wrote:
> All of this code is going to be moved out of `Gpu::new()` eventually
> (i.e. the follow-up patchset), but we are still figuring out where it
> will eventually land. We will need some other entity to manage the GSP
> boot (GspBooter?), and I am still learning which parts are common to all
> GPU families and which ones should be a HAL. So for now I'd rather keep
> it here, modulo the part that can be moved into `FwsecFirmware`.
Seems reasonable, let's move it to a separate function in the meantime and add a
very brief TODO please.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-18 19:24 ` Benno Lossin
@ 2025-06-19 13:26 ` Alexandre Courbot
2025-06-19 13:28 ` Benno Lossin
0 siblings, 1 reply; 58+ messages in thread
From: Alexandre Courbot @ 2025-06-19 13:26 UTC (permalink / raw)
To: Benno Lossin, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel
On Thu Jun 19, 2025 at 4:24 AM JST, Benno Lossin wrote:
> On Mon Jun 16, 2025 at 8:41 AM CEST, Alexandre Courbot wrote:
>> On Sun Jun 15, 2025 at 4:16 AM JST, Benno Lossin wrote:
>>> On Thu Jun 12, 2025 at 4:01 PM CEST, Alexandre Courbot wrote:
>>>> + #[inline(always)]
>>>> + pub const fn [<fls_ $t>](v: $t) -> u32 {
>>>
>>> Can we name this `find_last_set_bit_ $t`? When the upstream function
>>> lands, we should also rename this one.
>>
>> We can - but as for `align_up`/`next_multiple_of`, I am not sure which
>> naming scheme (kernel-like or closer to Rust conventions) is favored in
>> such cases, and so far it seems to come down to personal preference. I
>> tend to think that staying close to kernel conventions make it easier to
>> understand when a function is the equivalent of a C one, but whichever
>> policy we adopt it would be nice to codify it somewhere (apologies if it
>> is already and I missed it).
>
> I don't think we have it written down anywhere. I don't think that we
> should have a global rule for this. Certain things are more in the
> purview of the kernel and others are more on the Rust side.
>
> My opinion is that this, since it will hopefully be in `core` at some
> point, should go with the Rust naming.
I guess in that case we should go with `last_set_bit`, as `find_` is not
really used as a prefix for this kind of operations (e.g.
`leading_zeros` and friends).
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v5 05/23] rust: num: add the `fls` operation
2025-06-19 13:26 ` Alexandre Courbot
@ 2025-06-19 13:28 ` Benno Lossin
0 siblings, 0 replies; 58+ messages in thread
From: Benno Lossin @ 2025-06-19 13:28 UTC (permalink / raw)
To: Alexandre Courbot, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
Cc: John Hubbard, Ben Skeggs, Joel Fernandes, Timur Tabi,
Alistair Popple, linux-kernel, rust-for-linux, nouveau, dri-devel
On Thu Jun 19, 2025 at 3:26 PM CEST, Alexandre Courbot wrote:
> On Thu Jun 19, 2025 at 4:24 AM JST, Benno Lossin wrote:
>> On Mon Jun 16, 2025 at 8:41 AM CEST, Alexandre Courbot wrote:
>>> On Sun Jun 15, 2025 at 4:16 AM JST, Benno Lossin wrote:
>>>> On Thu Jun 12, 2025 at 4:01 PM CEST, Alexandre Courbot wrote:
>>>>> + #[inline(always)]
>>>>> + pub const fn [<fls_ $t>](v: $t) -> u32 {
>>>>
>>>> Can we name this `find_last_set_bit_ $t`? When the upstream function
>>>> lands, we should also rename this one.
>>>
>>> We can - but as for `align_up`/`next_multiple_of`, I am not sure which
>>> naming scheme (kernel-like or closer to Rust conventions) is favored in
>>> such cases, and so far it seems to come down to personal preference. I
>>> tend to think that staying close to kernel conventions make it easier to
>>> understand when a function is the equivalent of a C one, but whichever
>>> policy we adopt it would be nice to codify it somewhere (apologies if it
>>> is already and I missed it).
>>
>> I don't think we have it written down anywhere. I don't think that we
>> should have a global rule for this. Certain things are more in the
>> purview of the kernel and others are more on the Rust side.
>>
>> My opinion is that this, since it will hopefully be in `core` at some
>> point, should go with the Rust naming.
>
> I guess in that case we should go with `last_set_bit`, as `find_` is not
> really used as a prefix for this kind of operations (e.g.
> `leading_zeros` and friends).
Sounds good!
---
Cheers,
Benno
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2025-06-19 13:29 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-12 14:01 [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 01/23] rust: dma: expose the count and size of CoherentAllocation Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 02/23] rust: make ETIMEDOUT error available Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 03/23] rust: sizes: add constants up to SZ_2G Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 04/23] rust: add new `num` module with `PowerOfTwo` type Alexandre Courbot
2025-06-12 15:07 ` Boqun Feng
2025-06-12 20:00 ` John Hubbard
2025-06-12 20:05 ` Boqun Feng
2025-06-12 20:08 ` John Hubbard
2025-06-12 20:12 ` Boqun Feng
2025-06-13 14:16 ` Alexandre Courbot
2025-06-13 15:25 ` Boqun Feng
2025-06-14 17:08 ` Boqun Feng
2025-06-16 5:14 ` Alexandre Courbot
2025-06-14 17:31 ` Boqun Feng
2025-06-16 5:19 ` Alexandre Courbot
2025-06-14 19:09 ` Benno Lossin
2025-06-15 13:32 ` Miguel Ojeda
2025-06-16 5:13 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 05/23] rust: num: add the `fls` operation Alexandre Courbot
2025-06-14 19:16 ` Benno Lossin
2025-06-16 6:41 ` Alexandre Courbot
2025-06-18 19:24 ` Benno Lossin
2025-06-19 13:26 ` Alexandre Courbot
2025-06-19 13:28 ` Benno Lossin
2025-06-15 9:37 ` Miguel Ojeda
2025-06-15 10:51 ` Alexandre Courbot
2025-06-15 10:58 ` Alexandre Courbot
2025-06-15 13:25 ` Miguel Ojeda
2025-06-16 6:36 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 06/23] gpu: nova-core: use absolute paths in register!() macro Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 07/23] gpu: nova-core: add delimiter for helper rules " Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 08/23] gpu: nova-core: expose the offset of each register as a type constant Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 09/23] gpu: nova-core: allow register aliases Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 10/23] gpu: nova-core: increase BAR0 size to 16MB Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 11/23] gpu: nova-core: add helper function to wait on condition Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 12/23] gpu: nova-core: wait for GFW_BOOT completion Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 13/23] gpu: nova-core: add DMA object struct Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 14/23] gpu: nova-core: register sysmem flush page Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 15/23] gpu: nova-core: add falcon register definitions and base code Alexandre Courbot
2025-06-17 16:33 ` Danilo Krummrich
2025-06-18 5:26 ` Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 16/23] gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 17/23] gpu: nova-core: vbios: Add base support for VBIOS construction and iteration Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 18/23] gpu: nova-core: vbios: Add support to look up PMU table in FWSEC Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 19/23] gpu: nova-core: vbios: Add support for FWSEC ucode extraction Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 20/23] gpu: nova-core: compute layout of the FRTS region Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 21/23] gpu: nova-core: add types for patching firmware binaries Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 22/23] gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS Alexandre Courbot
2025-06-12 14:01 ` [PATCH v5 23/23] gpu: nova-core: load and " Alexandre Courbot
2025-06-18 20:23 ` Danilo Krummrich
2025-06-18 20:24 ` Danilo Krummrich
2025-06-19 12:35 ` Alexandre Courbot
2025-06-19 12:43 ` Danilo Krummrich
2025-06-17 20:14 ` [PATCH v5 00/23] nova-core: run FWSEC-FRTS to perform first stage of GSP initialization Danilo Krummrich
2025-06-18 8:25 ` Alexandre Courbot
2025-06-18 20:14 ` Danilo Krummrich
2025-06-19 7:14 ` Alexandre Courbot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).