rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices
@ 2025-03-20 13:39 Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 1/7] rust: add useful ops for u64 Alexandre Courbot
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot, Sergio González Collado

Hi everyone,

New RFC for the first steps towards bringing booting the GSP, with more
complete Falcon code. This code has been successfully used to run the
FECS-FRTS firmware from the bios and create the WPR2 region, although
this series alone is not enough to reproduce this - the next revision
will probably include the code required to extract that firmware from
the BIOS.

As for the previous revisions, the goal is to get more eyes to look at
the general direction of the driver and raise concerns if there are any,
the main point of discussion being probably the register!() macro that
is used to define register layouts.

The falcon code is still quite in a work-in-progress state, I am notably
still thinking of the best way to implement the HAL for chip-dependent
operations (right now we branch depending on the chipset, but that
obviously won't scale). So the overall design is still very much
flexible.

Dependencies:

- https://lore.kernel.org/rust-for-linux/20250318-topic-panthor-rs-genmask-v4-1-35004fca6ac5@collabora.com/
- https://lore.kernel.org/rust-for-linux/20250319-try_with-v2-0-822ec63c05fb@nvidia.com/
- https://lore.kernel.org/rust-for-linux/20250224115007.2072043-1-abdiel.janulgue@gmail.com/
- https://lore.kernel.org/rust-for-linux/20250306222336.23482-1-dakr@kernel.org/
- https://lore.kernel.org/rust-for-linux/20250320-registers-v2-1-d277409bcde8@nvidia.com/

TODO:

- Document more registers and fields,
- Add BIOS extractor code to obtain the FWSEC-FRTS firmware,
- Complete FWSEC-FRTS execution to obtain WPR2 region.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v3:
- Fixed typo in Boot0 register definition that made probing fail.
- Moved the register definition macros to their own patch series.
- Used Revocable::try_access_with() when accessing registers.
- Added all the Falcon code required to boot FWSEC-FRTS.
- Link to v2: https://lore.kernel.org/r/20250304-nova_timer-v2-0-8fb13f3f8cff@nvidia.com

Changes in v2:
- Don't hold the Bar guard in methods that can sleep.
- Added a Timestamp type for Timer to safely and easily get durations
  between two measurements.
- Added a macro to make register definitions easier.
- Added a very basic falcon implementation to define more registers and
  exercise the register definition macro.
- Link to v1: https://lore.kernel.org/r/20250217-nova_timer-v1-0-78c5ace2d987@nvidia.com

---
Alexandre Courbot (7):
      rust: add useful ops for u64
      rust: make ETIMEDOUT error available
      gpu: nova-core: derive useful traits for Chipset
      gpu: nova-core: add missing GA100 definition
      gpu: nova-core: use register!() to define register layout
      gpu: nova-core: add basic timer device
      gpu: nova-core: add falcon register definitions and probe code

 drivers/gpu/nova-core/driver.rs    |   4 +-
 drivers/gpu/nova-core/falcon.rs    | 618 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gpu.rs       |  78 ++++-
 drivers/gpu/nova-core/nova_core.rs |  14 +
 drivers/gpu/nova-core/regs.rs      | 261 ++++++++++++----
 drivers/gpu/nova-core/timer.rs     | 132 ++++++++
 rust/kernel/error.rs               |   1 +
 rust/kernel/lib.rs                 |   1 +
 rust/kernel/num.rs                 |  52 ++++
 9 files changed, 1101 insertions(+), 60 deletions(-)
---
base-commit: 1d53763dc16c9fc9329a4cdc14d691979d47568f
change-id: 20250216-nova_timer-c69430184f54
prerequisite-change-id: 20241023-topic-panthor-rs-genmask-fabc573fef43:v4
prerequisite-patch-id: 182945904fd914573eed9388a559ce8a642310ef
prerequisite-message-id: <20250224115007.2072043-1-abdiel.janulgue@gmail.com>
prerequisite-patch-id: 73f4047ae5d3e4d51cfa285bd8fd0f1c04d47409
prerequisite-patch-id: 5ad45352d9d457a45886eeea90a46cc21516356e
prerequisite-patch-id: 725e7d42309919c759fdd0585a97810b1eb72706
prerequisite-message-id: <20250306222336.23482-1-dakr@kernel.org>
prerequisite-patch-id: de15c0d16727e6af2d79f88f5b67be4c06212552
prerequisite-patch-id: f8bca95d983222da29508cc6e6886e4b0f992588
prerequisite-patch-id: 1ae8f68250fb43808342285a284bcf7b572263fe
prerequisite-patch-id: fa5ce1308e1dbc71374a381537ab3978babe20a0
prerequisite-patch-id: 7225e000f745bb5fd45fc43393d801d1d9adb767

Best regards,
-- 
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 1/7] rust: add useful ops for u64
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 2/7] rust: make ETIMEDOUT error available Alexandre Courbot
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot, Sergio González Collado

It is common to build a u64 from its high and low parts obtained from
two 32-bit registers. Conversely, it is also common to split a u64 into
two u32s to write them into registers. Add an extension trait for u64
that implement these methods in a new `num` module.

It is expected that this trait will be extended with other useful
operations, and similar extension traits implemented for other types.

Reviewed-by: Sergio González Collado <sergio.collado@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/num.rs | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 2fe10df9a45ccd5fa24330f927abdf9dfb874d44..a9499597ed9650f8fae9b2f53fa9abeea05071f4 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -62,6 +62,7 @@
 pub mod miscdevice;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod num;
 pub mod of;
 pub mod page;
 #[cfg(CONFIG_PCI)]
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index 0000000000000000000000000000000000000000..9b93db6528eef131fb74c1289f1e152cc2a13168
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+/// Useful operations for `u64`.
+pub trait U64Ext {
+    /// Build a `u64` by combining its `high` and `low` parts.
+    ///
+    /// ```
+    /// use kernel::num::U64Ext;
+    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef);
+    /// ```
+    fn from_u32s(high: u32, low: u32) -> Self;
+
+    /// Returns the upper 32 bits of `self`.
+    fn upper_32_bits(self) -> u32;
+
+    /// Returns the lower 32 bits of `self`.
+    fn lower_32_bits(self) -> u32;
+}
+
+impl U64Ext for u64 {
+    fn from_u32s(high: u32, low: u32) -> Self {
+        ((high as u64) << u32::BITS) | low as u64
+    }
+
+    fn upper_32_bits(self) -> u32 {
+        (self >> u32::BITS) as u32
+    }
+
+    fn lower_32_bits(self) -> u32 {
+        self as u32
+    }
+}
+
+/// Same as [`U64Ext::upper_32_bits`], but defined outside of the trait so it can be used in a
+/// `const` context.
+pub const fn upper_32_bits(v: u64) -> u32 {
+    (v >> u32::BITS) as u32
+}
+
+/// Same as [`U64Ext::lower_32_bits`], but defined outside of the trait so it can be used in a
+/// `const` context.
+pub const fn lower_32_bits(v: u64) -> u32 {
+    v as u32
+}
+
+/// Same as [`U64Ext::from_u32s`], but defined outside of the trait so it can be used in a `const`
+/// context.
+pub const fn u64_from_u32s(high: u32, low: u32) -> u64 {
+    ((high as u64) << u32::BITS) | low as u64
+}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 2/7] rust: make ETIMEDOUT error available
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 1/7] rust: add useful ops for u64 Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 3/7] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 rust/kernel/error.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 1e510181432cceae46219f7ed3597a88b85ebe0a..475d14a4830774aa7717d3b5e70c7ff9de203dc2 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -65,6 +65,7 @@ macro_rules! declare_err {
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
     declare_err!(EOVERFLOW, "Value too large for defined data type.");
+    declare_err!(ETIMEDOUT, "Connection timed out.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 3/7] gpu: nova-core: derive useful traits for Chipset
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 1/7] rust: add useful ops for u64 Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 2/7] rust: make ETIMEDOUT error available Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 4/7] gpu: nova-core: add missing GA100 definition Alexandre Courbot
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

We will commonly need to compare chipset versions, so derive the
ordering traits to make that possible. Also derive Copy and Clone since
Chipsets are as cheap as an integer.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 17c9660da45034762edaa78e372d8821144cdeb7..4de67a2dc16302c00530026156d7264cbc7e5b32 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -13,7 +13,7 @@ macro_rules! define_chipset {
     ({ $($variant:ident = $value:expr),* $(,)* }) =>
     {
         /// Enum representation of the GPU chipset.
-        #[derive(fmt::Debug)]
+        #[derive(fmt::Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
         pub(crate) enum Chipset {
             $($variant = $value),*,
         }

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 4/7] gpu: nova-core: add missing GA100 definition
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
                   ` (2 preceding siblings ...)
  2025-03-20 13:39 ` [PATCH RFC v3 3/7] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 5/7] gpu: nova-core: use register!() to define register layout Alexandre Courbot
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

linux-firmware contains a directory for GA100, and it is a defined
chipset in Nouveau.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 4de67a2dc16302c00530026156d7264cbc7e5b32..9fe6aedaa9563799c2624d461d4e37ee9b094909 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -54,6 +54,7 @@ fn try_from(value: u32) -> Result<Self, Self::Error> {
     TU117 = 0x167,
     TU116 = 0x168,
     // Ampere
+    GA100 = 0x170,
     GA102 = 0x172,
     GA103 = 0x173,
     GA104 = 0x174,
@@ -73,7 +74,7 @@ pub(crate) fn arch(&self) -> Architecture {
             Self::TU102 | Self::TU104 | Self::TU106 | Self::TU117 | Self::TU116 => {
                 Architecture::Turing
             }
-            Self::GA102 | Self::GA103 | Self::GA104 | Self::GA106 | Self::GA107 => {
+            Self::GA100 | Self::GA102 | Self::GA103 | Self::GA104 | Self::GA106 | Self::GA107 => {
                 Architecture::Ampere
             }
             Self::AD102 | Self::AD103 | Self::AD104 | Self::AD106 | Self::AD107 => {

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 5/7] gpu: nova-core: use register!() to define register layout
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
                   ` (3 preceding siblings ...)
  2025-03-20 13:39 ` [PATCH RFC v3 4/7] gpu: nova-core: add missing GA100 definition Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device Alexandre Courbot
  2025-03-20 13:39 ` [PATCH RFC v3 7/7] gpu: nova-core: add falcon register definitions and probe code Alexandre Courbot
  6 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Use the register!() macro to define the layout for the Boot0 register
and use its accessors through the use of the convenience with_bar!()
macro, which uses Revocable::try_access() and converts its returned
Option into the proper error as needed.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |  5 ++--
 drivers/gpu/nova-core/nova_core.rs | 12 ++++++++
 drivers/gpu/nova-core/regs.rs      | 60 ++++++--------------------------------
 3 files changed, 23 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 9fe6aedaa9563799c2624d461d4e37ee9b094909..d96901e5c8eace1e7c57c77da7def209e8149cd3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -135,11 +135,10 @@ pub(crate) struct Spec {
 
 impl Spec {
     fn new(bar: &Devres<Bar0>) -> Result<Spec> {
-        let bar = bar.try_access().ok_or(ENXIO)?;
-        let boot0 = regs::Boot0::read(&bar);
+        let boot0 = with_bar!(bar, |b| regs::Boot0::read(b))?;
 
         Ok(Self {
-            chipset: boot0.chipset().try_into()?,
+            chipset: boot0.chipset()?,
             revision: Revision::from_boot0(boot0),
         })
     }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index a91cd924054b49966937a8db6aab9cd0614f10de..94f4778c16f6a4d046c2f799129ed0cc68df6fd4 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -2,6 +2,18 @@
 
 //! Nova Core GPU Driver
 
+#[macro_use]
+mod macros {
+    /// Convenience macro to run a closure while holding [`crate::driver::Bar0`].
+    ///
+    /// If the bar cannot be acquired, then `ENXIO` is returned.
+    macro_rules! with_bar {
+        ($bar:expr, $closure:expr) => {
+            $bar.try_access_with($closure).ok_or(ENXIO)
+        };
+    }
+}
+
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 50aefb150b0b1c9b73f07fca3b7a070885785485..7bfd2b575fe2184565d495012e55cd0829b0b1ad 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -1,55 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use crate::driver::Bar0;
+use core::ops::Deref;
+use kernel::io::Io;
+use kernel::register;
 
-// TODO
-//
-// Create register definitions via generic macros. See task "Generic register
-// abstraction" in Documentation/gpu/nova/core/todo.rst.
+use crate::gpu::Chipset;
 
-const BOOT0_OFFSET: usize = 0x00000000;
-
-// 3:0 - chipset minor revision
-const BOOT0_MINOR_REV_SHIFT: u8 = 0;
-const BOOT0_MINOR_REV_MASK: u32 = 0x0000000f;
-
-// 7:4 - chipset major revision
-const BOOT0_MAJOR_REV_SHIFT: u8 = 4;
-const BOOT0_MAJOR_REV_MASK: u32 = 0x000000f0;
-
-// 23:20 - chipset implementation Identifier (depends on architecture)
-const BOOT0_IMPL_SHIFT: u8 = 20;
-const BOOT0_IMPL_MASK: u32 = 0x00f00000;
-
-// 28:24 - chipset architecture identifier
-const BOOT0_ARCH_MASK: u32 = 0x1f000000;
-
-// 28:20 - chipset identifier (virtual register field combining BOOT0_IMPL and
-//         BOOT0_ARCH)
-const BOOT0_CHIPSET_SHIFT: u8 = BOOT0_IMPL_SHIFT;
-const BOOT0_CHIPSET_MASK: u32 = BOOT0_IMPL_MASK | BOOT0_ARCH_MASK;
-
-#[derive(Copy, Clone)]
-pub(crate) struct Boot0(u32);
-
-impl Boot0 {
-    #[inline]
-    pub(crate) fn read(bar: &Bar0) -> Self {
-        Self(bar.readl(BOOT0_OFFSET))
-    }
-
-    #[inline]
-    pub(crate) fn chipset(&self) -> u32 {
-        (self.0 & BOOT0_CHIPSET_MASK) >> BOOT0_CHIPSET_SHIFT
-    }
-
-    #[inline]
-    pub(crate) fn minor_rev(&self) -> u8 {
-        ((self.0 & BOOT0_MINOR_REV_MASK) >> BOOT0_MINOR_REV_SHIFT) as u8
-    }
-
-    #[inline]
-    pub(crate) fn major_rev(&self) -> u8 {
-        ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as u8
-    }
-}
+register!(Boot0@0x00000000, "Basic revision information about the GPU";
+    3:0     minor_rev => as u8, "minor revision of the chip";
+    7:4     major_rev => as u8, "major revision of the chip";
+    28:20   chipset => try_into Chipset, "chipset model"
+);

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
                   ` (4 preceding siblings ...)
  2025-03-20 13:39 ` [PATCH RFC v3 5/7] gpu: nova-core: use register!() to define register layout Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  2025-03-20 15:54   ` Daniel Brooks
  2025-03-20 18:17   ` Boqun Feng
  2025-03-20 13:39 ` [PATCH RFC v3 7/7] gpu: nova-core: add falcon register definitions and probe code Alexandre Courbot
  6 siblings, 2 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

Add a basic timer device and exercise it during device probing. This
first draft is probably very questionable.

One point in particular which should IMHO receive attention: the generic
wait_on() method aims at providing similar functionality to Nouveau's
nvkm_[num]sec() macros. Since this method will be heavily used with
different conditions to test, I'd like to avoid monomorphizing it
entirely with each instance ; that's something that is achieved in
nvkm_xsec() using functions that the macros invoke.

I have tried achieving the same result in Rust using closures (kept
as-is in the current code), but they seem to be monomorphized as well.
Calling extra functions could work better, but looks also less elegant
to me, so I am really open to suggestions here.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs    |   4 +-
 drivers/gpu/nova-core/gpu.rs       |  55 +++++++++++++++-
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  11 ++++
 drivers/gpu/nova-core/timer.rs     | 132 +++++++++++++++++++++++++++++++++++++
 5 files changed, 201 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x9500;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
@@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>
             GFP_KERNEL,
         )?;
 
+        let _ = this.gpu.test_timer();
+
         Ok(this)
     }
 }
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index d96901e5c8eace1e7c57c77da7def209e8149cd3..f010d3152530b1cec032ca620e59de51a2fc1a13 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -6,8 +6,10 @@
 
 use crate::driver::Bar0;
 use crate::regs;
+use crate::timer::Timer;
 use crate::util;
 use core::fmt;
+use core::time::Duration;
 
 macro_rules! define_chipset {
     ({ $($variant:ident = $value:expr),* $(,)* }) =>
@@ -179,6 +181,7 @@ pub(crate) struct Gpu {
     /// MMIO mapping of PCI BAR 0
     bar: Devres<Bar0>,
     fw: Firmware,
+    timer: Timer,
 }
 
 impl Gpu {
@@ -194,6 +197,56 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<
             spec.revision
         );
 
-        Ok(pin_init!(Self { spec, bar, fw }))
+        let timer = Timer::new();
+
+        Ok(pin_init!(Self {
+            spec,
+            bar,
+            fw,
+            timer,
+        }))
+    }
+
+    pub(crate) fn test_timer(&self) -> Result<()> {
+        pr_info!("testing timer subdev\n");
+        with_bar!(self.bar, |b| {
+            pr_info!("current timestamp: {}\n", self.timer.read(b))
+        })?;
+
+        if !matches!(
+            self.timer
+                .wait_on(&self.bar, Duration::from_millis(10), || Some(())),
+            Ok(())
+        ) {
+            pr_crit!("timer test failure\n");
+        }
+
+        let t1 = with_bar!(self.bar, |b| {
+            pr_info!("timestamp after immediate exit: {}\n", self.timer.read(b));
+            self.timer.read(b)
+        })?;
+
+        if self
+            .timer
+            .wait_on(&self.bar, Duration::from_millis(10), || Option::<()>::None)
+            != Err(ETIMEDOUT)
+        {
+            pr_crit!("timer test 2 failure\n");
+        }
+
+        let t2 = with_bar!(self.bar, |b| self.timer.read(b))?;
+        if t2 - t1 < Duration::from_millis(10) {
+            pr_crit!("timer test 3 failure\n");
+        }
+
+        with_bar!(self.bar, |b| {
+            pr_info!(
+                "timestamp after timeout: {} ({:?})\n",
+                self.timer.read(b),
+                t2 - t1
+            );
+        })?;
+
+        Ok(())
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 94f4778c16f6a4d046c2f799129ed0cc68df6fd4..f54dcfc66490cb6b10090ef944ac14feca9f6972 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -18,6 +18,7 @@ macro_rules! with_bar {
 mod firmware;
 mod gpu;
 mod regs;
+mod timer;
 mod util;
 
 kernel::module_pci_driver! {
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 7bfd2b575fe2184565d495012e55cd0829b0b1ad..0d06e09b1ba62d55688c633500f37d3fe1aeb30e 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -11,3 +11,14 @@
     7:4     major_rev => as u8, "major revision of the chip";
     28:20   chipset => try_into Chipset, "chipset model"
 );
+
+/* PTIMER */
+
+register!(PtimerTime0@0x00009400;
+    31:0    lo => as u32, "low 32-bits of the timer"
+);
+
+register!(PtimerTime1@0x00009410;
+    31:0    hi => as u32, "high 32 bits of the timer"
+);
+
diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
new file mode 100644
index 0000000000000000000000000000000000000000..1361e4ad10d923ce114972889cdfcfa6e58a691b
--- /dev/null
+++ b/drivers/gpu/nova-core/timer.rs
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Nova Core Timer subdevice
+
+use core::fmt::Display;
+use core::ops::{Add, Sub};
+use core::time::Duration;
+
+use kernel::devres::Devres;
+use kernel::num::U64Ext;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+/// A timestamp with nanosecond granularity obtained from the GPU timer.
+///
+/// A timestamp can also be substracted to another in order to obtain a [`Duration`].
+///
+/// TODO: add Kunit tests!
+#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) struct Timestamp(u64);
+
+impl Display for Timestamp {
+    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+        write!(f, "{}", self.0)
+    }
+}
+
+impl Add<Duration> for Timestamp {
+    type Output = Self;
+
+    fn add(mut self, rhs: Duration) -> Self::Output {
+        let mut nanos = rhs.as_nanos();
+        while nanos > u64::MAX as u128 {
+            self.0 = self.0.wrapping_add(nanos as u64);
+            nanos -= u64::MAX as u128;
+        }
+
+        Timestamp(self.0.wrapping_add(nanos as u64))
+    }
+}
+
+impl Sub for Timestamp {
+    type Output = Duration;
+
+    fn sub(self, rhs: Self) -> Self::Output {
+        Duration::from_nanos(self.0.wrapping_sub(rhs.0))
+    }
+}
+
+pub(crate) struct Timer {}
+
+impl Timer {
+    pub(crate) fn new() -> Self {
+        Self {}
+    }
+
+    /// Read the current timer timestamp.
+    pub(crate) fn read(&self, bar: &Bar0) -> Timestamp {
+        loop {
+            let hi = regs::PtimerTime1::read(bar);
+            let lo = regs::PtimerTime0::read(bar);
+
+            if hi.hi() == regs::PtimerTime1::read(bar).hi() {
+                return Timestamp(u64::from_u32s(hi.hi(), lo.lo()));
+            }
+        }
+    }
+
+    #[allow(dead_code)]
+    pub(crate) fn time(bar: &Bar0, time: u64) {
+        regs::PtimerTime1::default()
+            .set_hi(time.upper_32_bits())
+            .write(bar);
+        regs::PtimerTime0::default()
+            .set_lo(time.lower_32_bits())
+            .write(bar);
+    }
+
+    /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
+    ///
+    /// When `cond` evaluates to `Some`, its return value is returned.
+    ///
+    /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to
+    /// `Some`, or if the timer device is stuck for some reason.
+    pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
+        &self,
+        bar: &Devres<Bar0>,
+        timeout: Duration,
+        cond: F,
+    ) -> Result<R> {
+        // Number of consecutive time reads after which we consider the timer frozen if it hasn't
+        // moved forward.
+        const MAX_STALLED_READS: usize = 16;
+
+        let (mut cur_time, mut prev_time, deadline) = {
+            let cur_time = with_bar!(bar, |b| self.read(b))?;
+            let deadline = cur_time + timeout;
+
+            (cur_time, cur_time, deadline)
+        };
+        let mut num_reads = 0;
+
+        loop {
+            if let Some(ret) = cond() {
+                return Ok(ret);
+            }
+
+            (|| {
+                cur_time = with_bar!(bar, |b| self.read(b))?;
+
+                /* Check if the timer is frozen for some reason. */
+                if cur_time == prev_time {
+                    if num_reads >= MAX_STALLED_READS {
+                        return Err(ETIMEDOUT);
+                    }
+                    num_reads += 1;
+                } else {
+                    if cur_time >= deadline {
+                        return Err(ETIMEDOUT);
+                    }
+
+                    num_reads = 0;
+                    prev_time = cur_time;
+                }
+
+                Ok(())
+            })()?;
+        }
+    }
+}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC v3 7/7] gpu: nova-core: add falcon register definitions and probe code
  2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
                   ` (5 preceding siblings ...)
  2025-03-20 13:39 ` [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device Alexandre Courbot
@ 2025-03-20 13:39 ` Alexandre Courbot
  6 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-20 13:39 UTC (permalink / raw)
  To: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter
  Cc: linux-kernel, rust-for-linux, nouveau, dri-devel,
	Alexandre Courbot

This is still very preliminary work, and is mostly designed to show how
register fields can be turned into safe types that force us to handle
invalid values.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs    |   2 +-
 drivers/gpu/nova-core/falcon.rs    | 618 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gpu.rs       |  13 +
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      | 188 ++++++++++-
 5 files changed, 820 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 0cd23aa306e4082405f480afc0530a41131485e7..dee5fd22eecb2ce1f4ea765338b0c1b68853b2d3 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 0x9500;
+const BAR0_SIZE: usize = 0x1000000;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
new file mode 100644
index 0000000000000000000000000000000000000000..0dd4b45abbe0a62238efe24d899c55d5db348586
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -0,0 +1,618 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Falcon microprocessor base support
+
+// TODO: remove this once this module is actively used.
+#![allow(dead_code)]
+
+use core::hint::unreachable_unchecked;
+use core::marker::PhantomData;
+use core::time::Duration;
+use kernel::bindings;
+use kernel::devres::Devres;
+use kernel::{pci, prelude::*};
+
+use crate::driver::Bar0;
+use crate::gpu::Chipset;
+use crate::regs;
+use crate::timer::Timer;
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRev {
+    #[default]
+    Rev1 = 1,
+    Rev2 = 2,
+    Rev3 = 3,
+    Rev4 = 4,
+    Rev5 = 5,
+    Rev6 = 6,
+    Rev7 = 7,
+}
+
+impl TryFrom<u32> for FalconCoreRev {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        use FalconCoreRev::*;
+
+        let rev = match value {
+            1 => Rev1,
+            2 => Rev2,
+            3 => Rev3,
+            4 => Rev4,
+            5 => Rev5,
+            6 => Rev6,
+            7 => Rev7,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(rev)
+    }
+}
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone)]
+pub(crate) enum FalconSecurityModel {
+    #[default]
+    None = 0,
+    Light = 2,
+    Heavy = 3,
+}
+
+impl TryFrom<u32> for FalconSecurityModel {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        use FalconSecurityModel::*;
+
+        let sec_model = match value {
+            0 => None,
+            2 => Light,
+            3 => Heavy,
+            _ => return Err(EINVAL),
+        };
+
+        Ok(sec_model)
+    }
+}
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRevSubversion {
+    #[default]
+    Subversion0 = 0,
+    Subversion1 = 1,
+    Subversion2 = 2,
+    Subversion3 = 3,
+}
+
+impl From<u32> for FalconCoreRevSubversion {
+    fn from(value: u32) -> Self {
+        use FalconCoreRevSubversion::*;
+
+        match value & 0b11 {
+            0 => Subversion0,
+            1 => Subversion1,
+            2 => Subversion2,
+            3 => Subversion3,
+            // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
+            4..=u32::MAX => unsafe { unreachable_unchecked() },
+        }
+    }
+}
+
+#[repr(u8)]
+#[derive(Debug, Default, Copy, Clone, PartialEq, Eq)]
+pub(crate) enum FalconModSelAlgo {
+    #[default]
+    Rsa3k = 1,
+}
+
+impl TryFrom<u32> for FalconModSelAlgo {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        match value {
+            1 => Ok(FalconModSelAlgo::Rsa3k),
+            _ => Err(EINVAL),
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum RiscvCoreSelect {
+    Falcon = 0,
+    Riscv = 1,
+}
+
+impl From<bool> for RiscvCoreSelect {
+    fn from(value: bool) -> Self {
+        match value {
+            false => RiscvCoreSelect::Falcon,
+            true => RiscvCoreSelect::Riscv,
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum FalconMem {
+    Imem,
+    Dmem,
+}
+
+#[repr(C)]
+#[derive(Debug, Clone, Copy)]
+pub(crate) struct FalconUCodeDescV3 {
+    pub(crate) hdr: u32,
+    pub(crate) stored_size: u32,
+    pub(crate) pkc_data_offset: u32,
+    pub(crate) interface_offset: u32,
+    pub(crate) imem_phys_base: u32,
+    pub(crate) imem_load_size: u32,
+    pub(crate) imem_virt_base: u32,
+    pub(crate) dmem_phys_base: u32,
+    pub(crate) dmem_load_size: u32,
+    pub(crate) engine_id_mask: u16,
+    pub(crate) ucode_id: u8,
+    pub(crate) signature_count: u8,
+    pub(crate) signature_versions: u16,
+    _reserved: u16,
+}
+
+impl FalconUCodeDescV3 {
+    pub(crate) fn ver(&self) -> u8 {
+        ((self.hdr & 0xff00) >> 8) as u8
+    }
+
+    pub(crate) fn size(&self) -> usize {
+        ((self.hdr & 0xffff0000) >> 16) as usize
+    }
+}
+
+/// Trait defining the parameters of a given Falcon instance.
+pub(crate) trait FalconInstance {
+    /// Base I/O address for the falcon, relative from which its registers are accessed.
+    const BASE: usize;
+}
+
+pub(crate) struct Gsp;
+impl FalconInstance for Gsp {
+    const BASE: usize = 0x00110000;
+}
+pub(crate) type GspFalcon = Falcon<Gsp>;
+
+pub(crate) struct Sec2;
+impl FalconInstance for Sec2 {
+    const BASE: usize = 0x00840000;
+}
+pub(crate) type Sec2Falcon = Falcon<Sec2>;
+
+/// Contains the base parameters common to all Falcon instances.
+#[derive(Debug)]
+pub(crate) struct Falcon<I: FalconInstance> {
+    /// Chipset this falcon belongs to.
+    chipset: Chipset,
+    /// Whether this falcon is part of a dual falcon/riscv engine.
+    has_riscv: bool,
+    _instance: PhantomData<I>,
+}
+
+impl<I: FalconInstance> Falcon<I> {
+    pub(crate) fn new(
+        pdev: &pci::Device,
+        chipset: Chipset,
+        bar: &Devres<Bar0>,
+        need_riscv: bool,
+    ) -> Result<Self> {
+        let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, I::BASE))?;
+        let rev = hwcfg1.core_rev()?;
+        let subver = hwcfg1.core_rev_subversion();
+        let sec_model = hwcfg1.security_model()?;
+
+        if need_riscv {
+            let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, I::BASE))?;
+            if !hwcfg2.riscv() {
+                dev_err!(
+                    pdev.as_ref(),
+                    "riscv support requested on falcon that does not support it\n"
+                );
+                return Err(EINVAL);
+            }
+        }
+
+        dev_info!(
+            pdev.as_ref(),
+            "new falcon: {:?} {:?} {:?}",
+            rev,
+            subver,
+            sec_model
+        );
+
+        Ok(Self {
+            chipset,
+            has_riscv: need_riscv,
+            _instance: PhantomData,
+        })
+    }
+
+    fn select_falcon_core(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        if self.has_riscv {
+            let bcr_ctrl = with_bar!(bar, |b| regs::RiscvBcrCtrl::read(b, I::BASE))?;
+            if bcr_ctrl.core_select() != RiscvCoreSelect::Falcon {
+                with_bar!(bar, |b| regs::RiscvBcrCtrl::default()
+                    .set_core_select(RiscvCoreSelect::Falcon)
+                    .write(b, I::BASE))?;
+
+                timer.wait_on(bar, Duration::from_millis(10), || {
+                    bar.try_access_with(|b| regs::RiscvBcrCtrl::read(b, I::BASE))
+                        .and_then(|v| if v.valid() { Some(()) } else { None })
+                })?;
+            }
+        }
+
+        Ok(())
+    }
+
+    fn reset_wait_mem_scrubbing(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        /* TODO: is this needed? */
+        with_bar!(bar, |b| regs::FalconMailbox0::alter(b, I::BASE, |v| v))?;
+
+        timer.wait_on(bar, Duration::from_millis(20), || {
+            bar.try_access_with(|b| regs::FalconHwcfg2::read(b, I::BASE))
+                .and_then(|r| if r.mem_scrubbing() { Some(()) } else { None })
+        })
+    }
+
+    fn reset_prep(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        let _ = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, I::BASE))?;
+
+        // Expected to timeout apparently?
+        // TODO: check why with OpenRM.
+        let _ = timer.wait_on(bar, Duration::from_micros(150), || {
+            bar.try_access_with(|b| regs::FalconHwcfg2::read(b, I::BASE))
+                .and_then(|r| if r.unk_31() { Some(()) } else { None })
+        });
+
+        Ok(())
+    }
+
+    fn reset_eng(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        self.reset_prep(bar, timer)?;
+
+        with_bar!(bar, |b| regs::RiscvUnk3c0::alter(b, I::BASE, |v| v
+            .set_unk0(true)))?;
+
+        let _: Result<()> = timer.wait_on(bar, Duration::from_micros(10), || None);
+
+        with_bar!(bar, |b| regs::RiscvUnk3c0::alter(b, I::BASE, |v| v
+            .set_unk0(false)))?;
+
+        self.reset_wait_mem_scrubbing(bar, timer)?;
+
+        Ok(())
+    }
+
+    fn disable(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        self.select_falcon_core(bar, timer)?;
+
+        with_bar!(bar, |b| {
+            regs::FalconUnk0048::alter(b, I::BASE, |r| r.set_val0(0));
+
+            regs::FalconIrqmclr::default()
+                .set_val(u32::MAX)
+                .write(b, I::BASE)
+        })?;
+
+        self.reset_eng(bar, timer)
+    }
+
+    fn enable(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        self.reset_eng(bar, timer)?;
+        self.select_falcon_core(bar, timer)?;
+        self.reset_wait_mem_scrubbing(bar, timer)?;
+
+        with_bar!(bar, |b| {
+            // We write Boot0 into FalconRm, for some reason...
+            regs::FalconRm::default()
+                .set_val(regs::Boot0::read(b).into())
+                .write(b, I::BASE)
+        })
+    }
+
+    pub(crate) fn reset(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        self.disable(bar, timer)?;
+        self.enable(bar, timer)
+    }
+
+    fn dma_init(
+        &self,
+        bar: &Devres<Bar0>,
+        dma_handle: bindings::dma_addr_t,
+        mem: FalconMem,
+        xfer_len: u32,
+        sec: bool,
+    ) -> Result<regs::FalconDmaTrfCmd> {
+        with_bar!(bar, |b| {
+            regs::FalconDmaTrfBase::default()
+                .set_base((dma_handle >> 8) as u32)
+                .write(b, I::BASE);
+            regs::FalconDmaTrfBase1::default()
+                .set_base((dma_handle >> 40) as u16)
+                .write(b, I::BASE)
+        })?;
+
+        Ok(regs::FalconDmaTrfCmd::default()
+            .set_size((xfer_len.ilog2() - 2) as u8)
+            .set_imem(mem == FalconMem::Imem)
+            .set_sec(if sec { 1 } else { 0 }))
+    }
+
+    fn dma_xfer(
+        &self,
+        bar: &Devres<Bar0>,
+        mem_base: u32,
+        dma_base: u32,
+        cmd: regs::FalconDmaTrfCmd,
+    ) -> Result<()> {
+        with_bar!(bar, |b| {
+            regs::FalconDmaTrfMOffs::default()
+                .set_offs(mem_base)
+                .write(b, I::BASE);
+            regs::FalconDmaTrfBOffs::default()
+                .set_offs(dma_base)
+                .write(b, I::BASE);
+
+            cmd.write(b, I::BASE)
+        })
+    }
+
+    fn dma_done(&self, bar: &Devres<Bar0>, timer: &Timer) -> Result<()> {
+        timer.wait_on(bar, Duration::from_millis(2000), || {
+            bar.try_access_with(|b| regs::FalconDmaTrfCmd::read(b, I::BASE))
+                .and_then(|v| if v.idle() { Some(()) } else { None })
+        })
+    }
+
+    fn dma_wr(
+        &self,
+        bar: &Devres<Bar0>,
+        timer: &Timer,
+        dma_handle: bindings::dma_addr_t,
+        dma_base: u32,
+        mem: FalconMem,
+        mem_base: u32,
+        len: u32,
+        sec: bool,
+    ) -> Result<()> {
+        const DMA_LEN: u32 = 256;
+
+        let (dma_start, dma_addr) = match mem {
+            FalconMem::Imem => (0, dma_handle),
+            FalconMem::Dmem => (dma_base, dma_handle + dma_base as bindings::dma_addr_t),
+        };
+
+        pr_info!(
+            "dma write {:?}: dma_handle {:x} dma_start {:x} dma_addr {:x} len {:x}\n",
+            mem,
+            dma_handle,
+            dma_start,
+            dma_addr,
+            len
+        );
+
+        let cmd = self.dma_init(bar, dma_addr, mem, DMA_LEN, sec)?;
+
+        let mut dst = mem_base;
+        let mut src = dma_base;
+        let mut remain = len;
+
+        while remain >= DMA_LEN {
+            self.dma_xfer(bar, dst, src - dma_start, cmd)?;
+            self.dma_done(bar, timer)?;
+
+            src += DMA_LEN;
+            dst += DMA_LEN;
+            remain -= DMA_LEN;
+        }
+
+        pr_info!("dma write remain: {} bytes\n", remain);
+
+        Ok(())
+    }
+
+    pub(crate) fn dma_load(
+        &self,
+        bar: &Devres<Bar0>,
+        timer: &Timer,
+        dma_handle: bindings::dma_addr_t,
+        imem_params: (u32, u32, u32),
+        dmem_params: (u32, u32, u32),
+    ) -> Result<()> {
+        pr_info!("dma_load: {:?} {:?}\n", imem_params, dmem_params);
+
+        with_bar!(bar, |b| {
+            regs::FalconUnk624::alter(b, I::BASE, |v| v.set_unk7(true));
+            regs::FalconDmaCtl::default().write(b, I::BASE);
+            regs::FalconUnk600::alter(b, I::BASE, |v| v.set_unk16(false).set_unk2((1 << 2) | 1));
+        })?;
+
+        self.dma_wr(
+            bar,
+            timer,
+            dma_handle,
+            imem_params.0,
+            FalconMem::Imem,
+            imem_params.1,
+            imem_params.2,
+            true,
+        )?;
+        self.dma_wr(
+            bar,
+            timer,
+            dma_handle,
+            dmem_params.0,
+            FalconMem::Dmem,
+            dmem_params.1,
+            dmem_params.2,
+            true,
+        )?;
+
+        Ok(())
+    }
+
+    pub(crate) fn boot(
+        &self,
+        bar: &Devres<Bar0>,
+        timer: &Timer,
+        v3_desc: &FalconUCodeDescV3,
+        mbox0: Option<u32>,
+        mbox1: Option<u32>,
+    ) -> Result<(u32, u32)> {
+        pr_info!("boot 0\n");
+
+        // Program BROM registers for PKC signature validation.
+        if self.chipset >= Chipset::GA102 {
+            let pkc_data_offset = v3_desc.pkc_data_offset;
+            let engine_id_mask = v3_desc.engine_id_mask;
+            let ucode_id = v3_desc.ucode_id;
+
+            pr_info!(
+                "dmem_sign: {:#x}, engine_id: {:#x}, ucode_id: {:#x}",
+                pkc_data_offset,
+                engine_id_mask,
+                ucode_id
+            );
+
+            with_bar!(bar, |b| {
+                regs::FalconBromParaaddr0::default()
+                    .set_addr(pkc_data_offset)
+                    .write(b, I::BASE);
+                regs::FalconBromEngidmask::default()
+                    .set_mask(engine_id_mask as u32)
+                    .write(b, I::BASE);
+                regs::FalconBromCurrUcodeId::default()
+                    .set_ucode_id(ucode_id as u32)
+                    .write(b, I::BASE);
+                regs::FalconModSel::default()
+                    .set_algo(FalconModSelAlgo::Rsa3k)
+                    .write(b, I::BASE);
+            })?;
+        }
+
+        pr_info!("boot 1\n");
+
+        with_bar!(bar, |b| {
+            if let Some(mbox0) = mbox0 {
+                regs::FalconMailbox0::default()
+                    .set_mailbox0(mbox0)
+                    .write(b, I::BASE);
+            }
+
+            if let Some(mbox1) = mbox1 {
+                regs::FalconMailbox1::default()
+                    .set_mailbox1(mbox1)
+                    .write(b, I::BASE);
+            }
+
+            // Set `BootVec` to start of non-secure code.
+            // TODO: use boot vector variable - apparently this is 0 on v3 hdr?
+            regs::FalconBootVec::default()
+                .set_boot_vec(0)
+                .write(b, I::BASE);
+
+            regs::FalconCpuCtl::default()
+                .set_start_cpu(true)
+                .write(b, I::BASE);
+        })?;
+
+        pr_info!("booted!\n");
+        timer.wait_on(bar, Duration::from_secs(2), || {
+            bar.try_access()
+                .map(|b| regs::FalconCpuCtl::read(&*b, I::BASE))
+                .and_then(|v| if v.halted() { Some(()) } else { None })
+        })?;
+
+        let (mbox0, mbox1) = with_bar!(bar, |b| {
+            let mbox0 = regs::FalconMailbox0::read(b, I::BASE).mailbox0();
+            let mbox1 = regs::FalconMailbox1::read(b, I::BASE).mailbox1();
+
+            (mbox0, mbox1)
+        })?;
+
+        pr_info!("successfully returned {} {}\n", mbox0, mbox1);
+
+        Ok((mbox0, mbox1))
+    }
+}
+
+#[repr(C)]
+#[derive(Debug)]
+struct FalconAppifHdrV1 {
+    ver: u8,
+    hdr: u8,
+    len: u8,
+    cnt: u8,
+}
+
+#[repr(C)]
+#[derive(Debug)]
+struct FalconAppifV1 {
+    id: u32,
+    dmem_base: u32,
+}
+
+const NVFW_FALCON_APPIF_ID_DMEMMAPPER: u32 = 0x4;
+
+#[repr(C)]
+#[derive(Debug)]
+struct FalconAppifDmemmapperV3 {
+    signature: u32,
+    version: u16,
+    size: u16,
+    cmd_in_buffer_offset: u32,
+    cmd_in_buffer_size: u32,
+    cmd_out_buffer_offset: u32,
+    cmd_out_buffer_size: u32,
+    nvf_img_data_buffer_offset: u32,
+    nvf_img_data_buffer_size: u32,
+    printf_buffer_hdr: u32,
+    ucode_build_time_stamp: u32,
+    ucode_signature: u32,
+    init_cmd: u32,
+    ucode_feature: u32,
+    ucode_cmd_mask0: u32,
+    ucode_cmd_mask1: u32,
+    multi_tgt_tbl: u32,
+}
+
+pub(crate) const NVFW_FALCON_APPIF_DMEMMAPPER_CMD_FRTS: u32 = 0x15;
+
+#[derive(Debug)]
+#[repr(C)]
+struct ReadVbios {
+    ver: u32,
+    hdr: u32,
+    addr: u64,
+    size: u32,
+    flags: u32,
+}
+
+#[derive(Debug)]
+#[repr(C)]
+struct FrtsRegion {
+    ver: u32,
+    hdr: u32,
+    addr: u32,
+    size: u32,
+    ftype: u32,
+}
+
+const NVFW_FRTS_CMD_REGION_TYPE_FB: u32 = 2;
+
+#[derive(Debug)]
+#[repr(C)]
+struct FrtsCmd {
+    read_vbios: ReadVbios,
+    frts_region: FrtsRegion,
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index f010d3152530b1cec032ca620e59de51a2fc1a13..ec745dd8175bd3164ed1b865293a526b09c59ab3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -5,6 +5,7 @@
 };
 
 use crate::driver::Bar0;
+use crate::falcon::{GspFalcon, Sec2Falcon};
 use crate::regs;
 use crate::timer::Timer;
 use crate::util;
@@ -198,6 +199,18 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<
         );
 
         let timer = Timer::new();
+        let _gsp_falcon = GspFalcon::new(
+            pdev,
+            spec.chipset,
+            &bar,
+            if spec.chipset > Chipset::GA100 {
+                true
+            } else {
+                false
+            },
+        )?;
+
+        let _sec2_falcon = Sec2Falcon::new(pdev, spec.chipset, &bar, false)?;
 
         Ok(pin_init!(Self {
             spec,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index f54dcfc66490cb6b10090ef944ac14feca9f6972..35c030485532633a5dd59a8a4a1f6d385cb46c98 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -15,6 +15,7 @@ macro_rules! with_bar {
 }
 
 mod driver;
+mod falcon;
 mod firmware;
 mod gpu;
 mod regs;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 0d06e09b1ba62d55688c633500f37d3fe1aeb30e..2952fa7f84c274f122bc12e5506b0b2ac0fbb82d 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -2,8 +2,11 @@
 
 use core::ops::Deref;
 use kernel::io::Io;
-use kernel::register;
+use kernel::{register, register_rel};
 
+use crate::falcon::{
+    FalconCoreRev, FalconCoreRevSubversion, FalconModSelAlgo, FalconSecurityModel, RiscvCoreSelect,
+};
 use crate::gpu::Chipset;
 
 register!(Boot0@0x00000000, "Basic revision information about the GPU";
@@ -22,3 +25,186 @@
     31:0    hi => as u32, "high 32 bits of the timer"
 );
 
+/* PFALCON */
+
+register_rel!(FalconIrqsclr@0x00000004;
+    4:4     halt => as_bit bool;
+    6:6     swgen0 => as_bit bool;
+);
+
+register_rel!(FalconIrqstat@0x00000008;
+    4:4     halt => as_bit bool;
+    6:6     swgen0 => as_bit bool;
+);
+
+register_rel!(FalconIrqmclr@0x00000014;
+    31:0    val => as u32
+);
+
+register_rel!(FalconIrqmask@0x00000018;
+    31:0    val => as u32
+);
+
+register_rel!(FalconRm@0x00000084;
+    31:0    val => as u32
+);
+
+register_rel!(FalconIrqdest@0x0000001c;
+    31:0    val => as u32
+);
+
+register_rel!(FalconMailbox0@0x00000040;
+    31:0    mailbox0 => as u32
+);
+register_rel!(FalconMailbox1@0x00000044;
+    31:0    mailbox1 => as u32
+);
+
+register_rel!(FalconUnk0048@0x00000048;
+    1:0     val0 => as u32
+);
+
+register_rel!(FalconHwcfg2@0x000000f4;
+    10:10   riscv => as_bit bool;
+    12:12   mem_scrubbing => as_bit bool;
+    31:31   unk_31 => as_bit bool;
+);
+
+register_rel!(FalconCpuCtl@0x00000100;
+    1:1     start_cpu => as_bit bool;
+    4:4     halted => as_bit bool;
+    6:6     alias_en => as_bit bool;
+);
+register_rel!(FalconBootVec@0x00000104;
+    31:0    boot_vec => as u32
+);
+
+register_rel!(FalconHwCfg@0x00000108;
+    8:0     imem_size => as u32;
+    17:9    dmem_size => as u32;
+);
+
+register_rel!(FalconDmaCtl@0x0000010c;
+    0:0     require_ctx => as_bit bool;
+    1:1     dmem_scrubbing  => as_bit bool;
+    2:2     imem_scrubbing => as_bit bool;
+    6:3     dmaq_num => as_bit u8;
+    7:7     secure_stat => as_bit bool;
+);
+
+register_rel!(FalconDmaTrfBase@0x00000110;
+    31:0    base => as u32;
+);
+
+register_rel!(FalconDmaTrfMOffs@0x00000114;
+    23:0    offs => as u32;
+);
+
+register_rel!(FalconDmaTrfCmd@0x00000118;
+    0:0     full => as_bit bool;
+    1:1     idle => as_bit bool;
+    3:2     sec => as_bit u8;
+    4:4     imem => as_bit bool;
+    5:5     is_write => as_bit bool;
+    10:8    size => as u8;
+    14:12   ctxdma => as u8;
+    16:16   set_dmtag => as u8;
+);
+
+register_rel!(FalconDmaTrfBOffs@0x0000011c;
+    31:0    offs => as u32;
+);
+
+register_rel!(FalconDmaTrfBase1@0x00000128;
+    8:0     base => as u16;
+);
+
+register_rel!(FalconHwcfg1@0x0000012c;
+    3:0     core_rev => try_into FalconCoreRev, "core revision of the falcon";
+    5:4     security_model => try_into FalconSecurityModel, "security model of the falcon";
+    7:6     core_rev_subversion => into FalconCoreRevSubversion;
+    11:8    imem_ports => as u8;
+    15:12   dmem_ports => as u8;
+);
+
+// TODO: This should be able to take an index, like +0x180[16; 8]? Then the constructor or read
+// method take the port we want to address as argument.
+register_rel!(FalconImemC@0x00000180;
+    7:2     offs => as u8;
+    23:8    blk => as u8;
+    24:24   aincw => as_bit bool;
+    25:25   aincr => as_bit bool;
+    28:28   secure => as_bit bool;
+    29:29   sec_atomic => as_bit bool;
+);
+
+register_rel!(FalconImemD@0x00000184;
+    31:0    data => as u32;
+);
+
+register_rel!(FalconImemT@0x00000188;
+    15:0    data => as u16;
+);
+
+register_rel!(FalconDmemC@0x000001c0;
+    23:0    addr => as u32;
+    7:2     offs => as u8;
+    23:8    blk => as u8;
+    24:24   aincw => as_bit bool;
+    25:25   aincr => as_bit bool;
+    26:26   settag => as_bit bool;
+    27:27   setlvl => as_bit bool;
+    28:28   va => as_bit bool;
+    29:29   miss => as_bit bool;
+);
+
+register_rel!(FalconDmemD@0x000001c4;
+    31:0    data => as u32;
+);
+
+register_rel!(FalconModSel@0x00001180;
+    7:0     algo => try_into FalconModSelAlgo;
+);
+register_rel!(FalconBromCurrUcodeId@0x00001198;
+    31:0    ucode_id => as u32;
+);
+register_rel!(FalconBromEngidmask@0x0000119c;
+    31:0    mask => as u32;
+);
+register_rel!(FalconBromParaaddr0@0x00001210;
+    31:0    addr => as u32;
+);
+
+register_rel!(RiscvCpuCtl@0x00000388;
+    0:0     startcpu => as_bit bool;
+    4:4     halted => as_bit bool;
+    5:5     stopped => as_bit bool;
+    7:7     active_stat => as_bit bool;
+);
+
+register_rel!(RiscvUnk3c0@0x000003c0;
+    0:0     unk0 => as_bit bool;
+);
+
+register_rel!(RiscvIrqmask@0x00000528;
+    31:0    mask => as u32;
+);
+
+register_rel!(RiscvIrqdest@0x0000052c;
+    31:0    dest => as u32;
+);
+
+register_rel!(FalconUnk600@0x00000600;
+    16:16   unk16 => as_bit bool;
+    2:0     unk2 => as u8;
+);
+
+register_rel!(FalconUnk624@0x00000624;
+    7:7     unk7 => as_bit bool;
+);
+
+register_rel!(RiscvBcrCtrl@0x00001668;
+    0:0     valid => as_bit bool;
+    4:4     core_select => as_bit RiscvCoreSelect;
+    8:8     br_fetch => as_bit bool;
+);

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-20 13:39 ` [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device Alexandre Courbot
@ 2025-03-20 15:54   ` Daniel Brooks
  2025-03-21  3:09     ` Alexandre Courbot
  2025-03-20 18:17   ` Boqun Feng
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Brooks @ 2025-03-20 15:54 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter, linux-kernel, rust-for-linux,
	nouveau, dri-devel

Alexandre Courbot <acourbot@nvidia.com> writes:

> +impl Add<Duration> for Timestamp {
> +    type Output = Self;
> +
> +    fn add(mut self, rhs: Duration) -> Self::Output {
> +        let mut nanos = rhs.as_nanos();
> +        while nanos > u64::MAX as u128 {
> +            self.0 = self.0.wrapping_add(nanos as u64);
> +            nanos -= u64::MAX as u128;
> +        }
> +
> +        Timestamp(self.0.wrapping_add(nanos as u64))
> +    }
> +}

Perhaps I missed something, but couldn’t you simplify this method like
this:

    fn add(mut self, rhs: Duration) -> Self::Output {
        let stamp = self.0 as u128;
        Timestamp(stamp.wrapping_add(rhs.as_nanos()) as u64)
    }

db48x

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-20 13:39 ` [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device Alexandre Courbot
  2025-03-20 15:54   ` Daniel Brooks
@ 2025-03-20 18:17   ` Boqun Feng
  2025-03-21  5:41     ` Alexandre Courbot
  1 sibling, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2025-03-20 18:17 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Simona Vetter, linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Alexandre,

On Thu, Mar 20, 2025 at 10:39:14PM +0900, Alexandre Courbot wrote:
> Add a basic timer device and exercise it during device probing. This
> first draft is probably very questionable.
> 
> One point in particular which should IMHO receive attention: the generic
> wait_on() method aims at providing similar functionality to Nouveau's
> nvkm_[num]sec() macros. Since this method will be heavily used with
> different conditions to test, I'd like to avoid monomorphizing it
> entirely with each instance ; that's something that is achieved in
> nvkm_xsec() using functions that the macros invoke.
> 
> I have tried achieving the same result in Rust using closures (kept
> as-is in the current code), but they seem to be monomorphized as well.
> Calling extra functions could work better, but looks also less elegant
> to me, so I am really open to suggestions here.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/driver.rs    |   4 +-
>  drivers/gpu/nova-core/gpu.rs       |  55 +++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |  11 ++++
>  drivers/gpu/nova-core/timer.rs     | 132 +++++++++++++++++++++++++++++++++++++
>  5 files changed, 201 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index 63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>      pub(crate) gpu: Gpu,
>  }
>  
> -const BAR0_SIZE: usize = 8;
> +const BAR0_SIZE: usize = 0x9500;
>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>  
>  kernel::pci_device_table!(
> @@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>
>              GFP_KERNEL,
>          )?;
>  
> +        let _ = this.gpu.test_timer();
> +
>          Ok(this)
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index d96901e5c8eace1e7c57c77da7def209e8149cd3..f010d3152530b1cec032ca620e59de51a2fc1a13 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -6,8 +6,10 @@
>  
>  use crate::driver::Bar0;
>  use crate::regs;
> +use crate::timer::Timer;
>  use crate::util;
>  use core::fmt;
> +use core::time::Duration;
>  

Since there is already a Delta type proposed to represent the timestamp
difference in kernel:

	https://lore.kernel.org/rust-for-linux/20250220070611.214262-4-fujita.tomonori@gmail.com/

, could you please make your work based on that and avoid using
core::time::Duration. Thanks!

>  macro_rules! define_chipset {
>      ({ $($variant:ident = $value:expr),* $(,)* }) =>
> @@ -179,6 +181,7 @@ pub(crate) struct Gpu {
>      /// MMIO mapping of PCI BAR 0
>      bar: Devres<Bar0>,
>      fw: Firmware,
> +    timer: Timer,
>  }
>  
[...]
> +
> +/// A timestamp with nanosecond granularity obtained from the GPU timer.
> +///
> +/// A timestamp can also be substracted to another in order to obtain a [`Duration`].
> +///
> +/// TODO: add Kunit tests!
> +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
> +pub(crate) struct Timestamp(u64);
> +

Also an Instant type has been proposed and reviewed for a while:

	https://lore.kernel.org/rust-for-linux/20250220070611.214262-5-fujita.tomonori@gmail.com/

we should use that type instead of re-inventing the wheel here. Of
course, it's currently not quite working because Instant is only for
CLOCK_MONOTONIC. But there was a proposal to make `Instant` generic over
clock:

	https://lore.kernel.org/rust-for-linux/20230714-rust-time-v2-1-f5aed84218c4@asahilina.net/

if you follow that design, you can implement a `Instant<NovaGpu>`, where

    ipml Now for NovaGpu {
        fn now() -> Instant<Self> {
	    // your Timer::read() implementation.
	}
    }

Thanks!

Regards,
Boqun

> +impl Display for Timestamp {
> +    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
> +        write!(f, "{}", self.0)
> +    }
> +}
> +
> +impl Add<Duration> for Timestamp {
> +    type Output = Self;
> +
> +    fn add(mut self, rhs: Duration) -> Self::Output {
> +        let mut nanos = rhs.as_nanos();
> +        while nanos > u64::MAX as u128 {
> +            self.0 = self.0.wrapping_add(nanos as u64);
> +            nanos -= u64::MAX as u128;
> +        }
> +
> +        Timestamp(self.0.wrapping_add(nanos as u64))
> +    }
> +}
> +
> +impl Sub for Timestamp {
> +    type Output = Duration;
> +
> +    fn sub(self, rhs: Self) -> Self::Output {
> +        Duration::from_nanos(self.0.wrapping_sub(rhs.0))
> +    }
> +}
> +
> +pub(crate) struct Timer {}
> +
> +impl Timer {
> +    pub(crate) fn new() -> Self {
> +        Self {}
> +    }
> +
> +    /// Read the current timer timestamp.
> +    pub(crate) fn read(&self, bar: &Bar0) -> Timestamp {
> +        loop {
> +            let hi = regs::PtimerTime1::read(bar);
> +            let lo = regs::PtimerTime0::read(bar);
> +
> +            if hi.hi() == regs::PtimerTime1::read(bar).hi() {
> +                return Timestamp(u64::from_u32s(hi.hi(), lo.lo()));
> +            }
> +        }
> +    }
> +
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-20 15:54   ` Daniel Brooks
@ 2025-03-21  3:09     ` Alexandre Courbot
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-21  3:09 UTC (permalink / raw)
  To: Daniel Brooks
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter, linux-kernel, rust-for-linux,
	nouveau, dri-devel

On Fri Mar 21, 2025 at 12:54 AM JST, Daniel Brooks wrote:
> Alexandre Courbot <acourbot@nvidia.com> writes:
>
>> +impl Add<Duration> for Timestamp {
>> +    type Output = Self;
>> +
>> +    fn add(mut self, rhs: Duration) -> Self::Output {
>> +        let mut nanos = rhs.as_nanos();
>> +        while nanos > u64::MAX as u128 {
>> +            self.0 = self.0.wrapping_add(nanos as u64);
>> +            nanos -= u64::MAX as u128;
>> +        }
>> +
>> +        Timestamp(self.0.wrapping_add(nanos as u64))
>> +    }
>> +}
>
> Perhaps I missed something, but couldn’t you simplify this method like
> this:
>
>     fn add(mut self, rhs: Duration) -> Self::Output {
>         let stamp = self.0 as u128;
>         Timestamp(stamp.wrapping_add(rhs.as_nanos()) as u64)
>     }

You are absolutely right, this loop will just wrap self.0 to the same value
after the initial pass.

... or at least it would if there weren't two bugs in it. >_<

- It adds the lower part of nanos while substracting u64::MAX from rhs,
  which is completely wrong;
- Substracting u64::MAX is also wrong, it should be u64::MAX as u128 +
  1.

So thanks a lot for pointing this out!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-20 18:17   ` Boqun Feng
@ 2025-03-21  5:41     ` Alexandre Courbot
  2025-03-21 16:20       ` Daniel Brooks
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-21  5:41 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Danilo Krummrich, David Airlie, John Hubbard, Ben Skeggs,
	Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Simona Vetter, linux-kernel, rust-for-linux, nouveau, dri-devel

Hi Boqun,

On Fri Mar 21, 2025 at 3:17 AM JST, Boqun Feng wrote:
> Hi Alexandre,
>
> On Thu, Mar 20, 2025 at 10:39:14PM +0900, Alexandre Courbot wrote:
>> Add a basic timer device and exercise it during device probing. This
>> first draft is probably very questionable.
>> 
>> One point in particular which should IMHO receive attention: the generic
>> wait_on() method aims at providing similar functionality to Nouveau's
>> nvkm_[num]sec() macros. Since this method will be heavily used with
>> different conditions to test, I'd like to avoid monomorphizing it
>> entirely with each instance ; that's something that is achieved in
>> nvkm_xsec() using functions that the macros invoke.
>> 
>> I have tried achieving the same result in Rust using closures (kept
>> as-is in the current code), but they seem to be monomorphized as well.
>> Calling extra functions could work better, but looks also less elegant
>> to me, so I am really open to suggestions here.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/driver.rs    |   4 +-
>>  drivers/gpu/nova-core/gpu.rs       |  55 +++++++++++++++-
>>  drivers/gpu/nova-core/nova_core.rs |   1 +
>>  drivers/gpu/nova-core/regs.rs      |  11 ++++
>>  drivers/gpu/nova-core/timer.rs     | 132 +++++++++++++++++++++++++++++++++++++
>>  5 files changed, 201 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>> index 63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7 100644
>> --- a/drivers/gpu/nova-core/driver.rs
>> +++ b/drivers/gpu/nova-core/driver.rs
>> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>>      pub(crate) gpu: Gpu,
>>  }
>>  
>> -const BAR0_SIZE: usize = 8;
>> +const BAR0_SIZE: usize = 0x9500;
>>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>>  
>>  kernel::pci_device_table!(
>> @@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>
>>              GFP_KERNEL,
>>          )?;
>>  
>> +        let _ = this.gpu.test_timer();
>> +
>>          Ok(this)
>>      }
>>  }
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index d96901e5c8eace1e7c57c77da7def209e8149cd3..f010d3152530b1cec032ca620e59de51a2fc1a13 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -6,8 +6,10 @@
>>  
>>  use crate::driver::Bar0;
>>  use crate::regs;
>> +use crate::timer::Timer;
>>  use crate::util;
>>  use core::fmt;
>> +use core::time::Duration;
>>  
>
> Since there is already a Delta type proposed to represent the timestamp
> difference in kernel:
>
> 	https://lore.kernel.org/rust-for-linux/20250220070611.214262-4-fujita.tomonori@gmail.com/
>
> , could you please make your work based on that and avoid using
> core::time::Duration. Thanks!
>
>>  macro_rules! define_chipset {
>>      ({ $($variant:ident = $value:expr),* $(,)* }) =>
>> @@ -179,6 +181,7 @@ pub(crate) struct Gpu {
>>      /// MMIO mapping of PCI BAR 0
>>      bar: Devres<Bar0>,
>>      fw: Firmware,
>> +    timer: Timer,
>>  }
>>  
> [...]
>> +
>> +/// A timestamp with nanosecond granularity obtained from the GPU timer.
>> +///
>> +/// A timestamp can also be substracted to another in order to obtain a [`Duration`].
>> +///
>> +/// TODO: add Kunit tests!
>> +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>> +pub(crate) struct Timestamp(u64);
>> +
>
> Also an Instant type has been proposed and reviewed for a while:
>
> 	https://lore.kernel.org/rust-for-linux/20250220070611.214262-5-fujita.tomonori@gmail.com/
>
> we should use that type instead of re-inventing the wheel here. Of
> course, it's currently not quite working because Instant is only for
> CLOCK_MONOTONIC. But there was a proposal to make `Instant` generic over
> clock:
>
> 	https://lore.kernel.org/rust-for-linux/20230714-rust-time-v2-1-f5aed84218c4@asahilina.net/
>
> if you follow that design, you can implement a `Instant<NovaGpu>`, where
>
>     ipml Now for NovaGpu {
>         fn now() -> Instant<Self> {
> 	    // your Timer::read() implementation.
> 	}
>     }

Ah, thanks for pointing this out. I'll keep track of these patches,
hopefully they get merged soon!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-21  5:41     ` Alexandre Courbot
@ 2025-03-21 16:20       ` Daniel Brooks
  2025-03-24  1:03         ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Brooks @ 2025-03-21 16:20 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Boqun Feng, Danilo Krummrich, David Airlie, John Hubbard,
	Ben Skeggs, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter, linux-kernel, rust-for-linux,
	nouveau, dri-devel

"Alexandre Courbot" <acourbot@nvidia.com> writes:

> Hi Boqun,
>
> On Fri Mar 21, 2025 at 3:17 AM JST, Boqun Feng wrote:
>> Also an Instant type has been proposed and reviewed for a while:
>>
>> 	https://lore.kernel.org/rust-for-linux/20250220070611.214262-5-fujita.tomonori@gmail.com/
>>
>> we should use that type instead of re-inventing the wheel here. Of
>> course, it's currently not quite working because Instant is only for
>> CLOCK_MONOTONIC. But there was a proposal to make `Instant` generic over
>> clock:
>>
>> 	https://lore.kernel.org/rust-for-linux/20230714-rust-time-v2-1-f5aed84218c4@asahilina.net/
>>
>> if you follow that design, you can implement a `Instant<NovaGpu>`, where
>>
>>     ipml Now for NovaGpu {
>>         fn now() -> Instant<Self> {
>> 	    // your Timer::read() implementation.
>> 	}
>>     }
>
> Ah, thanks for pointing this out. I'll keep track of these patches,
> hopefully they get merged soon!

Would that actually work though? Instant is a ktime_t, which is a signed
i64 rather than a u64. When I read your patch I assumed that you had to
add your own Timestamp because the value had to be whatever was read
from the GPU, independent of the clock value kept by the system or other
hardware.

db48x

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device
  2025-03-21 16:20       ` Daniel Brooks
@ 2025-03-24  1:03         ` Alexandre Courbot
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2025-03-24  1:03 UTC (permalink / raw)
  To: Daniel Brooks
  Cc: Boqun Feng, Danilo Krummrich, David Airlie, John Hubbard,
	Ben Skeggs, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Simona Vetter, linux-kernel, rust-for-linux,
	nouveau, dri-devel

On Sat Mar 22, 2025 at 1:20 AM JST, Daniel Brooks wrote:
> "Alexandre Courbot" <acourbot@nvidia.com> writes:
>
>> Hi Boqun,
>>
>> On Fri Mar 21, 2025 at 3:17 AM JST, Boqun Feng wrote:
>>> Also an Instant type has been proposed and reviewed for a while:
>>>
>>> 	https://lore.kernel.org/rust-for-linux/20250220070611.214262-5-fujita.tomonori@gmail.com/
>>>
>>> we should use that type instead of re-inventing the wheel here. Of
>>> course, it's currently not quite working because Instant is only for
>>> CLOCK_MONOTONIC. But there was a proposal to make `Instant` generic over
>>> clock:
>>>
>>> 	https://lore.kernel.org/rust-for-linux/20230714-rust-time-v2-1-f5aed84218c4@asahilina.net/
>>>
>>> if you follow that design, you can implement a `Instant<NovaGpu>`, where
>>>
>>>     ipml Now for NovaGpu {
>>>         fn now() -> Instant<Self> {
>>> 	    // your Timer::read() implementation.
>>> 	}
>>>     }
>>
>> Ah, thanks for pointing this out. I'll keep track of these patches,
>> hopefully they get merged soon!
>
> Would that actually work though? Instant is a ktime_t, which is a signed
> i64 rather than a u64. When I read your patch I assumed that you had to
> add your own Timestamp because the value had to be whatever was read
> from the GPU, independent of the clock value kept by the system or other
> hardware.

That's correct, and honestly there would be little benefit in using an
already existing type outside of not re-writing very similar code as GPU
time stamps are only used locally anyway. I'll keep an eye on these
patches though in case we can harmonize things a bit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-24  1:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-20 13:39 [RFC PATCH v3 0/7] gpu: nova-core: register definitions and basic timer and falcon devices Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 1/7] rust: add useful ops for u64 Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 2/7] rust: make ETIMEDOUT error available Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 3/7] gpu: nova-core: derive useful traits for Chipset Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 4/7] gpu: nova-core: add missing GA100 definition Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 5/7] gpu: nova-core: use register!() to define register layout Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 6/7] gpu: nova-core: add basic timer device Alexandre Courbot
2025-03-20 15:54   ` Daniel Brooks
2025-03-21  3:09     ` Alexandre Courbot
2025-03-20 18:17   ` Boqun Feng
2025-03-21  5:41     ` Alexandre Courbot
2025-03-21 16:20       ` Daniel Brooks
2025-03-24  1:03         ` Alexandre Courbot
2025-03-20 13:39 ` [PATCH RFC v3 7/7] gpu: nova-core: add falcon register definitions and probe code Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).