[PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes
@ 2026-06-15 14:40 Eliot Courtney
  2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
                   ` (12 more replies)
  0 siblings, 13 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Several fixes and clean-ups for blackwell related functionality. In
particular:

- FSP communication hardening
- Fix for Coherent allocation lifetime issues during FMC boot
- Convert some raw DMA handle stores into &Coherent stores to help
  prevent future issues with the allocation not staying alive long
  enough
- Move wait for FSP boot earlier - AFAICT, it's not valid to access
  certain registers before this is done.
- Make FbLayout code more obvious and correct
  Currently, the frts vidmem  offset is calculated based on the non-wpr
  heap size and pmu reservation size, but AFAICT this is not right. The
  fb layout actually looks like:
  | non-wpr heap | WPR2 .. FRTS | PMU reserved | ... | VGA workspace |
  It's just by coincidence + generous alignment that the values happened
  to match with something more like pmu reserved size + vga workspace.

  Originally, I thought it would make sense to use the offset of
  FbLayout::frts to compute frts vidmem offset, but actually the offsets
  in FbLayout AFAICT don't make sense on post-FSP.

  `FbLayout` is used for both pre and post FSP architectures. FbLayout
  contains ranges for each region of framebuffer, but on post FSP
  architectures, only the size is actually used by GSP. The offsets are
  not decided by the driver. So, for post FSP architectures FbLayout
  contains essentially guesses for the offsets. Instead, make separate
  types so that we only store the information that's actually needed.
  This includes the actual reserved size after the pmu reservation so we
  can properly compute the frts offset.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
Eliot Courtney (13):
      gpu: nova-core: fsp: limit FSP receive message allocation size
      gpu: nova-core: fsp: catch bogus queue pointer issues
      gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
      gpu: nova-core: falcon: gsp: move PRIV target mask constants
      gpu: nova-core: gsp: keep FMC boot params DMA region alive during error
      gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot
      gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations
      gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough
      gpu: nova-core: wait for FSP boot earlier
      gpu: nova-core: split FbLayout into FSP and non-FSP versions
      gpu: nova-core: correct FRTS vidmem offset calculation
      gpu: nova-core: rename heap size field
      gpu: nova-core: return non-WPR heap size as u64 from HALs

 drivers/gpu/nova-core/falcon/fsp.rs    |  70 ++++++++++++++-----
 drivers/gpu/nova-core/falcon/gsp.rs    |  11 +--
 drivers/gpu/nova-core/fb.rs            |  83 +++++++++++++++++++----
 drivers/gpu/nova-core/fb/hal.rs        |   5 +-
 drivers/gpu/nova-core/fb/hal/ga100.rs  |   6 +-
 drivers/gpu/nova-core/fb/hal/ga102.rs  |   6 +-
 drivers/gpu/nova-core/fb/hal/gb100.rs  |   9 ++-
 drivers/gpu/nova-core/fb/hal/gb202.rs  |   9 ++-
 drivers/gpu/nova-core/fb/hal/gh100.rs  |   8 ++-
 drivers/gpu/nova-core/fb/hal/tu102.rs  |  14 +++-
 drivers/gpu/nova-core/fsp.rs           | 119 ++++++++++++++++++---------------
 drivers/gpu/nova-core/fsp/hal.rs       |   4 +-
 drivers/gpu/nova-core/fsp/hal/gb100.rs |   4 +-
 drivers/gpu/nova-core/fsp/hal/gb202.rs |   4 +-
 drivers/gpu/nova-core/fsp/hal/gh100.rs |  10 +--
 drivers/gpu/nova-core/gpu.rs           |   7 +-
 drivers/gpu/nova-core/gpu/hal.rs       |   6 +-
 drivers/gpu/nova-core/gpu/hal/gh100.rs |  10 ++-
 drivers/gpu/nova-core/gpu/hal/tu102.rs |   3 +-
 drivers/gpu/nova-core/gsp.rs           |   8 +--
 drivers/gpu/nova-core/gsp/boot.rs      |  10 +--
 drivers/gpu/nova-core/gsp/fw.rs        |  95 ++++++++++++++++++++------
 drivers/gpu/nova-core/gsp/hal.rs       |   4 +-
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  39 +++++------
 drivers/gpu/nova-core/gsp/hal/tu102.rs |  26 ++++---
 drivers/gpu/nova-core/gsp/sequencer.rs |  18 +++--
 26 files changed, 396 insertions(+), 192 deletions(-)
---
base-commit: 5f7410aa26524101d34b627fbe16670b1514962c
change-id: 20260608-blackwell-fixes-30c9358c90a0

Best regards,
--  
Eliot Courtney <ecourtney@nvidia.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 17:11   ` Gary Guo
  2026-06-16  7:33   ` Alistair Popple
  2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, the FSP receive message code will try to allocate whatever
was sent without checking it at all. But the actual size allowed is
limited to 1024 anyway, so discard any messages over that size as bogus.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index 52cdb84ef0e8..e7419a6e71e2 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -35,6 +35,9 @@
 /// FSP message timeout in milliseconds.
 const FSP_MSG_TIMEOUT_MS: i64 = 2000;
 
+/// Size of the FSP EMEM channel 0 that we can use.
+const FSP_EMEM_CHANNEL_0_SIZE: usize = 1024;
+
 /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
 pub(crate) struct Fsp(());
 
@@ -149,23 +152,32 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
     /// Returns `ETIMEDOUT` if no message was available until timeout, or a regular error code if a
     /// memory allocation error occurred.
     pub(crate) fn recv_msg(&mut self, bar: Bar0<'_>) -> Result<KVec<u8>> {
-        let msg_size = read_poll_timeout(
-            || Ok(self.poll_msgq(bar)),
-            |&size| size > 0,
-            Delta::from_millis(10),
-            Delta::from_millis(FSP_MSG_TIMEOUT_MS),
-        )
-        .map(num::u32_as_usize)?;
+        let result = (|| {
+            let msg_size = read_poll_timeout(
+                || Ok(self.poll_msgq(bar)),
+                |&size| size > 0,
+                Delta::from_millis(10),
+                Delta::from_millis(FSP_MSG_TIMEOUT_MS),
+            )
+            .map(num::u32_as_usize)?;
 
-        let mut buffer = KVec::<u8>::new();
-        buffer.resize(msg_size, 0, GFP_KERNEL)?;
+            // Don't blindly allocate more than the maximum we expect from FSP.
+            if msg_size > FSP_EMEM_CHANNEL_0_SIZE {
+                return Err(EIO);
+            }
 
-        self.read_emem(bar, &mut buffer)?;
+            let mut buffer = KVec::<u8>::new();
+            buffer.resize(msg_size, 0, GFP_KERNEL)?;
 
-        // Reset message queue pointers after reading.
+            self.read_emem(bar, &mut buffer)?;
+
+            Ok(buffer)
+        })();
+
+        // Reset the message queue pointers regardless of outcome.
         bar.write(Array::at(0), regs::NV_PFSP_MSGQ_TAIL::zeroed().with_val(0));
         bar.write(Array::at(0), regs::NV_PFSP_MSGQ_HEAD::zeroed().with_val(0));
 
-        Ok(buffer)
+        result
     }
 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size
  2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
@ 2026-06-15 17:11   ` Gary Guo
  2026-06-16  7:33   ` Alistair Popple
  1 sibling, 0 replies; 24+ messages in thread
From: Gary Guo @ 2026-06-15 17:11 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> Currently, the FSP receive message code will try to allocate whatever
> was sent without checking it at all. But the actual size allowed is
> limited to 1024 anyway, so discard any messages over that size as bogus.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/gpu/nova-core/falcon/fsp.rs | 36 ++++++++++++++++++++++++------------
>  1 file changed, 24 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size
  2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
  2026-06-15 17:11   ` Gary Guo
@ 2026-06-16  7:33   ` Alistair Popple
  1 sibling, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2026-06-16  7:33 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, John Hubbard, Timur Tabi,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux

On 2026-06-16 at 00:40 +1000, Eliot Courtney <ecourtney@nvidia.com> wrote...
> Currently, the FSP receive message code will try to allocate whatever
> was sent without checking it at all. But the actual size allowed is
> limited to 1024 anyway, so discard any messages over that size as bogus.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

I've read through this and it seems reasonable to me, so:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> ---
>  drivers/gpu/nova-core/falcon/fsp.rs | 36 ++++++++++++++++++++++++------------
>  1 file changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index 52cdb84ef0e8..e7419a6e71e2 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> @@ -35,6 +35,9 @@
>  /// FSP message timeout in milliseconds.
>  const FSP_MSG_TIMEOUT_MS: i64 = 2000;
>  
> +/// Size of the FSP EMEM channel 0 that we can use.
> +const FSP_EMEM_CHANNEL_0_SIZE: usize = 1024;
> +
>  /// Type specifying the `Fsp` falcon engine. Cannot be instantiated.
>  pub(crate) struct Fsp(());
>  
> @@ -149,23 +152,32 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
>      /// Returns `ETIMEDOUT` if no message was available until timeout, or a regular error code if a
>      /// memory allocation error occurred.
>      pub(crate) fn recv_msg(&mut self, bar: Bar0<'_>) -> Result<KVec<u8>> {
> -        let msg_size = read_poll_timeout(
> -            || Ok(self.poll_msgq(bar)),
> -            |&size| size > 0,
> -            Delta::from_millis(10),
> -            Delta::from_millis(FSP_MSG_TIMEOUT_MS),
> -        )
> -        .map(num::u32_as_usize)?;
> +        let result = (|| {
> +            let msg_size = read_poll_timeout(
> +                || Ok(self.poll_msgq(bar)),
> +                |&size| size > 0,
> +                Delta::from_millis(10),
> +                Delta::from_millis(FSP_MSG_TIMEOUT_MS),
> +            )
> +            .map(num::u32_as_usize)?;
>  
> -        let mut buffer = KVec::<u8>::new();
> -        buffer.resize(msg_size, 0, GFP_KERNEL)?;
> +            // Don't blindly allocate more than the maximum we expect from FSP.
> +            if msg_size > FSP_EMEM_CHANNEL_0_SIZE {
> +                return Err(EIO);
> +            }
>  
> -        self.read_emem(bar, &mut buffer)?;
> +            let mut buffer = KVec::<u8>::new();
> +            buffer.resize(msg_size, 0, GFP_KERNEL)?;
>  
> -        // Reset message queue pointers after reading.
> +            self.read_emem(bar, &mut buffer)?;
> +
> +            Ok(buffer)
> +        })();
> +
> +        // Reset the message queue pointers regardless of outcome.
>          bar.write(Array::at(0), regs::NV_PFSP_MSGQ_TAIL::zeroed().with_val(0));
>          bar.write(Array::at(0), regs::NV_PFSP_MSGQ_HEAD::zeroed().with_val(0));
>  
> -        Ok(buffer)
> +        result
>      }
>  }
> 
> -- 
> 2.54.0
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
  2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 17:15   ` Gary Guo
  2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, `poll_msgq` will report a message of size 4 if the queue
pointers are broken. It's easy to catch this if it occurs, so have
`poll_msgq` return an error in this case.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index e7419a6e71e2..21eaa8e261ce 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -107,19 +107,22 @@ fn read_emem(&mut self, bar: Bar0<'_>, data: &mut [u8]) -> Result {
     /// Poll FSP for incoming data.
     ///
     /// Returns the size of available data in bytes, or 0 if no data is available.
+    /// Returns an error if the queue pointers are bogus (`tail < head`).
     ///
     /// The FSP message queue is not circular. Pointers are reset to 0 after each
     /// message exchange, so `tail >= head` is always true when data is present.
-    fn poll_msgq(&self, bar: Bar0<'_>) -> u32 {
+    fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
         let head = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
         let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
 
         if head == tail {
-            return 0;
+            Ok(0)
+        } else {
+            // TAIL points at the last DWORD written, so the size is `tail - head + 4`.
+            tail.checked_sub(head)
+                .and_then(|delta| delta.checked_add(4))
+                .ok_or(EIO)
         }
-
-        // TAIL points at last DWORD written, so add 4 to get total size.
-        tail.saturating_sub(head).saturating_add(4)
     }
 
     /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
@@ -154,7 +157,7 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
     pub(crate) fn recv_msg(&mut self, bar: Bar0<'_>) -> Result<KVec<u8>> {
         let result = (|| {
             let msg_size = read_poll_timeout(
-                || Ok(self.poll_msgq(bar)),
+                || self.poll_msgq(bar),
                 |&size| size > 0,
                 Delta::from_millis(10),
                 Delta::from_millis(FSP_MSG_TIMEOUT_MS),

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues
  2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
@ 2026-06-15 17:15   ` Gary Guo
  2026-06-16  7:57     ` Alistair Popple
  0 siblings, 1 reply; 24+ messages in thread
From: Gary Guo @ 2026-06-15 17:15 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> Currently, `poll_msgq` will report a message of size 4 if the queue
> pointers are broken. It's easy to catch this if it occurs, so have
> `poll_msgq` return an error in this case.
>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/fsp.rs | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index e7419a6e71e2..21eaa8e261ce 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> @@ -107,19 +107,22 @@ fn read_emem(&mut self, bar: Bar0<'_>, data: &mut [u8]) -> Result {
>      /// Poll FSP for incoming data.
>      ///
>      /// Returns the size of available data in bytes, or 0 if no data is available.
> +    /// Returns an error if the queue pointers are bogus (`tail < head`).
>      ///
>      /// The FSP message queue is not circular. Pointers are reset to 0 after each
>      /// message exchange, so `tail >= head` is always true when data is present.
> -    fn poll_msgq(&self, bar: Bar0<'_>) -> u32 {
> +    fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
>          let head = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
>          let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
>  
>          if head == tail {
> -            return 0;
> +            Ok(0)
> +        } else {
> +            // TAIL points at the last DWORD written, so the size is `tail - head + 4`.
> +            tail.checked_sub(head)
> +                .and_then(|delta| delta.checked_add(4))
> +                .ok_or(EIO)

Whenever we fail with this, we should print a message (actually, the same thing
probably should be done for patch 1 as well).

A plain EIO is going be very difficult to troubleshoot if this is ever hit.

Best,
Gary

>          }
> -
> -        // TAIL points at last DWORD written, so add 4 to get total size.
> -        tail.saturating_sub(head).saturating_add(4)
>      }
>  
>      /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
> @@ -154,7 +157,7 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
>      pub(crate) fn recv_msg(&mut self, bar: Bar0<'_>) -> Result<KVec<u8>> {
>          let result = (|| {
>              let msg_size = read_poll_timeout(
> -                || Ok(self.poll_msgq(bar)),
> +                || self.poll_msgq(bar),
>                  |&size| size > 0,
>                  Delta::from_millis(10),
>                  Delta::from_millis(FSP_MSG_TIMEOUT_MS),



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues
  2026-06-15 17:15   ` Gary Guo
@ 2026-06-16  7:57     ` Alistair Popple
  2026-06-16 10:57       ` Gary Guo
  0 siblings, 1 reply; 24+ messages in thread
From: Alistair Popple @ 2026-06-16  7:57 UTC (permalink / raw)
  To: Gary Guo
  Cc: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, John Hubbard,
	Timur Tabi, nova-gpu, dri-devel, linux-kernel, rust-for-linux

On 2026-06-16 at 03:15 +1000, Gary Guo <gary@garyguo.net> wrote...
> On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> > Currently, `poll_msgq` will report a message of size 4 if the queue
> > pointers are broken. It's easy to catch this if it occurs, so have
> > `poll_msgq` return an error in this case.
> >
> > Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> > ---
> >  drivers/gpu/nova-core/falcon/fsp.rs | 15 +++++++++------
> >  1 file changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> > index e7419a6e71e2..21eaa8e261ce 100644
> > --- a/drivers/gpu/nova-core/falcon/fsp.rs
> > +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> > @@ -107,19 +107,22 @@ fn read_emem(&mut self, bar: Bar0<'_>, data: &mut [u8]) -> Result {
> >      /// Poll FSP for incoming data.
> >      ///
> >      /// Returns the size of available data in bytes, or 0 if no data is available.
> > +    /// Returns an error if the queue pointers are bogus (`tail < head`).
> >      ///
> >      /// The FSP message queue is not circular. Pointers are reset to 0 after each
> >      /// message exchange, so `tail >= head` is always true when data is present.
> > -    fn poll_msgq(&self, bar: Bar0<'_>) -> u32 {
> > +    fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
> >          let head = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
> >          let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
> >  
> >          if head == tail {
> > -            return 0;
> > +            Ok(0)
> > +        } else {
> > +            // TAIL points at the last DWORD written, so the size is `tail - head + 4`.
> > +            tail.checked_sub(head)
> > +                .and_then(|delta| delta.checked_add(4))
> > +                .ok_or(EIO)
> 
> Whenever we fail with this, we should print a message (actually, the same thing
> probably should be done for patch 1 as well).
> 
> A plain EIO is going be very difficult to troubleshoot if this is ever hit.

I don't disagree with the sentiment - this is a problem through out the kernel
and I have spent way too long tracing where exactly error codes have come from
both in C and Rust.

But it seems odd to worry about these particular instances - they _should_
never happen or at least be extremely rare and very unlikely by an end-user.
More to the point though there are many other places in nova-core (and I'm
sure other drivers) where this pattern of just returning a fairly generic
error code exists. So it feels like it would be nicer to deal with this at some
other layer, eg. some kind of debug option to tag error codes with location
or something.

So I'm not opposed to the comment, but maybe it would be better addressed as a
separate question/patch series to figure out how to do this error reporting in a
more generic or consistent way across all of Nova at least?

 - Alistair

> Best,
> Gary
> 
> >          }
> > -
> > -        // TAIL points at last DWORD written, so add 4 to get total size.
> > -        tail.saturating_sub(head).saturating_add(4)
> >      }
> >  
> >      /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
> > @@ -154,7 +157,7 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
> >      pub(crate) fn recv_msg(&mut self, bar: Bar0<'_>) -> Result<KVec<u8>> {
> >          let result = (|| {
> >              let msg_size = read_poll_timeout(
> > -                || Ok(self.poll_msgq(bar)),
> > +                || self.poll_msgq(bar),
> >                  |&size| size > 0,
> >                  Delta::from_millis(10),
> >                  Delta::from_millis(FSP_MSG_TIMEOUT_MS),
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues
  2026-06-16  7:57     ` Alistair Popple
@ 2026-06-16 10:57       ` Gary Guo
  0 siblings, 0 replies; 24+ messages in thread
From: Gary Guo @ 2026-06-16 10:57 UTC (permalink / raw)
  To: Alistair Popple, Gary Guo
  Cc: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, John Hubbard,
	Timur Tabi, nova-gpu, dri-devel, linux-kernel, rust-for-linux

On Tue Jun 16, 2026 at 8:57 AM BST, Alistair Popple wrote:
> On 2026-06-16 at 03:15 +1000, Gary Guo <gary@garyguo.net> wrote...
>> On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
>> > Currently, `poll_msgq` will report a message of size 4 if the queue
>> > pointers are broken. It's easy to catch this if it occurs, so have
>> > `poll_msgq` return an error in this case.
>> >
>> > Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>> > ---
>> >  drivers/gpu/nova-core/falcon/fsp.rs | 15 +++++++++------
>> >  1 file changed, 9 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> > index e7419a6e71e2..21eaa8e261ce 100644
>> > --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> > +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>> > @@ -107,19 +107,22 @@ fn read_emem(&mut self, bar: Bar0<'_>, data: &mut [u8]) -> Result {
>> >      /// Poll FSP for incoming data.
>> >      ///
>> >      /// Returns the size of available data in bytes, or 0 if no data is available.
>> > +    /// Returns an error if the queue pointers are bogus (`tail < head`).
>> >      ///
>> >      /// The FSP message queue is not circular. Pointers are reset to 0 after each
>> >      /// message exchange, so `tail >= head` is always true when data is present.
>> > -    fn poll_msgq(&self, bar: Bar0<'_>) -> u32 {
>> > +    fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
>> >          let head = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
>> >          let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
>> >  
>> >          if head == tail {
>> > -            return 0;
>> > +            Ok(0)
>> > +        } else {
>> > +            // TAIL points at the last DWORD written, so the size is `tail - head + 4`.
>> > +            tail.checked_sub(head)
>> > +                .and_then(|delta| delta.checked_add(4))
>> > +                .ok_or(EIO)
>> 
>> Whenever we fail with this, we should print a message (actually, the same thing
>> probably should be done for patch 1 as well).
>> 
>> A plain EIO is going be very difficult to troubleshoot if this is ever hit.
>
> I don't disagree with the sentiment - this is a problem through out the kernel
> and I have spent way too long tracing where exactly error codes have come from
> both in C and Rust.
>
> But it seems odd to worry about these particular instances - they _should_
> never happen or at least be extremely rare and very unlikely by an end-user.

I think we should either consider it possible and add prints, or consider it a
bug (hardware or driver) and add a `WARN_ON`, or consider it impossible and not
add failure path at all.

> More to the point though there are many other places in nova-core (and I'm
> sure other drivers) where this pattern of just returning a fairly generic
> error code exists. So it feels like it would be nicer to deal with this at some
> other layer, eg. some kind of debug option to tag error codes with location
> or something.

This should happen whenever the information is lost. Creating a generic error
code would be such occasion. Handling it at upper layers is okay if the
information is kept for longer. For example:

    enum NovaError {
        ...
    }

    impl From<NovaError> for Error {
        fn from(err: NovaError) -> Self {
            // Print here
            EIO
        }
    }

Would be fine to me because the information is emitted to user when it is lost
by the concrete error -> error code conversion.

Best,
Gary

>
> So I'm not opposed to the comment, but maybe it would be better addressed as a
> separate question/patch series to figure out how to do this error reporting in a
> more generic or consistent way across all of Nova at least?
>
>  - Alistair

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
  2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
  2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 17:16   ` Gary Guo
  2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, `send_msg` assumes that the channel to FSP is free to write
into. But, it might not be. Both the kernel driver and GSP communicate
with FSP. The way they should attempt to keep exclusive access to this
channel to FSP is by making sure they don't try to start writing if
there's pending data until the full round trip has finished.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/falcon/fsp.rs | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
index 21eaa8e261ce..cdb476894e1a 100644
--- a/drivers/gpu/nova-core/falcon/fsp.rs
+++ b/drivers/gpu/nova-core/falcon/fsp.rs
@@ -125,6 +125,26 @@ fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
         }
     }
 
+    /// Both the kernel driver and GSP talk to FSP. Try to ensure exclusive access to the FSP is
+    /// enforced by making sure there is not a pending message already sent to FSP, and that there
+    /// is no pending message from FSP to be read.
+    fn wait_until_ready(&mut self, bar: Bar0<'_>) -> Result {
+        read_poll_timeout(
+            || {
+                let qhead = bar.read(regs::NV_PFSP_QUEUE_HEAD::at(0)).address();
+                let qtail = bar.read(regs::NV_PFSP_QUEUE_TAIL::at(0)).address();
+                let mhead = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
+                let mtail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
+
+                Ok(qhead == qtail && mhead == mtail)
+            },
+            |&ready| ready,
+            Delta::from_millis(10),
+            Delta::from_millis(FSP_MSG_TIMEOUT_MS),
+        )?;
+        Ok(())
+    }
+
     /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
     ///
     /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
@@ -133,6 +153,9 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
             return Err(EINVAL);
         }
 
+        // Try to make sure we have exclusive access to the FSP at this point.
+        self.wait_until_ready(bar)?;
+
         self.write_emem(bar, packet)?;
 
         // Update queue pointers. TAIL points at the last DWORD written.

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
  2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
@ 2026-06-15 17:16   ` Gary Guo
  0 siblings, 0 replies; 24+ messages in thread
From: Gary Guo @ 2026-06-15 17:16 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> Currently, `send_msg` assumes that the channel to FSP is free to write
> into. But, it might not be. Both the kernel driver and GSP communicate
> with FSP. The way they should attempt to keep exclusive access to this
> channel to FSP is by making sure they don't try to start writing if
> there's pending data until the full round trip has finished.
>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/fsp.rs | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index 21eaa8e261ce..cdb476894e1a 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> @@ -125,6 +125,26 @@ fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
>          }
>      }
>  
> +    /// Both the kernel driver and GSP talk to FSP. Try to ensure exclusive access to the FSP is
> +    /// enforced by making sure there is not a pending message already sent to FSP, and that there
> +    /// is no pending message from FSP to be read.
> +    fn wait_until_ready(&mut self, bar: Bar0<'_>) -> Result {
> +        read_poll_timeout(
> +            || {
> +                let qhead = bar.read(regs::NV_PFSP_QUEUE_HEAD::at(0)).address();
> +                let qtail = bar.read(regs::NV_PFSP_QUEUE_TAIL::at(0)).address();
> +                let mhead = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
> +                let mtail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();

How does this prevent race between kernel and GSP when initiating FSP
communcation?

Best,
Gary

> +
> +                Ok(qhead == qtail && mhead == mtail)
> +            },
> +            |&ready| ready,
> +            Delta::from_millis(10),
> +            Delta::from_millis(FSP_MSG_TIMEOUT_MS),
> +        )?;
> +        Ok(())
> +    }
> +
>      /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>      ///
>      /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
> @@ -133,6 +153,9 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
>              return Err(EINVAL);
>          }
>  
> +        // Try to make sure we have exclusive access to the FSP at this point.
> +        self.wait_until_ready(bar)?;
> +
>          self.write_emem(bar, packet)?;
>  
>          // Update queue pointers. TAIL points at the last DWORD written.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (2 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 17:17   ` Gary Guo
  2026-06-16  8:02   ` Alistair Popple
  2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Small cleanup to move these constants which are only used once closer to
their use location.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/falcon/gsp.rs | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
index d1f6f7fcffff..f788b87bd951 100644
--- a/drivers/gpu/nova-core/falcon/gsp.rs
+++ b/drivers/gpu/nova-core/falcon/gsp.rs
@@ -24,10 +24,6 @@
     regs,
 };
 
-/// Pattern returned by GSP register reads while the PRIV target mask still blocks CPU access.
-const GSP_TARGET_MASK_LOCKED_PATTERN: u32 = 0xbadf_4100;
-const GSP_TARGET_MASK_LOCKED_MASK: u32 = 0xffff_ff00;
-
 /// Type specifying the `Gsp` falcon engine. Cannot be instantiated.
 pub(crate) struct Gsp(());
 
@@ -70,10 +66,15 @@ pub(crate) fn riscv_branch_privilege_lockdown(&self, bar: Bar0<'_>) -> bool {
 
     /// Returns whether GSP registers can be read by the CPU.
     pub(crate) fn priv_target_mask_released(&self, bar: Bar0<'_>) -> bool {
+        /// Pattern returned by GSP register reads while the PRIV target mask still blocks CPU
+        /// access. The low byte varies; the upper 24 bits are fixed.
+        const LOCKED_PATTERN: u32 = 0xbadf_4100;
+        const LOCKED_MASK: u32 = 0xffff_ff00;
+
         let hwcfg2 = bar
             .read(regs::NV_PFALCON_FALCON_HWCFG2::of::<Gsp>())
             .into_raw();
 
-        hwcfg2 != 0 && (hwcfg2 & GSP_TARGET_MASK_LOCKED_MASK) != GSP_TARGET_MASK_LOCKED_PATTERN
+        hwcfg2 != 0 && (hwcfg2 & LOCKED_MASK) != LOCKED_PATTERN
     }
 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants
  2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
@ 2026-06-15 17:17   ` Gary Guo
  2026-06-16  8:02   ` Alistair Popple
  1 sibling, 0 replies; 24+ messages in thread
From: Gary Guo @ 2026-06-15 17:17 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> Small cleanup to move these constants which are only used once closer to
> their use location.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/gpu/nova-core/falcon/gsp.rs | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants
  2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
  2026-06-15 17:17   ` Gary Guo
@ 2026-06-16  8:02   ` Alistair Popple
  1 sibling, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2026-06-16  8:02 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, John Hubbard, Timur Tabi,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux

On 2026-06-16 at 00:40 +1000, Eliot Courtney <ecourtney@nvidia.com> wrote...
> Small cleanup to move these constants which are only used once closer to
> their use location.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/gsp.rs | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
> index d1f6f7fcffff..f788b87bd951 100644
> --- a/drivers/gpu/nova-core/falcon/gsp.rs
> +++ b/drivers/gpu/nova-core/falcon/gsp.rs
> @@ -24,10 +24,6 @@
>      regs,
>  };
>  
> -/// Pattern returned by GSP register reads while the PRIV target mask still blocks CPU access.
> -const GSP_TARGET_MASK_LOCKED_PATTERN: u32 = 0xbadf_4100;
> -const GSP_TARGET_MASK_LOCKED_MASK: u32 = 0xffff_ff00;
> -
>  /// Type specifying the `Gsp` falcon engine. Cannot be instantiated.
>  pub(crate) struct Gsp(());
>  
> @@ -70,10 +66,15 @@ pub(crate) fn riscv_branch_privilege_lockdown(&self, bar: Bar0<'_>) -> bool {
>  
>      /// Returns whether GSP registers can be read by the CPU.
>      pub(crate) fn priv_target_mask_released(&self, bar: Bar0<'_>) -> bool {
> +        /// Pattern returned by GSP register reads while the PRIV target mask still blocks CPU
> +        /// access. The low byte varies; the upper 24 bits are fixed.
> +        const LOCKED_PATTERN: u32 = 0xbadf_4100;
> +        const LOCKED_MASK: u32 = 0xffff_ff00;

Confirmed this error code doesn't appear to be used more generically across our
driver stack so agree it makes sense to localise it here.

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> +
>          let hwcfg2 = bar
>              .read(regs::NV_PFALCON_FALCON_HWCFG2::of::<Gsp>())
>              .into_raw();
>  
> -        hwcfg2 != 0 && (hwcfg2 & GSP_TARGET_MASK_LOCKED_MASK) != GSP_TARGET_MASK_LOCKED_PATTERN
> +        hwcfg2 != 0 && (hwcfg2 & LOCKED_MASK) != LOCKED_PATTERN
>      }
>  }
> 
> -- 
> 2.54.0
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (3 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 17:23   ` Gary Guo
  2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, if, for example `boot_fmc` fails, `FmcBootArgs` will be
dropped before the boot unload guard. But until everything is unloaded,
GSP may access this memory, so make sure it doesn't get deallocated.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 98f5ce197d13..b08761af89d3 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -162,16 +162,6 @@ fn boot<'a>(
     ) -> Result<BootUnloadGuard<'a>> {
         let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
-        let unload_bundle = crate::gsp::UnloadBundle(
-            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
-        );
-
-        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
-        let unload_guard =
-            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
-
-        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
-
         let args = FmcBootArgs::new(
             dev,
             chipset,
@@ -180,6 +170,16 @@ fn boot<'a>(
             false,
         )?;
 
+        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
+
+        let unload_bundle = crate::gsp::UnloadBundle(
+            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
+        );
+
+        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
+        let unload_guard =
+            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
+
         fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
         wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error
  2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
@ 2026-06-15 17:23   ` Gary Guo
  0 siblings, 0 replies; 24+ messages in thread
From: Gary Guo @ 2026-06-15 17:23 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> Currently, if, for example `boot_fmc` fails, `FmcBootArgs` will be
> dropped before the boot unload guard. But until everything is unloaded,
> GSP may access this memory, so make sure it doesn't get deallocated.

Hmm, this looks very weirld. `boot_fmc` only needs `&args` but it actually need
it for much longer?

This is hinting to me that the signature is wrong of the `boot_fmc` function is
wrong..

What is the exact lifetime requirement for GSP?

>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/hal/gh100.rs | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index 98f5ce197d13..b08761af89d3 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> @@ -162,16 +162,6 @@ fn boot<'a>(
>      ) -> Result<BootUnloadGuard<'a>> {
>          let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
>  
> -        let unload_bundle = crate::gsp::UnloadBundle(
> -            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
> -        );
> -
> -        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
> -        let unload_guard =
> -            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
> -
> -        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> -
>          let args = FmcBootArgs::new(
>              dev,
>              chipset,
> @@ -180,6 +170,16 @@ fn boot<'a>(
>              false,
>          )?;
>  
> +        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> +
> +        let unload_bundle = crate::gsp::UnloadBundle(
> +            KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>
> +        );
> +
> +        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
> +        let unload_guard =
> +            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
> +

This usual "usage beyond reference lifetime" needs to be at least explicitly
mentioned here.

Best,
Gary

>          fsp.boot_fmc(dev, bar, fb_layout, &args)?;
>  
>          wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (4 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 17:24   ` Gary Guo
  2026-06-15 14:40 ` [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations Eliot Courtney
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

`FspFirmware` is constructed and immediately passed into `Fsp`. It makes
sense for `Fsp` to ask to load its firmware, so move it there.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs           | 11 +++++++----
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  8 +-------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index d949c03dd304..4b97d1fb505e 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -31,9 +31,12 @@
         Falcon, //
     },
     fb::FbLayout,
-    firmware::fsp::{
-        FmcSignatures,
-        FspFirmware, //
+    firmware::{
+        fsp::{
+            FmcSignatures,
+            FspFirmware, //
+        },
+        FIRMWARE_VERSION, //
     },
     gpu::Chipset,
     gsp::GspFmcBootParams,
@@ -236,13 +239,13 @@ pub(crate) fn wait_secure_boot(
         dev: &device::Device<device::Bound>,
         bar: Bar0<'_>,
         chipset: Chipset,
-        fsp_fw: FspFirmware,
     ) -> Result<Fsp> {
         /// FSP secure boot completion timeout in milliseconds.
         const FSP_SECURE_BOOT_TIMEOUT_MS: i64 = 5000;
 
         let hal = hal::fsp_hal(chipset).ok_or(ENOTSUPP)?;
         let falcon = Falcon::<FspEngine>::new(dev, chipset)?;
+        let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
         read_poll_timeout(
             || Ok(hal.fsp_boot_status(bar)),
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index b08761af89d3..3e499563c9bc 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -18,10 +18,6 @@
         Falcon, //
     },
     fb::FbLayout,
-    firmware::{
-        fsp::FspFirmware,
-        FIRMWARE_VERSION, //
-    },
     fsp::{
         FmcBootArgs,
         Fsp, //
@@ -160,8 +156,6 @@ fn boot<'a>(
         gsp_falcon: &'a Falcon<GspEngine>,
         sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
-        let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
-
         let args = FmcBootArgs::new(
             dev,
             chipset,
@@ -170,7 +164,7 @@ fn boot<'a>(
             false,
         )?;
 
-        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
+        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
 
         let unload_bundle = crate::gsp::UnloadBundle(
             KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot
  2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
@ 2026-06-15 17:24   ` Gary Guo
  0 siblings, 0 replies; 24+ messages in thread
From: Gary Guo @ 2026-06-15 17:24 UTC (permalink / raw)
  To: Eliot Courtney, Danilo Krummrich, Alexandre Courbot, Alice Ryhl,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> `FspFirmware` is constructed and immediately passed into `Fsp`. It makes
> sense for `Fsp` to ask to load its firmware, so move it there.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/gpu/nova-core/fsp.rs           | 11 +++++++----
>  drivers/gpu/nova-core/gsp/hal/gh100.rs |  8 +-------
>  2 files changed, 8 insertions(+), 11 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (5 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 14:40 ` [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough Eliot Courtney
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, `FmcBootArgs` takes DMA handles directly, rather than
references to the `Coherent` for them. This is error prone, so instead
store lifetime'd references to the `Coherent` allocation.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs           | 32 ++++++++++++++++++++------------
 drivers/gpu/nova-core/gsp.rs           |  8 +++-----
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 19 +++++++------------
 3 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 4b97d1fb505e..3f3211eae4d0 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -39,7 +39,11 @@
         FIRMWARE_VERSION, //
     },
     gpu::Chipset,
-    gsp::GspFmcBootParams,
+    gsp::{
+        GspFmcBootParams,
+        GspFwWprMeta,
+        LibosMemoryRegionInitArgument, //
+    },
     mctp::{
         MctpHeader,
         NvdmHeader,
@@ -133,7 +137,7 @@ impl FspCotMessage {
     fn new<'a>(
         fb_layout: &FbLayout,
         fsp_fw: &'a FspFirmware,
-        args: &'a FmcBootArgs,
+        args: &'a FmcBootArgs<'_>,
     ) -> Result<impl Init<Self> + 'a> {
         // frts_vidmem_offset is measured from the end of FB, so FRTS sits at
         // (end of FB) - frts_vidmem_offset.
@@ -187,35 +191,39 @@ impl MessageToFsp for FspCotMessage {
 }
 
 /// Bundled arguments for FMC boot via FSP Chain of Trust.
-pub(crate) struct FmcBootArgs {
+pub(crate) struct FmcBootArgs<'a> {
     chipset: Chipset,
     fmc_boot_params: Coherent<GspFmcBootParams>,
     resume: bool,
+    // Additional dependencies required to be kept alive for FMC boot.
+    _wpr_meta: &'a Coherent<GspFwWprMeta>,
+    _libos: &'a Coherent<[LibosMemoryRegionInitArgument]>,
 }
 
-impl FmcBootArgs {
+impl<'a> FmcBootArgs<'a> {
     /// Builds FMC boot arguments, allocating the DMA-coherent boot parameter
     /// structure that FSP will read.
     pub(crate) fn new(
         dev: &device::Device<device::Bound>,
         chipset: Chipset,
-        wpr_meta_addr: u64,
-        libos_addr: u64,
+        wpr_meta: &'a Coherent<GspFwWprMeta>,
+        libos: &'a Coherent<[LibosMemoryRegionInitArgument]>,
         resume: bool,
     ) -> Result<Self> {
-        let init = GspFmcBootParams::new(wpr_meta_addr, libos_addr);
+        let init = GspFmcBootParams::new(wpr_meta.dma_handle(), libos.dma_handle());
 
         Ok(Self {
             chipset,
             fmc_boot_params: Coherent::<GspFmcBootParams>::init(dev, GFP_KERNEL, init)?,
             resume,
+            _wpr_meta: wpr_meta,
+            _libos: libos,
         })
     }
 
-    /// DMA address of the FMC boot parameters, needed after boot for lockdown
-    /// release polling.
-    pub(crate) fn boot_params_dma_handle(&self) -> u64 {
-        self.fmc_boot_params.dma_handle()
+    /// Returns the FMC boot parameters allocation.
+    pub(crate) fn boot_params(&self) -> &Coherent<GspFmcBootParams> {
+        &self.fmc_boot_params
     }
 }
 
@@ -332,7 +340,7 @@ pub(crate) fn boot_fmc(
         dev: &device::Device<device::Bound>,
         bar: Bar0<'_>,
         fb_layout: &FbLayout,
-        args: &FmcBootArgs,
+        args: &FmcBootArgs<'_>,
     ) -> Result {
         dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
 
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 385b4c09582b..b93c1fe8461e 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -28,16 +28,14 @@
 pub(crate) use fw::{
     GspFmcBootParams,
     GspFwWprMeta,
+    LibosMemoryRegionInitArgument,
     LibosParams, //
 };
 
 use crate::{
     gsp::cmdq::Cmdq,
-    gsp::fw::{
-        GspArgumentsPadded,
-        LibosMemoryRegionInitArgument, //
-    },
-    num,
+    gsp::fw::GspArgumentsPadded,
+    num, //
 };
 
 pub(crate) const GSP_PAGE_SHIFT: usize = 12;
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 3e499563c9bc..31498ae7abd2 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -30,6 +30,7 @@
             UnloadBundle, //
         },
         Gsp,
+        GspFmcBootParams,
         GspFwWprMeta, //
     },
 };
@@ -62,13 +63,13 @@ fn lockdown_released_or_error(
         &self,
         gsp_falcon: &Falcon<GspEngine>,
         bar: Bar0<'_>,
-        fmc_boot_params_addr: u64,
+        fmc_boot_params: &Coherent<GspFmcBootParams>,
     ) -> bool {
         // GSP-FMC normally clears the boot parameters address from the mailboxes early during
         // boot. If the address is still there, keep polling rather than treating it as an error.
         // Any other non-zero mailbox0 value is a GSP-FMC error code.
         if self.mbox0 != 0 {
-            return self.combined_addr() != fmc_boot_params_addr;
+            return self.combined_addr() != fmc_boot_params.dma_handle();
         }
 
         !gsp_falcon.riscv_branch_privilege_lockdown(bar)
@@ -80,7 +81,7 @@ fn wait_for_gsp_lockdown_release(
     dev: &device::Device<device::Bound>,
     bar: Bar0<'_>,
     gsp_falcon: &Falcon<GspEngine>,
-    fmc_boot_params_addr: u64,
+    fmc_boot_params: &Coherent<GspFmcBootParams>,
 ) -> Result {
     dev_dbg!(dev, "Waiting for GSP lockdown release\n");
 
@@ -95,7 +96,7 @@ fn wait_for_gsp_lockdown_release(
         },
         |mbox| match mbox {
             None => false,
-            Some(mbox) => mbox.lockdown_released_or_error(gsp_falcon, bar, fmc_boot_params_addr),
+            Some(mbox) => mbox.lockdown_released_or_error(gsp_falcon, bar, fmc_boot_params),
         },
         Delta::from_millis(10),
         Delta::from_secs(30),
@@ -156,13 +157,7 @@ fn boot<'a>(
         gsp_falcon: &'a Falcon<GspEngine>,
         sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
-        let args = FmcBootArgs::new(
-            dev,
-            chipset,
-            wpr_meta.dma_handle(),
-            gsp.libos.dma_handle(),
-            false,
-        )?;
+        let args = FmcBootArgs::new(dev, chipset, wpr_meta, &gsp.libos, false)?;
 
         let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
 
@@ -176,7 +171,7 @@ fn boot<'a>(
 
         fsp.boot_fmc(dev, bar, fb_layout, &args)?;
 
-        wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params_dma_handle())?;
+        wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params())?;
 
         Ok(unload_guard)
     }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (6 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 14:40 ` [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier Eliot Courtney
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, `GspSequencer` stores a raw DMA handle. Instead, store a
reference to `Coherent` to statically ensure that the allocation lives
long enough.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/hal/tu102.rs |  2 +-
 drivers/gpu/nova-core/gsp/sequencer.rs | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index eb7166148cc9..1e08c482fd39 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -344,7 +344,7 @@ fn post_boot(
         // Create and run the GSP sequencer.
         let seq_params = GspSequencerParams {
             bootloader_app_version: gsp_fw.bootloader.app_version,
-            libos_dma_handle: gsp.libos.dma_handle(),
+            libos: &gsp.libos,
             gsp_falcon,
             sec2_falcon,
             dev,
diff --git a/drivers/gpu/nova-core/gsp/sequencer.rs b/drivers/gpu/nova-core/gsp/sequencer.rs
index e0850d21adca..ddb6abc45a37 100644
--- a/drivers/gpu/nova-core/gsp/sequencer.rs
+++ b/drivers/gpu/nova-core/gsp/sequencer.rs
@@ -6,6 +6,7 @@
 
 use kernel::{
     device,
+    dma::Coherent,
     io::{
         poll::read_poll_timeout,
         Io, //
@@ -31,6 +32,7 @@
             MessageFromGsp, //
         },
         fw,
+        LibosMemoryRegionInitArgument, //
     },
     num::FromSafeCast,
     sbuffer::SBufferIter,
@@ -136,8 +138,8 @@ pub(crate) struct GspSequencer<'a> {
     sec2_falcon: &'a Falcon<Sec2>,
     /// GSP falcon for core operations.
     gsp_falcon: &'a Falcon<Gsp>,
-    /// LibOS DMA handle address.
-    libos_dma_handle: u64,
+    /// LibOS memory region init arguments.
+    libos: &'a Coherent<[LibosMemoryRegionInitArgument]>,
     /// Bootloader application version.
     bootloader_app_version: u32,
     /// Device for logging.
@@ -233,11 +235,13 @@ fn run(&self, seq: &GspSequencer<'_>) -> Result {
                 // Reset the GSP to prepare it for resuming.
                 seq.gsp_falcon.reset(seq.bar)?;
 
+                let libos_dma_handle = seq.libos.dma_handle();
+
                 // Write the libOS DMA handle to GSP mailboxes.
                 seq.gsp_falcon.write_mailboxes(
                     seq.bar,
-                    Some(seq.libos_dma_handle as u32),
-                    Some((seq.libos_dma_handle >> 32) as u32),
+                    Some(libos_dma_handle as u32),
+                    Some((libos_dma_handle >> 32) as u32),
                 );
 
                 // Start the SEC2 falcon which will trigger GSP-RM to resume on the GSP.
@@ -342,8 +346,8 @@ fn iter(&self) -> GspSeqIter<'_> {
 pub(crate) struct GspSequencerParams<'a> {
     /// Bootloader application version.
     pub(crate) bootloader_app_version: u32,
-    /// LibOS DMA handle address.
-    pub(crate) libos_dma_handle: u64,
+    /// LibOS memory region init arguments.
+    pub(crate) libos: &'a Coherent<[LibosMemoryRegionInitArgument]>,
     /// GSP falcon for core operations.
     pub(crate) gsp_falcon: &'a Falcon<Gsp>,
     /// SEC2 falcon for core operations.
@@ -369,7 +373,7 @@ pub(crate) fn run(cmdq: &Cmdq, params: GspSequencerParams<'a>) -> Result {
             bar: params.bar,
             sec2_falcon: params.sec2_falcon,
             gsp_falcon: params.gsp_falcon,
-            libos_dma_handle: params.libos_dma_handle,
+            libos: params.libos,
             bootloader_app_version: params.bootloader_app_version,
             dev: params.dev,
         };

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (7 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 14:40 ` [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions Eliot Courtney
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

For GPU architectures that use FSP CoT boot, ensure that FSP itself is
booted before trying to use it. In particular, accessing registers like
`NV_USABLE_FB_SIZE_IN_MB` for `FbHal::vidmem_size` should happen after
FSP is booted. Currently, we wait for FSP boot too late. So, move this
wait to a new preboot phase.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs           | 40 +++++++++++++++-------------------
 drivers/gpu/nova-core/fsp/hal.rs       |  4 ++--
 drivers/gpu/nova-core/fsp/hal/gb100.rs |  4 ++--
 drivers/gpu/nova-core/fsp/hal/gb202.rs |  4 ++--
 drivers/gpu/nova-core/fsp/hal/gh100.rs | 10 ++++-----
 drivers/gpu/nova-core/gpu.rs           |  7 +++---
 drivers/gpu/nova-core/gpu/hal.rs       |  6 +++--
 drivers/gpu/nova-core/gpu/hal/gh100.rs | 10 ++++++---
 drivers/gpu/nova-core/gpu/hal/tu102.rs |  3 ++-
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  2 +-
 10 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 3f3211eae4d0..bf0baa5ac4ae 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -49,8 +49,7 @@
         NvdmHeader,
         NvdmType, //
     },
-    num,
-    regs, //
+    num, //
 };
 
 mod hal;
@@ -229,41 +228,36 @@ pub(crate) fn boot_params(&self) -> &Coherent<GspFmcBootParams> {
 
 /// FSP interface for Hopper/Blackwell GPUs.
 ///
-/// An `Fsp` is produced by [`Fsp::wait_secure_boot`], which only returns once FSP secure boot
-/// has completed. It owns the FSP falcon and the FMC firmware, which are used for the subsequent
-/// Chain of Trust boot.
+/// It owns the FSP falcon and the FMC firmware, which are used for the subsequent Chain of Trust
+/// boot.
 pub(crate) struct Fsp {
     falcon: Falcon<FspEngine>,
     fsp_fw: FspFirmware,
 }
 
 impl Fsp {
-    /// Waits for FSP secure boot completion, then returns the [`Fsp`] interface.
-    ///
-    /// Polls the thermal scratch register until FSP signals boot completion or the timeout
-    /// elapses. Returning an [`Fsp`] only on success guarantees, at the API level, that the
-    /// interface is not used before secure boot has completed.
-    pub(crate) fn wait_secure_boot(
-        dev: &device::Device<device::Bound>,
-        bar: Bar0<'_>,
-        chipset: Chipset,
-    ) -> Result<Fsp> {
+    /// Waits for FSP secure boot completion. This must be called before trying to create the `Fsp`
+    /// interface or read any registers dependent on FSP boot completion.
+    pub(crate) fn wait_for_secure_boot(bar: Bar0<'_>, chipset: Chipset) -> Result {
         /// FSP secure boot completion timeout in milliseconds.
         const FSP_SECURE_BOOT_TIMEOUT_MS: i64 = 5000;
 
         let hal = hal::fsp_hal(chipset).ok_or(ENOTSUPP)?;
-        let falcon = Falcon::<FspEngine>::new(dev, chipset)?;
-        let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
         read_poll_timeout(
-            || Ok(hal.fsp_boot_status(bar)),
-            |&status| status == regs::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS,
+            || Ok(hal.fsp_boot_done(bar)),
+            |&done| done,
             Delta::from_millis(10),
             Delta::from_millis(FSP_SECURE_BOOT_TIMEOUT_MS),
-        )
-        .inspect_err(|e| {
-            dev_err!(dev, "FSP secure boot completion error: {:?}\n", e);
-        })?;
+        )?;
+
+        Ok(())
+    }
+
+    /// Creates an FSP interface.
+    pub(crate) fn new(dev: &device::Device<device::Bound>, chipset: Chipset) -> Result<Self> {
+        let falcon = Falcon::<FspEngine>::new(dev, chipset)?;
+        let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
         Ok(Fsp { falcon, fsp_fw })
     }
diff --git a/drivers/gpu/nova-core/fsp/hal.rs b/drivers/gpu/nova-core/fsp/hal.rs
index b6f2624bb13d..7c5a7e61835c 100644
--- a/drivers/gpu/nova-core/fsp/hal.rs
+++ b/drivers/gpu/nova-core/fsp/hal.rs
@@ -14,8 +14,8 @@
 mod gh100;
 
 pub(super) trait FspHal {
-    /// Returns the secure boot status from the architecture-specific `NV_THERM_I2CS_SCRATCH` register.
-    fn fsp_boot_status(&self, bar: Bar0<'_>) -> u32;
+    /// Returns whether FSP secure boot is done.
+    fn fsp_boot_done(&self, bar: Bar0<'_>) -> bool;
 
     /// Returns the FSP Chain of Trust protocol version this chipset advertises.
     fn cot_version(&self) -> u16;
diff --git a/drivers/gpu/nova-core/fsp/hal/gb100.rs b/drivers/gpu/nova-core/fsp/hal/gb100.rs
index 42f5ecfc6400..a95b2dde2a04 100644
--- a/drivers/gpu/nova-core/fsp/hal/gb100.rs
+++ b/drivers/gpu/nova-core/fsp/hal/gb100.rs
@@ -9,9 +9,9 @@
 struct Gb100;
 
 impl FspHal for Gb100 {
-    fn fsp_boot_status(&self, bar: Bar0<'_>) -> u32 {
+    fn fsp_boot_done(&self, bar: Bar0<'_>) -> bool {
         // GB10x shares Hopper's FSP secure boot status register.
-        super::gh100::fsp_boot_status_gh100(bar)
+        super::gh100::fsp_boot_done_gh100(bar)
     }
 
     fn cot_version(&self) -> u16 {
diff --git a/drivers/gpu/nova-core/fsp/hal/gb202.rs b/drivers/gpu/nova-core/fsp/hal/gb202.rs
index 1091b169a645..a3010717c57d 100644
--- a/drivers/gpu/nova-core/fsp/hal/gb202.rs
+++ b/drivers/gpu/nova-core/fsp/hal/gb202.rs
@@ -12,10 +12,10 @@
 struct Gb202;
 
 impl FspHal for Gb202 {
-    fn fsp_boot_status(&self, bar: Bar0<'_>) -> u32 {
+    fn fsp_boot_done(&self, bar: Bar0<'_>) -> bool {
         bar.read(regs::gb202::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE)
             .fsp_boot_complete()
-            .into()
+            == regs::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS
     }
 
     fn cot_version(&self) -> u16 {
diff --git a/drivers/gpu/nova-core/fsp/hal/gh100.rs b/drivers/gpu/nova-core/fsp/hal/gh100.rs
index 291acaf2845a..a440b68205e2 100644
--- a/drivers/gpu/nova-core/fsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/fsp/hal/gh100.rs
@@ -11,16 +11,16 @@
 
 struct Gh100;
 
-/// Reads the FSP secure boot status from the Hopper/GB10x thermal scratch register.
-pub(super) fn fsp_boot_status_gh100(bar: Bar0<'_>) -> u32 {
+/// Returns whether FSP secure boot is done on Hopper/GB10x.
+pub(super) fn fsp_boot_done_gh100(bar: Bar0<'_>) -> bool {
     bar.read(regs::gh100::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE)
         .fsp_boot_complete()
-        .into()
+        == regs::NV_THERM_I2CS_SCRATCH_FSP_BOOT_COMPLETE_STATUS_SUCCESS
 }
 
 impl FspHal for Gh100 {
-    fn fsp_boot_status(&self, bar: Bar0<'_>) -> u32 {
-        fsp_boot_status_gh100(bar)
+    fn fsp_boot_done(&self, bar: Bar0<'_>) -> bool {
+        fsp_boot_done_gh100(bar)
     }
 
     fn cot_version(&self) -> u16 {
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index b3c91731db45..ca37892c3b38 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -295,7 +295,8 @@ pub(crate) fn new(
                 dev_info!(pdev,"NVIDIA ({})\n", spec);
             })?,
 
-            // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
+            // We must wait for some architecture specific setup to complete before doing any
+            // significant setup on the GPU.
             _: {
                 let hal = hal::gpu_hal(spec.chipset);
                 let dma_mask = hal.dma_mask();
@@ -304,8 +305,8 @@ pub(crate) fn new(
                 // still constructing it, so no concurrent DMA allocations can exist.
                 unsafe { pdev.dma_set_mask_and_coherent(dma_mask)? };
 
-                hal.wait_gfw_boot_completion(bar)
-                    .inspect_err(|_| dev_err!(pdev, "GFW boot did not complete\n"))?;
+                hal.wait_preboot_completion(bar, spec.chipset)
+                    .inspect_err(|_| dev_err!(pdev, "preboot firmware did not complete\n"))?;
             },
 
             sysmem_flush: SysmemFlush::register(pdev.as_ref(), bar, spec.chipset)?,
diff --git a/drivers/gpu/nova-core/gpu/hal.rs b/drivers/gpu/nova-core/gpu/hal.rs
index 3f25882d0e56..232f073ccc06 100644
--- a/drivers/gpu/nova-core/gpu/hal.rs
+++ b/drivers/gpu/nova-core/gpu/hal.rs
@@ -19,8 +19,10 @@
 mod tu102;
 
 pub(crate) trait GpuHal {
-    /// Waits for GFW_BOOT completion if required by this hardware family.
-    fn wait_gfw_boot_completion(&self, bar: Bar0<'_>) -> Result;
+    /// Waits for architecture specific operations to complete before we can try to boot the GSP.
+    /// For example, may wait on GFW_BOOT completion or FSP secure boot completion, depending on the
+    /// architecture.
+    fn wait_preboot_completion(&self, bar: Bar0<'_>, chipset: Chipset) -> Result;
 
     /// Returns the DMA mask for the current architecture.
     fn dma_mask(&self) -> DmaMask;
diff --git a/drivers/gpu/nova-core/gpu/hal/gh100.rs b/drivers/gpu/nova-core/gpu/hal/gh100.rs
index e3f8ba0fab33..3aa18feec1f7 100644
--- a/drivers/gpu/nova-core/gpu/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gpu/hal/gh100.rs
@@ -7,15 +7,19 @@
     prelude::*, //
 };
 
-use crate::driver::Bar0;
+use crate::{
+    driver::Bar0,
+    fsp::Fsp,
+    gpu::Chipset, //
+};
 
 use super::GpuHal;
 
 struct Gh100;
 
 impl GpuHal for Gh100 {
-    fn wait_gfw_boot_completion(&self, _bar: Bar0<'_>) -> Result {
-        Ok(())
+    fn wait_preboot_completion(&self, bar: Bar0<'_>, chipset: Chipset) -> Result {
+        Fsp::wait_for_secure_boot(bar, chipset)
     }
 
     fn dma_mask(&self) -> DmaMask {
diff --git a/drivers/gpu/nova-core/gpu/hal/tu102.rs b/drivers/gpu/nova-core/gpu/hal/tu102.rs
index b0732e53edea..34b63a7c0ada 100644
--- a/drivers/gpu/nova-core/gpu/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gpu/hal/tu102.rs
@@ -32,6 +32,7 @@
 
 use crate::{
     driver::Bar0,
+    gpu::Chipset,
     regs, //
 };
 
@@ -55,7 +56,7 @@ impl GpuHal for Tu102 {
     /// This function waits for a signal indicating that core initialization is complete. Before
     /// this signal is received, little can be done with the GPU. This signal is set by the FWSEC
     /// running on the GSP in Heavy-secured mode.
-    fn wait_gfw_boot_completion(&self, bar: Bar0<'_>) -> Result {
+    fn wait_preboot_completion(&self, bar: Bar0<'_>, _chipset: Chipset) -> Result {
         // Before accessing the completion status in `NV_PGC6_AON_SECURE_SCRATCH_GROUP_05`, we must
         // first check `NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK`. This is because
         // `NV_PGC6_AON_SECURE_SCRATCH_GROUP_05` becomes accessible only after the secure firmware
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 31498ae7abd2..35554d92fda9 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -159,7 +159,7 @@ fn boot<'a>(
     ) -> Result<BootUnloadGuard<'a>> {
         let args = FmcBootArgs::new(dev, chipset, wpr_meta, &gsp.libos, false)?;
 
-        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset)?;
+        let mut fsp = Fsp::new(dev, chipset)?;
 
         let unload_bundle = crate::gsp::UnloadBundle(
             KBox::new(FspUnloadBundle, GFP_KERNEL)? as KBox<dyn UnloadBundle>

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (8 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 14:40 ` [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation Eliot Courtney
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

`FbLayout` is currently used for both pre and post FSP architectures. It
contains ranges for each region of framebuffer, but on post FSP
architectures, only the size is actually used by GSP. The offsets are
not decided by the driver. So, for post FSP architectures `FbLayout`
contains essentially guesses for the offsets. Instead, make separate
types so that we only store the information that's actually needed,
rather than keeping around offsets that may not be correct.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs            | 69 +++++++++++++++++++++---
 drivers/gpu/nova-core/fsp.rs           | 15 +++---
 drivers/gpu/nova-core/gsp/boot.rs      | 10 ++--
 drivers/gpu/nova-core/gsp/fw.rs        | 95 ++++++++++++++++++++++++++--------
 drivers/gpu/nova-core/gsp/hal.rs       |  4 +-
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 10 ++--
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 24 +++++----
 7 files changed, 170 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index 725e428154cf..facecb8b411f 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -144,11 +144,29 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
     }
 }
 
-/// Layout of the GPU framebuffer memory.
-///
-/// Contains ranges of GPU memory reserved for a given purpose during the GSP boot process.
+/// Framebuffer information required for GSP boot.
 #[derive(Debug)]
-pub(crate) struct FbLayout {
+pub(crate) enum GspFbInfo {
+    /// Concrete framebuffer ranges for host computed framebuffer layout.
+    Ranges(FbRanges),
+    /// Sizes of framebuffer ranges for GSP-FMC computed ranges.
+    Sizes(FbSizes),
+}
+
+impl GspFbInfo {
+    /// Computes the framebuffer region information required for boot.
+    pub(crate) fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
+        if chipset.uses_fsp() {
+            FbSizes::new(chipset, bar).map(Self::Sizes)
+        } else {
+            FbRanges::new(chipset, bar, gsp_fw).map(Self::Ranges)
+        }
+    }
+}
+
+/// Framebuffer ranges needed for GSP boot process.
+#[derive(Debug)]
+pub(crate) struct FbRanges {
     /// Range of the framebuffer. Starts at `0`.
     pub(crate) fb: FbRange,
     /// VGA workspace, small area of reserved memory at the end of the framebuffer.
@@ -163,15 +181,17 @@ pub(crate) struct FbLayout {
     pub(crate) wpr2_heap: FbRange,
     /// WPR2 region range, starting with an instance of `GspFwWprMeta`.
     pub(crate) wpr2: FbRange,
+    /// Non-WPR heap, located just below WPR2.
     pub(crate) heap: FbRange,
+    /// Number of VF partitions.
     pub(crate) vf_partition_count: u8,
     /// PMU reserved memory size, in bytes.
     pub(crate) pmu_reserved_size: u32,
 }
 
-impl FbLayout {
-    /// Computes the FB layout for `chipset` required to run the `gsp_fw` GSP firmware.
-    pub(crate) fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
+impl FbRanges {
+    /// Computes concrete framebuffer ranges required on non-FSP booting architectures.
+    fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
         let hal = hal::fb_hal(chipset);
 
         let fb = {
@@ -270,3 +290,38 @@ pub(crate) fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Resu
         })
     }
 }
+
+/// Framebuffer region sizes needed for GSP-FMC boot.
+#[derive(Debug)]
+pub(crate) struct FbSizes {
+    /// VGA workspace size, in bytes.
+    pub(crate) vga_workspace_size: u64,
+    /// FRTS size, in bytes.
+    pub(crate) frts_size: u64,
+    /// WPR2 heap size, in bytes.
+    pub(crate) wpr2_heap_size: u64,
+    /// Non-WPR heap size, in bytes.
+    pub(crate) heap_size: u64,
+    /// PMU reserved memory size, in bytes.
+    pub(crate) pmu_reserved_size: u32,
+    /// Number of VF partitions.
+    pub(crate) vf_partition_count: u8,
+}
+
+impl FbSizes {
+    /// Computes the framebuffer region sizes for GSP-FMC boot.
+    fn new(chipset: Chipset, bar: Bar0<'_>) -> Result<Self> {
+        let hal = hal::fb_hal(chipset);
+        let fb_size = hal.vidmem_size(bar);
+
+        Ok(Self {
+            vga_workspace_size: u64::SZ_128K,
+            frts_size: hal.frts_size(),
+            wpr2_heap_size: gsp::LibosParams::from_chipset(chipset)
+                .wpr_heap_size(chipset, fb_size)?,
+            heap_size: u64::from(hal.non_wpr_heap_size()),
+            pmu_reserved_size: hal.pmu_reserved_size(),
+            vf_partition_count: 0,
+        })
+    }
+}
diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index bf0baa5ac4ae..6778e5546cee 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -30,7 +30,7 @@
         fsp::Fsp as FspEngine,
         Falcon, //
     },
-    fb::FbLayout,
+    fb::FbSizes,
     firmware::{
         fsp::{
             FmcSignatures,
@@ -134,14 +134,14 @@ struct FspCotMessage {
 impl FspCotMessage {
     /// Returns an in-place initializer for [`FspCotMessage`].
     fn new<'a>(
-        fb_layout: &FbLayout,
+        fb_info: &FbSizes,
         fsp_fw: &'a FspFirmware,
         args: &'a FmcBootArgs<'_>,
     ) -> Result<impl Init<Self> + 'a> {
         // frts_vidmem_offset is measured from the end of FB, so FRTS sits at
         // (end of FB) - frts_vidmem_offset.
         let frts_vidmem_offset = if !args.resume {
-            let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
+            let frts_reserved_size = fb_info.heap_size + u64::from(fb_info.pmu_reserved_size);
 
             frts_reserved_size
                 .align_up(Alignment::new::<SZ_2M>())
@@ -151,7 +151,7 @@ fn new<'a>(
         };
 
         let frts_size: u32 = if !args.resume {
-            fb_layout.frts.len().try_into()?
+            fb_info.frts_size.try_into()?
         } else {
             0
         };
@@ -333,15 +333,12 @@ pub(crate) fn boot_fmc(
         &mut self,
         dev: &device::Device<device::Bound>,
         bar: Bar0<'_>,
-        fb_layout: &FbLayout,
+        fb_info: &FbSizes,
         args: &FmcBootArgs<'_>,
     ) -> Result {
         dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
 
-        let msg = KBox::init(
-            FspCotMessage::new(fb_layout, &self.fsp_fw, args)?,
-            GFP_KERNEL,
-        )?;
+        let msg = KBox::init(FspCotMessage::new(fb_info, &self.fsp_fw, args)?, GFP_KERNEL)?;
 
         let _response_buf = self.send_sync_fsp(dev, bar, &*msg)?;
 
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 8afb62d689cb..670a94545b61 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -19,7 +19,7 @@
         sec2::Sec2,
         Falcon, //
     },
-    fb::FbLayout,
+    fb::GspFbInfo,
     firmware::{
         gsp::GspFirmware,
         FIRMWARE_VERSION, //
@@ -114,10 +114,10 @@ pub(crate) fn boot(
 
         let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
 
-        let fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
-        dev_dbg!(dev, "{:#x?}\n", fb_layout);
+        let fb_info = GspFbInfo::new(chipset, bar, &gsp_fw)?;
+        dev_dbg!(dev, "{:#x?}\n", fb_info);
 
-        let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
+        let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_info))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
         let unload_guard = hal.boot(
@@ -125,7 +125,7 @@ pub(crate) fn boot(
             dev,
             bar,
             chipset,
-            &fb_layout,
+            &fb_info,
             &wpr_meta,
             gsp_falcon,
             sec2_falcon,
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 4db0cfa4dc4d..042b0122e98d 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -28,7 +28,7 @@
 };
 
 use crate::{
-    fb::FbLayout,
+    fb::GspFbInfo,
     firmware::gsp::GspFirmware,
     gpu::{
         Architecture,
@@ -214,11 +214,65 @@ unsafe impl FromBytes for GspFwWprMeta {}
 
 impl GspFwWprMeta {
     /// Returns an initializer for a `GspFwWprMeta` suitable for booting `gsp_firmware` using the
-    /// `fb_layout` layout.
+    /// framebuffer information.
     pub(crate) fn new<'a>(
         gsp_firmware: &'a GspFirmware,
-        fb_layout: &'a FbLayout,
+        fb_info: &'a GspFbInfo,
     ) -> impl Init<Self> + 'a {
+        #[derive(Default)]
+        struct WprMetaFields {
+            gsp_fw_rsvd_start: u64,
+            non_wpr_heap_offset: u64,
+            non_wpr_heap_size: u64,
+            gsp_fw_wpr_start: u64,
+            gsp_fw_heap_offset: u64,
+            gsp_fw_heap_size: u64,
+            gsp_fw_offset: u64,
+            boot_bin_offset: u64,
+            frts_offset: u64,
+            frts_size: u64,
+            gsp_fw_wpr_end: u64,
+            gsp_fw_heap_vf_partition_count: u8,
+            fb_size: u64,
+            vga_workspace_offset: u64,
+            vga_workspace_size: u64,
+            pmu_reserved_size: u32,
+        }
+
+        let fields = match fb_info {
+            GspFbInfo::Ranges(ranges) => WprMetaFields {
+                gsp_fw_rsvd_start: ranges.heap.start,
+                non_wpr_heap_offset: ranges.heap.start,
+                non_wpr_heap_size: ranges.heap.len(),
+                gsp_fw_wpr_start: ranges.wpr2.start,
+                gsp_fw_heap_offset: ranges.wpr2_heap.start,
+                gsp_fw_heap_size: ranges.wpr2_heap.len(),
+                gsp_fw_offset: ranges.elf.start,
+                boot_bin_offset: ranges.boot.start,
+                frts_offset: ranges.frts.start,
+                frts_size: ranges.frts.len(),
+                gsp_fw_wpr_end: ranges
+                    .vga_workspace
+                    .start
+                    .align_down(Alignment::new::<SZ_128K>()),
+                gsp_fw_heap_vf_partition_count: ranges.vf_partition_count,
+                fb_size: ranges.fb.len(),
+                vga_workspace_offset: ranges.vga_workspace.start,
+                vga_workspace_size: ranges.vga_workspace.len(),
+                pmu_reserved_size: ranges.pmu_reserved_size,
+            },
+            GspFbInfo::Sizes(sizes) => WprMetaFields {
+                non_wpr_heap_size: sizes.heap_size,
+                gsp_fw_heap_size: sizes.wpr2_heap_size,
+                frts_size: sizes.frts_size,
+                gsp_fw_heap_vf_partition_count: sizes.vf_partition_count,
+                vga_workspace_size: sizes.vga_workspace_size,
+                pmu_reserved_size: sizes.pmu_reserved_size,
+                // When only sizes are supplied, offsets and several other parameters are not used.
+                ..Default::default()
+            },
+        };
+
         #[allow(non_snake_case)]
         let init_inner = init!(bindings::GspFwWprMeta {
             // CAST: we want to store the bits of `GSP_FW_WPR_META_MAGIC` unmodified.
@@ -237,25 +291,22 @@ pub(crate) fn new<'a>(
                     sizeOfSignature: u64::from_safe_cast(gsp_firmware.signatures.size()),
                 },
             },
-            gspFwRsvdStart: fb_layout.heap.start,
-            nonWprHeapOffset: fb_layout.heap.start,
-            nonWprHeapSize: fb_layout.heap.end - fb_layout.heap.start,
-            gspFwWprStart: fb_layout.wpr2.start,
-            gspFwHeapOffset: fb_layout.wpr2_heap.start,
-            gspFwHeapSize: fb_layout.wpr2_heap.end - fb_layout.wpr2_heap.start,
-            gspFwOffset: fb_layout.elf.start,
-            bootBinOffset: fb_layout.boot.start,
-            frtsOffset: fb_layout.frts.start,
-            frtsSize: fb_layout.frts.end - fb_layout.frts.start,
-            gspFwWprEnd: fb_layout
-                .vga_workspace
-                .start
-                .align_down(Alignment::new::<SZ_128K>()),
-            gspFwHeapVfPartitionCount: fb_layout.vf_partition_count,
-            fbSize: fb_layout.fb.end - fb_layout.fb.start,
-            vgaWorkspaceOffset: fb_layout.vga_workspace.start,
-            vgaWorkspaceSize: fb_layout.vga_workspace.end - fb_layout.vga_workspace.start,
-            pmuReservedSize: fb_layout.pmu_reserved_size,
+            gspFwRsvdStart: fields.gsp_fw_rsvd_start,
+            nonWprHeapOffset: fields.non_wpr_heap_offset,
+            nonWprHeapSize: fields.non_wpr_heap_size,
+            gspFwWprStart: fields.gsp_fw_wpr_start,
+            gspFwHeapOffset: fields.gsp_fw_heap_offset,
+            gspFwHeapSize: fields.gsp_fw_heap_size,
+            gspFwOffset: fields.gsp_fw_offset,
+            bootBinOffset: fields.boot_bin_offset,
+            frtsOffset: fields.frts_offset,
+            frtsSize: fields.frts_size,
+            gspFwWprEnd: fields.gsp_fw_wpr_end,
+            gspFwHeapVfPartitionCount: fields.gsp_fw_heap_vf_partition_count,
+            fbSize: fields.fb_size,
+            vgaWorkspaceOffset: fields.vga_workspace_offset,
+            vgaWorkspaceSize: fields.vga_workspace_size,
+            pmuReservedSize: fields.pmu_reserved_size,
             ..Zeroable::init_zeroed()
         });
 
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 04f004856c60..aa41da9902b5 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -18,7 +18,7 @@
         sec2::Sec2,
         Falcon, //
     },
-    fb::FbLayout,
+    fb::GspFbInfo,
     firmware::gsp::GspFirmware,
     gpu::{
         Architecture,
@@ -60,7 +60,7 @@ fn boot<'a>(
         dev: &'a device::Device<device::Bound>,
         bar: Bar0<'a>,
         chipset: Chipset,
-        fb_layout: &FbLayout,
+        fb_info: &GspFbInfo,
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &'a Falcon<GspEngine>,
         sec2_falcon: &'a Falcon<Sec2>,
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 35554d92fda9..ddf3f67e6338 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -17,7 +17,7 @@
         sec2::Sec2,
         Falcon, //
     },
-    fb::FbLayout,
+    fb::GspFbInfo,
     fsp::{
         FmcBootArgs,
         Fsp, //
@@ -152,11 +152,15 @@ fn boot<'a>(
         dev: &'a device::Device<device::Bound>,
         bar: Bar0<'a>,
         chipset: Chipset,
-        fb_layout: &FbLayout,
+        fb_info: &GspFbInfo,
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &'a Falcon<GspEngine>,
         sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
+        let GspFbInfo::Sizes(fb_sizes) = fb_info else {
+            return Err(EINVAL);
+        };
+
         let args = FmcBootArgs::new(dev, chipset, wpr_meta, &gsp.libos, false)?;
 
         let mut fsp = Fsp::new(dev, chipset)?;
@@ -169,7 +173,7 @@ fn boot<'a>(
         let unload_guard =
             BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
 
-        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
+        fsp.boot_fmc(dev, bar, fb_sizes, &args)?;
 
         wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, args.boot_params())?;
 
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 1e08c482fd39..e809deb72055 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -16,7 +16,10 @@
         sec2::Sec2,
         Falcon, //
     },
-    fb::FbLayout,
+    fb::{
+        FbRanges,
+        GspFbInfo, //
+    },
     firmware::{
         booter::{
             BooterFirmware,
@@ -185,7 +188,7 @@ fn run_fwsec_frts(
     falcon: &Falcon<GspEngine>,
     bar: Bar0<'_>,
     bios: &Vbios,
-    fb_layout: &FbLayout,
+    fb_ranges: &FbRanges,
 ) -> Result {
     // Check that the WPR2 region does not already exist - if it does, we cannot run
     // FWSEC-FRTS until the GPU is reset.
@@ -204,8 +207,8 @@ fn run_fwsec_frts(
         bar,
         bios,
         FwsecCommand::Frts {
-            frts_addr: fb_layout.frts.start,
-            frts_size: fb_layout.frts.len(),
+            frts_addr: fb_ranges.frts.start,
+            frts_size: fb_ranges.frts.len(),
         },
     )?;
 
@@ -244,12 +247,12 @@ fn run_fwsec_frts(
 
             Err(EIO)
         }
-        (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
+        (wpr2_lo, _) if wpr2_lo != fb_ranges.frts.start => {
             dev_err!(
                 dev,
                 "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
                 wpr2_lo,
-                fb_layout.frts.start,
+                fb_ranges.frts.start,
             );
 
             Err(EIO)
@@ -272,11 +275,14 @@ fn boot<'a>(
         dev: &'a device::Device<device::Bound>,
         bar: Bar0<'a>,
         chipset: Chipset,
-        fb_layout: &FbLayout,
+        fb_info: &GspFbInfo,
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &'a Falcon<GspEngine>,
         sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
+        let GspFbInfo::Ranges(fb_ranges) = fb_info else {
+            return Err(EINVAL);
+        };
         let bios = Vbios::new(dev, bar)?;
 
         // Try and prepare the unload bundle.
@@ -301,8 +307,8 @@ fn boot<'a>(
             BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
 
         // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
-        if !fb_layout.frts.is_empty() {
-            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+        if !fb_ranges.frts.is_empty() {
+            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_ranges)?;
         }
 
         gsp_falcon.reset(bar)?;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (9 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 14:40 ` [PATCH 12/13] gpu: nova-core: rename heap size field Eliot Courtney
  2026-06-15 14:40 ` [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs Eliot Courtney
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

Currently, the frts vidmem offset is calculated based on the non-wpr
heap size and pmu reservation size, but this is not right. The layout
actually looks like this:

| non-wpr heap | WPR2 .. FRTS | PMU reserved | ... | VGA workspace |

It's just by coincidence + generous alignment that the values happened
to match. Instead, define a per-architecture reserved size at the end of
the framebuffer and use this plus the PMU reserved size to calculate the
frts vidmem offset.

Fixes: d317e4585fa3 ("gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs           |  4 ++++
 drivers/gpu/nova-core/fb/hal.rs       |  3 +++
 drivers/gpu/nova-core/fb/hal/ga100.rs |  4 ++++
 drivers/gpu/nova-core/fb/hal/ga102.rs |  4 ++++
 drivers/gpu/nova-core/fb/hal/gb100.rs |  5 +++++
 drivers/gpu/nova-core/fb/hal/gb202.rs |  5 +++++
 drivers/gpu/nova-core/fb/hal/gh100.rs |  4 ++++
 drivers/gpu/nova-core/fb/hal/tu102.rs |  8 ++++++++
 drivers/gpu/nova-core/fsp.rs          | 25 ++++++++++++++++++-------
 9 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index facecb8b411f..2089fa1d7a22 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -304,6 +304,9 @@ pub(crate) struct FbSizes {
     pub(crate) heap_size: u64,
     /// PMU reserved memory size, in bytes.
     pub(crate) pmu_reserved_size: u32,
+    /// Size reserved at the end of the framebuffer. This is architecture dependent and used to
+    /// compute the FRTS offset for the FSP CoT message.
+    pub(crate) fb_end_reserved_size: u32,
     /// Number of VF partitions.
     pub(crate) vf_partition_count: u8,
 }
@@ -321,6 +324,7 @@ fn new(chipset: Chipset, bar: Bar0<'_>) -> Result<Self> {
                 .wpr_heap_size(chipset, fb_size)?,
             heap_size: u64::from(hal.non_wpr_heap_size()),
             pmu_reserved_size: hal.pmu_reserved_size(),
+            fb_end_reserved_size: hal.fb_end_reserved_size(),
             vf_partition_count: 0,
         })
     }
diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
index 714f0b51cd8f..aa50534550eb 100644
--- a/drivers/gpu/nova-core/fb/hal.rs
+++ b/drivers/gpu/nova-core/fb/hal.rs
@@ -41,6 +41,9 @@ pub(crate) trait FbHal {
 
     /// Returns the FRTS size, in bytes.
     fn frts_size(&self) -> u64;
+
+    /// Returns the size reserved at the end of the framebuffer, in bytes.
+    fn fb_end_reserved_size(&self) -> u32;
 }
 
 /// Returns the HAL corresponding to `chipset`.
diff --git a/drivers/gpu/nova-core/fb/hal/ga100.rs b/drivers/gpu/nova-core/fb/hal/ga100.rs
index 3cc1caf361c7..ce544cbafa2d 100644
--- a/drivers/gpu/nova-core/fb/hal/ga100.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga100.rs
@@ -81,6 +81,10 @@ fn non_wpr_heap_size(&self) -> u32 {
     fn frts_size(&self) -> u64 {
         0
     }
+
+    fn fb_end_reserved_size(&self) -> u32 {
+        super::tu102::fb_end_reserved_size_tu102()
+    }
 }
 
 const GA100: Ga100 = Ga100;
diff --git a/drivers/gpu/nova-core/fb/hal/ga102.rs b/drivers/gpu/nova-core/fb/hal/ga102.rs
index 44a2cf8a00f1..82b4c6034c4a 100644
--- a/drivers/gpu/nova-core/fb/hal/ga102.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga102.rs
@@ -48,6 +48,10 @@ fn non_wpr_heap_size(&self) -> u32 {
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
+
+    fn fb_end_reserved_size(&self) -> u32 {
+        super::tu102::fb_end_reserved_size_tu102()
+    }
 }
 
 const GA102: Ga102 = Ga102;
diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
index 6e0eba101ca1..a53932eaf483 100644
--- a/drivers/gpu/nova-core/fb/hal/gb100.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
@@ -78,6 +78,7 @@ fn write_sysmem_flush_page_gb100(bar: Bar0<'_>, addr: Bounded<u64, 52>) {
     );
 }
 
+// This PMU reservation size is r570-specific.
 pub(super) const fn pmu_reserved_size_gb100() -> u32 {
     usize_into_u32::<{ const_align_up(SZ_8M + SZ_16M + SZ_4K, Alignment::new::<SZ_128K>()).unwrap() }>(
     )
@@ -116,6 +117,10 @@ fn non_wpr_heap_size(&self) -> u32 {
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
+
+    fn fb_end_reserved_size(&self) -> u32 {
+        u32::SZ_2M + u32::SZ_128K
+    }
 }
 
 const GB100: Gb100 = Gb100;
diff --git a/drivers/gpu/nova-core/fb/hal/gb202.rs b/drivers/gpu/nova-core/fb/hal/gb202.rs
index 038d1278c634..1233df4303f5 100644
--- a/drivers/gpu/nova-core/fb/hal/gb202.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb202.rs
@@ -83,12 +83,17 @@ fn pmu_reserved_size(&self) -> u32 {
 
     fn non_wpr_heap_size(&self) -> u32 {
         // Non-WPR heap for GB20x (see Open RM: kgspGetNonWprHeapSize, GB202+).
+        // This size is r570-specific.
         u32::SZ_2M + u32::SZ_128K
     }
 
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
+
+    fn fb_end_reserved_size(&self) -> u32 {
+        u32::SZ_2M + u32::SZ_128K
+    }
 }
 
 const GB202: Gb202 = Gb202;
diff --git a/drivers/gpu/nova-core/fb/hal/gh100.rs b/drivers/gpu/nova-core/fb/hal/gh100.rs
index 5450c7254dad..892c75ef26c6 100644
--- a/drivers/gpu/nova-core/fb/hal/gh100.rs
+++ b/drivers/gpu/nova-core/fb/hal/gh100.rs
@@ -44,6 +44,10 @@ fn non_wpr_heap_size(&self) -> u32 {
     fn frts_size(&self) -> u64 {
         super::tu102::frts_size_tu102()
     }
+
+    fn fb_end_reserved_size(&self) -> u32 {
+        super::tu102::fb_end_reserved_size_tu102()
+    }
 }
 
 const GH100: Gh100 = Gh100;
diff --git a/drivers/gpu/nova-core/fb/hal/tu102.rs b/drivers/gpu/nova-core/fb/hal/tu102.rs
index f629e8e9d5d5..8bafbeec9807 100644
--- a/drivers/gpu/nova-core/fb/hal/tu102.rs
+++ b/drivers/gpu/nova-core/fb/hal/tu102.rs
@@ -52,6 +52,10 @@ pub(super) const fn frts_size_tu102() -> u64 {
     u64::SZ_1M
 }
 
+pub(super) const fn fb_end_reserved_size_tu102() -> u32 {
+    u32::SZ_2M
+}
+
 struct Tu102;
 
 impl FbHal for Tu102 {
@@ -82,6 +86,10 @@ fn non_wpr_heap_size(&self) -> u32 {
     fn frts_size(&self) -> u64 {
         frts_size_tu102()
     }
+
+    fn fb_end_reserved_size(&self) -> u32 {
+        fb_end_reserved_size_tu102()
+    }
 }
 
 const TU102: Tu102 = Tu102;
diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 6778e5546cee..fed9d1e4eeda 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -132,20 +132,31 @@ struct FspCotMessage {
 }
 
 impl FspCotMessage {
+    /// Computes the FRTS vidmem offset for the Chain-of-Trust message. It is measured from the end
+    /// of the framebuffer.
+    fn frts_vidmem_offset(fb_info: &FbSizes) -> Result<u64> {
+        let mut offset = u64::from(fb_info.fb_end_reserved_size);
+
+        if fb_info.pmu_reserved_size != 0 {
+            offset = offset
+                .checked_add(u64::from(fb_info.pmu_reserved_size))
+                .ok_or(EINVAL)?
+                // The 2 MiB alignment is r570-specific.
+                .align_up(Alignment::new::<SZ_2M>())
+                .ok_or(EINVAL)?;
+        }
+
+        Ok(offset)
+    }
+
     /// Returns an in-place initializer for [`FspCotMessage`].
     fn new<'a>(
         fb_info: &FbSizes,
         fsp_fw: &'a FspFirmware,
         args: &'a FmcBootArgs<'_>,
     ) -> Result<impl Init<Self> + 'a> {
-        // frts_vidmem_offset is measured from the end of FB, so FRTS sits at
-        // (end of FB) - frts_vidmem_offset.
         let frts_vidmem_offset = if !args.resume {
-            let frts_reserved_size = fb_info.heap_size + u64::from(fb_info.pmu_reserved_size);
-
-            frts_reserved_size
-                .align_up(Alignment::new::<SZ_2M>())
-                .ok_or(EINVAL)?
+            Self::frts_vidmem_offset(fb_info)?
         } else {
             0
         };

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 12/13] gpu: nova-core: rename heap size field
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (10 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  2026-06-15 14:40 ` [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs Eliot Courtney
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

This field is called non_wpr_heap_size everywhere else. Unify the name
to make it more obvious which heap it is.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs     | 14 +++++++-------
 drivers/gpu/nova-core/gsp/fw.rs |  8 ++++----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index 2089fa1d7a22..ca831a36aedc 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -182,7 +182,7 @@ pub(crate) struct FbRanges {
     /// WPR2 region range, starting with an instance of `GspFwWprMeta`.
     pub(crate) wpr2: FbRange,
     /// Non-WPR heap, located just below WPR2.
-    pub(crate) heap: FbRange,
+    pub(crate) non_wpr_heap: FbRange,
     /// Number of VF partitions.
     pub(crate) vf_partition_count: u8,
     /// PMU reserved memory size, in bytes.
@@ -271,9 +271,9 @@ fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
             FbRange(wpr2_addr..frts.end)
         };
 
-        let heap = {
-            let heap_size = u64::from(hal.non_wpr_heap_size());
-            FbRange(wpr2.start - heap_size..wpr2.start)
+        let non_wpr_heap = {
+            let non_wpr_heap_size = u64::from(hal.non_wpr_heap_size());
+            FbRange(wpr2.start - non_wpr_heap_size..wpr2.start)
         };
 
         Ok(Self {
@@ -284,7 +284,7 @@ fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
             elf,
             wpr2_heap,
             wpr2,
-            heap,
+            non_wpr_heap,
             vf_partition_count: 0,
             pmu_reserved_size: hal.pmu_reserved_size(),
         })
@@ -301,7 +301,7 @@ pub(crate) struct FbSizes {
     /// WPR2 heap size, in bytes.
     pub(crate) wpr2_heap_size: u64,
     /// Non-WPR heap size, in bytes.
-    pub(crate) heap_size: u64,
+    pub(crate) non_wpr_heap_size: u64,
     /// PMU reserved memory size, in bytes.
     pub(crate) pmu_reserved_size: u32,
     /// Size reserved at the end of the framebuffer. This is architecture dependent and used to
@@ -322,7 +322,7 @@ fn new(chipset: Chipset, bar: Bar0<'_>) -> Result<Self> {
             frts_size: hal.frts_size(),
             wpr2_heap_size: gsp::LibosParams::from_chipset(chipset)
                 .wpr_heap_size(chipset, fb_size)?,
-            heap_size: u64::from(hal.non_wpr_heap_size()),
+            non_wpr_heap_size: u64::from(hal.non_wpr_heap_size()),
             pmu_reserved_size: hal.pmu_reserved_size(),
             fb_end_reserved_size: hal.fb_end_reserved_size(),
             vf_partition_count: 0,
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 042b0122e98d..0b94202a7e2a 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -241,9 +241,9 @@ struct WprMetaFields {
 
         let fields = match fb_info {
             GspFbInfo::Ranges(ranges) => WprMetaFields {
-                gsp_fw_rsvd_start: ranges.heap.start,
-                non_wpr_heap_offset: ranges.heap.start,
-                non_wpr_heap_size: ranges.heap.len(),
+                gsp_fw_rsvd_start: ranges.non_wpr_heap.start,
+                non_wpr_heap_offset: ranges.non_wpr_heap.start,
+                non_wpr_heap_size: ranges.non_wpr_heap.len(),
                 gsp_fw_wpr_start: ranges.wpr2.start,
                 gsp_fw_heap_offset: ranges.wpr2_heap.start,
                 gsp_fw_heap_size: ranges.wpr2_heap.len(),
@@ -262,7 +262,7 @@ struct WprMetaFields {
                 pmu_reserved_size: ranges.pmu_reserved_size,
             },
             GspFbInfo::Sizes(sizes) => WprMetaFields {
-                non_wpr_heap_size: sizes.heap_size,
+                non_wpr_heap_size: sizes.non_wpr_heap_size,
                 gsp_fw_heap_size: sizes.wpr2_heap_size,
                 frts_size: sizes.frts_size,
                 gsp_fw_heap_vf_partition_count: sizes.vf_partition_count,

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs
  2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
                   ` (11 preceding siblings ...)
  2026-06-15 14:40 ` [PATCH 12/13] gpu: nova-core: rename heap size field Eliot Courtney
@ 2026-06-15 14:40 ` Eliot Courtney
  12 siblings, 0 replies; 24+ messages in thread
From: Eliot Courtney @ 2026-06-15 14:40 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot, Alice Ryhl, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo
  Cc: John Hubbard, Alistair Popple, Timur Tabi, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux, Eliot Courtney

This is always immediately widened to u64, so just return it as a u64
from the beginning.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs           | 4 ++--
 drivers/gpu/nova-core/fb/hal.rs       | 2 +-
 drivers/gpu/nova-core/fb/hal/ga100.rs | 2 +-
 drivers/gpu/nova-core/fb/hal/ga102.rs | 2 +-
 drivers/gpu/nova-core/fb/hal/gb100.rs | 4 ++--
 drivers/gpu/nova-core/fb/hal/gb202.rs | 4 ++--
 drivers/gpu/nova-core/fb/hal/gh100.rs | 4 ++--
 drivers/gpu/nova-core/fb/hal/tu102.rs | 6 +++---
 8 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index ca831a36aedc..0829ef25527b 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -272,7 +272,7 @@ fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
         };
 
         let non_wpr_heap = {
-            let non_wpr_heap_size = u64::from(hal.non_wpr_heap_size());
+            let non_wpr_heap_size = hal.non_wpr_heap_size();
             FbRange(wpr2.start - non_wpr_heap_size..wpr2.start)
         };
 
@@ -322,7 +322,7 @@ fn new(chipset: Chipset, bar: Bar0<'_>) -> Result<Self> {
             frts_size: hal.frts_size(),
             wpr2_heap_size: gsp::LibosParams::from_chipset(chipset)
                 .wpr_heap_size(chipset, fb_size)?,
-            non_wpr_heap_size: u64::from(hal.non_wpr_heap_size()),
+            non_wpr_heap_size: hal.non_wpr_heap_size(),
             pmu_reserved_size: hal.pmu_reserved_size(),
             fb_end_reserved_size: hal.fb_end_reserved_size(),
             vf_partition_count: 0,
diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
index aa50534550eb..ff05292a3a19 100644
--- a/drivers/gpu/nova-core/fb/hal.rs
+++ b/drivers/gpu/nova-core/fb/hal.rs
@@ -37,7 +37,7 @@ pub(crate) trait FbHal {
     fn pmu_reserved_size(&self) -> u32;
 
     /// Returns the non-WPR heap size for this chipset, in bytes.
-    fn non_wpr_heap_size(&self) -> u32;
+    fn non_wpr_heap_size(&self) -> u64;
 
     /// Returns the FRTS size, in bytes.
     fn frts_size(&self) -> u64;
diff --git a/drivers/gpu/nova-core/fb/hal/ga100.rs b/drivers/gpu/nova-core/fb/hal/ga100.rs
index ce544cbafa2d..16ef0e0d2c05 100644
--- a/drivers/gpu/nova-core/fb/hal/ga100.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga100.rs
@@ -72,7 +72,7 @@ fn pmu_reserved_size(&self) -> u32 {
         super::tu102::pmu_reserved_size_tu102()
     }
 
-    fn non_wpr_heap_size(&self) -> u32 {
+    fn non_wpr_heap_size(&self) -> u64 {
         super::tu102::non_wpr_heap_size_tu102()
     }
 
diff --git a/drivers/gpu/nova-core/fb/hal/ga102.rs b/drivers/gpu/nova-core/fb/hal/ga102.rs
index 82b4c6034c4a..8653d0d404d8 100644
--- a/drivers/gpu/nova-core/fb/hal/ga102.rs
+++ b/drivers/gpu/nova-core/fb/hal/ga102.rs
@@ -41,7 +41,7 @@ fn pmu_reserved_size(&self) -> u32 {
         super::tu102::pmu_reserved_size_tu102()
     }
 
-    fn non_wpr_heap_size(&self) -> u32 {
+    fn non_wpr_heap_size(&self) -> u64 {
         super::tu102::non_wpr_heap_size_tu102()
     }
 
diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
index a53932eaf483..93fe708895c5 100644
--- a/drivers/gpu/nova-core/fb/hal/gb100.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
@@ -109,9 +109,9 @@ fn pmu_reserved_size(&self) -> u32 {
         pmu_reserved_size_gb100()
     }
 
-    fn non_wpr_heap_size(&self) -> u32 {
+    fn non_wpr_heap_size(&self) -> u64 {
         // Non-WPR heap for GB10x (see Open RM: kgspGetNonWprHeapSize, GB100/GB102).
-        u32::SZ_2M
+        u64::SZ_2M
     }
 
     fn frts_size(&self) -> u64 {
diff --git a/drivers/gpu/nova-core/fb/hal/gb202.rs b/drivers/gpu/nova-core/fb/hal/gb202.rs
index 1233df4303f5..e6b259fd72a4 100644
--- a/drivers/gpu/nova-core/fb/hal/gb202.rs
+++ b/drivers/gpu/nova-core/fb/hal/gb202.rs
@@ -81,10 +81,10 @@ fn pmu_reserved_size(&self) -> u32 {
         super::gb100::pmu_reserved_size_gb100()
     }
 
-    fn non_wpr_heap_size(&self) -> u32 {
+    fn non_wpr_heap_size(&self) -> u64 {
         // Non-WPR heap for GB20x (see Open RM: kgspGetNonWprHeapSize, GB202+).
         // This size is r570-specific.
-        u32::SZ_2M + u32::SZ_128K
+        u64::SZ_2M + u64::SZ_128K
     }
 
     fn frts_size(&self) -> u64 {
diff --git a/drivers/gpu/nova-core/fb/hal/gh100.rs b/drivers/gpu/nova-core/fb/hal/gh100.rs
index 892c75ef26c6..bb56ed15bab7 100644
--- a/drivers/gpu/nova-core/fb/hal/gh100.rs
+++ b/drivers/gpu/nova-core/fb/hal/gh100.rs
@@ -36,9 +36,9 @@ fn pmu_reserved_size(&self) -> u32 {
         super::tu102::pmu_reserved_size_tu102()
     }
 
-    fn non_wpr_heap_size(&self) -> u32 {
+    fn non_wpr_heap_size(&self) -> u64 {
         // Non-WPR heap for Hopper (see Open RM: kgspCalculateFbLayout_GH100).
-        u32::SZ_2M
+        u64::SZ_2M
     }
 
     fn frts_size(&self) -> u64 {
diff --git a/drivers/gpu/nova-core/fb/hal/tu102.rs b/drivers/gpu/nova-core/fb/hal/tu102.rs
index 8bafbeec9807..d98974827373 100644
--- a/drivers/gpu/nova-core/fb/hal/tu102.rs
+++ b/drivers/gpu/nova-core/fb/hal/tu102.rs
@@ -44,8 +44,8 @@ pub(super) const fn pmu_reserved_size_tu102() -> u32 {
     0
 }
 
-pub(super) const fn non_wpr_heap_size_tu102() -> u32 {
-    u32::SZ_1M
+pub(super) const fn non_wpr_heap_size_tu102() -> u64 {
+    u64::SZ_1M
 }
 
 pub(super) const fn frts_size_tu102() -> u64 {
@@ -79,7 +79,7 @@ fn pmu_reserved_size(&self) -> u32 {
         pmu_reserved_size_tu102()
     }
 
-    fn non_wpr_heap_size(&self) -> u32 {
+    fn non_wpr_heap_size(&self) -> u64 {
         non_wpr_heap_size_tu102()
     }
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-06-16 10:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
2026-06-15 17:11   ` Gary Guo
2026-06-16  7:33   ` Alistair Popple
2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
2026-06-15 17:15   ` Gary Guo
2026-06-16  7:57     ` Alistair Popple
2026-06-16 10:57       ` Gary Guo
2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
2026-06-15 17:16   ` Gary Guo
2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
2026-06-15 17:17   ` Gary Guo
2026-06-16  8:02   ` Alistair Popple
2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
2026-06-15 17:23   ` Gary Guo
2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
2026-06-15 17:24   ` Gary Guo
2026-06-15 14:40 ` [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations Eliot Courtney
2026-06-15 14:40 ` [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough Eliot Courtney
2026-06-15 14:40 ` [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier Eliot Courtney
2026-06-15 14:40 ` [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions Eliot Courtney
2026-06-15 14:40 ` [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation Eliot Courtney
2026-06-15 14:40 ` [PATCH 12/13] gpu: nova-core: rename heap size field Eliot Courtney
2026-06-15 14:40 ` [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs Eliot Courtney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.