[PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on
@ 2026-06-04 11:43 Zhi Wang
  2026-06-04 11:43 ` [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper Zhi Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

Booting up GSP with vGPU enabled is part of the first milestone (M1)
together with rust fwctl abstraction [1] and nova-core fwctl driver [2]
for upstream vGPU support, allowing us to validate the basic GSP boot flow
with vGPU enabled, upload vGPU types even before the remaining nova-core
dependencies are ready. The nova-vgpu WIP patches for all milestones can
be found at [3].

v3:

- Consolidate GSP boot parameters into GspBootContext struct, passed
  through the HAL trait methods instead of individual arguments.
- Add vGPU preludes patch (VgpuManager, Architecture::supports_vgpu,
  GspBootContext fields) as preparation for the vGPU boot changes.
- Rebase on top of the latest drm-rust-next.

Dropped patches:

Dependencies still in progress:

- "populate GSP_VF_INFO when vGPU is enabled" - depends on the
  ExtSriovCapability abstraction [4] which is still under review and
  Gary's on-going io projection work [5].

Ada specific:

- "introduce vgpu_support module param" - select vGPU support based on
  module params.

- "load the scrubber ucode when vGPU support is enabled"

v2:

- Adopt early-return style (Dirk).
- Add #ifndef CONFIG_PCI_IOV helper to fix compilation when
  CONFIG_PCI_IOV is disabled, per (Alex).
- Change return type from Result<i32> to Result<u16> to match the
  PCI spec field width, avoiding try_from at call sites.
- GspVfInfo changed to tuple struct (Alex).
- Use unconditional constructor with Option wrapping instead of
  bool parameter. (Alex)
- Use full initialization expression instead of mutating a zeroed
  value.
- Use .chain() pattern in GspSetSystemInfo::init() for optional
  vGPU info. (Alex)
- Eliminate all magic numbers: add vf_bar_is_64bit() and
  read_vf_bar64_addr() to ExtSriovCapability using PCI bindings
  constants (PCI_BASE_ADDRESS_MEM_TYPE_MASK, etc.).
- Use KVec<RegistryEntry> for dynamic registry entry construction
  instead of hardcoded array (Timur, Joel, Alexandre).
- Replace magic numbers 32/48 with named binding constants
  MAX_PARTITIONS_WITH_GFID_32VM / MAX_PARTITIONS_WITH_GFID from
  OpenRM (Alex).
- Use read_poll_timeout() instead of single read for scrubber
  completion check (Joel).
- Use dev instead of pdev.as_ref() in dev_dbg! (Dirk).
- Change scrubber trigger condition from vgpu_requested to
  fb_layout.wpr2_heap.len() > SZ_256M, checking actual heap size
  instead of vGPU flag. (Alex).

[1] https://lore.kernel.org/rust-for-linux/20260217204909.211793-1-zhiw@nvidia.com/
[2] https://lore.kernel.org/rust-for-linux/20260305190936.398590-1-zhiw@nvidia.com/
[3] https://github.com/zhiwang-nvidia/nova-core/tree/zhi/nova-vgpu-wip
[4] https://lore.kernel.org/rust-for-linux/20260409185254.3869808-1-zhiw@nvidia.com/
[5] https://lore.kernel.org/rust-for-linux/20260421-io_projection-v2-0-4c251c692ef4@garyguo.net/

Zhi Wang (9):
  rust: pci: expose sriov_get_totalvfs() helper
  gpu: nova-core: factor out common FSP message header
  gpu: nova-core: return FSP response buffer to caller
  gpu: nova-core: read vGPU mode from FSP via PRC protocol
  gpu: nova-core: add FSP and PRC protocol documentation
  gpu: nova-core: consolidate GSP boot parameters into GspBootContext
  gpu: nova-core: add vGPU preludes
  gpu: nova-core: set RMSetSriovMode when NVIDIA vGPU is enabled
  gpu: nova-core: reserve a larger GSP WPR2 heap when vGPU is enabled

 Documentation/gpu/nova/core/fsp.rst    | 142 +++++++++++++++++
 Documentation/gpu/nova/index.rst       |   1 +
 drivers/gpu/nova-core/fb.rs            |  17 +-
 drivers/gpu/nova-core/fsp.rs           | 207 ++++++++++++++++++++++---
 drivers/gpu/nova-core/gpu.rs           |  38 ++++-
 drivers/gpu/nova-core/gsp.rs           |  27 ++++
 drivers/gpu/nova-core/gsp/boot.rs      |  60 ++++---
 drivers/gpu/nova-core/gsp/commands.rs  |  93 +++++++----
 drivers/gpu/nova-core/gsp/fw.rs        |  16 ++
 drivers/gpu/nova-core/gsp/hal.rs       |  23 +--
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  22 ++-
 drivers/gpu/nova-core/gsp/hal/tu102.rs |  31 ++--
 drivers/gpu/nova-core/mctp.rs          |   3 +
 drivers/gpu/nova-core/nova_core.rs     |   1 +
 drivers/gpu/nova-core/vgpu.rs          |  47 ++++++
 rust/kernel/pci.rs                     |  12 ++
 16 files changed, 608 insertions(+), 132 deletions(-)
 create mode 100644 Documentation/gpu/nova/core/fsp.rst
 create mode 100644 drivers/gpu/nova-core/vgpu.rs

-- 
2.51.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-05 14:08   ` Alexandre Courbot
  2026-06-04 11:43 ` [PATCH 2/9] gpu: nova-core: factor out common FSP message header Zhi Wang
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang, Bjorn Helgaas, linux-pci

Add a wrapper for the `pci_sriov_get_totalvfs()` helper, allowing drivers
to query the number of total SR-IOV virtual functions a PCI device
supports.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 rust/kernel/pci.rs | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index 5071cae6543f..d04e5f6841f2 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -450,6 +450,18 @@ pub fn pci_class(&self) -> Class {
         // SAFETY: `self.as_raw` is a valid pointer to a `struct pci_dev`.
         Class::from_raw(unsafe { (*self.as_raw()).class })
     }
+
+    /// Returns total number of VFs, or `Err(ENODEV)` if none available.
+    pub fn sriov_get_totalvfs(&self) -> Result<i32> {
+        // SAFETY: `self.as_raw()` is a valid pointer to a `struct pci_dev`.
+        let vfs = unsafe { bindings::pci_sriov_get_totalvfs(self.as_raw()) };
+
+        if vfs == 0 {
+            return Err(ENODEV);
+        }
+
+        Ok(vfs)
+    }
 }
 
 impl<'a> Device<device::Core<'a>> {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/9] gpu: nova-core: factor out common FSP message header
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
  2026-06-04 11:43 ` [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-05 13:21   ` Alexandre Courbot
  2026-06-04 11:43 ` [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller Zhi Wang
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

Extract common MCTP + NVDM headers into FspMessageHeader, rename
FspMessage to FspCotMessage, and update FspResponse to use the shared
header. This prepares for adding new FSP message types.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs | 56 +++++++++++++++++++++++++-----------
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 8fc243c66e35..78b90bfbfba4 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -57,12 +57,35 @@ struct NvdmPayloadCommandResponse {
     error_code: u32,
 }
 
-/// Complete FSP response structure with MCTP and NVDM headers.
+/// Common MCTP and NVDM headers shared by all FSP messages.
 #[repr(C, packed)]
 #[derive(Clone, Copy)]
-struct FspResponse {
+struct FspMessageHeader {
     mctp_header: MctpHeader,
     nvdm_header: NvdmHeader,
+}
+
+// SAFETY: FspMessageHeader is a packed C struct with only integral fields.
+unsafe impl AsBytes for FspMessageHeader {}
+
+// SAFETY: FspMessageHeader is a packed C struct with only integral fields.
+unsafe impl FromBytes for FspMessageHeader {}
+
+impl FspMessageHeader {
+    /// Construct a standard FSP message header for the given NVDM type.
+    fn new(nvdm_type: NvdmType) -> Self {
+        Self {
+            mctp_header: MctpHeader::single_packet(),
+            nvdm_header: NvdmHeader::new(nvdm_type),
+        }
+    }
+}
+
+/// Complete FSP response structure with MCTP and NVDM headers.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct FspResponse {
+    header: FspMessageHeader,
     response: NvdmPayloadCommandResponse,
 }
 
@@ -94,17 +117,16 @@ struct NvdmPayloadCot {
     gsp_boot_args_sysmem_offset: u64,
 }
 
-/// Complete FSP message structure with MCTP and NVDM headers.
+/// Complete FSP COT (Chain of Trust) message structure.
 #[repr(C)]
 #[derive(Clone, Copy)]
-struct FspMessage {
-    mctp_header: MctpHeader,
-    nvdm_header: NvdmHeader,
+struct FspCotMessage {
+    header: FspMessageHeader,
     cot: NvdmPayloadCot,
 }
 
-impl FspMessage {
-    /// Returns an in-place initializer for [`FspMessage`].
+impl FspCotMessage {
+    /// Returns an in-place initializer for [`FspCotMessage`].
     fn new<'a>(
         fb_layout: &FbLayout,
         fsp_fw: &'a FspFirmware,
@@ -131,8 +153,7 @@ fn new<'a>(
         let size = num::usize_into_u16::<{ core::mem::size_of::<NvdmPayloadCot>() }>();
 
         Ok(init!(Self {
-            mctp_header: MctpHeader::single_packet(),
-            nvdm_header: NvdmHeader::new(NvdmType::Cot),
+            header: FspMessageHeader::new(NvdmType::Cot),
             // The payload is packed, so we cannot use `init!`. Initialize it member-by-member using
             // `chain`.
             cot <- pin_init::init_zeroed(),
@@ -153,11 +174,11 @@ fn new<'a>(
     }
 }
 
-// SAFETY: `FspMessage` is `#[repr(C)]` with no padding, so all of its
+// SAFETY: `FspCotMessage` is `#[repr(C)]` with no padding, so all of its
 // bytes are initialized.
-unsafe impl AsBytes for FspMessage {}
+unsafe impl AsBytes for FspCotMessage {}
 
-impl MessageToFsp for FspMessage {
+impl MessageToFsp for FspCotMessage {
     const NVDM_TYPE: NvdmType = NvdmType::Cot;
 }
 
@@ -251,8 +272,8 @@ fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: Bar0<'_>, msg: &M) ->
             EIO
         })?;
 
-        let mctp_header = response.mctp_header;
-        let nvdm_header = response.nvdm_header;
+        let mctp_header = response.header.mctp_header;
+        let nvdm_header = response.header.nvdm_header;
         let command_nvdm_type = response.response.command_nvdm_type;
         let error_code = response.response.error_code;
 
@@ -310,7 +331,10 @@ pub(crate) fn boot_fmc(
     ) -> Result {
         dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
 
-        let msg = KBox::init(FspMessage::new(fb_layout, &self.fsp_fw, args)?, GFP_KERNEL)?;
+        let msg = KBox::init(
+            FspCotMessage::new(fb_layout, &self.fsp_fw, args)?,
+            GFP_KERNEL,
+        )?;
 
         self.send_sync_fsp(dev, bar, &*msg)?;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
  2026-06-04 11:43 ` [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper Zhi Wang
  2026-06-04 11:43 ` [PATCH 2/9] gpu: nova-core: factor out common FSP message header Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-05 13:25   ` Alexandre Courbot
  2026-06-04 11:43 ` [PATCH 4/9] gpu: nova-core: read vGPU mode from FSP via PRC protocol Zhi Wang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

Change send_sync_fsp() to return the raw response buffer after
validating the common MCTP/NVDM headers and error code. This allows
callers to perform protocol-specific parsing on the response payload,
which is needed for the upcoming PRC protocol support.

For the existing COT caller, the response buffer is unused.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 78b90bfbfba4..5fd2e9e277b1 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -257,7 +257,8 @@ pub(crate) fn wait_secure_boot(
     }
 
     /// Sends a message to FSP and waits for the response.
-    fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: Bar0<'_>, msg: &M) -> Result
+    /// Returns the full response buffer on success.
+    fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: Bar0<'_>, msg: &M) -> Result<KVec<u8>>
     where
         M: MessageToFsp,
     {
@@ -315,7 +316,7 @@ fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: Bar0<'_>, msg: &M) ->
             return Err(EIO);
         }
 
-        Ok(())
+        Ok(response_buf)
     }
 
     /// Boots GSP FMC via FSP Chain of Trust.
@@ -336,7 +337,7 @@ pub(crate) fn boot_fmc(
             GFP_KERNEL,
         )?;
 
-        self.send_sync_fsp(dev, bar, &*msg)?;
+        let _response_buf = self.send_sync_fsp(dev, bar, &*msg)?;
 
         dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
         Ok(())
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/9] gpu: nova-core: read vGPU mode from FSP via PRC protocol
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
                   ` (2 preceding siblings ...)
  2026-06-04 11:43 ` [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-04 11:43 ` [PATCH 5/9] gpu: nova-core: add FSP and PRC protocol documentation Zhi Wang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

Add support for querying the vGPU mode configuration from FSP using
the PRC (Product Reconfiguration Control) protocol. PRC is an API
system exposed through FSP's Management Partition that allows querying
device configuration "knobs" without firmware updates.

Add a VgpuMode enum that validates the raw PRC response value,
returning an error for unexpected values.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs  | 145 ++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/mctp.rs |   3 +
 2 files changed, 148 insertions(+)

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index 5fd2e9e277b1..ce11efeba37e 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -48,6 +48,44 @@
 
 mod hal;
 
+/// PRC (Product Reconfiguration Control) protocol constants.
+///
+/// PRC is an API system exposed through FSP's Management Partition that allows
+/// querying and modifying device configuration "knobs" without firmware updates.
+/// Each knob is identified by a unique object ID and controls a specific device
+/// behavior (e.g., vGPU mode, ECC, confidential computing).
+mod prc {
+    /// Sub-command to read a PRC knob value.
+    pub(super) const SUBCMD_READ: u8 = 0x0c;
+
+    /// PRC object ID for vGPU mode configuration (knob ID 41).
+    pub(super) const OBJECT_VGPU_MODE: u8 = 0x29;
+
+    /// Request the active knob value (currently effective this boot).
+    pub(super) const FLAG_ACTIVE: u8 = 1 << 1;
+}
+
+/// vGPU operating mode as reported by FSP via the PRC protocol.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum VgpuMode {
+    /// vGPU support is disabled on this GPU.
+    Disabled = 0,
+    /// vGPU support is enabled on this GPU.
+    Enabled = 1,
+}
+
+impl TryFrom<u16> for VgpuMode {
+    type Error = kernel::error::Error;
+
+    fn try_from(value: u16) -> Result<Self> {
+        match value {
+            0 => Ok(VgpuMode::Disabled),
+            1 => Ok(VgpuMode::Enabled),
+            _ => Err(EINVAL),
+        }
+    }
+}
+
 /// FSP command response payload (`NVDM_PAYLOAD_COMMAND_RESPONSE`).
 #[repr(C, packed)]
 #[derive(Clone, Copy)]
@@ -57,6 +95,39 @@ struct NvdmPayloadCommandResponse {
     error_code: u32,
 }
 
+// SAFETY: NvdmPayloadCommandResponse is a packed C struct with only integral fields.
+unsafe impl FromBytes for NvdmPayloadCommandResponse {}
+
+/// PRC message payload.
+///
+/// Sent to FSP to query or modify a device configuration knob.
+/// The response includes the common FSP response header followed by
+/// a [`NvdmPayloadPrcResponse`] with the knob's current state value.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct NvdmPayloadPrc {
+    sub_message_id: u8,
+    flags: u8,
+    object_id: u8,
+    reserved: u8,
+}
+
+// SAFETY: NvdmPayloadPrc is a packed C struct with only integral fields.
+unsafe impl AsBytes for NvdmPayloadPrc {}
+
+/// PRC response payload containing the knob state value.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct NvdmPayloadPrcResponse {
+    value_low: u8,
+    value_high: u8,
+    reserved1: u8,
+    reserved2: u8,
+}
+
+// SAFETY: NvdmPayloadPrcResponse is a packed C struct with only integral fields.
+unsafe impl FromBytes for NvdmPayloadPrcResponse {}
+
 /// Common MCTP and NVDM headers shared by all FSP messages.
 #[repr(C, packed)]
 #[derive(Clone, Copy)]
@@ -92,6 +163,18 @@ struct FspResponse {
 // SAFETY: FspResponse is a packed C struct with only integral fields.
 unsafe impl FromBytes for FspResponse {}
 
+/// Complete FSP PRC response including the knob state payload.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct FspPrcResponse {
+    header: FspMessageHeader,
+    response: NvdmPayloadCommandResponse,
+    prc_data: NvdmPayloadPrcResponse,
+}
+
+// SAFETY: FspPrcResponse is a packed C struct with only integral fields.
+unsafe impl FromBytes for FspPrcResponse {}
+
 /// Trait implemented by types representing a message to send to FSP.
 ///
 /// This provides [`Fsp::send_sync_fsp`] with the information it needs to send
@@ -178,10 +261,25 @@ fn new<'a>(
 // bytes are initialized.
 unsafe impl AsBytes for FspCotMessage {}
 
+/// Complete FSP PRC message.
+#[repr(C, packed)]
+#[derive(Clone, Copy)]
+struct FspPrcMessage {
+    header: FspMessageHeader,
+    prc: NvdmPayloadPrc,
+}
+
+// SAFETY: FspPrcMessage is a packed C struct with only integral fields.
+unsafe impl AsBytes for FspPrcMessage {}
+
 impl MessageToFsp for FspCotMessage {
     const NVDM_TYPE: NvdmType = NvdmType::Cot;
 }
 
+impl MessageToFsp for FspPrcMessage {
+    const NVDM_TYPE: NvdmType = NvdmType::Prc;
+}
+
 /// Bundled arguments for FMC boot via FSP Chain of Trust.
 pub(crate) struct FmcBootArgs {
     chipset: Chipset,
@@ -226,6 +324,53 @@ pub(crate) struct Fsp {
 }
 
 impl Fsp {
+    /// Read vGPU mode from FSP using the PRC protocol.
+    ///
+    /// Queries FSP's Management Partition for the active vGPU mode knob value.
+    /// Returns [`VgpuMode::Enabled`] if vGPU support is active on this GPU,
+    /// [`VgpuMode::Disabled`] otherwise.
+    #[expect(dead_code)]
+    pub(crate) fn read_vgpu_mode(
+        &mut self,
+        dev: &device::Device<device::Bound>,
+        bar: Bar0<'_>,
+    ) -> Result<VgpuMode> {
+        let msg = KBox::new(
+            FspPrcMessage {
+                header: FspMessageHeader::new(NvdmType::Prc),
+                prc: NvdmPayloadPrc {
+                    sub_message_id: prc::SUBCMD_READ,
+                    flags: prc::FLAG_ACTIVE,
+                    object_id: prc::OBJECT_VGPU_MODE,
+                    reserved: 0,
+                },
+            },
+            GFP_KERNEL,
+        )?;
+
+        let response_buf = self.send_sync_fsp(dev, bar, &*msg)?;
+
+        let prc_resp_size = core::mem::size_of::<FspPrcResponse>();
+        if response_buf.len() < prc_resp_size {
+            dev_err!(
+                dev,
+                "PRC response too small: {} bytes (expected {})\n",
+                response_buf.len(),
+                prc_resp_size
+            );
+            return Err(EIO);
+        }
+
+        let (prc_response, _) = FspPrcResponse::from_bytes_prefix(&response_buf[..]).ok_or(EIO)?;
+
+        let raw_value = u16::from(prc_response.prc_data.value_low)
+            | (u16::from(prc_response.prc_data.value_high) << 8);
+
+        VgpuMode::try_from(raw_value).inspect_err(|_| {
+            dev_err!(dev, "unexpected vGPU mode value: {:#x}\n", raw_value);
+        })
+    }
+
     /// Waits for FSP secure boot completion, then returns the [`Fsp`] interface.
     ///
     /// Polls the thermal scratch register until FSP signals boot completion or the timeout
diff --git a/drivers/gpu/nova-core/mctp.rs b/drivers/gpu/nova-core/mctp.rs
index 482786e07bc7..b203c632bf20 100644
--- a/drivers/gpu/nova-core/mctp.rs
+++ b/drivers/gpu/nova-core/mctp.rs
@@ -13,6 +13,8 @@
 #[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
 #[repr(u8)]
 pub(crate) enum NvdmType {
+    /// PRC (Product Reconfiguration Control) message.
+    Prc = 0x13,
     #[default]
     /// Chain of Trust boot message.
     Cot = 0x14,
@@ -25,6 +27,7 @@ impl TryFrom<u8> for NvdmType {
 
     fn try_from(value: u8) -> Result<Self, Self::Error> {
         match value {
+            x if x == u8::from(Self::Prc) => Ok(Self::Prc),
             x if x == u8::from(Self::Cot) => Ok(Self::Cot),
             x if x == u8::from(Self::FspResponse) => Ok(Self::FspResponse),
             _ => Err(value),
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/9] gpu: nova-core: add FSP and PRC protocol documentation
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
                   ` (3 preceding siblings ...)
  2026-06-04 11:43 ` [PATCH 4/9] gpu: nova-core: read vGPU mode from FSP via PRC protocol Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-04 11:43 ` [PATCH 6/9] gpu: nova-core: consolidate GSP boot parameters into GspBootContext Zhi Wang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

Add documentation for the Foundation Security Processor (FSP) interface
covering the simplified Hopper/Blackwell boot flow, the Chain of Trust
(COT) message protocol, the MCTP/NVDM message format, and the Product
Reconfiguration Control (PRC) protocol used to query device configuration
knobs such as vGPU mode.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 Documentation/gpu/nova/core/fsp.rst | 142 ++++++++++++++++++++++++++++
 Documentation/gpu/nova/index.rst    |   1 +
 2 files changed, 143 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/fsp.rst

diff --git a/Documentation/gpu/nova/core/fsp.rst b/Documentation/gpu/nova/core/fsp.rst
new file mode 100644
index 000000000000..52d618d22bb8
--- /dev/null
+++ b/Documentation/gpu/nova/core/fsp.rst
@@ -0,0 +1,142 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================
+FSP (Foundation Security Processor) and Secure Boot
+===================================================
+This document describes the role of the FSP in the GPU boot sequence on
+Hopper and Blackwell GPUs, and how it differs from the earlier Ampere boot
+flow. It also provides a brief overview of the PRC (Product Reconfiguration
+Control) protocol used to query device configuration through FSP. As with
+other documents in this directory, the information is subject to change and
+is intended to help developers understand the corresponding kernel code.
+
+What is FSP?
+============
+The Foundation Security Processor (FSP) is the GPU's Internal Root of Trust
+(IROT). It is a dedicated security processor that boots from immutable ROM
+(Boot ROM) inside the GPU and is responsible for establishing the Chain of
+Trust before any other firmware is allowed to run.
+
+FSP runs independently of the host CPU and starts executing as soon as the
+GPU is powered on. By the time the nova-core driver is loaded, FSP has
+already completed its own secure boot and is ready to accept commands from
+the driver.
+
+Simplified boot flow (Hopper/Blackwell)
+=======================================
+Starting with Hopper, the boot flow is significantly simplified compared to
+earlier GPU generations like Ampere.
+
+On an **Ampere** GPU, the boot verification chain involves multiple Falcon
+engines and multiple ucode stages (see falcon.rst for details)::
+
+     Hardware BROM (SEC2)
+          -> HS Booter (SEC2)
+               -> LS GSP-RM (GSP)
+
+The driver must extract ucode from VBIOS, manage SEC2 and GSP, and
+orchestrate the Booter to load GSP-RM. This involves FWSEC-FRTS, devinit,
+and the Booter stages.
+
+On **Hopper/Blackwell** GPUs, FSP replaces this multi-stage process with a
+single message-driven interface::
+
+     FSP (hardware root of trust, boots from ROM)
+          -> FMC (Falcon Microcontroller, verified by FSP)
+               -> GSP-RM (verified and loaded by FMC)
+
+The driver only needs to:
+
+1. Wait for FSP to complete its own secure boot (polling a scratch register).
+2. Send a Chain of Trust (COT) message to FSP with the FMC firmware location,
+   cryptographic signatures, and GSP boot parameters.
+3. FSP authenticates the FMC firmware and boots it, FMC in turn loads GSP-RM.
+
+There is no SEC2 involvement, no Booter ucode, and no FWSEC-FRTS stage. The
+entire secure boot is driven by a single FSP message exchange.
+
+Chain of Trust (COT) protocol
+=============================
+The Chain of Trust establishes a cryptographically enforced boot sequence,
+ensuring the GPU reaches a known, trusted state.
+
+The driver communicates with FSP using a message queue (Falcon MSGQ
+interface). Each message consists of an MCTP (Management Component Transport
+Protocol) transport header and an NVDM (NVIDIA Vendor Defined Message) header,
+followed by a protocol-specific payload.
+
+For Chain of Trust, the payload includes:
+
+- The system memory address of the FMC firmware image.
+- Cryptographic material: a SHA-384 hash, RSA-3K public key, and RSA-3K
+  signature extracted from the FMC ELF firmware.
+- FRTS (Firmware Runtime Services) region information (vidmem offset and size).
+- The system memory address of the GSP boot arguments structure.
+
+FSP verifies the signature against the provided public key and hash, and if
+verification succeeds, boots the FMC. The FMC then authenticates and launches
+GSP-RM.
+
+The message flow is::
+
+     nova-core                          FSP
+        |                                |
+        |  1. Poll scratch register      |
+        |  (wait for FSP boot complete)  |
+        |                                |
+        |  2. COT message  ------------> |
+        |     (FMC addr, signatures,     |
+        |      boot params)              |
+        |                                |
+        |                                |--- Verify FMC signature
+        |                                |--- Boot FMC
+        |                                |--- FMC loads GSP-RM
+        |                                |
+        |  3. COT response <------------ |
+        |     (success/error)            |
+        |                                |
+
+FSP message format
+==================
+All FSP messages share a common header format consisting of two 32-bit words:
+
+**MCTP header** (Management Component Transport Protocol):
+
+- Bit 31: SOM (Start of Message)
+- Bit 30: EOM (End of Message)
+- Bits 29:28: Packet sequence number
+- Bits 23:16: Source Endpoint ID
+
+**NVDM header** (NVIDIA Vendor Defined Message):
+
+- Bits 6:0: MCTP message type (0x7e = vendor-defined PCI)
+- Bits 23:8: PCI vendor ID (0x10de = NVIDIA)
+- Bits 31:24: NVDM type (0x14 = COT, 0x13 = PRC, 0x15 = FSP response)
+
+PRC (Product Reconfiguration Control) protocol
+===============================================
+PRC is an API system exposed through FSP's Management Partition that allows
+querying and modifying device configuration without firmware updates.
+
+Configuration parameters are called "knobs". Each knob has a unique object
+ID and controls a specific device behavior. Examples include vGPU mode, ECC
+enable, confidential computing mode, and NVLINK configuration.
+
+Each knob has two values:
+
+- **Active**: the currently effective value for this boot cycle.
+- **Persistent**: the value stored in InfoROM, applied on subsequent boots.
+
+The nova-core driver uses PRC to read the vGPU mode knob (object ID 0x29)
+during early boot, before firmware loading, to determine whether the GPU
+should operate in vGPU mode.
+
+The PRC message format follows the same MCTP/NVDM header structure as COT,
+with NVDM type 0x13. The payload contains:
+
+- A sub-command (e.g., 0x0c for read).
+- Flags indicating which value to read (bit 0 = persistent, bit 1 = active).
+- The knob object ID.
+
+The response includes the common FSP response header (with error status)
+followed by the knob's 16-bit state value.
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index e39cb3163581..1783513cbd05 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -30,5 +30,6 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/todo
    core/vbios
    core/devinit
+   core/fsp
    core/fwsec
    core/falcon
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/9] gpu: nova-core: consolidate GSP boot parameters into GspBootContext
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
                   ` (4 preceding siblings ...)
  2026-06-04 11:43 ` [PATCH 5/9] gpu: nova-core: add FSP and PRC protocol documentation Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-04 11:43 ` [PATCH 7/9] gpu: nova-core: add vGPU preludes Zhi Wang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

The GspHal trait methods boot() and post_boot() accept a long list of
individual parameters (dev, bar, chipset, gsp_falcon, sec2_falcon) that
are threaded through the entire GSP boot call chain. This makes the
signatures unwieldy and difficult to extend as new boot-time context
(e.g. vGPU state) is introduced.

Introduce a GspBootContext struct that bundles the common boot
parameters into a single object, and refactor the GspHal trait to accept
&GspBootContext instead of individual arguments. The struct also exposes
a dev() helper with proper lifetime annotation so that HAL
implementations can extract the device reference without reborrowing
constraints.

Update both TU102 and GH100 HAL implementations to extract their
required parameters from the context struct, and simplify the call sites
in Gsp::boot() accordingly.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs           | 14 ++++++-
 drivers/gpu/nova-core/gsp.rs           | 22 +++++++++++
 drivers/gpu/nova-core/gsp/boot.rs      | 55 ++++++++++++--------------
 drivers/gpu/nova-core/gsp/hal.rs       | 23 +++--------
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 14 ++++---
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 31 ++++++---------
 6 files changed, 85 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index b3c91731db45..69569e218d9b 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -23,7 +23,8 @@
     fb::SysmemFlush,
     gsp::{
         self,
-        Gsp, //
+        Gsp,
+        GspBootContext, //
     },
     regs,
 };
@@ -323,7 +324,16 @@ pub(crate) fn new(
             // This member must be initialized last, so the `UnloadBundle` can never be dropped from
             // outside of the constructed `Gpu`, ensuring that the unload sequence is properly run
             // in case of failure.
-            unload_bundle: gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?,
+            unload_bundle: {
+                let ctx = GspBootContext {
+                    pdev,
+                    bar,
+                    chipset: spec.chipset,
+                    gsp_falcon,
+                    sec2_falcon,
+                };
+                gsp.boot(&ctx)?
+            },
             bar,
         })
     }
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 69175ca3315c..f0901bf86258 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -31,6 +31,13 @@
 };
 
 use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspFalcon,
+        sec2::Sec2 as Sec2Falcon,
+        Falcon, //
+    },
+    gpu::Chipset,
     gsp::cmdq::Cmdq,
     gsp::fw::{
         GspArgumentsPadded,
@@ -42,6 +49,21 @@
 pub(crate) const GSP_PAGE_SHIFT: usize = 12;
 pub(crate) const GSP_PAGE_SIZE: usize = 1 << GSP_PAGE_SHIFT;
 
+/// Common context for the GSP boot process.
+pub(crate) struct GspBootContext<'a> {
+    pub(crate) pdev: &'a pci::Device<device::Bound>,
+    pub(crate) bar: Bar0<'a>,
+    pub(crate) chipset: Chipset,
+    pub(crate) gsp_falcon: &'a Falcon<GspFalcon>,
+    pub(crate) sec2_falcon: &'a Falcon<Sec2Falcon>,
+}
+
+impl<'a> GspBootContext<'a> {
+    pub(crate) fn dev(&self) -> &'a device::Device<device::Bound> {
+        self.pdev.as_ref()
+    }
+}
+
 /// Number of GSP pages to use in a RM log buffer.
 const RM_LOG_BUFFER_NUM_PAGES: usize = 0x10;
 const LOG_BUFFER_SIZE: usize = RM_LOG_BUFFER_NUM_PAGES * GSP_PAGE_SIZE;
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 8afb62d689cb..6e170401d616 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -6,7 +6,6 @@
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
-    pci,
     prelude::*,
     time::Delta,
     types::ScopeGuard, //
@@ -24,7 +23,6 @@
         gsp::GspFirmware,
         FIRMWARE_VERSION, //
     },
-    gpu::Chipset,
     gsp::{
         cmdq::Cmdq,
         commands,
@@ -103,61 +101,58 @@ impl super::Gsp {
     /// [`Self::unload`]) returned.
     pub(crate) fn boot(
         self: Pin<&mut Self>,
-        pdev: &pci::Device<device::Bound>,
-        bar: Bar0<'_>,
-        chipset: Chipset,
-        gsp_falcon: &Falcon<Gsp>,
-        sec2_falcon: &Falcon<Sec2>,
+        ctx: &super::GspBootContext<'_>,
     ) -> Result<Option<super::UnloadBundle>> {
-        let dev = pdev.as_ref();
-        let hal = super::hal::gsp_hal(chipset);
+        let dev = ctx.dev();
+        let hal = super::hal::gsp_hal(ctx.chipset);
 
-        let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
+        let gsp_fw = KBox::pin_init(
+            GspFirmware::new(dev, ctx.chipset, FIRMWARE_VERSION),
+            GFP_KERNEL,
+        )?;
 
-        let fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
+        let fb_layout = FbLayout::new(ctx.chipset, ctx.bar, &gsp_fw)?;
         dev_dbg!(dev, "{:#x?}\n", fb_layout);
 
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
-        let unload_guard = hal.boot(
-            &self,
-            dev,
-            bar,
-            chipset,
-            &fb_layout,
-            &wpr_meta,
-            gsp_falcon,
-            sec2_falcon,
-        )?;
+        let unload_guard = hal.boot(&self, ctx, &fb_layout, &wpr_meta)?;
 
-        gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
+        ctx.gsp_falcon
+            .write_os_version(ctx.bar, gsp_fw.bootloader.app_version);
 
         // Poll for RISC-V to become active before continuing.
         read_poll_timeout(
-            || Ok(gsp_falcon.is_riscv_active(bar)),
+            || Ok(ctx.gsp_falcon.is_riscv_active(ctx.bar)),
             |val: &bool| *val,
             Delta::from_millis(10),
             Delta::from_secs(5),
         )?;
 
-        dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
+        dev_dbg!(
+            ctx.dev(),
+            "RISC-V active? {}\n",
+            ctx.gsp_falcon.is_riscv_active(ctx.bar),
+        );
 
         self.cmdq
-            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev, chipset))?;
+            .send_command_no_wait(ctx.bar, commands::SetSystemInfo::new(ctx.pdev, ctx.chipset))?;
         self.cmdq
-            .send_command_no_wait(bar, commands::SetRegistry::new())?;
+            .send_command_no_wait(ctx.bar, commands::SetRegistry::new())?;
 
-        hal.post_boot(&self, dev, bar, &gsp_fw, gsp_falcon, sec2_falcon)?;
+        hal.post_boot(&self, ctx, &gsp_fw)?;
 
         // Wait until GSP is fully initialized.
         commands::wait_gsp_init_done(&self.cmdq)?;
 
         // Obtain and display basic GPU information.
-        let info = self.cmdq.send_command(bar, commands::GetGspStaticInfo)?;
+        let info = self
+            .cmdq
+            .send_command(ctx.bar, commands::GetGspStaticInfo)?;
         match info.gpu_name() {
-            Ok(name) => dev_info!(pdev, "GPU name: {}\n", name),
-            Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
+            Ok(name) => dev_info!(ctx.pdev, "GPU name: {}\n", name),
+            Err(e) => dev_warn!(ctx.pdev, "GPU name unavailable: {:?}\n", e),
         }
 
         Ok(unload_guard.dismiss())
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 04f004856c60..51a277fe97bb 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -4,11 +4,10 @@
 mod gh100;
 mod tu102;
 
-use kernel::prelude::*;
-
 use kernel::{
     device,
-    dma::Coherent, //
+    dma::Coherent,
+    prelude::*, //
 };
 
 use crate::{
@@ -27,6 +26,7 @@
     gsp::{
         boot::BootUnloadGuard,
         Gsp,
+        GspBootContext,
         GspFwWprMeta, //
     },
 };
@@ -53,32 +53,19 @@ pub(super) trait GspHal: Send {
     ///
     /// Upon success, returns a guard that runs the GSP unload sequence if GSP boot does not
     /// complete.
-    #[allow(clippy::too_many_arguments)]
     fn boot<'a>(
         &self,
         gsp: &'a Gsp,
-        dev: &'a device::Device<device::Bound>,
-        bar: Bar0<'a>,
-        chipset: Chipset,
+        ctx: &GspBootContext<'a>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-        gsp_falcon: &'a Falcon<GspEngine>,
-        sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>>;
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
     /// This method is called by the GSP boot code after the GSP is confirmed to be running, and
     /// after the initialization commands have been pushed onto its queue.
-    fn post_boot(
-        &self,
-        _gsp: &Gsp,
-        _dev: &device::Device<device::Bound>,
-        _bar: Bar0<'_>,
-        _gsp_fw: &GspFirmware,
-        _gsp_falcon: &Falcon<GspEngine>,
-        _sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    fn post_boot(&self, _gsp: &Gsp, _ctx: &GspBootContext<'_>, _gsp_fw: &GspFirmware) -> Result {
         Ok(())
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 98f5ce197d13..c9fdc8cacedc 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -26,7 +26,6 @@
         FmcBootArgs,
         Fsp, //
     },
-    gpu::Chipset,
     gsp::{
         boot::BootUnloadGuard,
         hal::{
@@ -34,6 +33,7 @@
             UnloadBundle, //
         },
         Gsp,
+        GspBootContext,
         GspFwWprMeta, //
     },
 };
@@ -152,14 +152,16 @@ impl GspHal for Gh100 {
     fn boot<'a>(
         &self,
         gsp: &'a Gsp,
-        dev: &'a device::Device<device::Bound>,
-        bar: Bar0<'a>,
-        chipset: Chipset,
+        ctx: &GspBootContext<'a>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-        gsp_falcon: &'a Falcon<GspEngine>,
-        sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
+        let dev = ctx.dev();
+        let bar = ctx.bar;
+        let chipset = ctx.chipset;
+        let gsp_falcon = ctx.gsp_falcon;
+        let sec2_falcon = ctx.sec2_falcon;
+
         let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
 
         let unload_bundle = crate::gsp::UnloadBundle(
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index 2f6301af7113..5b4325f16930 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -42,6 +42,7 @@
             GspSequencerParams, //
         },
         Gsp,
+        GspBootContext,
         GspFwWprMeta, //
     },
     regs,
@@ -258,14 +259,16 @@ impl GspHal for Tu102 {
     fn boot<'a>(
         &self,
         gsp: &'a Gsp,
-        dev: &'a device::Device<device::Bound>,
-        bar: Bar0<'a>,
-        chipset: Chipset,
+        ctx: &GspBootContext<'a>,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-        gsp_falcon: &'a Falcon<GspEngine>,
-        sec2_falcon: &'a Falcon<Sec2>,
     ) -> Result<BootUnloadGuard<'a>> {
+        let dev = ctx.dev();
+        let bar = ctx.bar;
+        let chipset = ctx.chipset;
+        let gsp_falcon = ctx.gsp_falcon;
+        let sec2_falcon = ctx.sec2_falcon;
+
         let bios = Vbios::new(dev, bar)?;
 
         // Try and prepare the unload bundle.
@@ -321,23 +324,15 @@ fn boot<'a>(
         Ok(unload_guard)
     }
 
-    fn post_boot(
-        &self,
-        gsp: &Gsp,
-        dev: &device::Device<device::Bound>,
-        bar: Bar0<'_>,
-        gsp_fw: &GspFirmware,
-        gsp_falcon: &Falcon<GspEngine>,
-        sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    fn post_boot(&self, gsp: &Gsp, ctx: &GspBootContext<'_>, gsp_fw: &GspFirmware) -> Result {
         // Create and run the GSP sequencer.
         let seq_params = GspSequencerParams {
             bootloader_app_version: gsp_fw.bootloader.app_version,
             libos_dma_handle: gsp.libos.dma_handle(),
-            gsp_falcon,
-            sec2_falcon,
-            dev,
-            bar,
+            gsp_falcon: ctx.gsp_falcon,
+            sec2_falcon: ctx.sec2_falcon,
+            dev: ctx.dev(),
+            bar: ctx.bar,
         };
         GspSequencer::run(&gsp.cmdq, seq_params)?;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/9] gpu: nova-core: add vGPU preludes
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
                   ` (5 preceding siblings ...)
  2026-06-04 11:43 ` [PATCH 6/9] gpu: nova-core: consolidate GSP boot parameters into GspBootContext Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-04 11:43 ` [PATCH 8/9] gpu: nova-core: set RMSetSriovMode when NVIDIA vGPU is enabled Zhi Wang
  2026-06-04 11:43 ` [PATCH] gpu: nova-core: reserve a larger GSP WPR2 heap when " Zhi Wang
  8 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

The driver needs to detect vGPU capability before firmware loading so
that the GSP boot flow can be adjusted accordingly.

vGPU capability is determined by two sources: the PCI SR-IOV totalvfs
count (whether the device advertises VFs) and the FSP PRC (Product
Reconfiguration Control) knob (whether vGPU mode is actively enabled
on this boot cycle). Both must agree for vGPU to proceed.

Introduce VgpuManager to encapsulate vGPU state detection and tracking.
On creation it queries totalvfs via sriov_get_totalvfs(); during GSP
boot the PRC knob is read from FSP to refine the initial estimate.
Extend GspBootContext with vgpu_requested and total_vfs fields to carry
this state across the boot sequence. The FSP PRC read is performed
inside the GH100 HAL boot method, where the FSP falcon is already
available.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/fsp.rs           |  1 -
 drivers/gpu/nova-core/gpu.rs           | 24 +++++++++++--
 drivers/gpu/nova-core/gsp.rs           |  5 +++
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  8 ++++-
 drivers/gpu/nova-core/nova_core.rs     |  1 +
 drivers/gpu/nova-core/vgpu.rs          | 47 ++++++++++++++++++++++++++
 6 files changed, 82 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/nova-core/vgpu.rs

diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
index ce11efeba37e..c775e12c5451 100644
--- a/drivers/gpu/nova-core/fsp.rs
+++ b/drivers/gpu/nova-core/fsp.rs
@@ -329,7 +329,6 @@ impl Fsp {
     /// Queries FSP's Management Partition for the active vGPU mode knob value.
     /// Returns [`VgpuMode::Enabled`] if vGPU support is active on this GPU,
     /// [`VgpuMode::Disabled`] otherwise.
-    #[expect(dead_code)]
     pub(crate) fn read_vgpu_mode(
         &mut self,
         dev: &device::Device<device::Bound>,
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 69569e218d9b..6d8e60dd5292 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -9,7 +9,11 @@
     io::Io,
     num::Bounded,
     pci,
-    prelude::*, //
+    prelude::*,
+    sync::{
+        new_mutex,
+        Mutex, //
+    },
 };
 
 use crate::{
@@ -27,6 +31,7 @@
         GspBootContext, //
     },
     regs,
+    vgpu::VgpuManager,
 };
 
 mod hal;
@@ -180,6 +185,13 @@ pub(crate) enum Architecture with TryFrom<Bounded<u32, 6>> {
     }
 }
 
+impl Architecture {
+    /// Returns true for architectures that support vGPU (RTX PRO 6000 Blackwell Server Edition).
+    pub(crate) const fn supports_vgpu(&self) -> bool {
+        matches!(self, Self::BlackwellGB20x)
+    }
+}
+
 #[derive(Clone, Copy)]
 pub(crate) struct Revision {
     major: Bounded<u8, 4>,
@@ -278,9 +290,12 @@ pub(crate) struct Gpu<'gpu> {
     gsp_falcon: Falcon<GspFalcon>,
     /// SEC2 falcon instance, used for GSP boot up and cleanup.
     sec2_falcon: Falcon<Sec2Falcon>,
-    /// GSP runtime data. Temporarily an empty placeholder.
+    /// GSP runtime data.
     #[pin]
     gsp: Gsp,
+    /// vGPU state (SR-IOV / FSP PRC), behind Mutex for FFI concurrency.
+    #[pin]
+    vgpu: Mutex<VgpuManager>,
     /// GSP unload firmware bundle, if any.
     unload_bundle: Option<gsp::UnloadBundle>,
 }
@@ -319,18 +334,23 @@ pub(crate) fn new(
 
             sec2_falcon: Falcon::new(pdev.as_ref(), spec.chipset)?,
 
+            vgpu <- new_mutex!(VgpuManager::new(pdev, spec.chipset.arch())?, "vgpu_manager"),
+
             gsp <- Gsp::new(pdev),
 
             // This member must be initialized last, so the `UnloadBundle` can never be dropped from
             // outside of the constructed `Gpu`, ensuring that the unload sequence is properly run
             // in case of failure.
             unload_bundle: {
+                let mgr = vgpu.lock();
                 let ctx = GspBootContext {
                     pdev,
                     bar,
                     chipset: spec.chipset,
                     gsp_falcon,
                     sec2_falcon,
+                    vgpu_requested: core::cell::Cell::new(mgr.vgpu_requested),
+                    total_vfs: mgr.total_vfs,
                 };
                 gsp.boot(&ctx)?
             },
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index f0901bf86258..94cd4a784b79 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -3,6 +3,8 @@
 mod boot;
 mod hal;
 
+use core::cell::Cell;
+
 use kernel::{
     debugfs,
     device,
@@ -56,6 +58,9 @@ pub(crate) struct GspBootContext<'a> {
     pub(crate) chipset: Chipset,
     pub(crate) gsp_falcon: &'a Falcon<GspFalcon>,
     pub(crate) sec2_falcon: &'a Falcon<Sec2Falcon>,
+    pub(crate) vgpu_requested: Cell<bool>,
+    #[expect(dead_code)]
+    pub(crate) total_vfs: u16,
 }
 
 impl<'a> GspBootContext<'a> {
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index c9fdc8cacedc..e133a92fb67f 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -24,7 +24,8 @@
     },
     fsp::{
         FmcBootArgs,
-        Fsp, //
+        Fsp,
+        VgpuMode, //
     },
     gsp::{
         boot::BootUnloadGuard,
@@ -174,6 +175,11 @@ fn boot<'a>(
 
         let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
 
+        let vgpu_mode = fsp.read_vgpu_mode(dev, bar)?;
+        dev_dbg!(dev, "vGPU mode: {:?}\n", vgpu_mode);
+        ctx.vgpu_requested
+            .set(ctx.vgpu_requested.get() && vgpu_mode == VgpuMode::Enabled);
+
         let args = FmcBootArgs::new(
             dev,
             chipset,
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 9f0199f7b38c..6f55a9242027 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -26,6 +26,7 @@
 mod regs;
 mod sbuffer;
 mod vbios;
+mod vgpu;
 
 pub(crate) const MODULE_NAME: &core::ffi::CStr = <LocalModule as kernel::ModuleMetadata>::NAME;
 
diff --git a/drivers/gpu/nova-core/vgpu.rs b/drivers/gpu/nova-core/vgpu.rs
new file mode 100644
index 000000000000..c8f1b74037c7
--- /dev/null
+++ b/drivers/gpu/nova-core/vgpu.rs
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::{
+    device,
+    pci,
+    prelude::*, //
+};
+
+use crate::gpu::Architecture;
+
+/// vGPU manager.
+///
+/// On creation, performs platform detection to determine whether vGPU is
+/// requested (PRC knob + totalvfs for Blackwell). The `vgpu_requested`
+/// flag may be further refined during boot (e.g. FSP PRC knob read).
+pub(crate) struct VgpuManager {
+    pub(crate) vgpu_requested: bool,
+    pub(crate) vgpu_enabled: bool,
+    pub(crate) total_vfs: u16,
+}
+
+impl VgpuManager {
+    pub(crate) fn new(
+        pdev: &pci::Device<device::Bound>,
+        arch: Architecture,
+    ) -> Result<VgpuManager> {
+        let total_vfs: u16 = if arch.supports_vgpu() {
+            pdev.sriov_get_totalvfs()
+                .ok()
+                .and_then(|n| n.try_into().ok())
+                .unwrap_or(0)
+        } else {
+            0
+        };
+
+        Ok(VgpuManager {
+            vgpu_requested: total_vfs > 0,
+            vgpu_enabled: false,
+            total_vfs,
+        })
+    }
+
+    #[expect(dead_code)]
+    pub(crate) fn set_vgpu_enabled(&mut self, enabled: bool) {
+        self.vgpu_enabled = enabled;
+    }
+}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/9] gpu: nova-core: set RMSetSriovMode when NVIDIA vGPU is enabled
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
                   ` (6 preceding siblings ...)
  2026-06-04 11:43 ` [PATCH 7/9] gpu: nova-core: add vGPU preludes Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  2026-06-04 11:43 ` [PATCH] gpu: nova-core: reserve a larger GSP WPR2 heap when " Zhi Wang
  8 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

The registry object "RMSetSriovMode" is required to be set when vGPU is
enabled.

Convert SetRegistry to use KVec<RegistryEntry> for dynamic construction,
allowing entries to be added conditionally at runtime.

Set "RMSetSriovMode" to 1 when nova-core is loading the GSP firmware and
initialize the GSP registry objects, if vGPU is enabled.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs     |  2 +-
 drivers/gpu/nova-core/gsp/commands.rs | 93 ++++++++++++++++++---------
 drivers/gpu/nova-core/gsp/fw.rs       |  4 ++
 3 files changed, 67 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 6e170401d616..2981d02d15ad 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -139,7 +139,7 @@ pub(crate) fn boot(
         self.cmdq
             .send_command_no_wait(ctx.bar, commands::SetSystemInfo::new(ctx.pdev, ctx.chipset))?;
         self.cmdq
-            .send_command_no_wait(ctx.bar, commands::SetRegistry::new())?;
+            .send_command_no_wait(ctx.bar, commands::SetRegistry::new(ctx.vgpu_requested.get())?)?;
 
         hal.post_boot(&self, ctx, &gsp_fw)?;
 
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index f84de9f4f045..6171c60f6de0 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -29,6 +29,10 @@
         },
         fw::{
             self,
+            commands::{
+                PackedRegistryEntry,
+                PackedRegistryTable, //
+            },
             MsgFunction, //
         },
     },
@@ -65,38 +69,62 @@ struct RegistryEntry {
 }
 
 /// The `SetRegistry` command.
+///
+/// Registry entries are built dynamically at runtime based on the current
+/// configuration (e.g. whether vGPU is enabled).
 pub(crate) struct SetRegistry {
-    entries: [RegistryEntry; Self::NUM_ENTRIES],
+    entries: KVec<RegistryEntry>,
 }
 
 impl SetRegistry {
-    // For now we hard-code the registry entries. Future work will allow others to
-    // be added as module parameters.
-    const NUM_ENTRIES: usize = 3;
-
-    /// Creates a new `SetRegistry` command, using a set of hardcoded entries.
-    pub(crate) fn new() -> Self {
-        Self {
-            entries: [
-                // RMSecBusResetEnable - enables PCI secondary bus reset
-                RegistryEntry {
-                    key: "RMSecBusResetEnable",
-                    value: 1,
-                },
-                // RMForcePcieConfigSave - forces GSP-RM to preserve PCI configuration registers on
-                // any PCI reset.
-                RegistryEntry {
-                    key: "RMForcePcieConfigSave",
-                    value: 1,
-                },
-                // RMDevidCheckIgnore - allows GSP-RM to boot even if the PCI dev ID is not found
-                // in the internal product name database.
+    /// Creates a new `SetRegistry` command.
+    ///
+    /// The base set of registry entries is always included. Additional entries
+    /// are appended dynamically based on runtime conditions (e.g. vGPU).
+    pub(crate) fn new(vgpu_requested: bool) -> Result<Self> {
+        let mut entries = KVec::new();
+
+        // RMSecBusResetEnable - enables PCI secondary bus reset
+        entries.push(
+            RegistryEntry {
+                key: "RMSecBusResetEnable",
+                value: 1,
+            },
+            GFP_KERNEL,
+        )?;
+
+        // RMForcePcieConfigSave - forces GSP-RM to preserve PCI configuration registers on
+        // any PCI reset.
+        entries.push(
+            RegistryEntry {
+                key: "RMForcePcieConfigSave",
+                value: 1,
+            },
+            GFP_KERNEL,
+        )?;
+
+        // RMDevidCheckIgnore - allows GSP-RM to boot even if the PCI dev ID is not found
+        // in the internal product name database.
+        entries.push(
+            RegistryEntry {
+                key: "RMDevidCheckIgnore",
+                value: 1,
+            },
+            GFP_KERNEL,
+        )?;
+
+        // RMSetSriovMode - required when vGPU is enabled.
+        if vgpu_requested {
+            entries.push(
                 RegistryEntry {
-                    key: "RMDevidCheckIgnore",
+                    key: "RMSetSriovMode",
                     value: 1,
                 },
-            ],
+                GFP_KERNEL,
+            )?;
         }
+
+        Ok(Self { entries })
     }
 }
 
@@ -107,28 +135,31 @@ impl CommandToGsp for SetRegistry {
     type InitError = Infallible;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
-        Self::Command::init(Self::NUM_ENTRIES as u32, self.variable_payload_len() as u32)
+        PackedRegistryTable::init(
+            self.entries.len() as u32,
+            self.variable_payload_len() as u32,
+        )
     }
 
     fn variable_payload_len(&self) -> usize {
         let mut key_size = 0;
-        for i in 0..Self::NUM_ENTRIES {
-            key_size += self.entries[i].key.len() + 1; // +1 for NULL terminator
+        for entry in self.entries.iter() {
+            key_size += entry.key.len() + 1; // +1 for NULL terminator
         }
-        Self::NUM_ENTRIES * size_of::<fw::commands::PackedRegistryEntry>() + key_size
+        self.entries.len() * size_of::<fw::commands::PackedRegistryEntry>() + key_size
     }
 
     fn init_variable_payload(
         &self,
         dst: &mut SBufferIter<core::array::IntoIter<&mut [u8], 2>>,
     ) -> Result {
-        let string_data_start_offset = size_of::<Self::Command>()
-            + Self::NUM_ENTRIES * size_of::<fw::commands::PackedRegistryEntry>();
+        let string_data_start_offset = size_of::<PackedRegistryTable>()
+            + self.entries.len() * size_of::<PackedRegistryEntry>();
 
         // Array for string data.
         let mut string_data = KVec::new();
 
-        for entry in self.entries.iter().take(Self::NUM_ENTRIES) {
+        for entry in self.entries.iter() {
             dst.write_all(
                 fw::commands::PackedRegistryEntry::new(
                     (string_data_start_offset + string_data.len()) as u32,
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 4db0cfa4dc4d..14424a2c2d83 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -299,6 +299,7 @@ pub(crate) enum MsgFunction {
     OsErrorLog = bindings::NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
     PostEvent = bindings::NV_VGPU_MSG_EVENT_POST_EVENT,
     RcTriggered = bindings::NV_VGPU_MSG_EVENT_RC_TRIGGERED,
+    GpuacctPerfmonUtilSamples = bindings::NV_VGPU_MSG_EVENT_GPUACCT_PERFMON_UTIL_SAMPLES,
     UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
 }
 
@@ -348,6 +349,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
             bindings::NV_VGPU_MSG_EVENT_OS_ERROR_LOG => Ok(MsgFunction::OsErrorLog),
             bindings::NV_VGPU_MSG_EVENT_POST_EVENT => Ok(MsgFunction::PostEvent),
             bindings::NV_VGPU_MSG_EVENT_RC_TRIGGERED => Ok(MsgFunction::RcTriggered),
+            bindings::NV_VGPU_MSG_EVENT_GPUACCT_PERFMON_UTIL_SAMPLES => {
+                Ok(MsgFunction::GpuacctPerfmonUtilSamples)
+            }
             bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT => Ok(MsgFunction::UcodeLibOsPrint),
             _ => Err(EINVAL),
         }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] gpu: nova-core: reserve a larger GSP WPR2 heap when vGPU is enabled
  2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
                   ` (7 preceding siblings ...)
  2026-06-04 11:43 ` [PATCH 8/9] gpu: nova-core: set RMSetSriovMode when NVIDIA vGPU is enabled Zhi Wang
@ 2026-06-04 11:43 ` Zhi Wang
  8 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2026-06-04 11:43 UTC (permalink / raw)
  To: dakr, airlied, simona
  Cc: ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, jhubbard, acourbot, ecourtney,
	joelagnelf, apopple, cjia, smitra, kjaju, alkumar, ankita,
	aniketa, kwankhede, targupta, nova-gpu, linux-kernel, zhiwang,
	Zhi Wang

GSP-RM allocates independent RM sub-heaps for each VF partition inside
the WPR2 region. The default baremetal heap sizing is far too small for
vGPU instance, causing GSP-RM to hit out-of-memory failures during VF
initialization.

The host driver must reserve the correct heap size before GSP boots,
because the WPR2 region is locked down by the hardware after boot and
cannot be resized at runtime. The firmware determines the per-VF carve
from the gspFwHeapVfPartitionCount field in the WPR2 metadata header.

Select a pre-calibrated static heap size based on total_vfs (174 MB for
1 VM, 581 MB for 2-32 VFs, 1370 MB for 48 VFs) and set
vf_partition_count accordingly. Extend FbLayout::new() and
GspBootContext to propagate total_vfs through the boot path.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs       | 17 +++++++++++++----
 drivers/gpu/nova-core/gsp.rs      |  2 +-
 drivers/gpu/nova-core/gsp/boot.rs | 14 +++++++++++---
 drivers/gpu/nova-core/gsp/fw.rs   | 12 ++++++++++++
 4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index 725e428154cf..fb4e6aa9fda4 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -171,7 +171,13 @@ pub(crate) struct FbLayout {
 
 impl FbLayout {
     /// Computes the FB layout for `chipset` required to run the `gsp_fw` GSP firmware.
-    pub(crate) fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Result<Self> {
+    pub(crate) fn new(
+        chipset: Chipset,
+        bar: Bar0<'_>,
+        gsp_fw: &GspFirmware,
+        vgpu_requested: bool,
+        total_vfs: u16,
+    ) -> Result<Self> {
         let hal = hal::fb_hal(chipset);
 
         let fb = {
@@ -236,8 +242,11 @@ pub(crate) fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Resu
 
         let wpr2_heap = {
             const WPR2_HEAP_DOWN_ALIGN: Alignment = Alignment::new::<SZ_1M>();
-            let wpr2_heap_size =
-                gsp::LibosParams::from_chipset(chipset).wpr_heap_size(chipset, fb.end)?;
+            let wpr2_heap_size = if vgpu_requested {
+                gsp::vgpu_fw_heap_size(u32::from(total_vfs))
+            } else {
+                gsp::LibosParams::from_chipset(chipset).wpr_heap_size(chipset, fb.end)?
+            };
             let wpr2_heap_addr = (elf.start - wpr2_heap_size).align_down(WPR2_HEAP_DOWN_ALIGN);
 
             FbRange(wpr2_heap_addr..(elf.start).align_down(WPR2_HEAP_DOWN_ALIGN))
@@ -265,7 +274,7 @@ pub(crate) fn new(chipset: Chipset, bar: Bar0<'_>, gsp_fw: &GspFirmware) -> Resu
             wpr2_heap,
             wpr2,
             heap,
-            vf_partition_count: 0,
+            vf_partition_count: if vgpu_requested { total_vfs as u8 } else { 0 },
             pmu_reserved_size: hal.pmu_reserved_size(),
         })
     }
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 94cd4a784b79..921b92c9eb92 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -27,6 +27,7 @@
 mod sequencer;
 
 pub(crate) use fw::{
+    vgpu_fw_heap_size,
     GspFmcBootParams,
     GspFwWprMeta,
     LibosParams, //
@@ -59,7 +60,6 @@ pub(crate) struct GspBootContext<'a> {
     pub(crate) gsp_falcon: &'a Falcon<GspFalcon>,
     pub(crate) sec2_falcon: &'a Falcon<Sec2Falcon>,
     pub(crate) vgpu_requested: Cell<bool>,
-    #[expect(dead_code)]
     pub(crate) total_vfs: u16,
 }
 
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 2981d02d15ad..7c1f3f962fbe 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -111,7 +111,13 @@ pub(crate) fn boot(
             GFP_KERNEL,
         )?;
 
-        let fb_layout = FbLayout::new(ctx.chipset, ctx.bar, &gsp_fw)?;
+        let fb_layout = FbLayout::new(
+            ctx.chipset,
+            ctx.bar,
+            &gsp_fw,
+            ctx.vgpu_requested.get(),
+            ctx.total_vfs,
+        )?;
         dev_dbg!(dev, "{:#x?}\n", fb_layout);
 
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
@@ -138,8 +144,10 @@ pub(crate) fn boot(
 
         self.cmdq
             .send_command_no_wait(ctx.bar, commands::SetSystemInfo::new(ctx.pdev, ctx.chipset))?;
-        self.cmdq
-            .send_command_no_wait(ctx.bar, commands::SetRegistry::new(ctx.vgpu_requested.get())?)?;
+        self.cmdq.send_command_no_wait(
+            ctx.bar,
+            commands::SetRegistry::new(ctx.vgpu_requested.get())?,
+        )?;
 
         hal.post_boot(&self, ctx, &gsp_fw)?;
 
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 14424a2c2d83..2f3cbc5d5114 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -101,6 +101,18 @@ pub(in crate::gsp) fn advance_cpu_write_ptr(qs: &Coherent<GspMem>, count: u32) {
 pub(crate) const GSP_MSG_QUEUE_ELEMENT_SIZE_MAX: usize =
     num::u32_as_usize(bindings::GSP_MSG_QUEUE_ELEMENT_SIZE_MAX);
 
+const GSP_FW_HEAP_SIZE_VGPU_1VM: u64 = 174 * u64::SZ_1M;
+const GSP_FW_HEAP_SIZE_VGPU_DEFAULT: u64 = 581 * u64::SZ_1M;
+const GSP_FW_HEAP_SIZE_VGPU_48VMS: u64 = 1370 * u64::SZ_1M;
+
+pub(crate) fn vgpu_fw_heap_size(total_vfs: u32) -> u64 {
+    match total_vfs {
+        1 => GSP_FW_HEAP_SIZE_VGPU_1VM,
+        2..=32 => GSP_FW_HEAP_SIZE_VGPU_DEFAULT,
+        _ => GSP_FW_HEAP_SIZE_VGPU_48VMS,
+    }
+}
+
 /// Empty type to group methods related to heap parameters for running the GSP firmware.
 enum GspFwHeapParams {}
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/9] gpu: nova-core: factor out common FSP message header
  2026-06-04 11:43 ` [PATCH 2/9] gpu: nova-core: factor out common FSP message header Zhi Wang
@ 2026-06-05 13:21   ` Alexandre Courbot
  0 siblings, 0 replies; 15+ messages in thread
From: Alexandre Courbot @ 2026-06-05 13:21 UTC (permalink / raw)
  To: Zhi Wang
  Cc: dakr, airlied, simona, ojeda, alex.gaynor, boqun.feng, gary,
	bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, jhubbard,
	ecourtney, joelagnelf, apopple, cjia, smitra, kjaju, alkumar,
	ankita, aniketa, kwankhede, targupta, nova-gpu, linux-kernel,
	zhiwang

On Thu Jun 4, 2026 at 8:43 PM JST, Zhi Wang wrote:
> Extract common MCTP + NVDM headers into FspMessageHeader, rename
> FspMessage to FspCotMessage, and update FspResponse to use the shared
> header. This prepares for adding new FSP message types.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>

This patch is a good follow-up to the FSP series irrespective of vGPU
support, so I'd like to merge it rapidly. Leaving it a couple of days
for reviews but I think it makes sense to take it early and avoid
rebasing churn.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller
  2026-06-04 11:43 ` [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller Zhi Wang
@ 2026-06-05 13:25   ` Alexandre Courbot
  2026-06-05 16:04     ` Zhi Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Alexandre Courbot @ 2026-06-05 13:25 UTC (permalink / raw)
  To: Zhi Wang
  Cc: dakr, airlied, simona, ojeda, alex.gaynor, boqun.feng, gary,
	bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, jhubbard,
	ecourtney, joelagnelf, apopple, cjia, smitra, kjaju, alkumar,
	ankita, aniketa, kwankhede, targupta, nova-gpu, linux-kernel,
	zhiwang

On Thu Jun 4, 2026 at 8:43 PM JST, Zhi Wang wrote:
> Change send_sync_fsp() to return the raw response buffer after
> validating the common MCTP/NVDM headers and error code. This allows
> callers to perform protocol-specific parsing on the response payload,
> which is needed for the upcoming PRC protocol support.
>
> For the existing COT caller, the response buffer is unused.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>

Same as the previous patch, I intend to merge this quickly as it is a
local follow-up to the FSP support.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper
  2026-06-04 11:43 ` [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper Zhi Wang
@ 2026-06-05 14:08   ` Alexandre Courbot
  0 siblings, 0 replies; 15+ messages in thread
From: Alexandre Courbot @ 2026-06-05 14:08 UTC (permalink / raw)
  To: Zhi Wang
  Cc: dakr, airlied, simona, ojeda, alex.gaynor, boqun.feng, gary,
	bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, jhubbard,
	ecourtney, joelagnelf, apopple, cjia, smitra, kjaju, alkumar,
	ankita, aniketa, kwankhede, targupta, nova-gpu, linux-kernel,
	zhiwang, Bjorn Helgaas, linux-pci

On Thu Jun 4, 2026 at 8:43 PM JST, Zhi Wang wrote:
> Add a wrapper for the `pci_sriov_get_totalvfs()` helper, allowing drivers
> to query the number of total SR-IOV virtual functions a PCI device
> supports.
>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  rust/kernel/pci.rs | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
> index 5071cae6543f..d04e5f6841f2 100644
> --- a/rust/kernel/pci.rs
> +++ b/rust/kernel/pci.rs
> @@ -450,6 +450,18 @@ pub fn pci_class(&self) -> Class {
>          // SAFETY: `self.as_raw` is a valid pointer to a `struct pci_dev`.
>          Class::from_raw(unsafe { (*self.as_raw()).class })
>      }
> +
> +    /// Returns total number of VFs, or `Err(ENODEV)` if none available.
> +    pub fn sriov_get_totalvfs(&self) -> Result<i32> {

I mentioned it in a previous review [1], but I think there is an
opportunity to improve the C API (and by transition the Rust one) by
making it return a `u16`, which is the type the number of total VFs is
ultimately stored in anyway.

IIRC there is also no need to even update any caller, just changing the
prototype of `pci_sriov_get_totalvfs` would be enough as callers
ultimately compare the return value against u16s. So if anything it
would make the C API more sound.

[1] https://lore.kernel.org/all/DETDILPA1GFY.27WND0TEC5352@nvidia.com/

> +        // SAFETY: `self.as_raw()` is a valid pointer to a `struct pci_dev`.
> +        let vfs = unsafe { bindings::pci_sriov_get_totalvfs(self.as_raw()) };
> +
> +        if vfs == 0 {
> +            return Err(ENODEV);
> +        }

Having 0 VFs does not necessarily look like an error - it's quite a
valid answer to the question "how many VFs do we have?" which this
method tries to answer. In patch 7 you even do `unwrap_or(0)` on the
result of this method. So unless there is a good reason to treat this as
an error, maybe we can just return a `u16` here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller
  2026-06-05 13:25   ` Alexandre Courbot
@ 2026-06-05 16:04     ` Zhi Wang
  2026-06-09  6:07       ` Alexandre Courbot
  0 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2026-06-05 16:04 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: dakr, airlied, simona, ojeda, alex.gaynor, boqun.feng, gary,
	bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, jhubbard,
	ecourtney, joelagnelf, apopple, cjia, smitra, kjaju, alkumar,
	ankita, aniketa, kwankhede, targupta, nova-gpu, linux-kernel,
	zhiwang

On Fri, 05 Jun 2026 22:25:27 +0900
"Alexandre Courbot" <acourbot@nvidia.com> wrote:

> On Thu Jun 4, 2026 at 8:43 PM JST, Zhi Wang wrote:
> > Change send_sync_fsp() to return the raw response buffer after
> > validating the common MCTP/NVDM headers and error code. This allows
> > callers to perform protocol-specific parsing on the response
> > payload, which is needed for the upcoming PRC protocol support.
> >
> > For the existing COT caller, the response buffer is unused.
> >
> > Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> 
> Same as the previous patch, I intend to merge this quickly as it is a
> local follow-up to the FSP support.

Thanks. I will drop patch 2 and 3 in the next re-spin if you take them. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller
  2026-06-05 16:04     ` Zhi Wang
@ 2026-06-09  6:07       ` Alexandre Courbot
  0 siblings, 0 replies; 15+ messages in thread
From: Alexandre Courbot @ 2026-06-09  6:07 UTC (permalink / raw)
  To: Zhi Wang
  Cc: dakr, airlied, simona, ojeda, alex.gaynor, boqun.feng, gary,
	bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, jhubbard,
	ecourtney, joelagnelf, apopple, cjia, smitra, kjaju, alkumar,
	ankita, aniketa, kwankhede, targupta, nova-gpu, linux-kernel,
	zhiwang

On Sat Jun 6, 2026 at 1:04 AM JST, Zhi Wang wrote:
> On Fri, 05 Jun 2026 22:25:27 +0900
> "Alexandre Courbot" <acourbot@nvidia.com> wrote:
>
>> On Thu Jun 4, 2026 at 8:43 PM JST, Zhi Wang wrote:
>> > Change send_sync_fsp() to return the raw response buffer after
>> > validating the common MCTP/NVDM headers and error code. This allows
>> > callers to perform protocol-specific parsing on the response
>> > payload, which is needed for the upcoming PRC protocol support.
>> >
>> > For the existing COT caller, the response buffer is unused.
>> >
>> > Signed-off-by: Zhi Wang <zhiw@nvidia.com>
>> 
>> Same as the previous patch, I intend to merge this quickly as it is a
>> local follow-up to the FSP support.
>
> Thanks. I will drop patch 2 and 3 in the next re-spin if you take them. 

Patches 2 and 3 have been pushed to `drm-rust-next`!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-09  6:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 11:43 [PATCH 0/9] gpu: nova-core: boot GSP with vGPU enabled on Zhi Wang
2026-06-04 11:43 ` [PATCH 1/9] rust: pci: expose sriov_get_totalvfs() helper Zhi Wang
2026-06-05 14:08   ` Alexandre Courbot
2026-06-04 11:43 ` [PATCH 2/9] gpu: nova-core: factor out common FSP message header Zhi Wang
2026-06-05 13:21   ` Alexandre Courbot
2026-06-04 11:43 ` [PATCH 3/9] gpu: nova-core: return FSP response buffer to caller Zhi Wang
2026-06-05 13:25   ` Alexandre Courbot
2026-06-05 16:04     ` Zhi Wang
2026-06-09  6:07       ` Alexandre Courbot
2026-06-04 11:43 ` [PATCH 4/9] gpu: nova-core: read vGPU mode from FSP via PRC protocol Zhi Wang
2026-06-04 11:43 ` [PATCH 5/9] gpu: nova-core: add FSP and PRC protocol documentation Zhi Wang
2026-06-04 11:43 ` [PATCH 6/9] gpu: nova-core: consolidate GSP boot parameters into GspBootContext Zhi Wang
2026-06-04 11:43 ` [PATCH 7/9] gpu: nova-core: add vGPU preludes Zhi Wang
2026-06-04 11:43 ` [PATCH 8/9] gpu: nova-core: set RMSetSriovMode when NVIDIA vGPU is enabled Zhi Wang
2026-06-04 11:43 ` [PATCH] gpu: nova-core: reserve a larger GSP WPR2 heap when " Zhi Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.