* [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding
@ 2026-04-21 6:16 Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 6:16 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux, Alexandre Courbot
Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:
- Probing requires setting up the WPR2 region, which cannot be done if
there is already one in place. Hence the current requirement to reset
the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
which the kernel might recycle.
On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.
This series does the necessary to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is trivial
and will be added alongside the Blackwell series [1] if this can be
merged first.
The first patch adds a `warn_on_err` utility macro to the kernel crate
as it is useful to warn on failures in the driver unbind path, but I can
remove it if it is not deemed useful.
This series applies cleanly on `master` as of today.
[1] https://lore.kernel.org/all/20260411024953.473149-1-jhubbard@nvidia.com/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com
---
Alexandre Courbot (5):
rust: add warn_on_err macro
gpu: nova-core: use warn_on_err macro
gpu: nova-core: do not import firmware commands into GSP command module
gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
drivers/gpu/nova-core/firmware/booter.rs | 1 -
drivers/gpu/nova-core/firmware/fwsec.rs | 1 -
drivers/gpu/nova-core/gpu.rs | 21 ++++--
drivers/gpu/nova-core/gsp/boot.rs | 83 +++++++++++++++++++++++
drivers/gpu/nova-core/gsp/commands.rs | 60 ++++++++++++----
drivers/gpu/nova-core/gsp/fw.rs | 4 ++
drivers/gpu/nova-core/gsp/fw/commands.rs | 23 +++++++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++
drivers/gpu/nova-core/regs.rs | 5 ++
rust/kernel/bug.rs | 10 +++
10 files changed, 197 insertions(+), 19 deletions(-)
---
base-commit: b4e07588e743c989499ca24d49e752c074924a9a
change-id: 20251216-nova-unload-4029b3b76950
Best regards,
--
Alexandre Courbot <acourbot@nvidia.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/5] rust: add warn_on_err macro
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
@ 2026-04-21 6:16 ` Alexandre Courbot
2026-04-21 7:07 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 2/5] gpu: nova-core: use " Alexandre Courbot
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 6:16 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux, Alexandre Courbot
While we already have the `warn_on` macro, a common usage pattern in
Rust is to check whether a `Result` is an error. Add a helper macro that
allows this.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
rust/kernel/bug.rs | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/rust/kernel/bug.rs b/rust/kernel/bug.rs
index ed943960f851..2fefc0aeef81 100644
--- a/rust/kernel/bug.rs
+++ b/rust/kernel/bug.rs
@@ -130,3 +130,13 @@ macro_rules! warn_on {
cond
}};
}
+
+/// Report a warning if `res` is an error and return it unmodified.
+#[macro_export]
+macro_rules! warn_on_err {
+ ($res:expr) => {{
+ let res = $res;
+ let _ = $crate::warn_on!(res.is_err());
+ res
+ }};
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/5] gpu: nova-core: use warn_on_err macro
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
@ 2026-04-21 6:16 ` Alexandre Courbot
2026-04-21 7:09 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 6:16 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux, Alexandre Courbot
Use the warn_on_err macro in the unbind sequence, and refactor it to
early-exit on the failure path as we will need to use the bar for other
purposes.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gpu.rs | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 0f6fe9a1b955..1701c2600538 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -279,10 +279,10 @@ pub(crate) fn new<'a>(
///
/// Note: This method must only be called from `Driver::unbind`.
pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
- kernel::warn_on!(self
- .bar
- .access(dev)
- .inspect(|bar| self.sysmem_flush.unregister(bar))
- .is_err());
+ let Ok(bar) = kernel::warn_on_err!(self.bar.access(dev)) else {
+ return;
+ };
+
+ self.sysmem_flush.unregister(bar);
}
}
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 2/5] gpu: nova-core: use " Alexandre Courbot
@ 2026-04-21 6:16 ` Alexandre Courbot
2026-04-21 8:58 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
4 siblings, 1 reply; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 6:16 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux, Alexandre Courbot
Importing all the firmware commands like we did is a bit confusing, as
the layer of a command type (fw or GSP) cannot be inferred from looking
at its name alone. Furthermore it makes it impossible to create commands
that have the same name as their firmware command.
Thus, stop importing all commands and refer to them from the `fw` module
instead.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gsp/commands.rs | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index c89c7b57a751..c80df421702c 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -27,7 +27,7 @@
NoReply, //
},
fw::{
- commands::*,
+ self,
MsgFunction, //
},
},
@@ -48,12 +48,12 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>) -> Self {
impl<'a> CommandToGsp for SetSystemInfo<'a> {
const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
- type Command = GspSetSystemInfo;
+ type Command = fw::commands::GspSetSystemInfo;
type Reply = NoReply;
type InitError = Error;
fn init(&self) -> impl Init<Self::Command, Self::InitError> {
- GspSetSystemInfo::init(self.pdev)
+ Self::Command::init(self.pdev)
}
}
@@ -100,12 +100,12 @@ pub(crate) fn new() -> Self {
impl CommandToGsp for SetRegistry {
const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
- type Command = PackedRegistryTable;
+ type Command = fw::commands::PackedRegistryTable;
type Reply = NoReply;
type InitError = Infallible;
fn init(&self) -> impl Init<Self::Command, Self::InitError> {
- PackedRegistryTable::init(Self::NUM_ENTRIES as u32, self.variable_payload_len() as u32)
+ Self::Command::init(Self::NUM_ENTRIES as u32, self.variable_payload_len() as u32)
}
fn variable_payload_len(&self) -> usize {
@@ -113,22 +113,22 @@ fn variable_payload_len(&self) -> usize {
for i in 0..Self::NUM_ENTRIES {
key_size += self.entries[i].key.len() + 1; // +1 for NULL terminator
}
- Self::NUM_ENTRIES * size_of::<PackedRegistryEntry>() + key_size
+ Self::NUM_ENTRIES * size_of::<fw::commands::PackedRegistryEntry>() + key_size
}
fn init_variable_payload(
&self,
dst: &mut SBufferIter<core::array::IntoIter<&mut [u8], 2>>,
) -> Result {
- let string_data_start_offset =
- size_of::<PackedRegistryTable>() + Self::NUM_ENTRIES * size_of::<PackedRegistryEntry>();
+ let string_data_start_offset = size_of::<Self::Command>()
+ + Self::NUM_ENTRIES * size_of::<fw::commands::PackedRegistryEntry>();
// Array for string data.
let mut string_data = KVec::new();
for entry in self.entries.iter().take(Self::NUM_ENTRIES) {
dst.write_all(
- PackedRegistryEntry::new(
+ fw::commands::PackedRegistryEntry::new(
(string_data_start_offset + string_data.len()) as u32,
entry.value,
)
@@ -180,12 +180,12 @@ pub(crate) fn wait_gsp_init_done(cmdq: &Cmdq) -> Result {
impl CommandToGsp for GetGspStaticInfo {
const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
- type Command = GspStaticConfigInfo;
+ type Command = fw::commands::GspStaticConfigInfo;
type Reply = GetGspStaticInfoReply;
type InitError = Infallible;
fn init(&self) -> impl Init<Self::Command, Self::InitError> {
- GspStaticConfigInfo::init_zeroed()
+ Self::Command::init_zeroed()
}
}
@@ -196,7 +196,7 @@ pub(crate) struct GetGspStaticInfoReply {
impl MessageFromGsp for GetGspStaticInfoReply {
const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
- type Message = GspStaticConfigInfo;
+ type Message = fw::commands::GspStaticConfigInfo;
type InitError = Infallible;
fn read(
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
` (2 preceding siblings ...)
2026-04-21 6:16 ` [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
@ 2026-04-21 6:16 ` Alexandre Courbot
2026-04-21 9:42 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
4 siblings, 1 reply; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 6:16 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux, Alexandre Courbot
Currently, the GSP is left running after the driver is unbound. This is
not great for several reasons, notably that it can still access shared
memory areas that the kernel will now reclaim (especially problematic on
setups without an IOMMU).
Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
unbinding. This stops the GSP and lets us proceed with the rest of the
unbind sequence in the next patch.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gpu.rs | 5 +++
drivers/gpu/nova-core/gsp/boot.rs | 40 +++++++++++++++++++++++
drivers/gpu/nova-core/gsp/commands.rs | 36 ++++++++++++++++++++
drivers/gpu/nova-core/gsp/fw.rs | 4 +++
drivers/gpu/nova-core/gsp/fw/commands.rs | 23 +++++++++++++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
6 files changed, 116 insertions(+)
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 1701c2600538..8f2ae9e8a519 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -277,12 +277,17 @@ pub(crate) fn new<'a>(
/// Called when the corresponding [`Device`](device::Device) is unbound.
///
+ /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
+ /// memory page.
+ ///
/// Note: This method must only be called from `Driver::unbind`.
pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
let Ok(bar) = kernel::warn_on_err!(self.bar.access(dev)) else {
return;
};
+ let _ = kernel::warn_on_err!(self.gsp.unload(dev, bar, &self.gsp_falcon));
+
self.sysmem_flush.unregister(bar);
}
}
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 18f356c9178e..3f4e99b2497b 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -33,6 +33,7 @@
},
gpu::Chipset,
gsp::{
+ cmdq::Cmdq,
commands,
sequencer::{
GspSequencer,
@@ -237,4 +238,43 @@ pub(crate) fn boot(
Ok(())
}
+
+ /// Shut down the GSP and wait until it is offline.
+ fn shutdown_gsp(
+ cmdq: &Cmdq,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<Gsp>,
+ suspend: bool,
+ ) -> Result<()> {
+ // Send command to shutdown GSP and wait for response.
+ cmdq.send_command(bar, commands::UnloadingGuestDriver::new(suspend))?;
+
+ // Wait until GSP signals it is suspended.
+ const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
+ read_poll_timeout(
+ || Ok(gsp_falcon.read_mailbox0(bar)),
+ |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
+ Delta::from_millis(10),
+ Delta::from_secs(5),
+ )
+ .map(|_| ())
+ }
+
+ /// Attempts to unload the GSP firmware.
+ ///
+ /// This stops all activity on the GSP.
+ pub(crate) fn unload(
+ &self,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<Gsp>,
+ ) -> Result {
+ // Shut down the GSP.
+
+ Self::shutdown_gsp(&self.cmdq, bar, gsp_falcon, false)
+ .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
+ dev_dbg!(dev, "GSP shut down\n");
+
+ Ok(())
+ }
}
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index c80df421702c..fb94460c451e 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -237,3 +237,39 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
cmdq.send_command(bar, GetGspStaticInfo)
}
+
+pub(crate) struct UnloadingGuestDriver {
+ suspend: bool,
+}
+
+impl UnloadingGuestDriver {
+ pub(crate) fn new(suspend: bool) -> Self {
+ Self { suspend }
+ }
+}
+
+impl CommandToGsp for UnloadingGuestDriver {
+ const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+ type Command = fw::commands::UnloadingGuestDriver;
+ type Reply = UnloadingGuestDriverReply;
+ type InitError = Infallible;
+
+ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
+ fw::commands::UnloadingGuestDriver::new(self.suspend)
+ }
+}
+
+pub(crate) struct UnloadingGuestDriverReply;
+
+impl MessageFromGsp for UnloadingGuestDriverReply {
+ const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+ type InitError = Infallible;
+ type Message = ();
+
+ fn read(
+ _msg: &Self::Message,
+ _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
+ ) -> Result<Self, Self::InitError> {
+ Ok(UnloadingGuestDriverReply)
+ }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 0c8a74f0e8ac..59b4c4883185 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -278,6 +278,7 @@ pub(crate) enum MsgFunction {
Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
+ UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
// Event codes
GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
@@ -322,6 +323,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
Ok(MsgFunction::SetGuestSystemInfo)
}
bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
+ bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
+ Ok(MsgFunction::UnloadingGuestDriver)
+ }
// Event codes
bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index db46276430be..71c8690c9322 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -129,3 +129,26 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
// SAFETY: This struct only contains integer types for which all bit patterns
// are valid.
unsafe impl FromBytes for GspStaticConfigInfo {}
+
+/// Payload of the `UnloadingGuestDriver` command and message.
+#[repr(transparent)]
+#[derive(Clone, Copy, Debug, Zeroable)]
+pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
+
+impl UnloadingGuestDriver {
+ pub(crate) fn new(suspend: bool) -> Self {
+ Self(bindings::rpc_unloading_guest_driver_v1F_07 {
+ bInPMTransition: u8::from(suspend),
+ bGc6Entering: 0,
+ newLevel: if suspend { 3 } else { 0 },
+ ..Zeroable::zeroed()
+ })
+ }
+}
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for UnloadingGuestDriver {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for UnloadingGuestDriver {}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 334e8be5fde8..5d8e4c0ad904 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -880,6 +880,14 @@ fn default() -> Self {
}
}
#[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct rpc_unloading_guest_driver_v1F_07 {
+ pub bInPMTransition: u8_,
+ pub bGc6Entering: u8_,
+ pub __bindgen_padding_0: [u8; 2usize],
+ pub newLevel: u32_,
+}
+#[repr(C)]
#[derive(Debug, Default, MaybeZeroable)]
pub struct rpc_run_cpu_sequencer_v17_00 {
pub bufferSizeDWord: u32_,
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
` (3 preceding siblings ...)
2026-04-21 6:16 ` [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-04-21 6:16 ` Alexandre Courbot
4 siblings, 0 replies; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 6:16 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux, Alexandre Courbot
When probing the driver, the FWSEC-FRTS firmware creates a WPR2 secure
memory region to store the GSP firmware, and the Booter Loader loads and
starts that firmware into the GSP, making it run in RISC-V mode.
These operations need to be reverted upon unloading, particularly the
WPR2 secure region creation, as its presence prevents the driver from
subsequently probing.
Thus, load and run the Booter Unloader and FWSEC-SB firmwares at unbind
time to put the GPU into a state where it can be probed again.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/firmware/booter.rs | 1 -
drivers/gpu/nova-core/firmware/fwsec.rs | 1 -
drivers/gpu/nova-core/gpu.rs | 8 +++++-
drivers/gpu/nova-core/gsp/boot.rs | 43 ++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/regs.rs | 5 ++++
5 files changed, 55 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index de2a4536b532..771b018ba580 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -280,7 +280,6 @@ fn new_booter(data: &[u8]) -> Result<Self> {
#[derive(Copy, Clone, Debug, PartialEq)]
pub(crate) enum BooterKind {
Loader,
- #[expect(unused)]
Unloader,
}
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 8810cb49db67..4108f28cd338 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -144,7 +144,6 @@ pub(crate) enum FwsecCommand {
/// image into it.
Frts { frts_addr: u64, frts_size: u64 },
/// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
- #[expect(dead_code)]
Sb,
}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 8f2ae9e8a519..37d0e4587ed3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -286,7 +286,13 @@ pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
return;
};
- let _ = kernel::warn_on_err!(self.gsp.unload(dev, bar, &self.gsp_falcon));
+ let _ = kernel::warn_on_err!(self.gsp.unload(
+ dev,
+ bar,
+ self.spec.chipset,
+ &self.gsp_falcon,
+ &self.sec2_falcon,
+ ));
self.sysmem_flush.unregister(bar);
}
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 3f4e99b2497b..e00cfebe5d11 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -267,7 +267,9 @@ pub(crate) fn unload(
&self,
dev: &device::Device<device::Bound>,
bar: &Bar0,
+ chipset: Chipset,
gsp_falcon: &Falcon<Gsp>,
+ sec2_falcon: &Falcon<Sec2>,
) -> Result {
// Shut down the GSP.
@@ -275,6 +277,47 @@ pub(crate) fn unload(
.inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
dev_dbg!(dev, "GSP shut down\n");
+ // Run FWSEC-SB to reset the GSP falcon to its pre-libos state.
+
+ let bios = Vbios::new(dev, bar)?;
+ let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, &bios, FwsecCommand::Sb)?;
+
+ if chipset.needs_fwsec_bootloader() {
+ let fwsec_sb_bl = FwsecFirmwareWithBl::new(fwsec_sb, dev, chipset)?;
+ // Load and run the bootloader, which will load FWSEC-SB and run it.
+ fwsec_sb_bl.run(dev, gsp_falcon, bar)?;
+ } else {
+ // Load and run FWSEC-SB directly.
+ fwsec_sb.run(dev, gsp_falcon, bar)?;
+ }
+ dev_dbg!(dev, "FWSEC SB completed\n");
+
+ // Remove WPR2 region if set.
+
+ let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+ if wpr2_hi.is_wpr2_set() {
+ let booter_unloader = BooterFirmware::new(
+ dev,
+ BooterKind::Unloader,
+ chipset,
+ FIRMWARE_VERSION,
+ sec2_falcon,
+ bar,
+ )?;
+
+ sec2_falcon.reset(bar)?;
+ sec2_falcon.load(dev, bar, &booter_unloader)?;
+ let _ = sec2_falcon.boot(bar, Some(0xff), Some(0xff))?;
+
+ let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+ if wpr2_hi.is_wpr2_set() {
+ dev_err!(dev, "WPR2 region still set after Booter Unloader ran\n");
+ return Err(EBUSY);
+ }
+ }
+
+ dev_info!(dev, "successfully unloaded\n");
+
Ok(())
}
}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 2f171a4ff9ba..21ff5d15f648 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -176,6 +176,11 @@ impl NV_PFB_PRI_MMU_WPR2_ADDR_HI {
pub(crate) fn higher_bound(self) -> u64 {
u64::from(self.hi_val()) << 12
}
+
+ /// Returns whether the WPR2 region is currently set.
+ pub(crate) fn is_wpr2_set(self) -> bool {
+ self.hi_val() != 0
+ }
}
// PGSP
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/5] rust: add warn_on_err macro
2026-04-21 6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
@ 2026-04-21 7:07 ` Eliot Courtney
0 siblings, 0 replies; 11+ messages in thread
From: Eliot Courtney @ 2026-04-21 7:07 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Bjorn Helgaas, Krzysztof Wilczyński,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux
On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
> While we already have the `warn_on` macro, a common usage pattern in
> Rust is to check whether a `Result` is an error. Add a helper macro that
> allows this.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> rust/kernel/bug.rs | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/rust/kernel/bug.rs b/rust/kernel/bug.rs
> index ed943960f851..2fefc0aeef81 100644
> --- a/rust/kernel/bug.rs
> +++ b/rust/kernel/bug.rs
> @@ -130,3 +130,13 @@ macro_rules! warn_on {
> cond
> }};
> }
> +
> +/// Report a warning if `res` is an error and return it unmodified.
nit: "it" -> "the [`Result`] `res`" since it somewhat reads like it's
returning the error
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
> +#[macro_export]
> +macro_rules! warn_on_err {
> + ($res:expr) => {{
> + let res = $res;
> + let _ = $crate::warn_on!(res.is_err());
> + res
> + }};
> +}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/5] gpu: nova-core: use warn_on_err macro
2026-04-21 6:16 ` [PATCH v2 2/5] gpu: nova-core: use " Alexandre Courbot
@ 2026-04-21 7:09 ` Eliot Courtney
0 siblings, 0 replies; 11+ messages in thread
From: Eliot Courtney @ 2026-04-21 7:09 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Bjorn Helgaas, Krzysztof Wilczyński,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux
On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
> Use the warn_on_err macro in the unbind sequence, and refactor it to
> early-exit on the failure path as we will need to use the bar for other
> purposes.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module
2026-04-21 6:16 ` [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
@ 2026-04-21 8:58 ` Eliot Courtney
0 siblings, 0 replies; 11+ messages in thread
From: Eliot Courtney @ 2026-04-21 8:58 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Bjorn Helgaas, Krzysztof Wilczyński,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux
On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
> Importing all the firmware commands like we did is a bit confusing, as
> the layer of a command type (fw or GSP) cannot be inferred from looking
> at its name alone. Furthermore it makes it impossible to create commands
> that have the same name as their firmware command.
>
> Thus, stop importing all commands and refer to them from the `fw` module
> instead.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
2026-04-21 6:16 ` [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-04-21 9:42 ` Eliot Courtney
2026-04-21 14:27 ` Alexandre Courbot
0 siblings, 1 reply; 11+ messages in thread
From: Eliot Courtney @ 2026-04-21 9:42 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Bjorn Helgaas, Krzysztof Wilczyński,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Trevor Gross, Boqun Feng
Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
Eliot Courtney, nouveau, dri-devel, linux-kernel, linux-pci,
rust-for-linux
On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
> Currently, the GSP is left running after the driver is unbound. This is
> not great for several reasons, notably that it can still access shared
> memory areas that the kernel will now reclaim (especially problematic on
> setups without an IOMMU).
>
> Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
> unbinding. This stops the GSP and lets us proceed with the rest of the
> unbind sequence in the next patch.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> drivers/gpu/nova-core/gpu.rs | 5 +++
> drivers/gpu/nova-core/gsp/boot.rs | 40 +++++++++++++++++++++++
> drivers/gpu/nova-core/gsp/commands.rs | 36 ++++++++++++++++++++
> drivers/gpu/nova-core/gsp/fw.rs | 4 +++
> drivers/gpu/nova-core/gsp/fw/commands.rs | 23 +++++++++++++
> drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
> 6 files changed, 116 insertions(+)
>
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 1701c2600538..8f2ae9e8a519 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -277,12 +277,17 @@ pub(crate) fn new<'a>(
>
> /// Called when the corresponding [`Device`](device::Device) is unbound.
> ///
> + /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
> + /// memory page.
> + ///
> /// Note: This method must only be called from `Driver::unbind`.
> pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
> let Ok(bar) = kernel::warn_on_err!(self.bar.access(dev)) else {
> return;
> };
>
> + let _ = kernel::warn_on_err!(self.gsp.unload(dev, bar, &self.gsp_falcon));
> +
If I remember correctly, at least on blackwell, doing the full unloading
procedure here actually resets the sysmem flush register, so you get a
spurious warning. In my local branch I actually swapped the order of
this and unregister to get rid of it (not sure if this is correct though).
My sysmem flush patch that skips printing the warning if the value is 0
would also fix this, if we care. Have you noticed this happening too?
> self.sysmem_flush.unregister(bar);
> }
> }
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index 18f356c9178e..3f4e99b2497b 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -33,6 +33,7 @@
> },
> gpu::Chipset,
> gsp::{
> + cmdq::Cmdq,
> commands,
> sequencer::{
> GspSequencer,
> @@ -237,4 +238,43 @@ pub(crate) fn boot(
>
> Ok(())
> }
> +
> + /// Shut down the GSP and wait until it is offline.
> + fn shutdown_gsp(
> + cmdq: &Cmdq,
> + bar: &Bar0,
> + gsp_falcon: &Falcon<Gsp>,
> + suspend: bool,
> + ) -> Result<()> {
> + // Send command to shutdown GSP and wait for response.
> + cmdq.send_command(bar, commands::UnloadingGuestDriver::new(suspend))?;
> +
> + // Wait until GSP signals it is suspended.
> + const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
If this can change based on firmware, should it be taken in via
bindings? I also noticed in openrm 595, this is waited on by checking the
bit rather than by strict equality (see _kgspIsProcessorSuspended). So
it may be more defensive to check the bit rather than strict equality
(even though that is correct for 570 according to openrm code).
> + read_poll_timeout(
> + || Ok(gsp_falcon.read_mailbox0(bar)),
> + |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
> + Delta::from_millis(10),
> + Delta::from_secs(5),
> + )
> + .map(|_| ())
> + }
> +
> + /// Attempts to unload the GSP firmware.
> + ///
> + /// This stops all activity on the GSP.
> + pub(crate) fn unload(
> + &self,
> + dev: &device::Device<device::Bound>,
> + bar: &Bar0,
> + gsp_falcon: &Falcon<Gsp>,
> + ) -> Result {
> + // Shut down the GSP.
> +
> + Self::shutdown_gsp(&self.cmdq, bar, gsp_falcon, false)
> + .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
It looks like "suspend" is only ever false here? Will this be used
later? If we want to keep this, it may be nice to use a 2 discriminant
enum so we don't have misc boolean parameters hanging around.
nit: dev_err! should have \n?
> + dev_dbg!(dev, "GSP shut down\n");
> +
> + Ok(())
> + }
> }
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index c80df421702c..fb94460c451e 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -237,3 +237,39 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
> pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
> cmdq.send_command(bar, GetGspStaticInfo)
> }
> +
> +pub(crate) struct UnloadingGuestDriver {
> + suspend: bool,
> +}
This feels like it only makes sense to call from within the gsp module,
so I wonder if it can be pub(super) (prolly a few others in this file
could be too, ofc not relevant for this series).
nit: Should this have doc comment?
> +
> +impl UnloadingGuestDriver {
> + pub(crate) fn new(suspend: bool) -> Self {
> + Self { suspend }
> + }
> +}
> +
> +impl CommandToGsp for UnloadingGuestDriver {
> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
> + type Command = fw::commands::UnloadingGuestDriver;
> + type Reply = UnloadingGuestDriverReply;
> + type InitError = Infallible;
> +
> + fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> + fw::commands::UnloadingGuestDriver::new(self.suspend)
> + }
> +}
> +
> +pub(crate) struct UnloadingGuestDriverReply;
> +
> +impl MessageFromGsp for UnloadingGuestDriverReply {
> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
> + type InitError = Infallible;
> + type Message = ();
> +
> + fn read(
> + _msg: &Self::Message,
> + _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
> + ) -> Result<Self, Self::InitError> {
> + Ok(UnloadingGuestDriverReply)
> + }
> +}
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c8a74f0e8ac..59b4c4883185 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -278,6 +278,7 @@ pub(crate) enum MsgFunction {
> Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
> SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
> SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>
> // Event codes
> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> @@ -322,6 +323,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
> Ok(MsgFunction::SetGuestSystemInfo)
> }
> bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
> + Ok(MsgFunction::UnloadingGuestDriver)
> + }
>
> // Event codes
> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
> index db46276430be..71c8690c9322 100644
> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
> @@ -129,3 +129,26 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
> // SAFETY: This struct only contains integer types for which all bit patterns
> // are valid.
> unsafe impl FromBytes for GspStaticConfigInfo {}
> +
> +/// Payload of the `UnloadingGuestDriver` command and message.
> +#[repr(transparent)]
> +#[derive(Clone, Copy, Debug, Zeroable)]
> +pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
> +
> +impl UnloadingGuestDriver {
> + pub(crate) fn new(suspend: bool) -> Self {
> + Self(bindings::rpc_unloading_guest_driver_v1F_07 {
> + bInPMTransition: u8::from(suspend),
> + bGc6Entering: 0,
> + newLevel: if suspend { 3 } else { 0 },
Why '3'? Is there a binding that it makes sense to use for this?
> + ..Zeroable::zeroed()
> + })
> + }
> +}
> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for UnloadingGuestDriver {}
> +
> +// SAFETY: This struct only contains integer types for which all bit patterns
> +// are valid.
> +unsafe impl FromBytes for UnloadingGuestDriver {}
> diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> index 334e8be5fde8..5d8e4c0ad904 100644
> --- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> @@ -880,6 +880,14 @@ fn default() -> Self {
> }
> }
> #[repr(C)]
> +#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
> +pub struct rpc_unloading_guest_driver_v1F_07 {
> + pub bInPMTransition: u8_,
> + pub bGc6Entering: u8_,
> + pub __bindgen_padding_0: [u8; 2usize],
> + pub newLevel: u32_,
> +}
> +#[repr(C)]
> #[derive(Debug, Default, MaybeZeroable)]
> pub struct rpc_run_cpu_sequencer_v17_00 {
> pub bufferSizeDWord: u32_,
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
2026-04-21 9:42 ` Eliot Courtney
@ 2026-04-21 14:27 ` Alexandre Courbot
0 siblings, 0 replies; 11+ messages in thread
From: Alexandre Courbot @ 2026-04-21 14:27 UTC (permalink / raw)
To: Eliot Courtney
Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross, Boqun Feng, John Hubbard, Alistair Popple,
Joel Fernandes, Timur Tabi, nouveau, dri-devel, linux-kernel,
linux-pci, rust-for-linux
On Tue Apr 21, 2026 at 6:42 PM JST, Eliot Courtney wrote:
> On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
>> Currently, the GSP is left running after the driver is unbound. This is
>> not great for several reasons, notably that it can still access shared
>> memory areas that the kernel will now reclaim (especially problematic on
>> setups without an IOMMU).
>>
>> Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
>> unbinding. This stops the GSP and lets us proceed with the rest of the
>> unbind sequence in the next patch.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> drivers/gpu/nova-core/gpu.rs | 5 +++
>> drivers/gpu/nova-core/gsp/boot.rs | 40 +++++++++++++++++++++++
>> drivers/gpu/nova-core/gsp/commands.rs | 36 ++++++++++++++++++++
>> drivers/gpu/nova-core/gsp/fw.rs | 4 +++
>> drivers/gpu/nova-core/gsp/fw/commands.rs | 23 +++++++++++++
>> drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
>> 6 files changed, 116 insertions(+)
>>
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 1701c2600538..8f2ae9e8a519 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -277,12 +277,17 @@ pub(crate) fn new<'a>(
>>
>> /// Called when the corresponding [`Device`](device::Device) is unbound.
>> ///
>> + /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
>> + /// memory page.
>> + ///
>> /// Note: This method must only be called from `Driver::unbind`.
>> pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
>> let Ok(bar) = kernel::warn_on_err!(self.bar.access(dev)) else {
>> return;
>> };
>>
>> + let _ = kernel::warn_on_err!(self.gsp.unload(dev, bar, &self.gsp_falcon));
>> +
>
> If I remember correctly, at least on blackwell, doing the full unloading
> procedure here actually resets the sysmem flush register, so you get a
> spurious warning. In my local branch I actually swapped the order of
> this and unregister to get rid of it (not sure if this is correct though).
> My sysmem flush patch that skips printing the warning if the value is 0
> would also fix this, if we care. Have you noticed this happening too?
I haven't - this patch works fine and without any warning on my
Blackwell card. But that deserves further investigation, so let's
revisit once we add the Blackwell series and your own patch, as this
series only supports Turing/Ampere for now.
>
>> self.sysmem_flush.unregister(bar);
>> }
>> }
>> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
>> index 18f356c9178e..3f4e99b2497b 100644
>> --- a/drivers/gpu/nova-core/gsp/boot.rs
>> +++ b/drivers/gpu/nova-core/gsp/boot.rs
>> @@ -33,6 +33,7 @@
>> },
>> gpu::Chipset,
>> gsp::{
>> + cmdq::Cmdq,
>> commands,
>> sequencer::{
>> GspSequencer,
>> @@ -237,4 +238,43 @@ pub(crate) fn boot(
>>
>> Ok(())
>> }
>> +
>> + /// Shut down the GSP and wait until it is offline.
>> + fn shutdown_gsp(
>> + cmdq: &Cmdq,
>> + bar: &Bar0,
>> + gsp_falcon: &Falcon<Gsp>,
>> + suspend: bool,
>> + ) -> Result<()> {
>> + // Send command to shutdown GSP and wait for response.
>> + cmdq.send_command(bar, commands::UnloadingGuestDriver::new(suspend))?;
>> +
>> + // Wait until GSP signals it is suspended.
>> + const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
>
> If this can change based on firmware, should it be taken in via
> bindings? I also noticed in openrm 595, this is waited on by checking the
> bit rather than by strict equality (see _kgspIsProcessorSuspended). So
> it may be more defensive to check the bit rather than strict equality
> (even though that is correct for 570 according to openrm code).
Indeed, in 570.144 the code is actually
return (mailbox == 0x80000000);
... but I checked against `main` and it has been changed to what you
said, so testing the bit is probably better indeed.
The value is also a file-local constant, so not something we can get
through bindings unfortunately. :/ But I suspect we can rely on it being
stable.
>
>> + read_poll_timeout(
>> + || Ok(gsp_falcon.read_mailbox0(bar)),
>> + |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
>> + Delta::from_millis(10),
>> + Delta::from_secs(5),
>> + )
>> + .map(|_| ())
>> + }
>> +
>> + /// Attempts to unload the GSP firmware.
>> + ///
>> + /// This stops all activity on the GSP.
>> + pub(crate) fn unload(
>> + &self,
>> + dev: &device::Device<device::Bound>,
>> + bar: &Bar0,
>> + gsp_falcon: &Falcon<Gsp>,
>> + ) -> Result {
>> + // Shut down the GSP.
>> +
>> + Self::shutdown_gsp(&self.cmdq, bar, gsp_falcon, false)
>> + .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
>
> It looks like "suspend" is only ever false here? Will this be used
> later? If we want to keep this, it may be nice to use a 2 discriminant
> enum so we don't have misc boolean parameters hanging around.
It is in prevision of suspend/resume support yes. Agreed about the enum.
>
> nit: dev_err! should have \n?
It should!
>
>> + dev_dbg!(dev, "GSP shut down\n");
>> +
>> + Ok(())
>> + }
>> }
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index c80df421702c..fb94460c451e 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -237,3 +237,39 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
>> pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
>> cmdq.send_command(bar, GetGspStaticInfo)
>> }
>> +
>> +pub(crate) struct UnloadingGuestDriver {
>> + suspend: bool,
>> +}
>
> This feels like it only makes sense to call from within the gsp module,
> so I wonder if it can be pub(super) (prolly a few others in this file
> could be too, ofc not relevant for this series).
I'll review that, we do want to limit visibility as much as possible.
>
> nit: Should this have doc comment?
Yep, I'll add that.
>
>> +
>> +impl UnloadingGuestDriver {
>> + pub(crate) fn new(suspend: bool) -> Self {
>> + Self { suspend }
>> + }
>> +}
>> +
>> +impl CommandToGsp for UnloadingGuestDriver {
>> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
>> + type Command = fw::commands::UnloadingGuestDriver;
>> + type Reply = UnloadingGuestDriverReply;
>> + type InitError = Infallible;
>> +
>> + fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>> + fw::commands::UnloadingGuestDriver::new(self.suspend)
>> + }
>> +}
>> +
>> +pub(crate) struct UnloadingGuestDriverReply;
>> +
>> +impl MessageFromGsp for UnloadingGuestDriverReply {
>> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
>> + type InitError = Infallible;
>> + type Message = ();
>> +
>> + fn read(
>> + _msg: &Self::Message,
>> + _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
>> + ) -> Result<Self, Self::InitError> {
>> + Ok(UnloadingGuestDriverReply)
>> + }
>> +}
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 0c8a74f0e8ac..59b4c4883185 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -278,6 +278,7 @@ pub(crate) enum MsgFunction {
>> Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
>> SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
>> SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
>> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>>
>> // Event codes
>> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>> @@ -322,6 +323,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>> Ok(MsgFunction::SetGuestSystemInfo)
>> }
>> bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
>> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
>> + Ok(MsgFunction::UnloadingGuestDriver)
>> + }
>>
>> // Event codes
>> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
>> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
>> index db46276430be..71c8690c9322 100644
>> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
>> @@ -129,3 +129,26 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>> // SAFETY: This struct only contains integer types for which all bit patterns
>> // are valid.
>> unsafe impl FromBytes for GspStaticConfigInfo {}
>> +
>> +/// Payload of the `UnloadingGuestDriver` command and message.
>> +#[repr(transparent)]
>> +#[derive(Clone, Copy, Debug, Zeroable)]
>> +pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
>> +
>> +impl UnloadingGuestDriver {
>> + pub(crate) fn new(suspend: bool) -> Self {
>> + Self(bindings::rpc_unloading_guest_driver_v1F_07 {
>> + bInPMTransition: u8::from(suspend),
>> + bGc6Entering: 0,
>> + newLevel: if suspend { 3 } else { 0 },
>
> Why '3'? Is there a binding that it makes sense to use for this?
It's for suspend level 3 (suspend to RAM) if `suspend` is true, or
normal destructive unloading otherwise. OpenRM has a set of possible
values (`NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_*`) that directly
translate to the corresponding number, but at least they limit the
possible values to the valid set. I'll add an enum and the corresponding
bindings.
Thanks for the review!
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-21 14:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
2026-04-21 7:07 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 2/5] gpu: nova-core: use " Alexandre Courbot
2026-04-21 7:09 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
2026-04-21 8:58 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-04-21 9:42 ` Eliot Courtney
2026-04-21 14:27 ` Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox