From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Eliot Courtney" <ecourtney@nvidia.com>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Miguel Ojeda" <ojeda@kernel.org>, "Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <lossin@kernel.org>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Trevor Gross" <tmgross@umich.edu>,
"Boqun Feng" <boqun@kernel.org>,
"John Hubbard" <jhubbard@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Timur Tabi" <ttabi@nvidia.com>,
nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
rust-for-linux@vger.kernel.org
Subject: Re: [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
Date: Tue, 21 Apr 2026 23:27:33 +0900 [thread overview]
Message-ID: <DHYWJ32M02N4.2954OSJXFGD1G@nvidia.com> (raw)
In-Reply-To: <DHYQGLYYXEN6.11Q3FQGZMWCJ6@nvidia.com>
On Tue Apr 21, 2026 at 6:42 PM JST, Eliot Courtney wrote:
> On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
>> Currently, the GSP is left running after the driver is unbound. This is
>> not great for several reasons, notably that it can still access shared
>> memory areas that the kernel will now reclaim (especially problematic on
>> setups without an IOMMU).
>>
>> Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
>> unbinding. This stops the GSP and lets us proceed with the rest of the
>> unbind sequence in the next patch.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> drivers/gpu/nova-core/gpu.rs | 5 +++
>> drivers/gpu/nova-core/gsp/boot.rs | 40 +++++++++++++++++++++++
>> drivers/gpu/nova-core/gsp/commands.rs | 36 ++++++++++++++++++++
>> drivers/gpu/nova-core/gsp/fw.rs | 4 +++
>> drivers/gpu/nova-core/gsp/fw/commands.rs | 23 +++++++++++++
>> drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
>> 6 files changed, 116 insertions(+)
>>
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 1701c2600538..8f2ae9e8a519 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -277,12 +277,17 @@ pub(crate) fn new<'a>(
>>
>> /// Called when the corresponding [`Device`](device::Device) is unbound.
>> ///
>> + /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
>> + /// memory page.
>> + ///
>> /// Note: This method must only be called from `Driver::unbind`.
>> pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
>> let Ok(bar) = kernel::warn_on_err!(self.bar.access(dev)) else {
>> return;
>> };
>>
>> + let _ = kernel::warn_on_err!(self.gsp.unload(dev, bar, &self.gsp_falcon));
>> +
>
> If I remember correctly, at least on blackwell, doing the full unloading
> procedure here actually resets the sysmem flush register, so you get a
> spurious warning. In my local branch I actually swapped the order of
> this and unregister to get rid of it (not sure if this is correct though).
> My sysmem flush patch that skips printing the warning if the value is 0
> would also fix this, if we care. Have you noticed this happening too?
I haven't - this patch works fine and without any warning on my
Blackwell card. But that deserves further investigation, so let's
revisit once we add the Blackwell series and your own patch, as this
series only supports Turing/Ampere for now.
>
>> self.sysmem_flush.unregister(bar);
>> }
>> }
>> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
>> index 18f356c9178e..3f4e99b2497b 100644
>> --- a/drivers/gpu/nova-core/gsp/boot.rs
>> +++ b/drivers/gpu/nova-core/gsp/boot.rs
>> @@ -33,6 +33,7 @@
>> },
>> gpu::Chipset,
>> gsp::{
>> + cmdq::Cmdq,
>> commands,
>> sequencer::{
>> GspSequencer,
>> @@ -237,4 +238,43 @@ pub(crate) fn boot(
>>
>> Ok(())
>> }
>> +
>> + /// Shut down the GSP and wait until it is offline.
>> + fn shutdown_gsp(
>> + cmdq: &Cmdq,
>> + bar: &Bar0,
>> + gsp_falcon: &Falcon<Gsp>,
>> + suspend: bool,
>> + ) -> Result<()> {
>> + // Send command to shutdown GSP and wait for response.
>> + cmdq.send_command(bar, commands::UnloadingGuestDriver::new(suspend))?;
>> +
>> + // Wait until GSP signals it is suspended.
>> + const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
>
> If this can change based on firmware, should it be taken in via
> bindings? I also noticed in openrm 595, this is waited on by checking the
> bit rather than by strict equality (see _kgspIsProcessorSuspended). So
> it may be more defensive to check the bit rather than strict equality
> (even though that is correct for 570 according to openrm code).
Indeed, in 570.144 the code is actually
return (mailbox == 0x80000000);
... but I checked against `main` and it has been changed to what you
said, so testing the bit is probably better indeed.
The value is also a file-local constant, so not something we can get
through bindings unfortunately. :/ But I suspect we can rely on it being
stable.
>
>> + read_poll_timeout(
>> + || Ok(gsp_falcon.read_mailbox0(bar)),
>> + |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
>> + Delta::from_millis(10),
>> + Delta::from_secs(5),
>> + )
>> + .map(|_| ())
>> + }
>> +
>> + /// Attempts to unload the GSP firmware.
>> + ///
>> + /// This stops all activity on the GSP.
>> + pub(crate) fn unload(
>> + &self,
>> + dev: &device::Device<device::Bound>,
>> + bar: &Bar0,
>> + gsp_falcon: &Falcon<Gsp>,
>> + ) -> Result {
>> + // Shut down the GSP.
>> +
>> + Self::shutdown_gsp(&self.cmdq, bar, gsp_falcon, false)
>> + .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
>
> It looks like "suspend" is only ever false here? Will this be used
> later? If we want to keep this, it may be nice to use a 2 discriminant
> enum so we don't have misc boolean parameters hanging around.
It is in prevision of suspend/resume support yes. Agreed about the enum.
>
> nit: dev_err! should have \n?
It should!
>
>> + dev_dbg!(dev, "GSP shut down\n");
>> +
>> + Ok(())
>> + }
>> }
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index c80df421702c..fb94460c451e 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -237,3 +237,39 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
>> pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
>> cmdq.send_command(bar, GetGspStaticInfo)
>> }
>> +
>> +pub(crate) struct UnloadingGuestDriver {
>> + suspend: bool,
>> +}
>
> This feels like it only makes sense to call from within the gsp module,
> so I wonder if it can be pub(super) (prolly a few others in this file
> could be too, ofc not relevant for this series).
I'll review that, we do want to limit visibility as much as possible.
>
> nit: Should this have doc comment?
Yep, I'll add that.
>
>> +
>> +impl UnloadingGuestDriver {
>> + pub(crate) fn new(suspend: bool) -> Self {
>> + Self { suspend }
>> + }
>> +}
>> +
>> +impl CommandToGsp for UnloadingGuestDriver {
>> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
>> + type Command = fw::commands::UnloadingGuestDriver;
>> + type Reply = UnloadingGuestDriverReply;
>> + type InitError = Infallible;
>> +
>> + fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>> + fw::commands::UnloadingGuestDriver::new(self.suspend)
>> + }
>> +}
>> +
>> +pub(crate) struct UnloadingGuestDriverReply;
>> +
>> +impl MessageFromGsp for UnloadingGuestDriverReply {
>> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
>> + type InitError = Infallible;
>> + type Message = ();
>> +
>> + fn read(
>> + _msg: &Self::Message,
>> + _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
>> + ) -> Result<Self, Self::InitError> {
>> + Ok(UnloadingGuestDriverReply)
>> + }
>> +}
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 0c8a74f0e8ac..59b4c4883185 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -278,6 +278,7 @@ pub(crate) enum MsgFunction {
>> Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
>> SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
>> SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
>> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>>
>> // Event codes
>> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>> @@ -322,6 +323,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>> Ok(MsgFunction::SetGuestSystemInfo)
>> }
>> bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
>> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
>> + Ok(MsgFunction::UnloadingGuestDriver)
>> + }
>>
>> // Event codes
>> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
>> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
>> index db46276430be..71c8690c9322 100644
>> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
>> @@ -129,3 +129,26 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>> // SAFETY: This struct only contains integer types for which all bit patterns
>> // are valid.
>> unsafe impl FromBytes for GspStaticConfigInfo {}
>> +
>> +/// Payload of the `UnloadingGuestDriver` command and message.
>> +#[repr(transparent)]
>> +#[derive(Clone, Copy, Debug, Zeroable)]
>> +pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
>> +
>> +impl UnloadingGuestDriver {
>> + pub(crate) fn new(suspend: bool) -> Self {
>> + Self(bindings::rpc_unloading_guest_driver_v1F_07 {
>> + bInPMTransition: u8::from(suspend),
>> + bGc6Entering: 0,
>> + newLevel: if suspend { 3 } else { 0 },
>
> Why '3'? Is there a binding that it makes sense to use for this?
It's for suspend level 3 (suspend to RAM) if `suspend` is true, or
normal destructive unloading otherwise. OpenRM has a set of possible
values (`NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_*`) that directly
translate to the corresponding number, but at least they limit the
possible values to the valid set. I'll add an enum and the corresponding
bindings.
Thanks for the review!
next prev parent reply other threads:[~2026-04-21 14:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-04-21 6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
2026-04-21 7:07 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 2/5] gpu: nova-core: use " Alexandre Courbot
2026-04-21 7:09 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
2026-04-21 8:58 ` Eliot Courtney
2026-04-21 6:16 ` [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-04-21 9:42 ` Eliot Courtney
2026-04-21 14:27 ` Alexandre Courbot [this message]
2026-04-21 6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DHYWJ32M02N4.2954OSJXFGD1G@nvidia.com \
--to=acourbot@nvidia.com \
--cc=a.hindborg@kernel.org \
--cc=airlied@gmail.com \
--cc=aliceryhl@google.com \
--cc=apopple@nvidia.com \
--cc=bhelgaas@google.com \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun@kernel.org \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=ecourtney@nvidia.com \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=kwilczynski@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=tmgross@umich.edu \
--cc=ttabi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox