public inbox for dri-devel@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Eliot Courtney" <ecourtney@nvidia.com>
To: "Alexandre Courbot" <acourbot@nvidia.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Miguel Ojeda" <ojeda@kernel.org>, "Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Boqun Feng" <boqun@kernel.org>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Timur Tabi" <ttabi@nvidia.com>,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	<nouveau@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <rust-for-linux@vger.kernel.org>
Subject: Re: [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
Date: Tue, 21 Apr 2026 18:42:12 +0900	[thread overview]
Message-ID: <DHYQGLYYXEN6.11Q3FQGZMWCJ6@nvidia.com> (raw)
In-Reply-To: <20260421-nova-unload-v2-4-2fe54963af8b@nvidia.com>

On Tue Apr 21, 2026 at 3:16 PM JST, Alexandre Courbot wrote:
> Currently, the GSP is left running after the driver is unbound. This is
> not great for several reasons, notably that it can still access shared
> memory areas that the kernel will now reclaim (especially problematic on
> setups without an IOMMU).
>
> Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
> unbinding. This stops the GSP and lets us proceed with the rest of the
> unbind sequence in the next patch.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gpu.rs                      |  5 +++
>  drivers/gpu/nova-core/gsp/boot.rs                 | 40 +++++++++++++++++++++++
>  drivers/gpu/nova-core/gsp/commands.rs             | 36 ++++++++++++++++++++
>  drivers/gpu/nova-core/gsp/fw.rs                   |  4 +++
>  drivers/gpu/nova-core/gsp/fw/commands.rs          | 23 +++++++++++++
>  drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  8 +++++
>  6 files changed, 116 insertions(+)
>
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 1701c2600538..8f2ae9e8a519 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -277,12 +277,17 @@ pub(crate) fn new<'a>(
>  
>      /// Called when the corresponding [`Device`](device::Device) is unbound.
>      ///
> +    /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
> +    /// memory page.
> +    ///
>      /// Note: This method must only be called from `Driver::unbind`.
>      pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
>          let Ok(bar) = kernel::warn_on_err!(self.bar.access(dev)) else {
>              return;
>          };
>  
> +        let _ = kernel::warn_on_err!(self.gsp.unload(dev, bar, &self.gsp_falcon));
> +

If I remember correctly, at least on blackwell, doing the full unloading
procedure here actually resets the sysmem flush register, so you get a
spurious warning. In my local branch I actually swapped the order of
this and unregister to get rid of it (not sure if this is correct though).
My sysmem flush patch that skips printing the warning if the value is 0
would also fix this, if we care. Have you noticed this happening too?

>          self.sysmem_flush.unregister(bar);
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index 18f356c9178e..3f4e99b2497b 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -33,6 +33,7 @@
>      },
>      gpu::Chipset,
>      gsp::{
> +        cmdq::Cmdq,
>          commands,
>          sequencer::{
>              GspSequencer,
> @@ -237,4 +238,43 @@ pub(crate) fn boot(
>  
>          Ok(())
>      }
> +
> +    /// Shut down the GSP and wait until it is offline.
> +    fn shutdown_gsp(
> +        cmdq: &Cmdq,
> +        bar: &Bar0,
> +        gsp_falcon: &Falcon<Gsp>,
> +        suspend: bool,
> +    ) -> Result<()> {
> +        // Send command to shutdown GSP and wait for response.
> +        cmdq.send_command(bar, commands::UnloadingGuestDriver::new(suspend))?;
> +
> +        // Wait until GSP signals it is suspended.
> +        const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;

If this can change based on firmware, should it be taken in via
bindings? I also noticed in openrm 595, this is waited on by checking the
bit rather than by strict equality (see _kgspIsProcessorSuspended). So
it may be more defensive to check the bit rather than strict equality
(even though that is correct for 570 according to openrm code).

> +        read_poll_timeout(
> +            || Ok(gsp_falcon.read_mailbox0(bar)),
> +            |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
> +            Delta::from_millis(10),
> +            Delta::from_secs(5),
> +        )
> +        .map(|_| ())
> +    }
> +
> +    /// Attempts to unload the GSP firmware.
> +    ///
> +    /// This stops all activity on the GSP.
> +    pub(crate) fn unload(
> +        &self,
> +        dev: &device::Device<device::Bound>,
> +        bar: &Bar0,
> +        gsp_falcon: &Falcon<Gsp>,
> +    ) -> Result {
> +        // Shut down the GSP.
> +
> +        Self::shutdown_gsp(&self.cmdq, bar, gsp_falcon, false)
> +            .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;

It looks like "suspend" is only ever false here? Will this be used
later? If we want to keep this, it may be nice to use a 2 discriminant
enum so we don't have misc boolean parameters hanging around.

nit: dev_err! should have \n?

> +        dev_dbg!(dev, "GSP shut down\n");
> +
> +        Ok(())
> +    }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index c80df421702c..fb94460c451e 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -237,3 +237,39 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
>  pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
>      cmdq.send_command(bar, GetGspStaticInfo)
>  }
> +
> +pub(crate) struct UnloadingGuestDriver {
> +    suspend: bool,
> +}

This feels like it only makes sense to call from within the gsp module,
so I wonder if it can be pub(super) (prolly a few others in this file
could be too, ofc not relevant for this series).

nit: Should this have doc comment?

> +
> +impl UnloadingGuestDriver {
> +    pub(crate) fn new(suspend: bool) -> Self {
> +        Self { suspend }
> +    }
> +}
> +
> +impl CommandToGsp for UnloadingGuestDriver {
> +    const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
> +    type Command = fw::commands::UnloadingGuestDriver;
> +    type Reply = UnloadingGuestDriverReply;
> +    type InitError = Infallible;
> +
> +    fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> +        fw::commands::UnloadingGuestDriver::new(self.suspend)
> +    }
> +}
> +
> +pub(crate) struct UnloadingGuestDriverReply;
> +
> +impl MessageFromGsp for UnloadingGuestDriverReply {
> +    const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
> +    type InitError = Infallible;
> +    type Message = ();
> +
> +    fn read(
> +        _msg: &Self::Message,
> +        _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
> +    ) -> Result<Self, Self::InitError> {
> +        Ok(UnloadingGuestDriverReply)
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c8a74f0e8ac..59b4c4883185 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -278,6 +278,7 @@ pub(crate) enum MsgFunction {
>      Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
>      SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
>      SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> +    UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>  
>      // Event codes
>      GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> @@ -322,6 +323,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>                  Ok(MsgFunction::SetGuestSystemInfo)
>              }
>              bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
> +            bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
> +                Ok(MsgFunction::UnloadingGuestDriver)
> +            }
>  
>              // Event codes
>              bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
> index db46276430be..71c8690c9322 100644
> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
> @@ -129,3 +129,26 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>  // SAFETY: This struct only contains integer types for which all bit patterns
>  // are valid.
>  unsafe impl FromBytes for GspStaticConfigInfo {}
> +
> +/// Payload of the `UnloadingGuestDriver` command and message.
> +#[repr(transparent)]
> +#[derive(Clone, Copy, Debug, Zeroable)]
> +pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
> +
> +impl UnloadingGuestDriver {
> +    pub(crate) fn new(suspend: bool) -> Self {
> +        Self(bindings::rpc_unloading_guest_driver_v1F_07 {
> +            bInPMTransition: u8::from(suspend),
> +            bGc6Entering: 0,
> +            newLevel: if suspend { 3 } else { 0 },

Why '3'? Is there a binding that it makes sense to use for this?

> +            ..Zeroable::zeroed()
> +        })
> +    }
> +}
> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for UnloadingGuestDriver {}
> +
> +// SAFETY: This struct only contains integer types for which all bit patterns
> +// are valid.
> +unsafe impl FromBytes for UnloadingGuestDriver {}
> diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> index 334e8be5fde8..5d8e4c0ad904 100644
> --- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> @@ -880,6 +880,14 @@ fn default() -> Self {
>      }
>  }
>  #[repr(C)]
> +#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
> +pub struct rpc_unloading_guest_driver_v1F_07 {
> +    pub bInPMTransition: u8_,
> +    pub bGc6Entering: u8_,
> +    pub __bindgen_padding_0: [u8; 2usize],
> +    pub newLevel: u32_,
> +}
> +#[repr(C)]
>  #[derive(Debug, Default, MaybeZeroable)]
>  pub struct rpc_run_cpu_sequencer_v17_00 {
>      pub bufferSizeDWord: u32_,


  reply	other threads:[~2026-04-21  9:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-04-21  6:16 ` [PATCH v2 1/5] rust: add warn_on_err macro Alexandre Courbot
2026-04-21  7:07   ` Eliot Courtney
2026-04-21  6:16 ` [PATCH v2 2/5] gpu: nova-core: use " Alexandre Courbot
2026-04-21  7:09   ` Eliot Courtney
2026-04-21  6:16 ` [PATCH v2 3/5] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
2026-04-21  8:58   ` Eliot Courtney
2026-04-21  6:16 ` [PATCH v2 4/5] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-04-21  9:42   ` Eliot Courtney [this message]
2026-04-21 14:27     ` Alexandre Courbot
2026-04-21  6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
2026-04-22  6:01   ` Eliot Courtney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHYQGLYYXEN6.11Q3FQGZMWCJ6@nvidia.com \
    --to=ecourtney@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun@kernel.org \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tmgross@umich.edu \
    --cc=ttabi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox