Re: [PATCH] gpu: nova-core: fb: make sure to unregister SysmemFlush on boot failure

public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed

From: "Gary Guo" <gary@garyguo.net>
To: "Eliot Courtney" <ecourtney@nvidia.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Timur Tabi" <ttabi@nvidia.com>, <rust-for-linux@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] gpu: nova-core: fb: make sure to unregister SysmemFlush on boot failure
Date: Fri, 10 Apr 2026 16:57:57 +0100	[thread overview]
Message-ID: <DHPLKB40FM08.1L3H2S589WUW@garyguo.net> (raw)
In-Reply-To: <20260409-fix-systemflush-v1-1-a1d6c968f17c@nvidia.com>

On Thu Apr 9, 2026 at 1:15 PM BST, Eliot Courtney wrote:
> Current `Gpu::new` will not unregister SysmemFlush if something fails
> after it is created, since it needs manual unregistering. Add a `Drop`
> implementation which will clean it up in that case. Maintain the manual
> unregister path because it can stay infallible, unlike the Drop path
> which depends on revocable access. In the case that `Gpu::new` fails the
> access is guaranteed to succeed, however.
>
> Fixes: 6554ad65b589 ("gpu: nova-core: register sysmem flush page")
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/fb.rs  | 29 ++++++++++++++++++++---------
>  drivers/gpu/nova-core/gpu.rs |  7 ++++++-
>  2 files changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
> index bdd5eed760e1..edfbdc9a2512 100644
> --- a/drivers/gpu/nova-core/fb.rs
> +++ b/drivers/gpu/nova-core/fb.rs
> @@ -7,6 +7,7 @@
>  
>  use kernel::{
>      device,
> +    devres::Devres,
>      dma::CoherentHandle,
>      fmt,
>      io::Io,
> @@ -16,7 +17,10 @@
>          Alignment, //
>      },
>      sizes::*,
> -    sync::aref::ARef, //
> +    sync::{
> +        aref::ARef,
> +        Arc, //
> +    },
>  };
>  
>  use crate::{
> @@ -46,12 +50,14 @@
>  /// Because of this, the sysmem flush memory page must be registered as early as possible during
>  /// driver initialization, and before any falcon is reset.
>  ///
> -/// Users are responsible for manually calling [`Self::unregister`] before dropping this object,
> -/// otherwise the GPU might still use it even after it has been freed.
> +/// Users should call [`Self::unregister`] before unloading to ensure unregistering is infallible.
> +/// [`Drop`] performs a best-effort fallback using revocable BAR access.
>  pub(crate) struct SysmemFlush {
>      /// Chipset we are operating on.
>      chipset: Chipset,
>      device: ARef<device::Device>,
> +    /// MMIO mapping of PCI BAR 0.
> +    bar: Arc<Devres<Bar0>>,
>      /// Keep the page alive as long as we need it.
>      page: CoherentHandle,
>  }
> @@ -60,6 +66,7 @@ impl SysmemFlush {
>      /// Allocate a memory page and register it as the sysmem flush page.
>      pub(crate) fn register(
>          dev: &device::Device<device::Bound>,
> +        devres_bar: Arc<Devres<Bar0>>,
>          bar: &Bar0,
>          chipset: Chipset,
>      ) -> Result<Self> {
> @@ -70,18 +77,17 @@ pub(crate) fn register(
>          Ok(Self {
>              chipset,
>              device: dev.into(),
> +            bar: devres_bar,
>              page,
>          })
>      }
>  
>      /// Unregister the managed sysmem flush page.
> -    ///
> -    /// In order to gracefully tear down the GPU, users must make sure to call this method before
> -    /// dropping the object.
>      pub(crate) fn unregister(&self, bar: &Bar0) {
>          let hal = hal::fb_hal(self.chipset);
> +        let registered_dma_handle = hal.read_sysmem_flush_page(bar);
>  
> -        if hal.read_sysmem_flush_page(bar) == self.page.dma_handle() {
> +        if registered_dma_handle == self.page.dma_handle() {
>              let _ = hal.write_sysmem_flush_page(bar, 0).inspect_err(|e| {
>                  dev_warn!(
>                      &self.device,
> @@ -89,8 +95,7 @@ pub(crate) fn unregister(&self, bar: &Bar0) {
>                      e
>                  )
>              });
> -        } else {
> -            // Another page has been registered after us for some reason - warn as this is a bug.
> +        } else if registered_dma_handle != 0 {
>              dev_warn!(
>                  &self.device,
>                  "attempt to unregister a sysmem flush page that is not active\n"
> @@ -99,6 +104,12 @@ pub(crate) fn unregister(&self, bar: &Bar0) {
>      }
>  }
>  
> +impl Drop for SysmemFlush {
> +    fn drop(&mut self) {
> +        let _ = self.bar.try_access_with(|bar| self.unregister(bar));

I feel that this is the wrong solution to the problem.

The thing we want is to *ensure* that `SysmemFlush` Drop is called with device
still being bound.

It's not yet fully clear to me how we'd want to guarantee that, but one API that
might make sense is to create a DevRes API that allows you to reference an
existing `DevRes` and have driver-core making sure that the tear down happens in
reverse order. So inside the `Drop` the `bar` can still be unconditionally
access.

Best,
Gary

> +    }
> +}
> +
>  pub(crate) struct FbRange(Range<u64>);
>  
>  impl FbRange {
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 0f6fe9a1b955..5bad5a055b3b 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -257,7 +257,12 @@ pub(crate) fn new<'a>(
>                      .inspect_err(|_| dev_err!(pdev, "GFW boot did not complete\n"))?;
>              },
>  
> -            sysmem_flush: SysmemFlush::register(pdev.as_ref(), bar, spec.chipset)?,
> +            sysmem_flush: SysmemFlush::register(
> +                pdev.as_ref(),
> +                devres_bar.clone(),
> +                bar,
> +                spec.chipset,
> +            )?,
>  
>              gsp_falcon: Falcon::new(
>                  pdev.as_ref(),
>
> ---
> base-commit: a7a080bb4236ebe577b6776d940d1717912ff6dd
> change-id: 20260409-fix-systemflush-de66dc90378a
>
> Best regards,
> --  
> Eliot Courtney <ecourtney@nvidia.com>

     prev parent reply	other threads:[~2026-04-10 15:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 12:15 [PATCH] gpu: nova-core: fb: make sure to unregister SysmemFlush on boot failure Eliot Courtney
2026-04-09 22:56 ` John Hubbard
2026-04-10 15:57 ` Gary Guo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHPLKB40FM08.1L3H2S589WUW@garyguo.net \
    --to=gary@garyguo.net \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ecourtney@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=ttabi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox