Re: [RFC 7/7] gpu: nova-core: load the scrubber ucode when vGPU support is enabled

Linux PCI subsystem development
 help / color / mirror / Atom feed

From: Joel Fernandes <joelagnelf@nvidia.com>
To: Zhi Wang <zhiw@nvidia.com>,
	rust-for-linux@vger.kernel.org, linux-pci@vger.kernel.org,
	nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org
Cc: airlied@gmail.com, dakr@kernel.org, aliceryhl@google.com,
	bhelgaas@google.com, kwilczynski@kernel.org, ojeda@kernel.org,
	alex.gaynor@gmail.com, boqun.feng@gmail.com, gary@garyguo.net,
	bjorn3_gh@protonmail.com, lossin@kernel.org,
	a.hindborg@kernel.org, tmgross@umich.edu,
	markus.probst@posteo.de, helgaas@kernel.org, cjia@nvidia.com,
	alex@shazbot.org, smitra@nvidia.com, ankita@nvidia.com,
	aniketa@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
	acourbot@nvidia.com, jhubbard@nvidia.com, zhiwang@kernel.org
Subject: Re: [RFC 7/7] gpu: nova-core: load the scrubber ucode when vGPU support is enabled
Date: Sat, 6 Dec 2025 21:26:12 -0500	[thread overview]
Message-ID: <47c05bcf-7591-4148-8783-0c107b0c3c9d@nvidia.com> (raw)
In-Reply-To: <20251206124208.305963-8-zhiw@nvidia.com>

Hi Zhi,

On 12/6/2025 7:42 AM, Zhi Wang wrote:
> To support the maximum vGPUs on the device that support vGPU, a larger
> WPR2 heap size is required. By setting the WPR2 heap size larger than 256MB
> the scrubber ucode image is required to scrub the FB memory before any
> other ucode image is executed.
> 
> If not, the GSP firmware hangs when booting.
> 
> When vGPU support is enabled, execute the scrubber ucode image to scrub the
> FB memory before executing any other ucode images.
> 
[..]
>      pub(crate) const fn create(
> diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
> index f107f753214a..f622c9b960de 100644
> --- a/drivers/gpu/nova-core/firmware/booter.rs
> +++ b/drivers/gpu/nova-core/firmware/booter.rs
> @@ -269,6 +269,7 @@ fn new_booter(dev: &device::Device<device::Bound>, data: &[u8]) -> Result<Self>
>  
>  #[derive(Copy, Clone, Debug, PartialEq)]
>  pub(crate) enum BooterKind {
> +    Scrubber,
>      Loader,
>      #[expect(unused)]
>      Unloader,
> @@ -286,6 +287,7 @@ pub(crate) fn new(
>          bar: &Bar0,
>      ) -> Result<Self> {
>          let fw_name = match kind {
> +            BooterKind::Scrubber => "scrubber",
>              BooterKind::Loader => "booter_load",
>              BooterKind::Unloader => "booter_unload",
>          };
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index ec006c26f19f..8ef79433f017 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -151,6 +151,33 @@ pub(crate) fn boot(
>  
>          Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;

Could you elaborate on how the timeout below works? See comment below.

>  
> +        if vgpu_support {
> +            let scrubber = BooterFirmware::new(
> +                dev,
> +                BooterKind::Scrubber,
> +                chipset,
> +                FIRMWARE_VERSION,
> +                sec2_falcon,
> +                bar,
> +            )?;
> +
> +            sec2_falcon.reset(bar)?;
> +            sec2_falcon.dma_load(bar, &scrubber)?;
> +
> +            let (mbox0, mbox1) = sec2_falcon.boot(bar, None, None)?;

boot() already returns -ETIMEDOUT via wait_till_halted()->read_poll_timeout().

The wait there is 2 seconds. I assume the scrubber would have completed by then.

> +
> +            dev_dbg!(
> +                pdev.as_ref(),
> +                "SEC2 MBOX0: {:#x}, MBOX1{:#x}\n",
> +                mbox0,
> +                mbox1
> +            );
> +
> +            if !regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed() {
> +                return Err(ETIMEDOUT);

So under which situation do you get to this point (!scrubber_completed) ?
Basically I am not sure if ETIMEDOUT is the right error to return here, because
boot() already returns ETIMEDOUT by waiting for the halt.

If you still want return ETIMEDOUT here, then it sounds like you're waiting for
scrubbing beyond the waiting already done by boot(). If so, then shouldn't you
need to use read_poll_timeout() here?

perhaps something like:

 read_poll_timeout(
     || Ok(regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()),
     |val: &bool| *val,
     Delta::from_millis(10),
     Delta::from_secs(5),
 )?;

Thanks.

next prev parent reply	other threads:[~2025-12-07  2:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 12:42 [RFC 0/7] gpu: nova-core: Enable booting GSP with vGPU enabled Zhi Wang
2025-12-06 12:42 ` [RFC 1/7] rust: pci: expose sriov_get_totalvfs() helper Zhi Wang
2025-12-07  7:12   ` Dirk Behme
2025-12-09  1:09     ` Miguel Ojeda
2025-12-09 14:22     ` Zhi Wang
2025-12-09  3:42   ` Alexandre Courbot
2025-12-10 11:31   ` Alexandre Courbot
2025-12-06 12:42 ` [RFC 2/7] [!UPSTREAM] rust: pci: support configuration space access Zhi Wang
2025-12-10 22:51   ` Ewan CHORYNSKI
2025-12-06 12:42 ` [RFC 3/7] gpu: nova-core: introduce vgpu_support module param Zhi Wang
2025-12-06 12:42 ` [RFC 4/7] gpu: nova-core: populate GSP_VF_INFO when vGPU is enabled Zhi Wang
2025-12-07  2:32   ` Joel Fernandes
2025-12-09 13:41     ` Zhi Wang
2025-12-11  8:36       ` Joel Fernandes
2025-12-12  0:16         ` John Hubbard
2025-12-12  0:29           ` Joel Fernandes
2025-12-10 14:27   ` Alexandre Courbot
2025-12-06 12:42 ` [RFC 5/7] gpu: nova-core: set RMSetSriovMode when NVIDIA " Zhi Wang
2025-12-07 15:55   ` Timur Tabi
2025-12-07 16:57     ` Joel Fernandes
2025-12-09 14:28       ` Zhi Wang
2025-12-15  4:28     ` Alexandre Courbot
2025-12-06 12:42 ` [RFC 6/7] gpu: nova-core: reserve a larger GSP WPR2 heap when " Zhi Wang
2025-12-15  4:35   ` Alexandre Courbot
2025-12-06 12:42 ` [RFC 7/7] gpu: nova-core: load the scrubber ucode when vGPU support " Zhi Wang
2025-12-07  2:26   ` Joel Fernandes [this message]
2025-12-09 14:05     ` Zhi Wang
2025-12-11  1:24       ` Joel Fernandes
2025-12-07  6:42   ` Dirk Behme
2025-12-15  4:45   ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47c05bcf-7591-4148-8783-0c107b0c3c9d@nvidia.com \
    --to=joelagnelf@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alex@shazbot.org \
    --cc=aliceryhl@google.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=cjia@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=helgaas@kernel.org \
    --cc=jhubbard@nvidia.com \
    --cc=kwankhede@nvidia.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=markus.probst@posteo.de \
    --cc=nouveau@lists.freedesktop.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=smitra@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=tmgross@umich.edu \
    --cc=zhiw@nvidia.com \
    --cc=zhiwang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox