NVIDIA GPU driver infrastructure
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Danilo Krummrich" <dakr@kernel.org>,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	"Zhi Wang" <zhiw@nvidia.com>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Gary Guo" <gary@garyguo.net>, "David Airlie" <airlied@gmail.com>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alistair Popple" <apopple@nvidia.com>,
	nova-gpu@lists.linux.dev,
	"Shashank Sharma" <shashanks@nvidia.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"Timur Tabi" <ttabi@nvidia.com>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>
Subject: Re: [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
Date: Tue, 02 Jun 2026 03:40:42 +0000	[thread overview]
Message-ID: <20260602034042.7C26E1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260602032111.224790-20-jhubbard@nvidia.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] DMA Use-After-Free of FMC boot parameters during initialization
- [High] Missing little-endian conversions in hardware payloads breaks big-endian hosts
- [Medium] Kernel stack overflow risk due to large struct passed by value
--

commit 42c5f0e39c2bb0bc87a3f8df25554d04f00a3e69
Author: John Hubbard <jhubbard@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
    
Build and send the Chain of Trust message to FSP, bundling the
DMA-coherent boot parameters that FSP reads at boot time.

> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index 67cf83aba83c6..352ef7683cf2a 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs

[ ... ]

> @@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
>          Ok(Fsp { falcon, fsp_fw })
>      }
>  
> +    /// Boots GSP FMC via FSP Chain of Trust.
> +    ///
> +    /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
> +    /// to FSP, and waits for the response.
> +    pub(crate) fn boot_fmc(
> +        &mut self,
> +        dev: &device::Device<device::Bound>,
> +        bar: &Bar0,
> +        fb_layout: &FbLayout,
> +        args: &FmcBootArgs,
> +    ) -> Result {
> +        dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
> +
> +        let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
> +        let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
> +
> +        // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
> +        let frts_offset = if !args.resume {
> +            let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
> +
> +            frts_reserved_size
> +                .align_up(Alignment::new::<SZ_2M>())
> +                .ok_or(EINVAL)?
> +        } else {
> +            0
> +        };
> +        let frts_size: u32 = if !args.resume {
> +            fb_layout.frts.len().try_into()?
> +        } else {
> +            0
> +        };
> +
> +        let msg = KBox::new(
> +            FspMessage {
> +                mctp_header: MctpHeader::single_packet(),
> +                nvdm_header: NvdmHeader::new(NvdmType::Cot),
> +                cot: NvdmPayloadCot {
[Severity: Medium]
Is it intentional to construct this ~868-byte FspMessage structure on the
stack before boxing it? 

This also dereferences and copies the 816-byte FmcSignatures structure
(sigs: *self.fsp_fw.fmc_sigs). Doesn't this bypass the optimization in
FspFirmware::extract_fmc_signatures() which avoided building FmcSignatures on
the stack?

Could this contribute to kernel stack overflows? Might it be safer to
initialize it directly on the heap using KBox::init() or KBox::new_uninit()?

> +                    version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
> +                    size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
> +                        .map_err(|_| EINVAL)?,
> +                    gsp_fmc_sysmem_offset: fmc_addr,
> +                    frts_sysmem_offset: 0,
> +                    frts_sysmem_size: 0,
> +                    frts_vidmem_offset: frts_offset,
> +                    frts_vidmem_size: frts_size,
[Severity: High]
Will these native integer types be correctly interpreted by the little-endian
RISC-V FSP/GSP processors if the host architecture is big-endian?

Because the struct is laid out in big-endian byte order on big-endian hosts,
won't the little-endian hardware swap the bytes when reading, resulting in
incorrect sizes, versions, and DMA addresses? 

Should explicit little-endian types or byte-swapping (like .to_le()) be used
here to avoid breaking driver initialization on big-endian hosts?

> +                    sigs: *self.fsp_fw.fmc_sigs,
> +                    gsp_boot_args_sysmem_offset: fmc_boot_params_addr,
> +                },
> +            },
> +            GFP_KERNEL,
> +        )?;
> +
> +        self.send_sync_fsp(dev, bar, &*msg)?;
> +
> +        dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
> +        Ok(())
> +    }

[ ... ]

> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c54e8bf4bb3a..558b37863f00a 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -934,3 +934,67 @@ fn new(cmdq: &Cmdq) -> impl Init<Self> + '_ {
>          })
>      }
>  }
> +
> +#[repr(u32)]
> +pub(crate) enum GspDmaTarget {
> +    #[expect(dead_code)]
> +    LocalFb = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB,
> +    CoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM,
> +    NoncoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM,
> +}
> +
> +type GspAcrBootGspRmParams = bindings::GSP_ACR_BOOT_GSP_RM_PARAMS;
> +
> +impl GspAcrBootGspRmParams {
> +    fn new(target: GspDmaTarget, wpr_meta_addr: u64) -> impl Init<Self> {
> +        #[allow(non_snake_case)]
> +        let params = init!(Self {
> +            target: target as u32,
> +            gspRmDescSize: num::usize_into_u32::<{ size_of::<GspFwWprMeta>() }>(),
> +            gspRmDescOffset: wpr_meta_addr,
> +            bIsGspRmBoot: 1,
[Severity: High]
Similarly to NvdmPayloadCot above, will these native integer types be
correctly interpreted by little-endian hardware on big-endian hosts?

Does this struct require explicit little-endian types or byte-swapping to
avoid accessing invalid physical memory on big-endian architectures?

> +            wprCarveoutOffset: 0,
> +            wprCarveoutSize: 0,
> +            __bindgen_padding_0: Default::default(),
> +        });
> +
> +        params
> +    }
> +}

[ ... ]

> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index b25970dd4561d..f41f3fea15ff2 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs

[ ... ]

> @@ -39,17 +42,27 @@ impl GspHal for Gh100 {
>      /// the GSP boot internally - no manual GSP reset/boot is needed.
>      fn boot<'a>(
>          &self,
> -        _gsp: &'a Gsp,
> +        gsp: &'a Gsp,
>          dev: &'a device::Device<device::Bound>,
>          bar: &'a Bar0,
>          chipset: Chipset,
> -        _fb_layout: &FbLayout,
> -        _wpr_meta: &Coherent<GspFwWprMeta>,
> +        fb_layout: &FbLayout,
> +        wpr_meta: &Coherent<GspFwWprMeta>,
>          _gsp_falcon: &'a Falcon<GspEngine>,
>          _sec2_falcon: &'a Falcon<Sec2>,
>      ) -> Result<BootUnloadGuard<'a>> {
>          let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
> -        let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> +        let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> +
> +        let args = FmcBootArgs::new(
> +            dev,
> +            chipset,
> +            wpr_meta.dma_handle(),
> +            gsp.libos.dma_handle(),
> +            false,
> +        )?;
> +
> +        fsp.boot_fmc(dev, bar, fb_layout, &args)?;
>  
>          Err(ENOTSUPP)
[Severity: High]
Does returning an error here immediately drop the args variable, thereby
freeing its DMA-coherent memory before the FSP finishes reading it?

Since the FSP is actively booting asynchronously, wouldn't this cause a
hardware use-after-free of the boot parameters, leading to IOMMU faults or
undefined behavior?

Note: I see this is fixed in a later commit 'gpu: nova-core: gsp: enable FSP
boot path' which changes the return to Ok(unload_guard) and properly handles
the required hardware delays.

>      }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@nvidia.com?part=19

  reply	other threads:[~2026-06-02  3:40 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02  3:20 [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support John Hubbard
2026-06-02  3:20 ` [PATCH v12 01/22] gpu: nova-core: set DMA mask width based on GPU architecture John Hubbard
2026-06-02  6:40   ` Eliot Courtney
2026-06-02  3:20 ` [PATCH v12 02/22] gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror John Hubbard
2026-06-02  3:20 ` [PATCH v12 03/22] gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size John Hubbard
2026-06-02  3:20 ` [PATCH v12 04/22] gpu: nova-core: Hopper/Blackwell: larger non-WPR heap John Hubbard
2026-06-02  3:20 ` [PATCH v12 05/22] gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap John Hubbard
2026-06-02  3:20 ` [PATCH v12 06/22] gpu: nova-core: Blackwell: use correct sysmem flush registers John Hubbard
2026-06-02  3:30   ` sashiko-bot
2026-06-02  8:00     ` Alexandre Courbot
2026-06-02  7:12   ` Eliot Courtney
2026-06-02  8:26     ` Alexandre Courbot
2026-06-02  3:20 ` [PATCH v12 07/22] gpu: nova-core: don't assume 64-bit firmware images John Hubbard
2026-06-02  3:20 ` [PATCH v12 08/22] gpu: nova-core: add support for 32-bit " John Hubbard
2026-06-02  3:20 ` [PATCH v12 09/22] gpu: nova-core: add auto-detection of 32-bit, 64-bit " John Hubbard
2026-06-02  3:20 ` [PATCH v12 10/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub John Hubbard
2026-06-02  6:50   ` Eliot Courtney
2026-06-02  3:20 ` [PATCH v12 11/22] gpu: nova-core: Hopper/Blackwell: add FMC firmware image John Hubbard
2026-06-02  7:18   ` Eliot Courtney
2026-06-02  3:21 ` [PATCH v12 12/22] gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting John Hubbard
2026-06-02  7:56   ` Eliot Courtney
2026-06-02  8:22     ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 13/22] gpu: nova-core: Hopper/Blackwell: add FMC signature extraction John Hubbard
2026-06-02  3:32   ` sashiko-bot
2026-06-02  7:56     ` Alexandre Courbot
2026-06-02  8:11   ` Eliot Courtney
2026-06-02  8:28     ` Alexandre Courbot
2026-06-03  0:04   ` Timur Tabi
2026-06-03  0:20     ` Alexandre Courbot
2026-06-03  3:09       ` Timur Tabi
2026-06-03  3:53         ` John Hubbard
2026-06-02  3:21 ` [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations John Hubbard
2026-06-02 11:42   ` Eliot Courtney
2026-06-02 14:55     ` Alexandre Courbot
2026-06-02 15:02   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure John Hubbard
2026-06-02  3:33   ` sashiko-bot
2026-06-03  1:14     ` Alexandre Courbot
2026-06-03  1:41       ` Eliot Courtney
2026-06-02 12:21   ` Eliot Courtney
2026-06-03  1:34     ` Alexandre Courbot
2026-06-03  4:49       ` Eliot Courtney
2026-06-03  5:00         ` Alexandre Courbot
2026-06-03  1:00   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 16/22] gpu: nova-core: add MCTP/NVDM protocol types for firmware communication John Hubbard
2026-06-02  5:36   ` sashiko-bot
2026-06-03  2:41     ` Alexandre Courbot
2026-06-02 12:53   ` Eliot Courtney
2026-06-02  3:21 ` [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging John Hubbard
2026-06-02  3:35   ` sashiko-bot
2026-06-02  3:21 ` [PATCH v12 18/22] gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version John Hubbard
2026-06-02 12:55   ` Eliot Courtney
2026-06-02  3:21 ` [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot John Hubbard
2026-06-02  3:40   ` sashiko-bot [this message]
2026-06-03  5:23     ` Alexandre Courbot
2026-06-03  5:19   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 20/22] gpu: nova-core: Hopper/Blackwell: add GSP lockdown release polling John Hubbard
2026-06-02  3:38   ` sashiko-bot
2026-06-03  5:45   ` Alexandre Courbot
2026-06-02  3:21 ` [PATCH v12 21/22] gpu: nova-core: add non-sec2 unload path John Hubbard
2026-06-02  3:21 ` [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path John Hubbard
2026-06-02  3:38   ` sashiko-bot
2026-06-02 12:38 ` [PATCH v12 00/22] gpu: nova-core: firmware: Hopper/Blackwell support Danilo Krummrich
2026-06-02 13:37 ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260602034042.7C26E1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=a.hindborg@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=ecourtney@nvidia.com \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=nova-gpu@lists.linux.dev \
    --cc=ojeda@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=shashanks@nvidia.com \
    --cc=simona@ffwll.ch \
    --cc=tmgross@umich.edu \
    --cc=ttabi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox