All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Danilo Krummrich" <dakr@kernel.org>
To: "Timur Tabi" <ttabi@nvidia.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	nouveau@lists.freedesktop.org, rust-for-linux@vger.kernel.org
Subject: Re: [PATCH v6 11/11] gpu: nova-core: add PIO support for loading firmware images
Date: Fri, 16 Jan 2026 22:05:19 +0100	[thread overview]
Message-ID: <DFQBHVTTHZY8.13ASLCJ3FJP81@kernel.org> (raw)
In-Reply-To: <20260114192950.1143002-12-ttabi@nvidia.com>

On Wed Jan 14, 2026 at 8:29 PM CET, Timur Tabi wrote:
> +    /// Write a byte array to Falcon memory using programmed I/O (PIO).
> +    ///
> +    /// Writes `img` to the specified `target_mem` (IMEM or DMEM) starting at `mem_base`.
> +    /// For IMEM writes, tags are set for each 256-byte block starting from `tag`.
> +    ///
> +    /// Returns `EINVAL` if `img.len()` is not a multiple of 4.
> +    fn pio_wr_bytes(
> +        &self,
> +        bar: &Bar0,
> +        img: &[u8],
> +        mem_base: u16,
> +        target_mem: FalconMem,
> +        port: u8,

This looks like we should use the Bounded type instead.

> +        tag: u16,
> +    ) -> Result {
> +        // Rejecting misaligned images here allows us to avoid checking
> +        // inside the loops.
> +        if img.len() % 4 != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let port = usize::from(port);
> +
> +        match target_mem {
> +            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
> +                regs::NV_PFALCON_FALCON_IMEMC::default()
> +                    .set_secure(target_mem == FalconMem::ImemSecure)
> +                    .set_aincw(true)
> +                    .set_offs(mem_base)
> +                    .write(bar, &E::ID, port);
> +
> +                let mut tag = tag;
> +                for block in img.chunks(256) {

	for (n, block) in img.chunks(256).iter().enumerate() {
	    regs::NV_PFALCON_FALCON_IMEMT::default()
	        .set_tag(tag + n)
	        .write(bar, &E::ID, port);
	}

This way you don't need the mutable shadow of tag. Besides that, how do we know
this doesn't overflow? Don't we need a checked_add()?

> +                    regs::NV_PFALCON_FALCON_IMEMT::default()
> +                        .set_tag(tag)
> +                        .write(bar, &E::ID, port);
> +                    for word in block.chunks_exact(4) {
> +                        let w = [word[0], word[1], word[2], word[3]];
> +                        regs::NV_PFALCON_FALCON_IMEMD::default()
> +                            .set_data(u32::from_le_bytes(w))
> +                            .write(bar, &E::ID, port);
> +                    }
> +                    tag += 1;
> +                }
> +            }
> +            FalconMem::Dmem => {
> +                regs::NV_PFALCON_FALCON_DMEMC::default()
> +                    .set_aincw(true)
> +                    .set_offs(mem_base)
> +                    .write(bar, &E::ID, port);
> +
> +                for block in img.chunks(256) {
> +                    for word in block.chunks_exact(4) {
> +                        let w = [word[0], word[1], word[2], word[3]];
> +                        regs::NV_PFALCON_FALCON_DMEMD::default()
> +                            .set_data(u32::from_le_bytes(w))
> +                            .write(bar, &E::ID, port);
> +                    }
> +                }
> +            }
> +        }
> +
> +        Ok(())
> +    }
> +
> +    fn pio_wr<F: FalconFirmware<Target = E>>(
> +        &self,
> +        bar: &Bar0,
> +        fw: &F,
> +        target_mem: FalconMem,
> +        load_offsets: &FalconLoadTarget,
> +        port: u8,
> +        tag: u16,
> +    ) -> Result {
> +        let start = usize::from_safe_cast(load_offsets.src_start);
> +        let len = usize::from_safe_cast(load_offsets.len);
> +        let mem_base = u16::try_from(load_offsets.dst_start)?;
> +
> +        // SAFETY: we are the only user of the firmware image at this stage
> +        let data = unsafe { fw.as_slice(start, len).map_err(|_| EINVAL)? };

Why do we need the firmware image to be backed by a DMA object at this point
when you load the firmware image through PIO anyways?

> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
> index 49501aa6ff7f..3a882b7d8aa8 100644
> --- a/drivers/gpu/nova-core/falcon/hal.rs
> +++ b/drivers/gpu/nova-core/falcon/hal.rs
> @@ -15,6 +15,11 @@
>  mod ga102;
>  mod tu102;
>  
> +pub(crate) enum LoadMethod {
> +    Pio,
> +    Dma,
> +}

Seems obvious, but still, please add some documentation explaining that this is
the load method for falcon firmware images, etc.

> +pub(crate) struct GenericBootloader {
> +    pub desc: BootloaderDesc,
> +    pub ucode: Vec<u8, kernel::alloc::allocator::Kmalloc>,

	pub ucode: KVec<u8>,

Also, we may want to use KVVec<u8> or just VVec<u8> instead. What's the image
size?

> +}
> +
>  /// The FWSEC microcode, extracted from the BIOS and to be run on the GSP falcon.
>  ///
>  /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
> @@ -221,6 +286,8 @@ pub(crate) struct FwsecFirmware {
>      desc: FalconUCodeDesc,
>      /// GPU-accessible DMA object containing the firmware.
>      ucode: FirmwareDmaObject<Self, Signed>,
> +    /// Generic bootloader
> +    gen_bootloader: Option<GenericBootloader>,

I'm not convinced this is a good idea. We probably want a HAL here and have
different FwsecFirmware types:

One with a DMA object and one with a system memory object when the architecture
uses PIO. In the latter case the object can have a GenericBootloader field, i.e.
this also gets us rid of the Option and all the subsequent 'if chipset <
Chipset::GA102' checks and 'match gbl_fw' matches below.

>  }
>  
>  impl FalconLoadParams for FwsecFirmware {
> @@ -275,7 +342,19 @@ fn brom_params(&self) -> FalconBromParams {
>      }
>  
>      fn boot_addr(&self) -> u32 {
> -        0
> +        match &self.desc {
> +            FalconUCodeDesc::V2(_v2) => {
> +                // On V2 platforms, the boot address is extracted from the
> +                // generic bootloader, because the gbl is what actually copies
> +                // FWSEC into memory, so that is what needs to be booted.
> +                if let Some(ref gbl) = self.gen_bootloader {
> +                    gbl.desc.start_tag << 8
> +                } else {
> +                    0
> +                }
> +            }
> +            FalconUCodeDesc::V3(_v3) => 0,
> +        }
>      }
>  }
>  
> @@ -376,6 +455,7 @@ impl FwsecFirmware {
>      /// command.
>      pub(crate) fn new(
>          dev: &Device<device::Bound>,
> +        chipset: Chipset,
>          falcon: &Falcon<Gsp>,
>          bar: &Bar0,
>          bios: &Vbios,
> @@ -432,9 +512,54 @@ pub(crate) fn new(
>              ucode_dma.no_patch_signature()
>          };
>  
> +        // The Generic Bootloader exists only on Turing and GA100.  To avoid a bogus
> +        // console error message on other platforms, only try to load it if it's
> +        // supposed to be there.
> +        let gbl_fw = if chipset < Chipset::GA102 {
> +            Some(super::request_firmware(
> +                dev,
> +                chipset,
> +                "gen_bootloader",
> +                FIRMWARE_VERSION,
> +            )?)
> +        } else {
> +            None
> +        };
> +
> +        let gbl = match gbl_fw {
> +            Some(fw) => {
> +                let hdr = fw
> +                    .data()
> +                    .get(0..size_of::<BinHdr>())
> +                    .and_then(BinHdr::from_bytes_copy)
> +                    .ok_or(EINVAL)?;
> +
> +                let desc_offset = usize::from_safe_cast(hdr.header_offset);
> +                let desc = fw
> +                    .data()
> +                    .get(desc_offset..desc_offset + size_of::<BootloaderDesc>())
> +                    .and_then(BootloaderDesc::from_bytes_copy)
> +                    .ok_or(EINVAL)?;
> +
> +                let ucode_start = usize::from_safe_cast(hdr.data_offset);
> +                let ucode_size = usize::from_safe_cast(hdr.data_size);
> +                let ucode_data = fw
> +                    .data()
> +                    .get(ucode_start..ucode_start + ucode_size)
> +                    .ok_or(EINVAL)?;
> +
> +                let mut ucode = KVec::new();
> +                ucode.extend_from_slice(ucode_data, GFP_KERNEL)?;
> +
> +                Some(GenericBootloader { desc, ucode })
> +            }
> +            None => None,
> +        };
> +
>          Ok(FwsecFirmware {
>              desc,
>              ucode: ucode_signed,
> +            gen_bootloader: gbl,
>          })
>      }

WARNING: multiple messages have this Message-ID (diff)
From: "Danilo Krummrich" <dakr@kernel.org>
To: "Timur Tabi" <ttabi@nvidia.com>
Cc: "Alexandre Courbot" <acourbot@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	<nouveau@lists.freedesktop.org>, <rust-for-linux@vger.kernel.org>
Subject: Re: [PATCH v6 11/11] gpu: nova-core: add PIO support for loading firmware images
Date: Fri, 16 Jan 2026 22:05:19 +0100	[thread overview]
Message-ID: <DFQBHVTTHZY8.13ASLCJ3FJP81@kernel.org> (raw)
In-Reply-To: <20260114192950.1143002-12-ttabi@nvidia.com>

On Wed Jan 14, 2026 at 8:29 PM CET, Timur Tabi wrote:
> +    /// Write a byte array to Falcon memory using programmed I/O (PIO).
> +    ///
> +    /// Writes `img` to the specified `target_mem` (IMEM or DMEM) starting at `mem_base`.
> +    /// For IMEM writes, tags are set for each 256-byte block starting from `tag`.
> +    ///
> +    /// Returns `EINVAL` if `img.len()` is not a multiple of 4.
> +    fn pio_wr_bytes(
> +        &self,
> +        bar: &Bar0,
> +        img: &[u8],
> +        mem_base: u16,
> +        target_mem: FalconMem,
> +        port: u8,

This looks like we should use the Bounded type instead.

> +        tag: u16,
> +    ) -> Result {
> +        // Rejecting misaligned images here allows us to avoid checking
> +        // inside the loops.
> +        if img.len() % 4 != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let port = usize::from(port);
> +
> +        match target_mem {
> +            FalconMem::ImemSecure | FalconMem::ImemNonSecure => {
> +                regs::NV_PFALCON_FALCON_IMEMC::default()
> +                    .set_secure(target_mem == FalconMem::ImemSecure)
> +                    .set_aincw(true)
> +                    .set_offs(mem_base)
> +                    .write(bar, &E::ID, port);
> +
> +                let mut tag = tag;
> +                for block in img.chunks(256) {

	for (n, block) in img.chunks(256).iter().enumerate() {
	    regs::NV_PFALCON_FALCON_IMEMT::default()
	        .set_tag(tag + n)
	        .write(bar, &E::ID, port);
	}

This way you don't need the mutable shadow of tag. Besides that, how do we know
this doesn't overflow? Don't we need a checked_add()?

> +                    regs::NV_PFALCON_FALCON_IMEMT::default()
> +                        .set_tag(tag)
> +                        .write(bar, &E::ID, port);
> +                    for word in block.chunks_exact(4) {
> +                        let w = [word[0], word[1], word[2], word[3]];
> +                        regs::NV_PFALCON_FALCON_IMEMD::default()
> +                            .set_data(u32::from_le_bytes(w))
> +                            .write(bar, &E::ID, port);
> +                    }
> +                    tag += 1;
> +                }
> +            }
> +            FalconMem::Dmem => {
> +                regs::NV_PFALCON_FALCON_DMEMC::default()
> +                    .set_aincw(true)
> +                    .set_offs(mem_base)
> +                    .write(bar, &E::ID, port);
> +
> +                for block in img.chunks(256) {
> +                    for word in block.chunks_exact(4) {
> +                        let w = [word[0], word[1], word[2], word[3]];
> +                        regs::NV_PFALCON_FALCON_DMEMD::default()
> +                            .set_data(u32::from_le_bytes(w))
> +                            .write(bar, &E::ID, port);
> +                    }
> +                }
> +            }
> +        }
> +
> +        Ok(())
> +    }
> +
> +    fn pio_wr<F: FalconFirmware<Target = E>>(
> +        &self,
> +        bar: &Bar0,
> +        fw: &F,
> +        target_mem: FalconMem,
> +        load_offsets: &FalconLoadTarget,
> +        port: u8,
> +        tag: u16,
> +    ) -> Result {
> +        let start = usize::from_safe_cast(load_offsets.src_start);
> +        let len = usize::from_safe_cast(load_offsets.len);
> +        let mem_base = u16::try_from(load_offsets.dst_start)?;
> +
> +        // SAFETY: we are the only user of the firmware image at this stage
> +        let data = unsafe { fw.as_slice(start, len).map_err(|_| EINVAL)? };

Why do we need the firmware image to be backed by a DMA object at this point
when you load the firmware image through PIO anyways?

> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
> index 49501aa6ff7f..3a882b7d8aa8 100644
> --- a/drivers/gpu/nova-core/falcon/hal.rs
> +++ b/drivers/gpu/nova-core/falcon/hal.rs
> @@ -15,6 +15,11 @@
>  mod ga102;
>  mod tu102;
>  
> +pub(crate) enum LoadMethod {
> +    Pio,
> +    Dma,
> +}

Seems obvious, but still, please add some documentation explaining that this is
the load method for falcon firmware images, etc.

> +pub(crate) struct GenericBootloader {
> +    pub desc: BootloaderDesc,
> +    pub ucode: Vec<u8, kernel::alloc::allocator::Kmalloc>,

	pub ucode: KVec<u8>,

Also, we may want to use KVVec<u8> or just VVec<u8> instead. What's the image
size?

> +}
> +
>  /// The FWSEC microcode, extracted from the BIOS and to be run on the GSP falcon.
>  ///
>  /// It is responsible for e.g. carving out the WPR2 region as the first step of the GSP bootflow.
> @@ -221,6 +286,8 @@ pub(crate) struct FwsecFirmware {
>      desc: FalconUCodeDesc,
>      /// GPU-accessible DMA object containing the firmware.
>      ucode: FirmwareDmaObject<Self, Signed>,
> +    /// Generic bootloader
> +    gen_bootloader: Option<GenericBootloader>,

I'm not convinced this is a good idea. We probably want a HAL here and have
different FwsecFirmware types:

One with a DMA object and one with a system memory object when the architecture
uses PIO. In the latter case the object can have a GenericBootloader field, i.e.
this also gets us rid of the Option and all the subsequent 'if chipset <
Chipset::GA102' checks and 'match gbl_fw' matches below.

>  }
>  
>  impl FalconLoadParams for FwsecFirmware {
> @@ -275,7 +342,19 @@ fn brom_params(&self) -> FalconBromParams {
>      }
>  
>      fn boot_addr(&self) -> u32 {
> -        0
> +        match &self.desc {
> +            FalconUCodeDesc::V2(_v2) => {
> +                // On V2 platforms, the boot address is extracted from the
> +                // generic bootloader, because the gbl is what actually copies
> +                // FWSEC into memory, so that is what needs to be booted.
> +                if let Some(ref gbl) = self.gen_bootloader {
> +                    gbl.desc.start_tag << 8
> +                } else {
> +                    0
> +                }
> +            }
> +            FalconUCodeDesc::V3(_v3) => 0,
> +        }
>      }
>  }
>  
> @@ -376,6 +455,7 @@ impl FwsecFirmware {
>      /// command.
>      pub(crate) fn new(
>          dev: &Device<device::Bound>,
> +        chipset: Chipset,
>          falcon: &Falcon<Gsp>,
>          bar: &Bar0,
>          bios: &Vbios,
> @@ -432,9 +512,54 @@ pub(crate) fn new(
>              ucode_dma.no_patch_signature()
>          };
>  
> +        // The Generic Bootloader exists only on Turing and GA100.  To avoid a bogus
> +        // console error message on other platforms, only try to load it if it's
> +        // supposed to be there.
> +        let gbl_fw = if chipset < Chipset::GA102 {
> +            Some(super::request_firmware(
> +                dev,
> +                chipset,
> +                "gen_bootloader",
> +                FIRMWARE_VERSION,
> +            )?)
> +        } else {
> +            None
> +        };
> +
> +        let gbl = match gbl_fw {
> +            Some(fw) => {
> +                let hdr = fw
> +                    .data()
> +                    .get(0..size_of::<BinHdr>())
> +                    .and_then(BinHdr::from_bytes_copy)
> +                    .ok_or(EINVAL)?;
> +
> +                let desc_offset = usize::from_safe_cast(hdr.header_offset);
> +                let desc = fw
> +                    .data()
> +                    .get(desc_offset..desc_offset + size_of::<BootloaderDesc>())
> +                    .and_then(BootloaderDesc::from_bytes_copy)
> +                    .ok_or(EINVAL)?;
> +
> +                let ucode_start = usize::from_safe_cast(hdr.data_offset);
> +                let ucode_size = usize::from_safe_cast(hdr.data_size);
> +                let ucode_data = fw
> +                    .data()
> +                    .get(ucode_start..ucode_start + ucode_size)
> +                    .ok_or(EINVAL)?;
> +
> +                let mut ucode = KVec::new();
> +                ucode.extend_from_slice(ucode_data, GFP_KERNEL)?;
> +
> +                Some(GenericBootloader { desc, ucode })
> +            }
> +            None => None,
> +        };
> +
>          Ok(FwsecFirmware {
>              desc,
>              ucode: ucode_signed,
> +            gen_bootloader: gbl,
>          })
>      }

  parent reply	other threads:[~2026-01-16 21:05 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 19:29 [PATCH v6 00/11] gpu: nova-core: add Turing support Timur Tabi
2026-01-14 19:29 ` [PATCH v6 01/11] gpu: nova-core: rename Imem to ImemSecure Timur Tabi
2026-01-14 19:29 ` [PATCH v6 02/11] gpu: nova-core: add ImemNonSecure section infrastructure Timur Tabi
2026-01-22 12:52   ` Gary Guo
2026-01-22 19:00     ` Timur Tabi
2026-01-14 19:29 ` [PATCH v6 03/11] gpu: nova-core: support header parsing on Turing/GA100 Timur Tabi
2026-01-14 19:29 ` [PATCH v6 04/11] gpu: nova-core: add support for Turing/GA100 fwsignature Timur Tabi
2026-01-14 19:29 ` [PATCH v6 05/11] gpu: nova-core: add NV_PFALCON_FALCON_DMATRFCMD::with_falcon_mem() Timur Tabi
2026-01-14 19:29 ` [PATCH v6 06/11] gpu: nova-core: move some functions into the HAL Timur Tabi
2026-01-16 19:55   ` Danilo Krummrich
2026-01-16 19:55     ` Danilo Krummrich
2026-01-16 20:08     ` John Hubbard
2026-01-16 20:11       ` Danilo Krummrich
2026-01-16 20:11         ` Danilo Krummrich
2026-01-16 20:15         ` John Hubbard
2026-01-16 20:15           ` John Hubbard
2026-01-16 20:21           ` Danilo Krummrich
2026-01-16 20:21             ` Danilo Krummrich
2026-01-16 20:27             ` John Hubbard
2026-01-16 20:27               ` John Hubbard
2026-01-14 19:29 ` [PATCH v6 07/11] gpu: nova-core: Add basic Turing HAL Timur Tabi
2026-01-14 19:29 ` [PATCH v6 08/11] gpu: nova-core: add Falcon HAL method supports_dma() Timur Tabi
2026-01-16  1:53   ` Alexandre Courbot
2026-01-16  1:53     ` Alexandre Courbot
2026-01-16  3:00     ` Timur Tabi
2026-01-16  3:00       ` Timur Tabi
2026-01-14 19:29 ` [PATCH v6 09/11] gpu: nova-core: add FalconUCodeDescV2 support Timur Tabi
2026-01-16  3:11   ` Alexandre Courbot
2026-01-16  3:11     ` Alexandre Courbot
2026-01-16  3:23   ` Alexandre Courbot
2026-01-16  3:23     ` Alexandre Courbot
2026-01-16 20:09     ` Danilo Krummrich
2026-01-16 20:09       ` Danilo Krummrich
2026-01-14 19:29 ` [PATCH v6 10/11] gpu: nova-core: align LibosMemoryRegionInitArgument size to page size Timur Tabi
2026-01-14 19:29 ` [PATCH v6 11/11] gpu: nova-core: add PIO support for loading firmware images Timur Tabi
2026-01-16 15:22   ` Alexandre Courbot
2026-01-16 15:22     ` Alexandre Courbot
2026-01-16 21:05   ` Danilo Krummrich [this message]
2026-01-16 21:05     ` Danilo Krummrich
2026-01-17  1:55     ` Alexandre Courbot
2026-01-17  1:55       ` Alexandre Courbot
2026-01-21  0:25     ` Timur Tabi
2026-01-21  0:25       ` Timur Tabi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DFQBHVTTHZY8.13ASLCJ3FJP81@kernel.org \
    --to=dakr@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=ttabi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.