Netdev List
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: nikhil.rao@amd.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	brett.creeley@amd.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	eric.joyner@amd.com, Vamsi.Atluri@amd.com
Subject: Re: [PATCH net-next v2 3/6] pds_core: add PLDM firmware update support via devlink flash
Date: Wed, 20 May 2026 16:44:32 -0700	[thread overview]
Message-ID: <20260520234432.567056-1-kuba@kernel.org> (raw)
In-Reply-To: <20260516-upstream_v2_clean-v2-3-7e0d66bf4020@amd.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
pds_core: add PLDM firmware update support via devlink flash

This adds PLDM firmware update support to pds_core via the upstream
pldmfw API, allowing flashing of entire packages or individual
components through devlink dev flash.

> diff --git a/drivers/net/ethernet/amd/pds_core/fw.c b/drivers/net/ethernet/amd/pds_core/fw.c
> index fa626719e68d..f091a753bce9 100644
> --- a/drivers/net/ethernet/amd/pds_core/fw.c
> +++ b/drivers/net/ethernet/amd/pds_core/fw.c

[ ... ]

> +int pdsc_get_component_info(struct pdsc *pdsc)
> +{
> +	union pds_core_dev_cmd cmd = {
> +		.get_component_info.opcode = PDS_CORE_CMD_GET_COMPONENT_INFO,
> +		.get_component_info.ver = 1,
> +	};
> +	struct pds_core_component_list_info *list_info;
> +	union pds_core_dev_comp comp = {};
> +	dma_addr_t dma_addr;
> +	u8 num_components;
> +	int err, i;
> +
> +	list_info = kzalloc(PDS_PAGE_SIZE, GFP_KERNEL);
> +	if (!list_info)
> +		return -ENOMEM;
> +
> +	dma_addr = dma_map_single(pdsc->dev, list_info, PDS_PAGE_SIZE,
> +				  DMA_FROM_DEVICE);
> +	if (dma_mapping_error(pdsc->dev, dma_addr)) {
> +		dev_err(pdsc->dev,
> +			"Failed to dma_map component_list_info length %d\n",
> +			PDS_PAGE_SIZE);
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	cmd.get_component_info.data_len = cpu_to_le16(PDS_PAGE_SIZE);
> +	cmd.get_component_info.data_pa = cpu_to_le64(dma_addr);
> +
> +	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout * 2);
> +	dma_unmap_single(pdsc->dev, dma_addr, PDS_PAGE_SIZE, DMA_FROM_DEVICE);
> +	if (err)
> +		goto out;
> +
> +	if (comp.get_component_info.ver == 0) {
> +		/* Don't support backward compatibility as version 0 has
> +		 * alignment issues, so give a hint to users to update
> +		 * their firmware
> +		 */
> +		dev_warn(pdsc->dev,
> +			 "Incompatible get_component_info version %u reported by firmware\n",
> +			 comp.get_component_info.ver);
> +		err = 0;
> +		goto out;
> +	}

[Medium]
The comment states version 0 is not supported because of alignment
issues, but the code sets err = 0 and falls through with
pdsc->fw_components.num_components left at zero.

The caller pdsc_pldm_firmware_update() treats err == 0 as success and
calls pldmfw_flash_image() with no cached components, which leads to
a few user-visible side effects:

  - When a user passes a specific component, pdsc_get_slot_id() iterates
    an empty fw_components array and returns PDS_CORE_FW_SLOT_MAX, which
    bubbles up as -ENOENT from pdsc_send_component_table() rather than a
    "firmware too old" indication.

  - For a full-image flash, pdsc_set_component_name() finds no matches,
    so devlink_flash_update_status_notify() ends up reporting empty
    component strings throughout the flash.

  - pdsc_pldm_firmware_update() resets num_components = 0 at the end of
    every flash, so the dev_warn and the same downstream paths recur on
    every subsequent attempt.

Should this path return an error such as -EOPNOTSUPP so the documented
intent matches what callers see?

[ ... ]

> +static int pdsc_devcmd_send_component(struct pdsc *pdsc,
> +				      struct pds_core_flash_component *info,
> +				      u16 info_sz, dma_addr_t addr, u32 length,
> +				      u32 offset, u16 slot_id,
> +				      union pds_core_dev_comp *comp)
> +{
> +	union pds_core_dev_cmd cmd = {
> +		.send_component.opcode = PDS_CORE_CMD_SEND_COMPONENT,
> +		.send_component.ver = 1,
> +		.send_component.operation = PDS_CORE_SEND_COMPONENT_START,
> +		.send_component.data_pa = cpu_to_le64(addr),
> +		.send_component.data_len = cpu_to_le32(length),
> +		.send_component.offset = cpu_to_le32(offset),
> +		.send_component.slot_id = slot_id,
> +	};
> +	unsigned long timeout = 300 * HZ;
> +	unsigned long start_time;
> +	unsigned long end_time;
> +	int err;
> +
> +	start_time = jiffies;
> +	end_time = start_time + timeout;
> +	do {
> +		/* prevent noisy/benign devcmd failures */
> +		err = pdsc_devcmd_with_data_nomsg(pdsc, &cmd, info, info_sz,
> +						  comp, 60);
> +		if (err != -EAGAIN)
> +			break;
> +
> +		/* if required, subsequent commands check status of
> +		 * PDS_CORE_CMD_SEND_COMPONENT command, which returns
> +		 * EAGAIN while the command is still running,
> +		 * else we get the final command status.
> +		 */
> +		cmd.send_component.operation = PDS_CORE_SEND_COMPONENT_STATUS;
> +		msleep(20);
> +	} while (time_before(jiffies, end_time));
> +
> +	if (err == -EAGAIN || err == -ETIMEDOUT)
> +		dev_err(pdsc->dev, "PDS_CORE_CMD_SEND_COMPONENT timed out\n");
> +
> +	return err;
> +}

[ ... ]

> +static int pdsc_flash_component(struct pldmfw *context,
> +				struct pldmfw_component *component)
> +{

[ ... ]

> +		err = pdsc_devcmd_send_component(pdsc, component_info, info_sz,
> +						 dma_addr, copy_sz, offset,
> +						 slot_id, &comp);
> +		dma_unmap_single(dev, dma_addr, copy_sz, DMA_TO_DEVICE);
> +		kfree(component_data);

[High]
On the -EAGAIN or -ETIMEDOUT exit from pdsc_devcmd_send_component(),
can the device still be DMA-reading from this buffer when the unmap
and kfree happen here?

The polling loop in pdsc_devcmd_send_component() runs up to 300 seconds
and the only exits when the device is still busy are -EAGAIN (the
status query explicitly says START has not completed) or -ETIMEDOUT.
No abort or cancel opcode is issued, and the wire format documented
for struct pds_core_send_component_cmd hands data_pa to the device on
the START call.

Looking at __pdsc_devcmd_locked() in pds_core/dev.c:

    if ((err == -ENXIO || err == -ETIMEDOUT) && pdsc->wq)
            queue_work(pdsc->wq, &pdsc->health_work);

-EAGAIN does not even trigger health_work, so the device is not
treated as hung in that case.

Once control returns to pdsc_flash_component():

    err = pdsc_devcmd_send_component(...);
    dma_unmap_single(dev, dma_addr, copy_sz, DMA_TO_DEVICE);
    kfree(component_data);

If the firmware finishes the flash a moment later and reads from
data_pa, on IOMMU systems the unmap can race the DMA and produce
IOMMU faults; on non-IOMMU systems the device can DMA-read from
freed slab memory belonging to a later allocation.

Should this path leak the buffer, reset the device, or send an abort
opcode to the firmware before unmapping and freeing on the busy
timeout exits?

[ ... ]

  reply	other threads:[~2026-05-20 23:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29  8:28 [PATCH net-next 0/6] pds_core: Add PLDM firmware update and host backed memory support Nikhil P. Rao
2026-04-29  8:28 ` [PATCH net-next 1/6] pds_core: add support for quiet devcmd failures Nikhil P. Rao
2026-04-29  8:28 ` [PATCH net-next 2/6] pds_core: add support for identity version 2 Nikhil P. Rao
2026-04-29  8:28 ` [PATCH net-next 3/6] pds_core: add PLDM firmware update support via devlink flash Nikhil P. Rao
2026-05-01  1:05   ` Jakub Kicinski
2026-05-01 20:03     ` Rao, Nikhil
2026-04-29  8:28 ` [PATCH net-next 4/6] pds_core: add PLDM component info display Nikhil P. Rao
2026-04-29  8:28 ` [PATCH net-next 5/6] pds_core: add host backed memory support for firmware Nikhil P. Rao
2026-04-29  8:28 ` [PATCH net-next 6/6] pds_core: add debugfs support for host backed memory Nikhil P. Rao
2026-05-16  2:42 ` [PATCH net-next v2 0/6] PLDM Firmware Update Support for pds_core Nikhil P. Rao
2026-05-16  2:42   ` [PATCH net-next v2 1/6] pds_core: add support for quiet devcmd failures Nikhil P. Rao
2026-05-16  2:42   ` [PATCH net-next v2 2/6] pds_core: add support for identity version 2 Nikhil P. Rao
2026-05-20 23:44     ` Jakub Kicinski
2026-05-16  2:42   ` [PATCH net-next v2 3/6] pds_core: add PLDM firmware update support via devlink flash Nikhil P. Rao
2026-05-20 23:44     ` Jakub Kicinski [this message]
2026-05-16  2:42   ` [PATCH net-next v2 4/6] pds_core: add PLDM component info display Nikhil P. Rao
2026-05-20 23:44     ` Jakub Kicinski
2026-05-20 23:47     ` Jakub Kicinski
2026-05-16  2:42   ` [PATCH net-next v2 5/6] pds_core: add host backed memory support for firmware Nikhil P. Rao
2026-05-20 23:44     ` Jakub Kicinski
2026-05-16  2:42   ` [PATCH net-next v2 6/6] pds_core: add debugfs support for host backed memory Nikhil P. Rao
2026-05-20 23:44     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520234432.567056-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=Vamsi.Atluri@amd.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=brett.creeley@amd.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.joyner@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikhil.rao@amd.com \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox