BPF List
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Daniel Machon" <daniel.machon@microchip.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH net-next v4 09/13] net: lan966x: add PCIe FDMA support
Date: Sat, 09 May 2026 08:27:19 +0000	[thread overview]
Message-ID: <20260509082720.503A6C2BCB2@smtp.kernel.org> (raw)
In-Reply-To: <20260508-lan966x-pci-fdma-v4-9-14e0c89d8d63@microchip.com>

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [Critical] Time-of-Check to Time-of-Use (TOCTOU) vulnerability on DMA descriptor status leading to out-of-bounds kernel memory read.
- [High] Use-After-Free in the transmit path due to unconditional SKB consumption when PTP is enabled.
- [High] TX Ring out-of-order traversal breaks hardware FIFO ordering and causes permanent TX queue stalls.
- [High] Use-After-Free during teardown due to freeing DMA memory before unregistering network devices.
- [High] NAPI RX queue stall on SKB allocation failure or malformed frames due to premature loop exit.
- [High] Missing memory barrier in RX descriptor recycling leads to silent data corruption.
- [High] Teardown race condition: hardware RX channel is disabled while NAPI is still active.
- [High] Unconditional re-enabling of all FDMA interrupts causes interrupt storms.
- [Medium] Missing feature implementation for NETIF_F_RXFCS in the PCIe RX path.
- [Medium] Parsing IFH fields before verifying frame bounds allows network-controllable WARN_ON triggers.
--

commit e3118c2c5891c088b15f392a451ae362b3da62d0
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: add PCIe FDMA support

This patch adds PCIe FDMA support for the lan966x driver, mapping contiguous
DMA buffers through the endpoint's ATU and using memcpy-based frame
transfers instead of per-page DMA mappings, improving throughput
significantly.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
> new file mode 100644
> index 0000000000000..2e88d211073d6
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
[ ... ]
> +static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma)
> +{
> +	struct fdma_db *db;
> +
> +	for (int i = 0; i < fdma->n_dcbs; i++) {
> +		db = fdma_db_get(fdma, i, 0);

Does scanning from index 0 on every call break the hardware FIFO ordering?
If the CPU reuses index 0 while the hardware is processing a later index,
will the hardware miss the new descriptor until it wraps around the ring?

Also, if the hardware encounters a DONE descriptor and halts, and the CPU
places new data at index 0 without advancing the hardware's pointer, could
the TX path permanently stall? Should this use a cyclic running index instead?

> +
> +		if (!fdma_db_is_done(db))
> +			continue;
[ ... ]
> +static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
> +{
> +	struct lan966x *lan966x = rx->lan966x;
> +	struct fdma *fdma = &rx->fdma;
> +	struct lan966x_port *port;
> +	struct fdma_db *db;
> +	void *virt_addr;
> +	u32 blockl;
> +
> +	/* virt_addr points to the IFH. */
> +	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma,
> +						      fdma->dcb_index,
> +						      fdma->db_index);
> +
> +	lan966x_ifh_get_src_port(virt_addr, src_port);
> +
> +	if (WARN_ON(*src_port >= lan966x->num_phys_ports))
> +		return FDMA_ERROR;

Is it safe to parse the IFH fields from memory and validate them with a WARN_ON
before verifying the frame bounds? Could a runt frame or corrupt DMA descriptor
from the PCIe endpoint trigger this WARN_ON unnecessarily, causing a
denial-of-service via kernel log spam? Should the length check happen first?

> +
> +	port = lan966x->ports[*src_port];
> +	if (!port)
> +		return FDMA_ERROR;
> +
> +	db = fdma_db_next_get(fdma);
> +
> +	/* BLOCKL is a 16-bit HW-populated field; reject obviously-bad
> +	 * values before they feed memcpy/XDP sizes.
> +	 */
> +	blockl = FDMA_DCB_STATUS_BLOCKL(db->status);

Can a malicious or compromised PCIe endpoint modify db->status between this
size validation and the memcpy in lan966x_fdma_pci_rx_get_frame()?

Since the descriptor resides in shared memory, could a secondary modification
bypass the size validation and allow an out-of-bounds read of adjacent kernel
memory into the network stack?

> +	if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl))
> +		return FDMA_ERROR;
> +
> +	return FDMA_PASS;
> +}
> +
> +static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
> +						     u64 src_port)
> +{
> +	struct lan966x *lan966x = rx->lan966x;
> +	struct fdma *fdma = &rx->fdma;
> +	struct sk_buff *skb;
> +	struct fdma_db *db;
> +	u32 data_len;
> +
> +	/* Get the received frame and create an SKB for it. */
> +	db = fdma_db_next_get(fdma);
> +	data_len = FDMA_DCB_STATUS_BLOCKL(db->status);
> +
> +	skb = napi_alloc_skb(&lan966x->napi, data_len);
> +	if (unlikely(!skb))
> +		return NULL;
> +
> +	memcpy(skb->data,
> +	       fdma_dataptr_virt_addr_contiguous(fdma,
> +						 fdma->dcb_index,
> +						 fdma->db_index),
> +						 data_len);
> +
> +	skb_put(skb, data_len);
> +
> +	skb->dev = lan966x->ports[src_port]->dev;
> +	skb_pull(skb, IFH_LEN_BYTES);
> +
> +	skb_trim(skb, skb->len - ETH_FCS_LEN);

Does this unconditionally trim the FCS, breaking the NETIF_F_RXFCS feature
for PCIe-based devices? The non-PCI path seems to conditionally trim it
based on skb->dev->features.

> +
> +	skb->protocol = eth_type_trans(skb, skb->dev);
[ ... ]
> +static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh,
> +				 struct net_device *dev)
> +{
[ ... ]
> +	/* Safe to free: the PCIe DTBO does not enable the PTP interrupt,
> +	 * so lan966x->ptp stays 0 and lan966x_port_xmit() never enqueues
> +	 * this skb on port->tx_skbs for a TX timestamp.
> +	 */
> +	dev_consume_skb_any(skb);

Can this cause a use-after-free if the packet requires a TX timestamp?
When PTP is enabled, lan966x_ptp_txtstamp_request() queues the SKB for later
processing. Does unconditionally consuming the SKB here free the memory before
the PTP worker accesses it? Is it possible for a standard device tree to
enable PTP on this interface, despite the comment?

> +
> +	return NETDEV_TX_OK;
> +}
> +
> +static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
> +{
[ ... ]
> +	/* Get all received skbs. */
> +	while (counter < weight) {
> +		if (!fdma_has_frames(fdma))
> +			break;
> +		/* Order DONE read before DCB/frame reads below. */
> +		dma_rmb();
> +		counter++;
> +		switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) {
> +		case FDMA_PASS:
> +			break;
> +		case FDMA_ERROR:
> +			fdma_dcb_advance(fdma);
> +			goto allocate_new;
> +		}
> +		skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
> +		fdma_dcb_advance(fdma);
> +		if (!skb)
> +			goto allocate_new;

If napi_alloc_skb() fails due to memory pressure, does branching to
allocate_new prematurely break out of the processing loop? Since the hardware
interrupt was already cleared, and napi_complete_done() is called below,
will this cause the RX queue to stall permanently for the remaining
unprocessed frames? Should the code drop the packet and continue instead?

> +
> +		napi_gro_receive(&lan966x->napi, skb);
> +	}
> +allocate_new:
> +	while (dcb_reload != fdma->dcb_index) {
> +		old_dcb = dcb_reload;
> +		dcb_reload++;
> +		dcb_reload &= fdma->n_dcbs - 1;
> +
> +		fdma_dcb_add(fdma,
> +			     old_dcb,
> +			     FDMA_DCB_INFO_DATAL(fdma->db_size),
> +			     FDMA_DCB_STATUS_INTR);

Is a memory barrier needed before recycling the descriptor?

The CPU reads the packet payload via memcpy() in lan966x_fdma_pci_rx_get_frame(),
and then recycles the descriptor by writing to db->status via fdma_dcb_add().
Could weakly ordered architectures reorder the descriptor status write to occur
before the payload reads have completed, allowing the hardware to overwrite
the buffer prematurely?

> +
> +		lan966x_fdma_rx_reload(rx);
> +	}
> +
> +	if (counter < weight && napi_complete_done(napi, counter))
> +		lan_wr(0xff, lan966x, FDMA_INTR_DB_ENA);

Does writing 0xff forcefully enable all 8 FDMA channels simultaneously?
Could this override the channel-specific bitmask configured during
initialization and lead to interrupt storms from uninitialized or
disabled channels?

> +
> +	return counter;
> +}
[ ... ]
> +static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)
> +{
> +	if (!lan966x->fdma)
> +		return;
> +
> +	lan966x_fdma_rx_disable(&lan966x->rx);
> +	lan966x_fdma_tx_disable(&lan966x->tx);
> +
> +	napi_synchronize(&lan966x->napi);
> +	napi_disable(&lan966x->napi);

Can a race condition occur because the hardware RX channel is disabled while
NAPI is still active?

If an interrupt triggers on another CPU, could the poll function process
frames and call lan966x_fdma_rx_reload(), restarting the hardware channel
and conflicting with the disable sequence? Should NAPI be disabled before
shutting down the hardware?

> +
> +	fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
> +	fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->tx.fdma);

During module removal, lan966x_fdma_pci_deinit() unmaps the PCIe ATU and frees
the DMA buffers before the network devices are unregistered.

Can the networking stack still concurrently invoke ndo_start_xmit during this
window? If so, would lan966x_fdma_pci_xmit() attempt to write to unmapped
and freed DMA memory, resulting in a use-after-free and potential PCI bus
aborts?

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260508-lan966x-pci-fdma-v4-0-14e0c89d8d63@microchip.com?part=9

  reply	other threads:[~2026-05-09  8:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  7:35 [PATCH net-next v4 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 02/13] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 03/13] net: microchip: fdma: add PCIe ATU support Daniel Machon
2026-05-09  8:27   ` sashiko-bot
2026-05-08  7:35 ` [PATCH net-next v4 04/13] net: lan966x: add FDMA LLP register write helper Daniel Machon
2026-05-09  8:27   ` sashiko-bot
2026-05-08  7:35 ` [PATCH net-next v4 05/13] net: lan966x: export FDMA helpers for reuse Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 06/13] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 07/13] net: lan966x: clear FDMA interrupt stickies after switch reset Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot Daniel Machon
2026-05-09  8:27   ` sashiko-bot
2026-05-08  7:35 ` [PATCH net-next v4 09/13] net: lan966x: add PCIe FDMA support Daniel Machon
2026-05-09  8:27   ` sashiko-bot [this message]
2026-05-08  7:35 ` [PATCH net-next v4 10/13] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
2026-05-09  8:27   ` sashiko-bot
2026-05-08  7:35 ` [PATCH net-next v4 11/13] net: lan966x: add PCIe FDMA XDP support Daniel Machon
2026-05-09  8:27   ` sashiko-bot
2026-05-08  7:35 ` [PATCH net-next v4 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
2026-05-08  7:35 ` [PATCH net-next v4 13/13] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
2026-05-12  2:20 ` [PATCH net-next v4 00/13] net: lan966x: add support for PCIe FDMA Jakub Kicinski
2026-05-12  9:56   ` Daniel Machon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260509082720.503A6C2BCB2@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel.machon@microchip.com \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox