From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAE1938B125 for ; Sat, 9 May 2026 08:27:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315240; cv=none; b=DC5/oUdIbLldI03MVL13/1RZEVdw2GgwuoLC1kYFSWKQVBiwYgY0omnE8QGRR1QcG6J+KZRFObfX4oAKGJOWBdQ9PApB9ZzZVBirGhLGaAja2am00waPDDpMPgQObDc3evvaCAHIwNhUGXD9Qb/wdtj2gIsOFXwcCb53iS7uagg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315240; c=relaxed/simple; bh=6yyVaU0P3FlSLXrMr995PiLhA3L2zvEGrAhx5ugsWBg=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=BIqvXQRSaHiEBFOk3ud4/7TiMhBeXhF29pivgxJqvuHrLF1cBuij41RXbtYj+yBVJzdba2YvV2CmQ2MRCZqHEMroPvyAZDVapRwEUHX5IftnBb008rOJwlUZUZ4Uc3gihD4vxbUWCoK8MhjeHHV7HLjV8uLbZ70NZlbHmnJE97M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YoDmu72o; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YoDmu72o" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 503A6C2BCB2; Sat, 9 May 2026 08:27:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778315240; bh=6yyVaU0P3FlSLXrMr995PiLhA3L2zvEGrAhx5ugsWBg=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=YoDmu72oEUT7ERvY/Mi1khQCho5U+cab87JQ+O6nqwu4wf3k6mdgKjZ6CIla4Myqi 9hmI12rCz07SvKs3fq9vsuaT8gkVBtrcTveEBHfP1wue+lroBtbrg/pS3brpnldQtd Buep2AvZU9Awc53x2MyNmRPoJIr4GgVde4ru2sx9u4GEX9RLkWyD0hkrkVMa/wqYkV ooBiwNiOvRyb4mI0sCbwL1Mt1C5xBAg0ByQFWG8gC1oPRTMRaig0NhgMftZEgzKiiK mH+fWzF+31Nd8LvIkSiXbou8B0F6cQmyoYnIAzhUeiO/AB1uBXiQUUIKZwofJJpNup pED6kCKuDD4pQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH net-next v4 09/13] net: lan966x: add PCIe FDMA support Reply-To: sashiko@lists.linux.dev To: "Daniel Machon" Cc: bpf@vger.kernel.org In-Reply-To: <20260508-lan966x-pci-fdma-v4-9-14e0c89d8d63@microchip.com> References: <20260508-lan966x-pci-fdma-v4-9-14e0c89d8d63@microchip.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 09 May 2026 08:27:19 +0000 Message-Id: <20260509082720.503A6C2BCB2@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 10 potential issue= (s) to consider: - [Critical] Time-of-Check to Time-of-Use (TOCTOU) vulnerability on DMA des= criptor status leading to out-of-bounds kernel memory read. - [High] Use-After-Free in the transmit path due to unconditional SKB consu= mption when PTP is enabled. - [High] TX Ring out-of-order traversal breaks hardware FIFO ordering and c= auses permanent TX queue stalls. - [High] Use-After-Free during teardown due to freeing DMA memory before un= registering network devices. - [High] NAPI RX queue stall on SKB allocation failure or malformed frames = due to premature loop exit. - [High] Missing memory barrier in RX descriptor recycling leads to silent = data corruption. - [High] Teardown race condition: hardware RX channel is disabled while NAP= I is still active. - [High] Unconditional re-enabling of all FDMA interrupts causes interrupt = storms. - [Medium] Missing feature implementation for NETIF_F_RXFCS in the PCIe RX = path. - [Medium] Parsing IFH fields before verifying frame bounds allows network-= controllable WARN_ON triggers. -- commit e3118c2c5891c088b15f392a451ae362b3da62d0 Author: Daniel Machon net: lan966x: add PCIe FDMA support This patch adds PCIe FDMA support for the lan966x driver, mapping contiguous DMA buffers through the endpoint's ATU and using memcpy-based frame transfers instead of per-page DMA mappings, improving throughput significantly. > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/= drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c > new file mode 100644 > index 0000000000000..2e88d211073d6 > --- /dev/null > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c [ ... ] > +static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma) > +{ > + struct fdma_db *db; > + > + for (int i =3D 0; i < fdma->n_dcbs; i++) { > + db =3D fdma_db_get(fdma, i, 0); Does scanning from index 0 on every call break the hardware FIFO ordering? If the CPU reuses index 0 while the hardware is processing a later index, will the hardware miss the new descriptor until it wraps around the ring? Also, if the hardware encounters a DONE descriptor and halts, and the CPU places new data at index 0 without advancing the hardware's pointer, could the TX path permanently stall? Should this use a cyclic running index inste= ad? > + > + if (!fdma_db_is_done(db)) > + continue; [ ... ] > +static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *s= rc_port) > +{ > + struct lan966x *lan966x =3D rx->lan966x; > + struct fdma *fdma =3D &rx->fdma; > + struct lan966x_port *port; > + struct fdma_db *db; > + void *virt_addr; > + u32 blockl; > + > + /* virt_addr points to the IFH. */ > + virt_addr =3D fdma_dataptr_virt_addr_contiguous(fdma, > + fdma->dcb_index, > + fdma->db_index); > + > + lan966x_ifh_get_src_port(virt_addr, src_port); > + > + if (WARN_ON(*src_port >=3D lan966x->num_phys_ports)) > + return FDMA_ERROR; Is it safe to parse the IFH fields from memory and validate them with a WAR= N_ON before verifying the frame bounds? Could a runt frame or corrupt DMA descri= ptor from the PCIe endpoint trigger this WARN_ON unnecessarily, causing a denial-of-service via kernel log spam? Should the length check happen first? > + > + port =3D lan966x->ports[*src_port]; > + if (!port) > + return FDMA_ERROR; > + > + db =3D fdma_db_next_get(fdma); > + > + /* BLOCKL is a 16-bit HW-populated field; reject obviously-bad > + * values before they feed memcpy/XDP sizes. > + */ > + blockl =3D FDMA_DCB_STATUS_BLOCKL(db->status); Can a malicious or compromised PCIe endpoint modify db->status between this size validation and the memcpy in lan966x_fdma_pci_rx_get_frame()? Since the descriptor resides in shared memory, could a secondary modificati= on bypass the size validation and allow an out-of-bounds read of adjacent kern= el memory into the network stack? > + if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl)) > + return FDMA_ERROR; > + > + return FDMA_PASS; > +} > + > +static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *= rx, > + u64 src_port) > +{ > + struct lan966x *lan966x =3D rx->lan966x; > + struct fdma *fdma =3D &rx->fdma; > + struct sk_buff *skb; > + struct fdma_db *db; > + u32 data_len; > + > + /* Get the received frame and create an SKB for it. */ > + db =3D fdma_db_next_get(fdma); > + data_len =3D FDMA_DCB_STATUS_BLOCKL(db->status); > + > + skb =3D napi_alloc_skb(&lan966x->napi, data_len); > + if (unlikely(!skb)) > + return NULL; > + > + memcpy(skb->data, > + fdma_dataptr_virt_addr_contiguous(fdma, > + fdma->dcb_index, > + fdma->db_index), > + data_len); > + > + skb_put(skb, data_len); > + > + skb->dev =3D lan966x->ports[src_port]->dev; > + skb_pull(skb, IFH_LEN_BYTES); > + > + skb_trim(skb, skb->len - ETH_FCS_LEN); Does this unconditionally trim the FCS, breaking the NETIF_F_RXFCS feature for PCIe-based devices? The non-PCI path seems to conditionally trim it based on skb->dev->features. > + > + skb->protocol =3D eth_type_trans(skb, skb->dev); [ ... ] > +static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh, > + struct net_device *dev) > +{ [ ... ] > + /* Safe to free: the PCIe DTBO does not enable the PTP interrupt, > + * so lan966x->ptp stays 0 and lan966x_port_xmit() never enqueues > + * this skb on port->tx_skbs for a TX timestamp. > + */ > + dev_consume_skb_any(skb); Can this cause a use-after-free if the packet requires a TX timestamp? When PTP is enabled, lan966x_ptp_txtstamp_request() queues the SKB for later processing. Does unconditionally consuming the SKB here free the memory bef= ore the PTP worker accesses it? Is it possible for a standard device tree to enable PTP on this interface, despite the comment? > + > + return NETDEV_TX_OK; > +} > + > +static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weig= ht) > +{ [ ... ] > + /* Get all received skbs. */ > + while (counter < weight) { > + if (!fdma_has_frames(fdma)) > + break; > + /* Order DONE read before DCB/frame reads below. */ > + dma_rmb(); > + counter++; > + switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) { > + case FDMA_PASS: > + break; > + case FDMA_ERROR: > + fdma_dcb_advance(fdma); > + goto allocate_new; > + } > + skb =3D lan966x_fdma_pci_rx_get_frame(rx, src_port); > + fdma_dcb_advance(fdma); > + if (!skb) > + goto allocate_new; If napi_alloc_skb() fails due to memory pressure, does branching to allocate_new prematurely break out of the processing loop? Since the hardwa= re interrupt was already cleared, and napi_complete_done() is called below, will this cause the RX queue to stall permanently for the remaining unprocessed frames? Should the code drop the packet and continue instead? > + > + napi_gro_receive(&lan966x->napi, skb); > + } > +allocate_new: > + while (dcb_reload !=3D fdma->dcb_index) { > + old_dcb =3D dcb_reload; > + dcb_reload++; > + dcb_reload &=3D fdma->n_dcbs - 1; > + > + fdma_dcb_add(fdma, > + old_dcb, > + FDMA_DCB_INFO_DATAL(fdma->db_size), > + FDMA_DCB_STATUS_INTR); Is a memory barrier needed before recycling the descriptor? The CPU reads the packet payload via memcpy() in lan966x_fdma_pci_rx_get_fr= ame(), and then recycles the descriptor by writing to db->status via fdma_dcb_add(= ). Could weakly ordered architectures reorder the descriptor status write to o= ccur before the payload reads have completed, allowing the hardware to overwrite the buffer prematurely? > + > + lan966x_fdma_rx_reload(rx); > + } > + > + if (counter < weight && napi_complete_done(napi, counter)) > + lan_wr(0xff, lan966x, FDMA_INTR_DB_ENA); Does writing 0xff forcefully enable all 8 FDMA channels simultaneously? Could this override the channel-specific bitmask configured during initialization and lead to interrupt storms from uninitialized or disabled channels? > + > + return counter; > +} [ ... ] > +static void lan966x_fdma_pci_deinit(struct lan966x *lan966x) > +{ > + if (!lan966x->fdma) > + return; > + > + lan966x_fdma_rx_disable(&lan966x->rx); > + lan966x_fdma_tx_disable(&lan966x->tx); > + > + napi_synchronize(&lan966x->napi); > + napi_disable(&lan966x->napi); Can a race condition occur because the hardware RX channel is disabled while NAPI is still active? If an interrupt triggers on another CPU, could the poll function process frames and call lan966x_fdma_rx_reload(), restarting the hardware channel and conflicting with the disable sequence? Should NAPI be disabled before shutting down the hardware? > + > + fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma); > + fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->tx.fdma); During module removal, lan966x_fdma_pci_deinit() unmaps the PCIe ATU and fr= ees the DMA buffers before the network devices are unregistered. Can the networking stack still concurrently invoke ndo_start_xmit during th= is window? If so, would lan966x_fdma_pci_xmit() attempt to write to unmapped and freed DMA memory, resulting in a use-after-free and potential PCI bus aborts? > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260508-lan966x-pc= i-fdma-v4-0-14e0c89d8d63@microchip.com?part=3D9