All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA
@ 2026-05-20  8:12 Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry Daniel Machon
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

When lan966x operates as a PCIe endpoint, the driver currently uses
register-based I/O for frame injection and extraction. This approach is
functional but slow, topping out at around 33 Mbps on an Intel x86 host
with a lan966x PCIe card.

This series adds FDMA (Frame DMA) support for the PCIe path. When
operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
directly access host memory, so DMA buffers are allocated as contiguous
coherent memory and mapped through the PCIe Address Translation Unit
(ATU). The ATU provides outbound windows that translate internal FDMA
addresses to PCIe bus addresses, allowing the FDMA engine to read and
write host memory. Because the ATU requires contiguous address regions,
page_pool and normal per-page DMA mappings cannot be used. Instead,
frames are transferred using memcpy between the ATU-mapped buffers and
the network stack. With this, throughput increases from ~33 Mbps to
~620 Mbps for default MTU.

Patch 1 adds the shared drivers/net/ethernet/microchip/fdma/ directory
to the Sparx5 SoC MAINTAINERS entry.

Patches 2-3 prepare the shared FDMA library: patch 2 renames the
contiguous dataptr helpers for clarity, and patch 3 adds PCIe ATU
region management and coherent DMA allocation with ATU mapping.

Patches 4-6 refactor the lan966x FDMA code to support both platform
and PCIe paths: extracting the LLP register write into a helper,
exporting shared functions, and introducing an ops dispatch table
selected at probe time.

Patches 7-8 harden the existing FDMA path for the PCIe endpoint
lifecycle: patch 7 clears latched FDMA error/interrupt stickies after
the switch reset so they don't assert as soon as interrupts are
enabled, and patch 8 adds a shutdown() callback that quiesces the
FDMA engine on host warm reboot (on the PCIe card the FDMA survives
host reset and would otherwise keep the shared INTx asserted into
the next probe).

Patch 9 adds the core PCIe FDMA implementation with RX/TX using
contiguous ATU-mapped buffers. Patches 10 and 11 extend it with MTU
change and XDP support respectively. XDP_PASS, XDP_TX, XDP_DROP and
XDP_ABORTED are supported; XDP_REDIRECT is deliberately not, because
the PCIe data path does not use page_pool.

Patches 12-13 update the lan966x PCI device tree overlay to extend the
cpu register mapping to cover the ATU register space and add the FDMA
interrupt.

To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Horatiu Vultur <horatiu.vultur@microchip.com>
To: Steen Hegelund <steen.hegelund@microchip.com>
To: UNGLinuxDriver@microchip.com
To: Alexei Starovoitov <ast@kernel.org>
To: Daniel Borkmann <daniel@iogearbox.net>
To: Jesper Dangaard Brouer <hawk@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
To: Herve Codina <herve.codina@bootlin.com>
To: Arnd Bergmann <arnd@arndb.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Mohsin Bashir <mohsin.bashr@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
Changes in v5:

This version fixes a single AI review issue, flagged by Paolo. Other AI
issues  for v4 has been classified as pre-existing or changes for
follow-ups.

- Fix premature napi_complete_done() in lan966x_fdma_pci_napi_poll() on
  FDMA_ERROR and napi_alloc_skb() failure. Bailing out left DONE=1 DCBs
  in the ring with no IRQ to drain them. Drop the frame and continue
  the poll loop instead. Bump rx_dropped on memory-pressure drop.
  (Paolo)
- Link to v4: https://lore.kernel.org/r/20260508-lan966x-pci-fdma-v4-0-14e0c89d8d63@microchip.com

Changes in v4:
- Consolidate rx size checks into lan966x_fdma_pci_rx_size_fits().
  Subtract XDP_PACKET_HEADROOM on the max size check, and add ETH_HLEN
  on the min size check. This fixes potential OOB reads/writes.
- On xdp_prepare_buff(), update comment to clarify that data is already
  offset by XDP_PACKET_HEADROOM.
- Link to v3:
  https://lore.kernel.org/r/20260504-lan966x-pci-fdma-v3-0-a56f5740d870@microchip.com

Changes in v3:

Version 3 fixes a number of issues reported by sashiko - mostly
hardening.

- Fix double use of XDP_PACKET_HEADROOM.
- Fix ERR_PTR persistence in fdma->atu_region and add missing
  NULL/ERR_PTR guard in fdma_pci_atu_region_unmap().
- Reject size <= 0 in fdma_pci_atu_region_map() and return
  -ENOSPC (was -ENOMEM) when no region is free.
- Introduce lan966x_fdma_pci_tx_size_fits() that accounts for
  XDP_PACKET_HEADROOM; use it from both xmit paths to keep
  bpf_xdp_adjust_tail from writing past the TX slot.
- Validate BLOCKL in rx_check_frame() (reject < IFH+FCS or
  > db_size) before it feeds memcpy/XDP sizes.
- READ_ONCE(port->xdp_prog) inside lan966x_xdp_pci_run() to close
  a TOCTOU on XDP detach that could deref NULL in
  bpf_prog_run_xdp().
- Strip IFH and FCS pre-XDP in rx_check_frame(). After BPF runs
  the driver cannot tell whether the tail was modified; drop the
  unconditional skb_pull/skb_trim in rx_get_frame().
- Account tx_bytes/tx_packets on XDP_TX success and tx_dropped on
  XDP_TX size reject.
- Add dma_wmb()/dma_rmb() around DCB status writes and reads in
  xmit, xmit_xdpf, and napi_poll.
- Collected Tested-by: Hervé Codina.
- Link to v2: https://lore.kernel.org/r/20260428-lan966x-pci-fdma-v2-0-d3ec66e06202@microchip.com

Changes in v2:

Version 2 primarily addresses issues with module unload/load, where
traffic would stop working (Hervé), and XDP head/tail adjust that would be
discarded (Mohsin).

Apart from that, I ran through issues reported by Sashiko, and fixed a
number of other issues.

- New patch 1: add drivers/net/ethernet/microchip/fdma/ to the Sparx5
  SoC MAINTAINERS entry.
- New patch 7: clear latched FDMA error/interrupt stickies after the
  switch reset so they don't fire as soon as interrupts are enabled.
- New patch 8: shutdown() callback, quiescing FDMA on host warm reboot.
- Replaced the depth-2 dev_is_pci(parent->parent) backend selector
  with a parent-chain walk.
- XDP: use xdp.data/xdp.data_end for the post-XDP frame length so that
  bpf_xdp_adjust_head/tail are respected (Mohsin Bashir)
- MTU change: drain in-flight xmits with netif_tx_disable() on every
  port before reallocating rings, waking them again on completion.
- MTU change: cap the PCIe DCB ring at 256 entries so a full-ring
  coherent DMA allocation fits in a single MAX_PAGE_ORDER block at
  jumbo MTU.
- PCIe ATU: disable the region before clearing its translation on
  unmap.
- PCIe FDMA: hold tx_lock in napi_poll around the free-DCB check used
  to wake stopped netdev queues.
- PCIe FDMA: return -ENOSPC (not -1) when the DCB ring is exhausted.
- Link to v1: https://lore.kernel.org/r/20260320-lan966x-pci-fdma-v1-0-ef54cb9b0c4b@microchip.com

---
Daniel Machon (13):
      MAINTAINERS: add FDMA library to Sparx5 SoC entry
      net: microchip: fdma: rename contiguous dataptr helpers
      net: microchip: fdma: add PCIe ATU support
      net: lan966x: add FDMA LLP register write helper
      net: lan966x: export FDMA helpers for reuse
      net: lan966x: add FDMA ops dispatch for PCIe support
      net: lan966x: clear FDMA interrupt stickies after switch reset
      net: lan966x: add shutdown callback to stop FDMA on reboot
      net: lan966x: add PCIe FDMA support
      net: lan966x: add PCIe FDMA MTU change support
      net: lan966x: add PCIe FDMA XDP support
      misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
      misc: lan966x-pci: dts: add fdma interrupt to overlay

 MAINTAINERS                                        |   1 +
 drivers/misc/lan966x_pci.dtso                      |   5 +-
 drivers/net/ethernet/microchip/fdma/Makefile       |   4 +
 drivers/net/ethernet/microchip/fdma/fdma_api.c     |  33 +
 drivers/net/ethernet/microchip/fdma/fdma_api.h     |  25 +-
 drivers/net/ethernet/microchip/fdma/fdma_pci.c     | 182 ++++++
 drivers/net/ethernet/microchip/fdma/fdma_pci.h     |  42 ++
 drivers/net/ethernet/microchip/lan966x/Makefile    |   4 +
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  |  51 +-
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 667 +++++++++++++++++++++
 .../net/ethernet/microchip/lan966x/lan966x_main.c  |  74 ++-
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  45 ++
 .../net/ethernet/microchip/lan966x/lan966x_regs.h  |  25 +
 .../net/ethernet/microchip/lan966x/lan966x_xdp.c   |  10 +
 14 files changed, 1128 insertions(+), 40 deletions(-)
---
base-commit: bf53bf33206137c2337bd8aacf0ef4c348b97a36
change-id: 20260313-lan966x-pci-fdma-94ed485d23fa

Best regards,
-- 
Daniel Machon <daniel.machon@microchip.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 02/13] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

The FDMA library under drivers/net/ethernet/microchip/fdma/ is shared by
the lan966x, sparx5 and lan969x drivers, but is not covered by an entry
in the MAINTAINERS file. A subsequent patch will add new files to the
FDMA library, so let's make sure it's covered.

Add drivers/net/ethernet/microchip/fdma/ to the Sparx5 SoC entry, since
I am already listed there.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5db1a2923dd2..2d4eeb855145 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3115,6 +3115,7 @@ M:	UNGLinuxDriver@microchip.com
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:	Supported
 F:	arch/arm64/boot/dts/microchip/sparx*
+F:	drivers/net/ethernet/microchip/fdma/
 F:	drivers/net/ethernet/microchip/vcap/
 F:	drivers/pinctrl/pinctrl-microchip-sgpio.c
 N:	sparx5

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 02/13] net: microchip: fdma: rename contiguous dataptr helpers
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support Daniel Machon
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

When the FDMA library was introduced [1], two helpers to get the DMA and
virtual address of a DCB, in contiguous memory, were added. These
helpers have had no callers until this series. I found the naming I
initially used confusing and inconsistent.

Rename fdma_dataptr_get_contiguous() and
fdma_dataptr_virt_get_contiguous() to fdma_dataptr_dma_addr_contiguous()
and fdma_dataptr_virt_addr_contiguous(). This makes the pair symmetric
and clarifies what type of address each returns.

[1]: commit 30e48a75df9c ("net: microchip: add FDMA library")

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/fdma/fdma_api.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.h b/drivers/net/ethernet/microchip/fdma/fdma_api.h
index d91affe8bd98..94f1a6596097 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.h
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.h
@@ -197,8 +197,9 @@ static inline int fdma_nextptr_cb(struct fdma *fdma, int dcb_idx, u64 *nextptr)
  * if the dataptr addresses and DCB's are in contiguous memory and the driver
  * supports XDP.
  */
-static inline u64 fdma_dataptr_get_contiguous(struct fdma *fdma, int dcb_idx,
-					      int db_idx)
+static inline u64 fdma_dataptr_dma_addr_contiguous(struct fdma *fdma,
+						   int dcb_idx,
+						   int db_idx)
 {
 	return fdma->dma + (sizeof(struct fdma_dcb) * fdma->n_dcbs) +
 	       (dcb_idx * fdma->n_dbs + db_idx) * fdma->db_size +
@@ -209,8 +210,8 @@ static inline u64 fdma_dataptr_get_contiguous(struct fdma *fdma, int dcb_idx,
  * applicable if the dataptr addresses and DCB's are in contiguous memory and
  * the driver supports XDP.
  */
-static inline void *fdma_dataptr_virt_get_contiguous(struct fdma *fdma,
-						     int dcb_idx, int db_idx)
+static inline void *fdma_dataptr_virt_addr_contiguous(struct fdma *fdma,
+						      int dcb_idx, int db_idx)
 {
 	return (u8 *)fdma->dcbs + (sizeof(struct fdma_dcb) * fdma->n_dcbs) +
 	       (dcb_idx * fdma->n_dbs + db_idx) * fdma->db_size +

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 02/13] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper Daniel Machon
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

When lan966x or lan969x operates as a PCIe endpoint, the internal FDMA
engine cannot directly access host memory. Instead, DMA addresses must
be translated through the PCIe Address Translation Unit (ATU). The ATU
provides outbound windows that map internal addresses to PCIe bus
addresses.

The ATU outbound address space (0x10000000-0x1fffffff) is divided into
six equally-sized regions (~42MB each). When FDMA buffers are allocated,
a free ATU region is claimed and programmed with the DMA target address.
The FDMA engine then uses the region's base address in its descriptors,
and the ATU translates these to the actual DMA addresses on the PCIe bus.

Add the required functions and helpers that combine the DMA allocation
with the ATU region mapping, effectively adding support for PCIe FDMA.

This implementation will also be used by the lan969x, when PCIe FDMA is
added for that platform in the future.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/fdma/Makefile   |   4 +
 drivers/net/ethernet/microchip/fdma/fdma_api.c |  33 +++++
 drivers/net/ethernet/microchip/fdma/fdma_api.h |  16 +++
 drivers/net/ethernet/microchip/fdma/fdma_pci.c | 182 +++++++++++++++++++++++++
 drivers/net/ethernet/microchip/fdma/fdma_pci.h |  42 ++++++
 5 files changed, 277 insertions(+)

diff --git a/drivers/net/ethernet/microchip/fdma/Makefile b/drivers/net/ethernet/microchip/fdma/Makefile
index cc9a736be357..eed4df6f7158 100644
--- a/drivers/net/ethernet/microchip/fdma/Makefile
+++ b/drivers/net/ethernet/microchip/fdma/Makefile
@@ -5,3 +5,7 @@
 
 obj-$(CONFIG_FDMA) += fdma.o
 fdma-y += fdma_api.o
+
+ifdef CONFIG_MCHP_LAN966X_PCI
+fdma-y += fdma_pci.o
+endif
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.c b/drivers/net/ethernet/microchip/fdma/fdma_api.c
index e78c3590da9e..e0c2b137afef 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.c
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.c
@@ -127,6 +127,39 @@ void fdma_free_phys(struct fdma *fdma)
 }
 EXPORT_SYMBOL_GPL(fdma_free_phys);
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+/* Allocate coherent DMA memory and map it in the ATU. */
+int fdma_alloc_coherent_and_map(struct device *dev, struct fdma *fdma,
+				struct fdma_pci_atu *atu)
+{
+	struct fdma_pci_atu_region *region;
+	int err;
+
+	err = fdma_alloc_coherent(dev, fdma);
+	if (err)
+		return err;
+
+	region = fdma_pci_atu_region_map(atu, fdma->dma, fdma->size);
+	if (IS_ERR(region)) {
+		fdma_free_coherent(dev, fdma);
+		return PTR_ERR(region);
+	}
+
+	fdma->atu_region = region;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fdma_alloc_coherent_and_map);
+
+/* Free coherent DMA memory and unmap the memory in the ATU. */
+void fdma_free_coherent_and_unmap(struct device *dev, struct fdma *fdma)
+{
+	fdma_pci_atu_region_unmap(fdma->atu_region);
+	fdma_free_coherent(dev, fdma);
+}
+EXPORT_SYMBOL_GPL(fdma_free_coherent_and_unmap);
+#endif
+
 /* Get the size of the FDMA memory */
 u32 fdma_get_size(struct fdma *fdma)
 {
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.h b/drivers/net/ethernet/microchip/fdma/fdma_api.h
index 94f1a6596097..0e0f8af7463f 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.h
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.h
@@ -7,6 +7,10 @@
 #include <linux/etherdevice.h>
 #include <linux/types.h>
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+#include "fdma_pci.h"
+#endif
+
 /* This provides a common set of functions and data structures for interacting
  * with the Frame DMA engine on multiple Microchip switchcores.
  *
@@ -109,6 +113,11 @@ struct fdma {
 	u32 channel_id;
 
 	struct fdma_ops ops;
+
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	/* PCI ATU region for this FDMA instance. */
+	struct fdma_pci_atu_region *atu_region;
+#endif
 };
 
 /* Advance the DCB index and wrap if required. */
@@ -234,9 +243,16 @@ int __fdma_dcb_add(struct fdma *fdma, int dcb_idx, u64 info, u64 status,
 
 int fdma_alloc_coherent(struct device *dev, struct fdma *fdma);
 int fdma_alloc_phys(struct fdma *fdma);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+int fdma_alloc_coherent_and_map(struct device *dev, struct fdma *fdma,
+				struct fdma_pci_atu *atu);
+#endif
 
 void fdma_free_coherent(struct device *dev, struct fdma *fdma);
 void fdma_free_phys(struct fdma *fdma);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+void fdma_free_coherent_and_unmap(struct device *dev, struct fdma *fdma);
+#endif
 
 u32 fdma_get_size(struct fdma *fdma);
 u32 fdma_get_size_contiguous(struct fdma *fdma);
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.c b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
new file mode 100644
index 000000000000..1bd41eaa58a4
--- /dev/null
+++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "fdma_pci.h"
+
+/* When the switch operates as a PCIe endpoint, the FDMA engine needs to
+ * DMA to/from host memory. The FDMA writes to addresses within the endpoint's
+ * internal Outbound (OB) address space, and the PCIe ATU translates these to
+ * DMA addresses on the PCIe bus, targeting host memory.
+ *
+ * The ATU supports up to six outbound regions. This implementation divides
+ * the OB address space into six equally sized chunks.
+ *
+ * +-------------+------------+------------+-----+------------+
+ * | Index       | Region 0   | Region 1   | ... | Region 5   |
+ * +-------------+------------+------------+-----+------------+
+ * | Base addr   | 0x10000000 | 0x12aa0000 | ... | 0x1d500000 |
+ * | Limit addr  | 0x12a9ffff | 0x1553ffff | ... | 0x1ff9ffff |
+ * | Target addr | host dma   | host dma   | ... | host dma   |
+ * +-------------+------------+------------+-----+------------+
+ *
+ * Base addr is the start address of the region within the OB address space.
+ * Limit addr is the end address of the region within the OB address space.
+ * Target addr is the host DMA address that the base addr translates to.
+ */
+
+#define FDMA_PCI_ATU_REGION_ALIGN    BIT(16) /* 64KB */
+#define FDMA_PCI_ATU_OB_START        0x10000000
+#define FDMA_PCI_ATU_OB_END          0x1fffffff
+
+#define FDMA_PCI_ATU_ADDR            0x300000
+#define FDMA_PCI_ATU_IDX_SIZE        0x200
+#define FDMA_PCI_ATU_ENA_REG         0x4
+#define FDMA_PCI_ATU_ENA_BIT         BIT(31)
+#define FDMA_PCI_ATU_LWR_BASE_ADDR   0x8
+#define FDMA_PCI_ATU_UPP_BASE_ADDR   0xc
+#define FDMA_PCI_ATU_LIMIT_ADDR      0x10
+#define FDMA_PCI_ATU_LWR_TARGET_ADDR 0x14
+#define FDMA_PCI_ATU_UPP_TARGET_ADDR 0x18
+
+static u32 fdma_pci_atu_region_size(void)
+{
+	return round_down((FDMA_PCI_ATU_OB_END - FDMA_PCI_ATU_OB_START) /
+			  FDMA_PCI_ATU_REGION_MAX, FDMA_PCI_ATU_REGION_ALIGN);
+}
+
+static void __iomem *fdma_pci_atu_addr_get(void __iomem *addr, int offset,
+					   int idx)
+{
+	return addr + FDMA_PCI_ATU_ADDR + FDMA_PCI_ATU_IDX_SIZE * idx + offset;
+}
+
+static void fdma_pci_atu_region_enable(struct fdma_pci_atu_region *region)
+{
+	writel(FDMA_PCI_ATU_ENA_BIT,
+	       fdma_pci_atu_addr_get(region->atu->addr, FDMA_PCI_ATU_ENA_REG,
+				     region->idx));
+}
+
+static void fdma_pci_atu_region_disable(struct fdma_pci_atu_region *region)
+{
+	writel(0, fdma_pci_atu_addr_get(region->atu->addr, FDMA_PCI_ATU_ENA_REG,
+					region->idx));
+}
+
+/* Configure the address translation in the ATU. */
+static void
+fdma_pci_atu_configure_translation(struct fdma_pci_atu_region *region)
+{
+	struct fdma_pci_atu *atu = region->atu;
+	int idx = region->idx;
+
+	writel(lower_32_bits(region->base_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_LWR_BASE_ADDR, idx));
+
+	writel(upper_32_bits(region->base_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_UPP_BASE_ADDR, idx));
+
+	/* Upper limit register only needed with REGION_SIZE > 4GB. */
+	writel(region->limit_addr,
+	       fdma_pci_atu_addr_get(atu->addr, FDMA_PCI_ATU_LIMIT_ADDR, idx));
+
+	writel(lower_32_bits(region->target_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_LWR_TARGET_ADDR, idx));
+
+	writel(upper_32_bits(region->target_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_UPP_TARGET_ADDR, idx));
+}
+
+/* Find an unused ATU region. */
+static struct fdma_pci_atu_region *
+fdma_pci_atu_region_get_free(struct fdma_pci_atu *atu)
+{
+	struct fdma_pci_atu_region *regions = atu->regions;
+
+	for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
+		if (regions[i].in_use)
+			continue;
+
+		return &regions[i];
+	}
+
+	return ERR_PTR(-ENOSPC);
+}
+
+/* Unmap an ATU region, clearing its translation and disabling it. */
+void fdma_pci_atu_region_unmap(struct fdma_pci_atu_region *region)
+{
+	if (IS_ERR_OR_NULL(region))
+		return;
+
+	region->target_addr = 0;
+	region->in_use = false;
+
+	fdma_pci_atu_region_disable(region);
+	fdma_pci_atu_configure_translation(region);
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_region_unmap);
+
+/* Map a host DMA address into a free outbound region. */
+struct fdma_pci_atu_region *
+fdma_pci_atu_region_map(struct fdma_pci_atu *atu, u64 target_addr, int size)
+{
+	struct fdma_pci_atu_region *region;
+
+	if (!atu)
+		return ERR_PTR(-EINVAL);
+
+	if (size <= 0)
+		return ERR_PTR(-EINVAL);
+
+	if (size > fdma_pci_atu_region_size())
+		return ERR_PTR(-E2BIG);
+
+	region = fdma_pci_atu_region_get_free(atu);
+	if (IS_ERR(region))
+		return region;
+
+	region->target_addr = target_addr;
+	region->in_use = true;
+
+	/* Enable first, according to datasheet section 3.24.7.4.1 */
+	fdma_pci_atu_region_enable(region);
+	fdma_pci_atu_configure_translation(region);
+
+	return region;
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_region_map);
+
+/* Translate a host DMA address to the corresponding OB address. */
+u64 fdma_pci_atu_translate_addr(struct fdma_pci_atu_region *region, u64 addr)
+{
+	return region->base_addr + (addr - region->target_addr);
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_translate_addr);
+
+/* Initialize ATU, dividing the OB space into equally sized regions. */
+void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr)
+{
+	struct fdma_pci_atu_region *regions = atu->regions;
+	u32 region_size = fdma_pci_atu_region_size();
+
+	atu->addr = addr;
+
+	for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
+		regions[i].base_addr =
+			FDMA_PCI_ATU_OB_START + (i * region_size);
+		regions[i].limit_addr =
+			regions[i].base_addr + region_size - 1;
+		regions[i].idx = i;
+		regions[i].atu = atu;
+	}
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_init);
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.h b/drivers/net/ethernet/microchip/fdma/fdma_pci.h
new file mode 100644
index 000000000000..eccfe5dc25e7
--- /dev/null
+++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _FDMA_PCI_H_
+#define _FDMA_PCI_H_
+
+#include <linux/types.h>
+
+#define FDMA_PCI_ATU_REGION_MAX 6
+#define FDMA_PCI_DB_ALIGN 128
+#define FDMA_PCI_DB_SIZE(mtu) ALIGN(mtu, FDMA_PCI_DB_ALIGN)
+
+struct fdma_pci_atu;
+
+struct fdma_pci_atu_region {
+	struct fdma_pci_atu *atu;
+	u64 base_addr; /* Base addr of the OB window */
+	u64 limit_addr; /* Limit addr of the OB window */
+	u64 target_addr; /* Host DMA address this region maps to */
+	int idx;
+	bool in_use;
+};
+
+struct fdma_pci_atu {
+	void __iomem *addr;
+	struct fdma_pci_atu_region regions[FDMA_PCI_ATU_REGION_MAX];
+};
+
+/* Initialize ATU, dividing OB space into regions. */
+void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr);
+
+/* Unmap an ATU region, clearing its translation and disabling it. */
+void fdma_pci_atu_region_unmap(struct fdma_pci_atu_region *region);
+
+/* Map a host DMA address into a free ATU region. */
+struct fdma_pci_atu_region *fdma_pci_atu_region_map(struct fdma_pci_atu *atu,
+						    u64 target_addr,
+						    int size);
+
+/* Translate a host DMA address to the OB address space. */
+u64 fdma_pci_atu_translate_addr(struct fdma_pci_atu_region *region, u64 addr);
+
+#endif

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (2 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse Daniel Machon
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

The FDMA Link List Pointer (LLP) register points to the first DCB in the
chain and must be written before the channel is activated. This tells
the FDMA engine where to begin DMA transfers.

Move the LLP register writes from the channel start/activate functions
into the allocation functions and introduce a shared
lan966x_fdma_llp_configure() helper. This is needed because the upcoming
PCIe FDMA path writes ATU-translated addresses to the LLP registers
instead of DMA addresses. Keeping the writes in the shared
start/activate path would overwrite these translated addresses.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  | 29 ++++++++++------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index f8ce735a7fc0..6c5761e886d4 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -109,6 +109,13 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
 	return PTR_ERR_OR_ZERO(rx->page_pool);
 }
 
+static void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+				       u8 channel_id)
+{
+	lan_wr(lower_32_bits(addr), lan966x, FDMA_DCB_LLP(channel_id));
+	lan_wr(upper_32_bits(addr), lan966x, FDMA_DCB_LLP1(channel_id));
+}
+
 static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
@@ -127,6 +134,9 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
 	fdma_dcbs_init(fdma, FDMA_DCB_INFO_DATAL(fdma->db_size),
 		       FDMA_DCB_STATUS_INTR);
 
+	lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
+				   fdma->channel_id);
+
 	return 0;
 }
 
@@ -136,14 +146,6 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 	struct fdma *fdma = &rx->fdma;
 	u32 mask;
 
-	/* When activating a channel, first is required to write the first DCB
-	 * address and then to activate it
-	 */
-	lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP(fdma->channel_id));
-	lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP1(fdma->channel_id));
-
 	lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
 	       FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
 	       FDMA_CH_CFG_CH_INJ_PORT_SET(0) |
@@ -214,6 +216,9 @@ static int lan966x_fdma_tx_alloc(struct lan966x_tx *tx)
 
 	fdma_dcbs_init(fdma, 0, 0);
 
+	lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
+				   fdma->channel_id);
+
 	return 0;
 
 out:
@@ -235,14 +240,6 @@ static void lan966x_fdma_tx_activate(struct lan966x_tx *tx)
 	struct fdma *fdma = &tx->fdma;
 	u32 mask;
 
-	/* When activating a channel, first is required to write the first DCB
-	 * address and then to activate it
-	 */
-	lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP(fdma->channel_id));
-	lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP1(fdma->channel_id));
-
 	lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
 	       FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
 	       FDMA_CH_CFG_CH_INJ_PORT_SET(0) |

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (3 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

Make shared FDMA helpers non-static, so they can be reused by the PCIe
FDMA implementation.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  | 22 +++++++++++-----------
 .../net/ethernet/microchip/lan966x/lan966x_main.h  | 11 +++++++++++
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 6c5761e886d4..25e673bdf084 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -109,8 +109,8 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
 	return PTR_ERR_OR_ZERO(rx->page_pool);
 }
 
-static void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
-				       u8 channel_id)
+void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+				u8 channel_id)
 {
 	lan_wr(lower_32_bits(addr), lan966x, FDMA_DCB_LLP(channel_id));
 	lan_wr(upper_32_bits(addr), lan966x, FDMA_DCB_LLP1(channel_id));
@@ -140,7 +140,7 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
 	return 0;
 }
 
-static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
+void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
 	struct fdma *fdma = &rx->fdma;
@@ -171,7 +171,7 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 		lan966x, FDMA_CH_ACTIVATE);
 }
 
-static void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
+void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
 	struct fdma *fdma = &rx->fdma;
@@ -191,7 +191,7 @@ static void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
 		lan966x, FDMA_CH_DB_DISCARD);
 }
 
-static void lan966x_fdma_rx_reload(struct lan966x_rx *rx)
+void lan966x_fdma_rx_reload(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
 
@@ -265,7 +265,7 @@ static void lan966x_fdma_tx_activate(struct lan966x_tx *tx)
 		lan966x, FDMA_CH_ACTIVATE);
 }
 
-static void lan966x_fdma_tx_disable(struct lan966x_tx *tx)
+void lan966x_fdma_tx_disable(struct lan966x_tx *tx)
 {
 	struct lan966x *lan966x = tx->lan966x;
 	struct fdma *fdma = &tx->fdma;
@@ -297,7 +297,7 @@ static void lan966x_fdma_tx_reload(struct lan966x_tx *tx)
 		lan966x, FDMA_CH_RELOAD);
 }
 
-static void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
+void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
 {
 	struct lan966x_port *port;
 	int i;
@@ -471,7 +471,7 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx,
 	return NULL;
 }
 
-static int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
+int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
 {
 	struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
 	struct lan966x_rx *rx = &lan966x->rx;
@@ -584,7 +584,7 @@ static int lan966x_fdma_get_next_dcb(struct lan966x_tx *tx)
 	return -1;
 }
 
-static void lan966x_fdma_tx_start(struct lan966x_tx *tx)
+void lan966x_fdma_tx_start(struct lan966x_tx *tx)
 {
 	struct lan966x *lan966x = tx->lan966x;
 
@@ -802,7 +802,7 @@ static int lan966x_fdma_get_max_mtu(struct lan966x *lan966x)
 	return max_mtu;
 }
 
-static int lan966x_qsys_sw_status(struct lan966x *lan966x)
+int lan966x_qsys_sw_status(struct lan966x *lan966x)
 {
 	return lan_rd(lan966x, QSYS_SW_STATUS(CPU_PORT));
 }
@@ -861,7 +861,7 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
 	return err;
 }
 
-static int lan966x_fdma_get_max_frame(struct lan966x *lan966x)
+int lan966x_fdma_get_max_frame(struct lan966x *lan966x)
 {
 	return lan966x_fdma_get_max_mtu(lan966x) +
 	       IFH_LEN_BYTES +
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index eea286c29474..83c361abb789 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -561,6 +561,17 @@ int lan966x_fdma_init(struct lan966x *lan966x);
 void lan966x_fdma_deinit(struct lan966x *lan966x);
 irqreturn_t lan966x_fdma_irq_handler(int irq, void *args);
 int lan966x_fdma_reload_page_pool(struct lan966x *lan966x);
+int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight);
+void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+				u8 channel_id);
+void lan966x_fdma_rx_start(struct lan966x_rx *rx);
+void lan966x_fdma_rx_disable(struct lan966x_rx *rx);
+void lan966x_fdma_rx_reload(struct lan966x_rx *rx);
+void lan966x_fdma_tx_start(struct lan966x_tx *tx);
+void lan966x_fdma_tx_disable(struct lan966x_tx *tx);
+void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x);
+int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
+int lan966x_qsys_sw_status(struct lan966x *lan966x);
 
 int lan966x_lag_port_join(struct lan966x_port *port,
 			  struct net_device *brport_dev,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (4 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 07/13] net: lan966x: clear FDMA interrupt stickies after switch reset Daniel Machon
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

Introduce lan966x_fdma_ops to support different FDMA implementations
for platform and PCIe. Plumb fdma_init, fdma_deinit, fdma_xmit,
fdma_poll and fdma_resize through the ops table, and select the
implementation at probe time based on runtime PCI bus detection.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  |  2 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.c  | 25 +++++++++++++++++-----
 .../net/ethernet/microchip/lan966x/lan966x_main.h  | 13 +++++++++++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 25e673bdf084..9bb40383aa56 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -925,7 +925,7 @@ void lan966x_fdma_netdev_init(struct lan966x *lan966x, struct net_device *dev)
 		return;
 
 	lan966x->fdma_ndev = dev;
-	netif_napi_add(dev, &lan966x->napi, lan966x_fdma_napi_poll);
+	netif_napi_add(dev, &lan966x->napi, lan966x->ops->fdma_poll);
 	napi_enable(&lan966x->napi);
 }
 
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 1179a6e127c5..ff3c6c76f16c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -26,6 +26,14 @@
 
 #define IO_RANGES 2
 
+static const struct lan966x_fdma_ops lan966x_fdma_ops = {
+	.fdma_init = &lan966x_fdma_init,
+	.fdma_deinit = &lan966x_fdma_deinit,
+	.fdma_xmit = &lan966x_fdma_xmit,
+	.fdma_poll = &lan966x_fdma_napi_poll,
+	.fdma_resize = &lan966x_fdma_change_mtu,
+};
+
 static const struct of_device_id lan966x_match[] = {
 	{ .compatible = "microchip,lan966x-switch" },
 	{ }
@@ -391,7 +399,7 @@ static netdev_tx_t lan966x_port_xmit(struct sk_buff *skb,
 
 	spin_lock(&lan966x->tx_lock);
 	if (port->lan966x->fdma)
-		err = lan966x_fdma_xmit(skb, ifh, dev);
+		err = lan966x->ops->fdma_xmit(skb, ifh, dev);
 	else
 		err = lan966x_port_ifh_xmit(skb, ifh, dev);
 	spin_unlock(&lan966x->tx_lock);
@@ -413,7 +421,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
 	if (!lan966x->fdma)
 		return 0;
 
-	err = lan966x_fdma_change_mtu(lan966x);
+	err = lan966x->ops->fdma_resize(lan966x);
 	if (err) {
 		lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(LAN966X_HW_MTU(old_mtu)),
 		       lan966x, DEV_MAC_MAXLEN_CFG(port->chip_port));
@@ -1081,6 +1089,11 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
 	return 0;
 }
 
+static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
+{
+	return &lan966x_fdma_ops;
+}
+
 static int lan966x_probe(struct platform_device *pdev)
 {
 	struct fwnode_handle *ports, *portnp;
@@ -1095,6 +1108,8 @@ static int lan966x_probe(struct platform_device *pdev)
 	platform_set_drvdata(pdev, lan966x);
 	lan966x->dev = &pdev->dev;
 
+	lan966x->ops = lan966x_get_fdma_ops(&pdev->dev);
+
 	if (!device_get_mac_address(&pdev->dev, mac_addr)) {
 		ether_addr_copy(lan966x->base_mac, mac_addr);
 	} else {
@@ -1234,7 +1249,7 @@ static int lan966x_probe(struct platform_device *pdev)
 	if (err)
 		goto cleanup_fdb;
 
-	err = lan966x_fdma_init(lan966x);
+	err = lan966x->ops->fdma_init(lan966x);
 	if (err)
 		goto cleanup_ptp;
 
@@ -1247,7 +1262,7 @@ static int lan966x_probe(struct platform_device *pdev)
 	return 0;
 
 cleanup_fdma:
-	lan966x_fdma_deinit(lan966x);
+	lan966x->ops->fdma_deinit(lan966x);
 
 cleanup_ptp:
 	lan966x_ptp_deinit(lan966x);
@@ -1275,7 +1290,7 @@ static void lan966x_remove(struct platform_device *pdev)
 
 	lan966x_taprio_deinit(lan966x);
 	lan966x_vcap_deinit(lan966x);
-	lan966x_fdma_deinit(lan966x);
+	lan966x->ops->fdma_deinit(lan966x);
 	lan966x_cleanup_ports(lan966x);
 
 	cancel_delayed_work_sync(&lan966x->stats_work);
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index 83c361abb789..5f4dbeda17cd 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -193,6 +193,17 @@ enum vcap_is1_port_sel_rt {
 	VCAP_IS1_PS_RT_FOLLOW_OTHER = 7,
 };
 
+struct lan966x;
+
+struct lan966x_fdma_ops {
+	int (*fdma_init)(struct lan966x *lan966x);
+	void (*fdma_deinit)(struct lan966x *lan966x);
+	int (*fdma_xmit)(struct sk_buff *skb, __be32 *ifh,
+			 struct net_device *dev);
+	int (*fdma_poll)(struct napi_struct *napi, int weight);
+	int (*fdma_resize)(struct lan966x *lan966x);
+};
+
 struct lan966x_port;
 
 struct lan966x_rx {
@@ -270,6 +281,8 @@ struct lan966x_skb_cb {
 struct lan966x {
 	struct device *dev;
 
+	const struct lan966x_fdma_ops *ops;
+
 	u8 num_phys_ports;
 	struct lan966x_port **ports;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 07/13] net: lan966x: clear FDMA interrupt stickies after switch reset
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (5 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-20  8:12 ` [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot Daniel Machon
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

When in PCI mode, the GCB soft reset issued by the reset controller
can latch spurious bits in the FDMA error stickies. The latched bits
sit in FDMA_INTR_ERR until the FDMA IRQ is requested later in probe,
at which point the handler fires immediately and WARNs.

Clear FDMA_ERRORS, FDMA_INTR_ERR and FDMA_INTR_DB right after the
switch reset so the FDMA comes out clean and the IRQ handler does not
see ghost errors on probe.

The clear runs on both the PCI and platform paths. On the platform
path it has no effect — there are no spurious stickies to clear — but
keeping it unconditional avoids a PCI-specific code path here.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index ff3c6c76f16c..f6d64323d06c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1066,6 +1066,15 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
 
 	reset_control_reset(switch_reset);
 
+	/* When in PCI mode, the GCB soft reset issued by the reset
+	 * controller can latch spurious bits in the FDMA error stickies.
+	 * Clear them before request_irq hooks up the FDMA IRQ line,
+	 * otherwise the handler fires immediately on probe.
+	 */
+	lan_wr(lan_rd(lan966x, FDMA_ERRORS),   lan966x, FDMA_ERRORS);
+	lan_wr(lan_rd(lan966x, FDMA_INTR_ERR), lan966x, FDMA_INTR_ERR);
+	lan_wr(lan_rd(lan966x, FDMA_INTR_DB),  lan966x, FDMA_INTR_DB);
+
 	/* Don't reinitialize the switch core, if it is already initialized. In
 	 * case it is initialized twice, some pointers inside the queue system
 	 * in HW will get corrupted and then after a while the queue system gets

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (6 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 07/13] net: lan966x: clear FDMA interrupt stickies after switch reset Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-23  1:45   ` Jakub Kicinski
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support Daniel Machon
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

When lan966x is used as a PCIe endpoint, the FDMA engine runs on the
card and survives a host reboot. Without a shutdown callback, channels
stay active and interrupt sources stay armed across the reset, causing
the shared PCIe INTx to assert before the driver has re-probed.

Add a shutdown callback, shared by the platform and PCI paths, that
masks FDMA interrupts (FDMA_INTR_ENA and FDMA_INTR_DB_ENA) and disables
the RX and TX channels.

FDMA_INTR_ENA persists on the card across a warm reboot, so also
restore the full enable in lan966x_fdma_rx_start() to re-arm interrupts
after a previous shutdown(). rx_start() runs after both the RX and TX
rings are allocated, so the same single-site re-arm works for both the
platform and PCIe backends.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c |  4 ++++
 drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 18 ++++++++++++++++++
 drivers/net/ethernet/microchip/lan966x/lan966x_regs.h | 15 +++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 9bb40383aa56..493aef5ba8d1 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -146,6 +146,10 @@ void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 	struct fdma *fdma = &rx->fdma;
 	u32 mask;
 
+	lan_wr(FDMA_INTR_ENA_INTR_PORT_ENA_SET(GENMASK(1, 0)) |
+	       FDMA_INTR_ENA_INTR_CH_ENA_SET(GENMASK(7, 0)),
+	       lan966x, FDMA_INTR_ENA);
+
 	lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
 	       FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
 	       FDMA_CH_CFG_CH_INJ_PORT_SET(0) |
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index f6d64323d06c..cc3c7b6c65ae 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1313,9 +1313,27 @@ static void lan966x_remove(struct platform_device *pdev)
 	debugfs_remove_recursive(lan966x->debugfs_root);
 }
 
+static void lan966x_shutdown(struct platform_device *pdev)
+{
+	struct lan966x *lan966x = platform_get_drvdata(pdev);
+
+	if (!lan966x->fdma)
+		return;
+
+	lan966x_fdma_rx_disable(&lan966x->rx);
+	lan966x_fdma_tx_disable(&lan966x->tx);
+
+	napi_synchronize(&lan966x->napi);
+	napi_disable(&lan966x->napi);
+
+	lan_wr(0, lan966x, FDMA_INTR_ENA);
+	lan_wr(0, lan966x, FDMA_INTR_DB_ENA);
+}
+
 static struct platform_driver lan966x_driver = {
 	.probe = lan966x_probe,
 	.remove = lan966x_remove,
+	.shutdown = lan966x_shutdown,
 	.driver = {
 		.name = "lan966x-switch",
 		.of_match_table = lan966x_match,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
index 4b553927d2e0..aba0d36ae6b5 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
@@ -1039,6 +1039,21 @@ enum lan966x_target {
 /*      FDMA:FDMA:FDMA_INTR_ERR */
 #define FDMA_INTR_ERR             __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 400, 0, 1, 4)
 
+/*      FDMA:FDMA:FDMA_INTR_ENA */
+#define FDMA_INTR_ENA             __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 404, 0, 1, 4)
+
+#define FDMA_INTR_ENA_INTR_PORT_ENA              GENMASK(9, 8)
+#define FDMA_INTR_ENA_INTR_PORT_ENA_SET(x)\
+	FIELD_PREP(FDMA_INTR_ENA_INTR_PORT_ENA, x)
+#define FDMA_INTR_ENA_INTR_PORT_ENA_GET(x)\
+	FIELD_GET(FDMA_INTR_ENA_INTR_PORT_ENA, x)
+
+#define FDMA_INTR_ENA_INTR_CH_ENA                GENMASK(7, 0)
+#define FDMA_INTR_ENA_INTR_CH_ENA_SET(x)\
+	FIELD_PREP(FDMA_INTR_ENA_INTR_CH_ENA, x)
+#define FDMA_INTR_ENA_INTR_CH_ENA_GET(x)\
+	FIELD_GET(FDMA_INTR_ENA_INTR_CH_ENA, x)
+
 /*      FDMA:FDMA:FDMA_ERRORS */
 #define FDMA_ERRORS               __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 412, 0, 1, 4)
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (7 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-23  1:56   ` Jakub Kicinski
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

Add PCIe FDMA support for lan966x. The PCIe FDMA path uses contiguous
DMA buffers mapped through the endpoint's ATU, with memcpy-based frame
transfer instead of per-page DMA mappings.

With PCIe FDMA, throughput increases from ~33 Mbps (register-based I/O)
to ~620 Mbps on an Intel x86 host with a lan966x PCIe card.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/lan966x/Makefile    |   4 +
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 394 +++++++++++++++++++++
 .../net/ethernet/microchip/lan966x/lan966x_main.c  |  11 +
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  11 +
 .../net/ethernet/microchip/lan966x/lan966x_regs.h  |  10 +
 5 files changed, 430 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
index 4cdbe263502c..ac0beceb2a0d 100644
--- a/drivers/net/ethernet/microchip/lan966x/Makefile
+++ b/drivers/net/ethernet/microchip/lan966x/Makefile
@@ -18,6 +18,10 @@ lan966x-switch-objs  := lan966x_main.o lan966x_phylink.o lan966x_port.o \
 lan966x-switch-$(CONFIG_LAN966X_DCB) += lan966x_dcb.o
 lan966x-switch-$(CONFIG_DEBUG_FS) += lan966x_vcap_debugfs.o
 
+ifdef CONFIG_MCHP_LAN966X_PCI
+lan966x-switch-y += lan966x_fdma_pci.o
+endif
+
 # Provide include files
 ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/vcap
 ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/fdma
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
new file mode 100644
index 000000000000..c27d9e76e735
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include "fdma_api.h"
+#include "lan966x_main.h"
+
+static int lan966x_fdma_pci_dataptr_cb(struct fdma *fdma, int dcb, int db,
+				       u64 *dataptr)
+{
+	u64 addr;
+
+	addr = fdma_dataptr_dma_addr_contiguous(fdma, dcb, db);
+
+	*dataptr = fdma_pci_atu_translate_addr(fdma->atu_region, addr);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_nextptr_cb(struct fdma *fdma, int dcb, u64 *nextptr)
+{
+	u64 addr;
+
+	fdma_nextptr_cb(fdma, dcb, &addr);
+
+	*nextptr = fdma_pci_atu_translate_addr(fdma->atu_region, addr);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_rx_alloc(struct lan966x_rx *rx)
+{
+	struct lan966x *lan966x = rx->lan966x;
+	struct fdma *fdma = &rx->fdma;
+	int err;
+
+	err = fdma_alloc_coherent_and_map(lan966x->dev, fdma, &lan966x->atu);
+	if (err)
+		return err;
+
+	fdma_dcbs_init(fdma,
+		       FDMA_DCB_INFO_DATAL(fdma->db_size),
+		       FDMA_DCB_STATUS_INTR);
+
+	lan966x_fdma_llp_configure(lan966x,
+				   fdma->atu_region->base_addr,
+				   fdma->channel_id);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_tx_alloc(struct lan966x_tx *tx)
+{
+	struct lan966x *lan966x = tx->lan966x;
+	struct fdma *fdma = &tx->fdma;
+	int err;
+
+	err = fdma_alloc_coherent_and_map(lan966x->dev, fdma, &lan966x->atu);
+	if (err)
+		return err;
+
+	fdma_dcbs_init(fdma,
+		       FDMA_DCB_INFO_DATAL(fdma->db_size),
+		       FDMA_DCB_STATUS_DONE);
+
+	lan966x_fdma_llp_configure(lan966x,
+				   fdma->atu_region->base_addr,
+				   fdma->channel_id);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma)
+{
+	struct fdma_db *db;
+
+	for (int i = 0; i < fdma->n_dcbs; i++) {
+		db = fdma_db_get(fdma, i, 0);
+
+		if (!fdma_db_is_done(db))
+			continue;
+		if (fdma_is_last(fdma, &fdma->dcbs[i]))
+			continue;
+
+		return i;
+	}
+
+	return -ENOSPC;
+}
+
+/* TX slot layout (sizes in bytes):
+ *
+ *  +---------------------+-----+---------+-----+
+ *  | XDP_PACKET_HEADROOM | IFH | payload | FCS |
+ *  |         256         |  28 |   len   |   4 |
+ *  +---------------------+-----+---------+-----+
+ *  |<---------------- db_size ----------------->|
+ *
+ * Return true if the frame plus required overhead fits.
+ */
+static bool lan966x_fdma_pci_tx_size_fits(struct fdma *fdma, u32 len)
+{
+	return XDP_PACKET_HEADROOM + IFH_LEN_BYTES + len + ETH_FCS_LEN <=
+	       fdma->db_size;
+}
+
+/* Return true if blockl is a valid RX frame size. */
+static bool lan966x_fdma_pci_rx_size_fits(struct fdma *fdma, u32 blockl)
+{
+	return blockl >= IFH_LEN_BYTES + ETH_HLEN + ETH_FCS_LEN &&
+	       blockl <= fdma->db_size - XDP_PACKET_HEADROOM;
+}
+
+static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
+{
+	struct lan966x *lan966x = rx->lan966x;
+	struct fdma *fdma = &rx->fdma;
+	struct lan966x_port *port;
+	struct fdma_db *db;
+	void *virt_addr;
+	u32 blockl;
+
+	/* virt_addr points to the IFH. */
+	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma,
+						      fdma->dcb_index,
+						      fdma->db_index);
+
+	lan966x_ifh_get_src_port(virt_addr, src_port);
+
+	if (WARN_ON(*src_port >= lan966x->num_phys_ports))
+		return FDMA_ERROR;
+
+	port = lan966x->ports[*src_port];
+	if (!port)
+		return FDMA_ERROR;
+
+	db = fdma_db_next_get(fdma);
+
+	/* BLOCKL is a 16-bit HW-populated field; reject obviously-bad
+	 * values before they feed memcpy/XDP sizes.
+	 */
+	blockl = FDMA_DCB_STATUS_BLOCKL(db->status);
+	if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl))
+		return FDMA_ERROR;
+
+	return FDMA_PASS;
+}
+
+static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
+						     u64 src_port)
+{
+	struct lan966x *lan966x = rx->lan966x;
+	struct fdma *fdma = &rx->fdma;
+	struct sk_buff *skb;
+	struct fdma_db *db;
+	u32 data_len;
+
+	/* Get the received frame and create an SKB for it. */
+	db = fdma_db_next_get(fdma);
+	data_len = FDMA_DCB_STATUS_BLOCKL(db->status);
+
+	skb = napi_alloc_skb(&lan966x->napi, data_len);
+	if (unlikely(!skb))
+		return NULL;
+
+	memcpy(skb->data,
+	       fdma_dataptr_virt_addr_contiguous(fdma,
+						 fdma->dcb_index,
+						 fdma->db_index),
+						 data_len);
+
+	skb_put(skb, data_len);
+
+	skb->dev = lan966x->ports[src_port]->dev;
+	skb_pull(skb, IFH_LEN_BYTES);
+
+	skb_trim(skb, skb->len - ETH_FCS_LEN);
+
+	skb->protocol = eth_type_trans(skb, skb->dev);
+
+	if (lan966x->bridge_mask & BIT(src_port)) {
+		skb->offload_fwd_mark = 1;
+
+		skb_reset_network_header(skb);
+		if (!lan966x_hw_offload(lan966x, src_port, skb))
+			skb->offload_fwd_mark = 0;
+	}
+
+	skb->dev->stats.rx_bytes += skb->len;
+	skb->dev->stats.rx_packets++;
+
+	return skb;
+}
+
+static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh,
+				 struct net_device *dev)
+{
+	struct lan966x_port *port = netdev_priv(dev);
+	struct lan966x *lan966x = port->lan966x;
+	struct lan966x_tx *tx = &lan966x->tx;
+	struct fdma *fdma = &tx->fdma;
+	int next_to_use;
+	void *virt_addr;
+
+	next_to_use = lan966x_fdma_pci_get_next_dcb(fdma);
+
+	if (next_to_use < 0) {
+		netif_stop_queue(dev);
+		return NETDEV_TX_BUSY;
+	}
+
+	if (skb_put_padto(skb, ETH_ZLEN)) {
+		dev->stats.tx_dropped++;
+		return NETDEV_TX_OK;
+	}
+
+	if (!lan966x_fdma_pci_tx_size_fits(fdma, skb->len)) {
+		dev_kfree_skb_any(skb);
+		dev->stats.tx_dropped++;
+		return NETDEV_TX_OK;
+	}
+
+	skb_tx_timestamp(skb);
+
+	/* virt_addr points to the IFH. */
+	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma, next_to_use, 0);
+	memcpy(virt_addr, ifh, IFH_LEN_BYTES);
+	memcpy(virt_addr + IFH_LEN_BYTES, skb->data, skb->len);
+
+	/* Order frame write before DCB status write below. */
+	dma_wmb();
+
+	fdma_dcb_add(fdma,
+		     next_to_use,
+		     0,
+		     FDMA_DCB_STATUS_INTR |
+		     FDMA_DCB_STATUS_SOF |
+		     FDMA_DCB_STATUS_EOF |
+		     FDMA_DCB_STATUS_BLOCKO(0) |
+		     FDMA_DCB_STATUS_BLOCKL(IFH_LEN_BYTES + skb->len + ETH_FCS_LEN));
+
+	/* Start the transmission. */
+	lan966x_fdma_tx_start(tx);
+
+	dev->stats.tx_bytes += skb->len;
+	dev->stats.tx_packets++;
+
+	/* Safe to free: the PCIe DTBO does not enable the PTP interrupt,
+	 * so lan966x->ptp stays 0 and lan966x_port_xmit() never enqueues
+	 * this skb on port->tx_skbs for a TX timestamp.
+	 */
+	dev_consume_skb_any(skb);
+
+	return NETDEV_TX_OK;
+}
+
+static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
+{
+	struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
+	struct lan966x_rx *rx = &lan966x->rx;
+	struct fdma *fdma = &rx->fdma;
+	int dcb_reload, old_dcb;
+	struct sk_buff *skb;
+	int counter = 0;
+	u64 src_port;
+
+	/* Wake any stopped TX queues if a TX DCB is available. */
+	spin_lock(&lan966x->tx_lock);
+	if (lan966x_fdma_pci_get_next_dcb(&lan966x->tx.fdma) >= 0)
+		lan966x_fdma_wakeup_netdev(lan966x);
+	spin_unlock(&lan966x->tx_lock);
+
+	dcb_reload = fdma->dcb_index;
+
+	/* Get all received skbs. */
+	while (counter < weight) {
+		if (!fdma_has_frames(fdma))
+			break;
+		/* Order DONE read before DCB/frame reads below. */
+		dma_rmb();
+		counter++;
+		switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) {
+		case FDMA_PASS:
+			break;
+		case FDMA_ERROR:
+			/* No rx_dropped increment here because src_port is
+			 * invalid.
+			 */
+			fdma_dcb_advance(fdma);
+			continue;
+		}
+		skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
+		fdma_dcb_advance(fdma);
+		if (!skb) {
+			lan966x->ports[src_port]->dev->stats.rx_dropped++;
+			continue;
+		}
+
+		napi_gro_receive(&lan966x->napi, skb);
+	}
+	while (dcb_reload != fdma->dcb_index) {
+		old_dcb = dcb_reload;
+		dcb_reload++;
+		dcb_reload &= fdma->n_dcbs - 1;
+
+		fdma_dcb_add(fdma,
+			     old_dcb,
+			     FDMA_DCB_INFO_DATAL(fdma->db_size),
+			     FDMA_DCB_STATUS_INTR);
+
+		lan966x_fdma_rx_reload(rx);
+	}
+
+	if (counter < weight && napi_complete_done(napi, counter))
+		lan_wr(0xff, lan966x, FDMA_INTR_DB_ENA);
+
+	return counter;
+}
+
+static int lan966x_fdma_pci_init(struct lan966x *lan966x)
+{
+	struct fdma *rx_fdma = &lan966x->rx.fdma;
+	struct fdma *tx_fdma = &lan966x->tx.fdma;
+	int err;
+
+	if (!lan966x->fdma)
+		return 0;
+
+	lan_wr(FDMA_CTRL_NRESET_SET(0), lan966x, FDMA_CTRL);
+	lan_wr(FDMA_CTRL_NRESET_SET(1), lan966x, FDMA_CTRL);
+
+	fdma_pci_atu_init(&lan966x->atu, lan966x->regs[TARGET_PCIE_DBI]);
+
+	lan966x->rx.lan966x = lan966x;
+	lan966x->rx.max_mtu = lan966x_fdma_get_max_frame(lan966x);
+	rx_fdma->channel_id = FDMA_XTR_CHANNEL;
+	rx_fdma->n_dcbs = FDMA_DCB_MAX;
+	rx_fdma->n_dbs = FDMA_RX_DCB_MAX_DBS;
+	rx_fdma->priv = lan966x;
+	rx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	rx_fdma->size = fdma_get_size_contiguous(rx_fdma);
+	rx_fdma->ops.nextptr_cb = &lan966x_fdma_pci_nextptr_cb;
+	rx_fdma->ops.dataptr_cb = &lan966x_fdma_pci_dataptr_cb;
+
+	lan966x->tx.lan966x = lan966x;
+	tx_fdma->channel_id = FDMA_INJ_CHANNEL;
+	tx_fdma->n_dcbs = FDMA_DCB_MAX;
+	tx_fdma->n_dbs = FDMA_TX_DCB_MAX_DBS;
+	tx_fdma->priv = lan966x;
+	tx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	tx_fdma->size = fdma_get_size_contiguous(tx_fdma);
+	tx_fdma->ops.nextptr_cb = &lan966x_fdma_pci_nextptr_cb;
+	tx_fdma->ops.dataptr_cb = &lan966x_fdma_pci_dataptr_cb;
+
+	err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
+	if (err)
+		return err;
+
+	err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);
+	if (err) {
+		fdma_free_coherent_and_unmap(lan966x->dev, rx_fdma);
+		return err;
+	}
+
+	lan966x_fdma_rx_start(&lan966x->rx);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_resize(struct lan966x *lan966x)
+{
+	return -EOPNOTSUPP;
+}
+
+static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)
+{
+	if (!lan966x->fdma)
+		return;
+
+	lan966x_fdma_rx_disable(&lan966x->rx);
+	lan966x_fdma_tx_disable(&lan966x->tx);
+
+	napi_synchronize(&lan966x->napi);
+	napi_disable(&lan966x->napi);
+
+	fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
+	fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->tx.fdma);
+}
+
+const struct lan966x_fdma_ops lan966x_fdma_pci_ops = {
+	.fdma_init = &lan966x_fdma_pci_init,
+	.fdma_deinit = &lan966x_fdma_pci_deinit,
+	.fdma_xmit = &lan966x_fdma_pci_xmit,
+	.fdma_poll = &lan966x_fdma_pci_napi_poll,
+	.fdma_resize = &lan966x_fdma_pci_resize,
+};
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index cc3c7b6c65ae..7036b1d937d5 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -7,6 +7,7 @@
 #include <linux/ip.h>
 #include <linux/of.h>
 #include <linux/of_net.h>
+#include <linux/pci.h>
 #include <linux/phy/phy.h>
 #include <linux/platform_device.h>
 #include <linux/reset.h>
@@ -49,6 +50,9 @@ struct lan966x_main_io_resource {
 static const struct lan966x_main_io_resource lan966x_main_iomap[] =  {
 	{ TARGET_CPU,                   0xc0000, 0 }, /* 0xe00c0000 */
 	{ TARGET_FDMA,                  0xc0400, 0 }, /* 0xe00c0400 */
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	{ TARGET_PCIE_DBI,             0x400000, 0 }, /* 0xe0400000 */
+#endif
 	{ TARGET_ORG,                         0, 1 }, /* 0xe2000000 */
 	{ TARGET_GCB,                    0x4000, 1 }, /* 0xe2004000 */
 	{ TARGET_QS,                     0x8000, 1 }, /* 0xe2008000 */
@@ -1100,6 +1104,13 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
 
 static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
 {
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	for (struct device *p = dev->parent; p; p = p->parent) {
+		if (dev_is_pci(p))
+			return &lan966x_fdma_pci_ops;
+	}
+#endif
+
 	return &lan966x_fdma_ops;
 }
 
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index 5f4dbeda17cd..e7fdd4447fb6 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -17,6 +17,9 @@
 #include <net/xdp.h>
 
 #include <fdma_api.h>
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+#include <fdma_pci.h>
+#endif
 #include <vcap_api.h>
 #include <vcap_api_client.h>
 
@@ -288,6 +291,10 @@ struct lan966x {
 
 	void __iomem *regs[NUM_TARGETS];
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	struct fdma_pci_atu atu;
+#endif
+
 	int shared_queue_sz;
 
 	u8 base_mac[ETH_ALEN];
@@ -586,6 +593,10 @@ void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x);
 int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
 int lan966x_qsys_sw_status(struct lan966x *lan966x);
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
+#endif
+
 int lan966x_lag_port_join(struct lan966x_port *port,
 			  struct net_device *brport_dev,
 			  struct net_device *bond,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
index aba0d36ae6b5..4778ea217673 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
@@ -20,6 +20,7 @@ enum lan966x_target {
 	TARGET_FDMA = 21,
 	TARGET_GCB = 27,
 	TARGET_ORG = 36,
+	TARGET_PCIE_DBI = 40,
 	TARGET_PTP = 41,
 	TARGET_QS = 42,
 	TARGET_QSYS = 46,
@@ -1009,6 +1010,15 @@ enum lan966x_target {
 #define FDMA_CH_CFG_CH_MEM_GET(x)\
 	FIELD_GET(FDMA_CH_CFG_CH_MEM, x)
 
+/*      FDMA:FDMA:FDMA_CTRL */
+#define FDMA_CTRL                 __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 424, 0, 1, 4)
+
+#define FDMA_CTRL_NRESET                         BIT(0)
+#define FDMA_CTRL_NRESET_SET(x)\
+	FIELD_PREP(FDMA_CTRL_NRESET, x)
+#define FDMA_CTRL_NRESET_GET(x)\
+	FIELD_GET(FDMA_CTRL_NRESET, x)
+
 /*      FDMA:FDMA:FDMA_PORT_CTRL */
 #define FDMA_PORT_CTRL(r)         __REG(TARGET_FDMA, 0, 1, 8, 0, 1, 428, 376, r, 2, 4)
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (8 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-30  0:44   ` sashiko-bot
  2026-05-20  8:12 ` [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support Daniel Machon
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

Add MTU change support for the PCIe FDMA path. When the MTU changes,
the contiguous ATU-mapped RX and TX buffers are reallocated with the
new size. On allocation failure, the existing buffers are reused
after being reset.

Cap the PCIe DCB ring at 256 (FDMA_PCI_DCB_MAX) to keep the entire
contiguous allocation under MAX_PAGE_ORDER at jumbo MTU, which 512
DCBs would overflow.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 157 ++++++++++++++++++++-
 1 file changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
index c27d9e76e735..3ea6d22ee573 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -3,6 +3,11 @@
 #include "fdma_api.h"
 #include "lan966x_main.h"
 
+/* Ring must fit in one MAX_PAGE_ORDER DMA block; 512 DCBs overflows
+ * at jumbo MTU.
+ */
+#define FDMA_PCI_DCB_MAX	256
+
 static int lan966x_fdma_pci_dataptr_cb(struct fdma *fdma, int dcb, int db,
 				       u64 *dataptr)
 {
@@ -332,7 +337,7 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
 	lan966x->rx.lan966x = lan966x;
 	lan966x->rx.max_mtu = lan966x_fdma_get_max_frame(lan966x);
 	rx_fdma->channel_id = FDMA_XTR_CHANNEL;
-	rx_fdma->n_dcbs = FDMA_DCB_MAX;
+	rx_fdma->n_dcbs = FDMA_PCI_DCB_MAX;
 	rx_fdma->n_dbs = FDMA_RX_DCB_MAX_DBS;
 	rx_fdma->priv = lan966x;
 	rx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
@@ -342,7 +347,7 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
 
 	lan966x->tx.lan966x = lan966x;
 	tx_fdma->channel_id = FDMA_INJ_CHANNEL;
-	tx_fdma->n_dcbs = FDMA_DCB_MAX;
+	tx_fdma->n_dcbs = FDMA_PCI_DCB_MAX;
 	tx_fdma->n_dbs = FDMA_TX_DCB_MAX_DBS;
 	tx_fdma->priv = lan966x;
 	tx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
@@ -365,9 +370,155 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
 	return 0;
 }
 
+/* Reset existing rx and tx buffers. */
+static void lan966x_fdma_pci_reset_mem(struct lan966x *lan966x)
+{
+	struct lan966x_rx *rx = &lan966x->rx;
+	struct lan966x_tx *tx = &lan966x->tx;
+
+	memset(rx->fdma.dcbs, 0, rx->fdma.size);
+	memset(tx->fdma.dcbs, 0, tx->fdma.size);
+
+	fdma_dcbs_init(&rx->fdma,
+		       FDMA_DCB_INFO_DATAL(rx->fdma.db_size),
+		       FDMA_DCB_STATUS_INTR);
+
+	fdma_dcbs_init(&tx->fdma,
+		       FDMA_DCB_INFO_DATAL(tx->fdma.db_size),
+		       FDMA_DCB_STATUS_DONE);
+
+	lan966x_fdma_llp_configure(lan966x,
+				   tx->fdma.atu_region->base_addr,
+				   tx->fdma.channel_id);
+	lan966x_fdma_llp_configure(lan966x,
+				   rx->fdma.atu_region->base_addr,
+				   rx->fdma.channel_id);
+}
+
+/* Drain in-flight xmit callers and stop all TX queues on every port. */
+static void lan966x_fdma_pci_stop_netdev(struct lan966x *lan966x)
+{
+	for (int i = 0; i < lan966x->num_phys_ports; ++i) {
+		struct lan966x_port *port = lan966x->ports[i];
+
+		if (port)
+			netif_tx_disable(port->dev);
+	}
+}
+
+/* Wake all TX queues on every port (undoes lan966x_fdma_pci_stop_netdev). */
+static void lan966x_fdma_pci_wakeup_netdev(struct lan966x *lan966x)
+{
+	for (int i = 0; i < lan966x->num_phys_ports; ++i) {
+		struct lan966x_port *port = lan966x->ports[i];
+
+		if (port)
+			netif_tx_wake_all_queues(port->dev);
+	}
+}
+
+static int lan966x_fdma_pci_reload(struct lan966x *lan966x, int new_mtu)
+{
+	struct fdma tx_fdma_old = lan966x->tx.fdma;
+	struct fdma rx_fdma_old = lan966x->rx.fdma;
+	u32 old_mtu = lan966x->rx.max_mtu;
+	int err;
+
+	napi_synchronize(&lan966x->napi);
+	napi_disable(&lan966x->napi);
+	lan966x_fdma_pci_stop_netdev(lan966x);
+	lan966x_fdma_rx_disable(&lan966x->rx);
+	lan966x_fdma_tx_disable(&lan966x->tx);
+
+	lan966x->rx.max_mtu = new_mtu;
+
+	lan966x->tx.fdma.db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	lan966x->tx.fdma.size = fdma_get_size_contiguous(&lan966x->tx.fdma);
+	lan966x->rx.fdma.db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	lan966x->rx.fdma.size = fdma_get_size_contiguous(&lan966x->rx.fdma);
+
+	err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
+	if (err)
+		goto restore;
+
+	err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);
+	if (err) {
+		fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
+		goto restore;
+	}
+
+	/* Free and unmap old memory. */
+	fdma_free_coherent_and_unmap(lan966x->dev, &rx_fdma_old);
+	fdma_free_coherent_and_unmap(lan966x->dev, &tx_fdma_old);
+
+	/* Keep this order: rx_start, wakeup_netdev, napi_enable. */
+	lan966x_fdma_rx_start(&lan966x->rx);
+	lan966x_fdma_pci_wakeup_netdev(lan966x);
+	napi_enable(&lan966x->napi);
+
+	return err;
+restore:
+
+	/* No new buffers are allocated at this point. Use the old buffers,
+	 * but reset them before starting the FDMA again.
+	 */
+
+	memcpy(&lan966x->tx.fdma, &tx_fdma_old, sizeof(struct fdma));
+	memcpy(&lan966x->rx.fdma, &rx_fdma_old, sizeof(struct fdma));
+
+	lan966x->rx.max_mtu = old_mtu;
+
+	lan966x_fdma_pci_reset_mem(lan966x);
+
+	/* Keep this order: rx_start, wakeup_netdev, napi_enable. */
+	lan966x_fdma_rx_start(&lan966x->rx);
+	lan966x_fdma_pci_wakeup_netdev(lan966x);
+	napi_enable(&lan966x->napi);
+
+	return err;
+}
+
+static int __lan966x_fdma_pci_reload(struct lan966x *lan966x, int max_mtu)
+{
+	int err;
+	u32 val;
+
+	/* Disable the CPU port. */
+	lan_rmw(QSYS_SW_PORT_MODE_PORT_ENA_SET(0),
+		QSYS_SW_PORT_MODE_PORT_ENA,
+		lan966x, QSYS_SW_PORT_MODE(CPU_PORT));
+
+	/* Flush the CPU queues. */
+	readx_poll_timeout(lan966x_qsys_sw_status,
+			   lan966x,
+			   val,
+			   !(QSYS_SW_STATUS_EQ_AVAIL_GET(val)),
+			   READL_SLEEP_US, READL_TIMEOUT_US);
+
+	/* Add a sleep in case there are frames between the queues and the CPU
+	 * port
+	 */
+	usleep_range(USEC_PER_MSEC, 2 * USEC_PER_MSEC);
+
+	err = lan966x_fdma_pci_reload(lan966x, max_mtu);
+
+	/* Enable back the CPU port. */
+	lan_rmw(QSYS_SW_PORT_MODE_PORT_ENA_SET(1),
+		QSYS_SW_PORT_MODE_PORT_ENA,
+		lan966x, QSYS_SW_PORT_MODE(CPU_PORT));
+
+	return err;
+}
+
 static int lan966x_fdma_pci_resize(struct lan966x *lan966x)
 {
-	return -EOPNOTSUPP;
+	int max_mtu;
+
+	max_mtu = lan966x_fdma_get_max_frame(lan966x);
+	if (max_mtu == lan966x->rx.max_mtu)
+		return 0;
+
+	return __lan966x_fdma_pci_reload(lan966x, max_mtu);
 }
 
 static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (9 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-23  2:01   ` Jakub Kicinski
  2026-05-20  8:12 ` [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

Add XDP support for the PCIe FDMA path. The implementation operates on
contiguous ATU-mapped buffers with memcpy-based XDP_TX, unlike the
platform path which uses page_pool.

XDP sees the frame with IFH and FCS stripped. These are removed in
lan966x_fdma_pci_rx_check_frame() before the BPF program runs, because
after the program returns the driver cannot tell whether the tail
region was modified. The skb_pull/skb_trim previously done in
lan966x_fdma_pci_rx_get_frame() are removed for the same reason; the
frame pointer and length are pre-computed by rx_check_frame() and
passed through rx_get_frame() and lan966x_xdp_pci_run() to the caller.

lan966x_fdma_pci_xmit_xdpf() handles XDP_TX: it rebuilds a fresh IFH
in the TX slot, copies the post-XDP frame after it, and lets HW insert
a new FCS.

lan966x_xdp_setup() is extended so the PCIe path skips the page_pool
reload that the platform path needs.

Only XDP_ACT_BASIC is supported.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 162 ++++++++++++++++++---
 .../net/ethernet/microchip/lan966x/lan966x_main.c  |  11 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  10 ++
 .../net/ethernet/microchip/lan966x/lan966x_xdp.c   |  10 ++
 4 files changed, 169 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
index 3ea6d22ee573..f9db66e3e753 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
 
+#include <linux/bpf_trace.h>
+
 #include "fdma_api.h"
 #include "lan966x_main.h"
 
@@ -114,7 +116,118 @@ static bool lan966x_fdma_pci_rx_size_fits(struct fdma *fdma, u32 blockl)
 	       blockl <= fdma->db_size - XDP_PACKET_HEADROOM;
 }
 
-static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
+static int lan966x_fdma_pci_xmit_xdpf(struct lan966x_port *port,
+				      void *ptr, u32 len)
+{
+	struct lan966x *lan966x = port->lan966x;
+	struct lan966x_tx *tx = &lan966x->tx;
+	struct fdma *fdma = &tx->fdma;
+	int next_to_use, ret = 0;
+	void *virt_addr;
+
+	spin_lock(&lan966x->tx_lock);
+
+	next_to_use = lan966x_fdma_pci_get_next_dcb(fdma);
+
+	if (next_to_use < 0) {
+		netif_stop_queue(port->dev);
+		ret = NETDEV_TX_BUSY;
+		goto out;
+	}
+
+	if (!lan966x_fdma_pci_tx_size_fits(fdma, len)) {
+		port->dev->stats.tx_dropped++;
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* virt_addr points to the IFH. */
+	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma, next_to_use, 0);
+
+	/* Construct a fresh IFH. */
+	memset(virt_addr, 0, IFH_LEN_BYTES);
+	lan966x_ifh_set_bypass(virt_addr, 1);
+	lan966x_ifh_set_port(virt_addr, BIT_ULL(port->chip_port));
+
+	/* Copy the (post-XDP) frame after the IFH. */
+	memcpy(virt_addr + IFH_LEN_BYTES, ptr, len);
+
+	/* Order frame write before DCB status write below. */
+	dma_wmb();
+
+	/* Reserve ETH_FCS_LEN for the HW-inserted FCS (len is FCS-stripped). */
+	fdma_dcb_add(fdma,
+		     next_to_use,
+		     0,
+		     FDMA_DCB_STATUS_INTR |
+		     FDMA_DCB_STATUS_SOF |
+		     FDMA_DCB_STATUS_EOF |
+		     FDMA_DCB_STATUS_BLOCKO(0) |
+		     FDMA_DCB_STATUS_BLOCKL(IFH_LEN_BYTES + len + ETH_FCS_LEN));
+
+	/* Start the transmission. */
+	lan966x_fdma_tx_start(tx);
+
+	port->dev->stats.tx_bytes += len;
+	port->dev->stats.tx_packets++;
+
+out:
+	spin_unlock(&lan966x->tx_lock);
+
+	return ret;
+}
+
+static int lan966x_xdp_pci_run(struct lan966x_port *port, void *data,
+			       u32 data_len, void **xdp_data, u32 *xdp_len)
+{
+	/* Read once so the NULL check and bpf_prog_run_xdp() see the same
+	 * pointer.
+	 */
+	struct bpf_prog *xdp_prog = READ_ONCE(port->xdp_prog);
+	struct lan966x *lan966x = port->lan966x;
+	struct fdma *fdma = &lan966x->rx.fdma;
+	struct xdp_buff xdp;
+	u32 act;
+
+	if (!xdp_prog)
+		return FDMA_PASS;
+
+	xdp_init_buff(&xdp, fdma->db_size, &port->xdp_rxq);
+
+	/* hard_start is set to slot start (virt_addr is XDP_PACKET_HEADROOM
+	 * into the slot). Headroom includes the IFH; BPF may grow into it
+	 * via adjust_head. IFH is rebuilt on XDP_TX and unread on XDP_PASS.
+	 */
+	xdp_prepare_buff(&xdp,
+			 data - XDP_PACKET_HEADROOM,
+			 XDP_PACKET_HEADROOM + IFH_LEN_BYTES,
+			 data_len,
+			 false);
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+
+	*xdp_data = xdp.data;
+	*xdp_len = xdp.data_end - xdp.data;
+
+	switch (act) {
+	case XDP_PASS:
+		return FDMA_PASS;
+	case XDP_TX:
+		return lan966x_fdma_pci_xmit_xdpf(port, *xdp_data, *xdp_len) ?
+		       FDMA_DROP : FDMA_TX;
+	default:
+		bpf_warn_invalid_xdp_action(port->dev, xdp_prog, act);
+		fallthrough;
+	case XDP_ABORTED:
+		trace_xdp_exception(port->dev, xdp_prog, act);
+		fallthrough;
+	case XDP_DROP:
+		return FDMA_DROP;
+	}
+}
+
+static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port,
+					   void **data, u32 *data_len)
 {
 	struct lan966x *lan966x = rx->lan966x;
 	struct fdma *fdma = &rx->fdma;
@@ -146,38 +259,33 @@ static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
 	if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl))
 		return FDMA_ERROR;
 
-	return FDMA_PASS;
+	/* Present the Ethernet frame (no IFH, no FCS). HW re-inserts the
+	 * FCS on TX; see lan966x_fdma_pci_xmit_xdpf(). May be overridden
+	 * by XDP. The FCS strip is unconditional because NETIF_F_RXFCS
+	 * is not advertised in hw_features.
+	 */
+	*data = virt_addr + IFH_LEN_BYTES;
+	*data_len = blockl - IFH_LEN_BYTES - ETH_FCS_LEN;
+
+	return lan966x_xdp_pci_run(port, virt_addr, *data_len, data, data_len);
 }
 
 static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
-						     u64 src_port)
+						     u64 src_port, void *data,
+						     u32 data_len)
 {
 	struct lan966x *lan966x = rx->lan966x;
-	struct fdma *fdma = &rx->fdma;
 	struct sk_buff *skb;
-	struct fdma_db *db;
-	u32 data_len;
-
-	/* Get the received frame and create an SKB for it. */
-	db = fdma_db_next_get(fdma);
-	data_len = FDMA_DCB_STATUS_BLOCKL(db->status);
 
 	skb = napi_alloc_skb(&lan966x->napi, data_len);
 	if (unlikely(!skb))
 		return NULL;
 
-	memcpy(skb->data,
-	       fdma_dataptr_virt_addr_contiguous(fdma,
-						 fdma->dcb_index,
-						 fdma->db_index),
-						 data_len);
+	memcpy(skb->data, data, data_len);
 
 	skb_put(skb, data_len);
 
 	skb->dev = lan966x->ports[src_port]->dev;
-	skb_pull(skb, IFH_LEN_BYTES);
-
-	skb_trim(skb, skb->len - ETH_FCS_LEN);
 
 	skb->protocol = eth_type_trans(skb, skb->dev);
 
@@ -266,6 +374,8 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
 	struct sk_buff *skb;
 	int counter = 0;
 	u64 src_port;
+	u32 data_len;
+	void *data;
 
 	/* Wake any stopped TX queues if a TX DCB is available. */
 	spin_lock(&lan966x->tx_lock);
@@ -282,7 +392,10 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
 		/* Order DONE read before DCB/frame reads below. */
 		dma_rmb();
 		counter++;
-		switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) {
+		switch (lan966x_fdma_pci_rx_check_frame(rx,
+							&src_port,
+							&data,
+							&data_len)) {
 		case FDMA_PASS:
 			break;
 		case FDMA_ERROR:
@@ -291,8 +404,17 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
 			 */
 			fdma_dcb_advance(fdma);
 			continue;
+		case FDMA_TX:
+			fdma_dcb_advance(fdma);
+			continue;
+		case FDMA_DROP:
+			fdma_dcb_advance(fdma);
+			continue;
 		}
-		skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
+		skb = lan966x_fdma_pci_rx_get_frame(rx,
+						    src_port,
+						    data,
+						    data_len);
 		fdma_dcb_advance(fdma);
 		if (!skb) {
 			lan966x->ports[src_port]->dev->stats.rx_dropped++;
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 7036b1d937d5..b984e819312d 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -876,10 +876,13 @@ static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
 
 	port->phylink = phylink;
 
-	if (lan966x->fdma)
-		dev->xdp_features = NETDEV_XDP_ACT_BASIC |
-				    NETDEV_XDP_ACT_REDIRECT |
-				    NETDEV_XDP_ACT_NDO_XMIT;
+	if (lan966x->fdma) {
+		dev->xdp_features = NETDEV_XDP_ACT_BASIC;
+
+		if (!lan966x_is_pci(lan966x))
+			dev->xdp_features |= NETDEV_XDP_ACT_REDIRECT |
+					     NETDEV_XDP_ACT_NDO_XMIT;
+	}
 
 	err = register_netdev(dev);
 	if (err) {
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index e7fdd4447fb6..8911825eab77 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -595,6 +595,16 @@ int lan966x_qsys_sw_status(struct lan966x *lan966x);
 
 #if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
 extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
+
+static inline bool lan966x_is_pci(struct lan966x *lan966x)
+{
+	return lan966x->ops == &lan966x_fdma_pci_ops;
+}
+#else
+static inline bool lan966x_is_pci(struct lan966x *lan966x)
+{
+	return false;
+}
 #endif
 
 int lan966x_lag_port_join(struct lan966x_port *port,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
index 9ee61db8690b..b470f731e25c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
@@ -24,6 +24,16 @@ static int lan966x_xdp_setup(struct net_device *dev, struct netdev_bpf *xdp)
 	old_prog = xchg(&port->xdp_prog, xdp->prog);
 	new_xdp = lan966x_xdp_present(lan966x);
 
+	/* PCIe FDMA uses contiguous buffers, so no page_pool reload
+	 * is needed. Drain NAPI before freeing the old program so
+	 * no in-flight poll holds a stale pointer.
+	 */
+	if (lan966x_is_pci(lan966x)) {
+		if (old_prog)
+			napi_synchronize(&lan966x->napi);
+		goto out;
+	}
+
 	if (old_xdp == new_xdp)
 		goto out;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (10 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-23  2:01   ` Jakub Kicinski
  2026-05-20  8:12 ` [PATCH net-next v5 13/13] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
  2026-05-21 14:12 ` [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
  13 siblings, 1 reply; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

The ATU outbound windows used by the FDMA engine are programmed through
registers at offset 0x400000+, which falls outside the current cpu reg
mapping. Extend the cpu reg size from 0x100000 (1MB) to 0x800000 (8MB)
to cover the full PCIE DBI and iATU register space.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/misc/lan966x_pci.dtso | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/lan966x_pci.dtso b/drivers/misc/lan966x_pci.dtso
index 7b196b0a0eb6..7bb726550caf 100644
--- a/drivers/misc/lan966x_pci.dtso
+++ b/drivers/misc/lan966x_pci.dtso
@@ -135,7 +135,7 @@ lan966x_phy1: ethernet-lan966x_phy@2 {
 
 				switch: switch@e0000000 {
 					compatible = "microchip,lan966x-switch";
-					reg = <0xe0000000 0x0100000>,
+					reg = <0xe0000000 0x0800000>,
 					      <0xe2000000 0x0800000>;
 					reg-names = "cpu", "gcb";
 

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v5 13/13] misc: lan966x-pci: dts: add fdma interrupt to overlay
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (11 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
@ 2026-05-20  8:12 ` Daniel Machon
  2026-05-21 14:12 ` [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
  13 siblings, 0 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-20  8:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

Add the fdma interrupt (OIC interrupt 14) to the lan966x PCI device
tree overlay, enabling FDMA-based frame injection/extraction when
the switch is connected over PCIe.

Tested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/misc/lan966x_pci.dtso | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/lan966x_pci.dtso b/drivers/misc/lan966x_pci.dtso
index 7bb726550caf..5bb12dbc0843 100644
--- a/drivers/misc/lan966x_pci.dtso
+++ b/drivers/misc/lan966x_pci.dtso
@@ -141,8 +141,9 @@ switch: switch@e0000000 {
 
 					interrupt-parent = <&oic>;
 					interrupts = <12 IRQ_TYPE_LEVEL_HIGH>,
+						     <14 IRQ_TYPE_LEVEL_HIGH>,
 						     <9 IRQ_TYPE_LEVEL_HIGH>;
-					interrupt-names = "xtr", "ana";
+					interrupt-names = "xtr", "fdma", "ana";
 
 					resets = <&reset 0>;
 					reset-names = "switch";

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA
  2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (12 preceding siblings ...)
  2026-05-20  8:12 ` [PATCH net-next v5 13/13] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
@ 2026-05-21 14:12 ` Daniel Machon
  13 siblings, 0 replies; 26+ messages in thread
From: Daniel Machon @ 2026-05-21 14:12 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir
  Cc: netdev, linux-kernel, bpf, linux-arm-kernel

> This series adds FDMA (Frame DMA) support for the PCIe path. When
> operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> directly access host memory, so DMA buffers are allocated as contiguous
> coherent memory and mapped through the PCIe Address Translation Unit
> (ATU). The ATU provides outbound windows that translate internal FDMA
> addresses to PCIe bus addresses, allowing the FDMA engine to read and
> write host memory. Because the ATU requires contiguous address regions,
> page_pool and normal per-page DMA mappings cannot be used. Instead,
> frames are transferred using memcpy between the ATU-mapped buffers and
> the network stack. With this, throughput increases from ~33 Mbps to
> ~620 Mbps for default MTU.

I went through the sashiko review for v5 (gemini and nipa). AFAICT no new
series-level issues were raised — the only Critical (XDP features exposed before
the xdp_setup() PCIe short-circuit lands) is a per-patch bisectability concern
that sashiko itself flags as "likely fixed by the subsequent patch", and indeed
patch [11/13] adds the guard. The NAPI bailout issue Paolo raised on v4 is
fixed in this version and not reflagged.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot
  2026-05-20  8:12 ` [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot Daniel Machon
@ 2026-05-23  1:45   ` Jakub Kicinski
  2026-05-30  0:44   ` sashiko-bot
  1 sibling, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-05-23  1:45 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir, netdev, linux-kernel, bpf,
	linux-arm-kernel

On Wed, 20 May 2026 10:12:20 +0200 Daniel Machon wrote:
> +	napi_synchronize(&lan966x->napi);
> +	napi_disable(&lan966x->napi);

Why the synchronize() before disable() ?
napi_synchronize() is a simplistic disable() + enable()


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support
  2026-05-20  8:12 ` [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support Daniel Machon
@ 2026-05-23  1:56   ` Jakub Kicinski
  2026-05-30  0:44   ` sashiko-bot
  1 sibling, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-05-23  1:56 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir, netdev, linux-kernel, bpf,
	linux-arm-kernel

On Wed, 20 May 2026 10:12:21 +0200 Daniel Machon wrote:
> Add PCIe FDMA support for lan966x. The PCIe FDMA path uses contiguous
> DMA buffers mapped through the endpoint's ATU, with memcpy-based frame
> transfer instead of per-page DMA mappings.
> 
> With PCIe FDMA, throughput increases from ~33 Mbps (register-based I/O)
> to ~620 Mbps on an Intel x86 host with a lan966x PCIe card.

> diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
> index 4cdbe263502c..ac0beceb2a0d 100644
> --- a/drivers/net/ethernet/microchip/lan966x/Makefile
> +++ b/drivers/net/ethernet/microchip/lan966x/Makefile
> @@ -18,6 +18,10 @@ lan966x-switch-objs  := lan966x_main.o lan966x_phylink.o lan966x_port.o \
>  lan966x-switch-$(CONFIG_LAN966X_DCB) += lan966x_dcb.o
>  lan966x-switch-$(CONFIG_DEBUG_FS) += lan966x_vcap_debugfs.o
>  
> +ifdef CONFIG_MCHP_LAN966X_PCI

ifeq ()

would be more common in a Makefile?

> +lan966x-switch-y += lan966x_fdma_pci.o
> +endif
> +

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> index cc3c7b6c65ae..7036b1d937d5 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> @@ -7,6 +7,7 @@
>  #include <linux/ip.h>
>  #include <linux/of.h>
>  #include <linux/of_net.h>
> +#include <linux/pci.h>
>  #include <linux/phy/phy.h>
>  #include <linux/platform_device.h>
>  #include <linux/reset.h>
> @@ -49,6 +50,9 @@ struct lan966x_main_io_resource {
>  static const struct lan966x_main_io_resource lan966x_main_iomap[] =  {
>  	{ TARGET_CPU,                   0xc0000, 0 }, /* 0xe00c0000 */
>  	{ TARGET_FDMA,                  0xc0400, 0 }, /* 0xe00c0400 */
> +#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)

why config option being enabled changes the targets?
Can someone with a non-PCI device enable that option too
(sure it would be useless but given that/if they can guarding
with an #if seems like a waste of LoC)

> +	{ TARGET_PCIE_DBI,             0x400000, 0 }, /* 0xe0400000 */
> +#endif
>  	{ TARGET_ORG,                         0, 1 }, /* 0xe2000000 */
>  	{ TARGET_GCB,                    0x4000, 1 }, /* 0xe2004000 */
>  	{ TARGET_QS,                     0x8000, 1 }, /* 0xe2008000 */
> @@ -1100,6 +1104,13 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
>  
>  static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
>  {
> +#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
> +	for (struct device *p = dev->parent; p; p = p->parent) {
> +		if (dev_is_pci(p))

If the PCIe devices also use an intermediate platform device for
probing, can't they explicitly have some flag / state to indicate
they are PCIe? The device walk in such a constrained env seems
like an overkill

> +			return &lan966x_fdma_pci_ops;
> +	}
> +#endif
> +
>  	return &lan966x_fdma_ops;
>  }
>  
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> index 5f4dbeda17cd..e7fdd4447fb6 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> @@ -17,6 +17,9 @@
>  #include <net/xdp.h>
>  
>  #include <fdma_api.h>
> +#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
> +#include <fdma_pci.h>
> +#endif

Conditional #includes make build testing harder, better to avoid them

>  #include <vcap_api.h>
>  #include <vcap_api_client.h>
>  
> @@ -288,6 +291,10 @@ struct lan966x {
>  
>  	void __iomem *regs[NUM_TARGETS];
>  
> +#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
> +	struct fdma_pci_atu atu;
> +#endif
> +
>  	int shared_queue_sz;
>  
>  	u8 base_mac[ETH_ALEN];
> @@ -586,6 +593,10 @@ void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x);
>  int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
>  int lan966x_qsys_sw_status(struct lan966x *lan966x);
>  
> +#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
> +extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
> +#endif

There should be no need to wrap extern in an #if

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support
  2026-05-20  8:12 ` [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support Daniel Machon
@ 2026-05-23  2:01   ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-05-23  2:01 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir, netdev, linux-kernel, bpf,
	linux-arm-kernel

On Wed, 20 May 2026 10:12:23 +0200 Daniel Machon wrote:
>  #if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
>  extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
> +
> +static inline bool lan966x_is_pci(struct lan966x *lan966x)
> +{
> +	return lan966x->ops == &lan966x_fdma_pci_ops;

	return IS_ENABLED(CONFIG_MCHP_LAN966X_PCI) && 
		lan966x->ops == &lan966x_fdma_pci_ops;

compiler will eliminate the reference to the ops as dead code
and linker will not complain that its object is missing.

> +}
> +#else
> +static inline bool lan966x_is_pci(struct lan966x *lan966x)
> +{
> +	return false;
> +}
>  #endif
>  
>  int lan966x_lag_port_join(struct lan966x_port *port,
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
> index 9ee61db8690b..b470f731e25c 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
> @@ -24,6 +24,16 @@ static int lan966x_xdp_setup(struct net_device *dev, struct netdev_bpf *xdp)
>  	old_prog = xchg(&port->xdp_prog, xdp->prog);
>  	new_xdp = lan966x_xdp_present(lan966x);
>  
> +	/* PCIe FDMA uses contiguous buffers, so no page_pool reload
> +	 * is needed. Drain NAPI before freeing the old program so
> +	 * no in-flight poll holds a stale pointer.
> +	 */

Not sure what this does. BPF programs are under RCU, they are not
immediately freed (if that's that you mean..)

> +	if (lan966x_is_pci(lan966x)) {
> +		if (old_prog)
> +			napi_synchronize(&lan966x->napi);
> +		goto out;
> +	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
  2026-05-20  8:12 ` [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
@ 2026-05-23  2:01   ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-05-23  2:01 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir, netdev, linux-kernel, bpf,
	linux-arm-kernel

On Wed, 20 May 2026 10:12:24 +0200 Daniel Machon wrote:
> The ATU outbound windows used by the FDMA engine are programmed through
> registers at offset 0x400000+, which falls outside the current cpu reg
> mapping. Extend the cpu reg size from 0x100000 (1MB) to 0x800000 (8MB)
> to cover the full PCIE DBI and iATU register space.

Are we supposed to take these to net-next ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse
  2026-05-20  8:12 ` [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse Daniel Machon
@ 2026-05-30  0:44   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] Missing dma_wmb() when appending new DMA descriptors to an active ring.
- [High] Lack of endianness conversions for hardware DMA descriptor fields, incompatible with PCIe on Big-Endian hosts.
- [High] Missing bounds validation on hardware-provided length before calling skb_put(), risking kernel panic.
--

commit 71ed27dbfea60bc58684878c1afb6584c28b04fb
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: export FDMA helpers for reuse

This patch makes shared FDMA helpers non-static to allow their reuse by the
PCIe FDMA implementation.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> index 6c5761e886d47..25e673bdf0841 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c

[ ... ]

> @@ -471,7 +471,7 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx,
>  	return NULL;
>  }
>  
> -static int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
> +int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
>  {

[Severity: High]
This isn't a bug introduced by this patch, but since this helper is being
exported for PCIe reuse, does the platform RX path need bounds validation on
the hardware-provided length?

In lan966x_fdma_rx_get_frame() (called by this NAPI poll function), the
packet length is extracted directly from the hardware descriptor and passed
unconditionally to skb_put():

	skb_put(skb, FDMA_DCB_STATUS_BLOCKL(db->status));

If the hardware returns a length exceeding the allocated SKB data region
minus the XDP headroom, skb_put() will extend the tail beyond skb->end,
resulting in an immediate skb_over_panic(). Should the length be validated
against the available headroom and at least IFH_LEN_BYTES before calling
skb_pull()?

[Severity: High]
This is a pre-existing issue, but since the FDMA implementation is being
exposed for PCIe, could the lack of endianness conversions cause issues on
Big-Endian hosts?

PCIe endpoints use strictly little-endian DMA descriptors. However, the driver
defines descriptor fields (like status and dataptr in struct fdma_db) as
native u64 and accesses them using native bitwise macros. 

If this code runs on a Big-Endian architecture with the PCIe device, will the
CPU read and write the status word in Big-Endian, causing the Little-Endian
hardware to interpret it incorrectly and fail?

[Severity: High]
This is a pre-existing issue, but is there a missing dma_wmb() when appending
new DMA descriptors to an active ring?

When lan966x_fdma_napi_poll() allocates new pages, it calls fdma_dcb_add()
which initializes the new descriptor's contents and then updates the previous
descriptor's nextptr to link it.

There doesn't appear to be a dma_wmb() executed between writing the new
descriptor's status/data pointers and updating the previous descriptor's
nextptr. Without this barrier, could the CPU reorder the memory writes,
allowing the hardware DMA engine to follow the updated nextptr and read the
new descriptor before its data and status fields are fully written to memory?

>  	struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
>  	struct lan966x_rx *rx = &lan966x->rx;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=5

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support
  2026-05-20  8:12 ` [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support Daniel Machon
@ 2026-05-30  0:44   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] The ATU outbound window limit is statically configured to its maximum size instead of being tightly constrained to the allocated DMA buffer size.
- [Medium] Hardware ATU regions are not explicitly disabled during initialization, risking stray DMAs from stale configuration.
- [Low] The block comment detailing the ATU regions contains a mathematical error, contradicting the actual hardware configuration.
--

commit 474737a9138a576d895670ba7334d761b2fb9464
Author: Daniel Machon <daniel.machon@microchip.com>
net: microchip: fdma: add PCIe ATU support

When lan966x or lan969x operates as a PCIe endpoint, the internal FDMA
engine cannot directly access host memory. Instead, DMA addresses must
be translated through the PCIe Address Translation Unit (ATU), which
provides outbound windows mapping internal addresses to PCIe bus addresses.

> diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.c b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
> new file mode 100644
> index 0000000000000..1bd41eaa58a4b
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
> @@ -0,0 +1,182 @@
[ ... ]
> + * +-------------+------------+------------+-----+------------+
> + * | Index       | Region 0   | Region 1   | ... | Region 5   |
> + * +-------------+------------+------------+-----+------------+
> + * | Base addr   | 0x10000000 | 0x12aa0000 | ... | 0x1d500000 |

[Severity: Low]
This isn't a bug, but there appears to be a mathematical error in the
comment. The computed region size is round_down((0x1fffffff - 0x10000000) /
6, BIT(16)) which equals 0x2aa0000. Using this size, Region 5's base address
would be 0x10000000 + (5 * 0x2aa0000) = 0x1d520000, not 0x1d500000.

> + * | Limit addr  | 0x12a9ffff | 0x1553ffff | ... | 0x1ff9ffff |
> + * | Target addr | host dma   | host dma   | ... | host dma   |
> + * +-------------+------------+------------+-----+------------+

[ ... ]

> +struct fdma_pci_atu_region *
> +fdma_pci_atu_region_map(struct fdma_pci_atu *atu, u64 target_addr, int size)
> +{
> +	struct fdma_pci_atu_region *region;
> +
> +	if (!atu)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (size <= 0)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (size > fdma_pci_atu_region_size())
> +		return ERR_PTR(-E2BIG);
> +
> +	region = fdma_pci_atu_region_get_free(atu);
> +	if (IS_ERR(region))
> +		return region;
> +
> +	region->target_addr = target_addr;
> +	region->in_use = true;
> +
> +	/* Enable first, according to datasheet section 3.24.7.4.1 */
> +	fdma_pci_atu_region_enable(region);
> +	fdma_pci_atu_configure_translation(region);
> +
> +	return region;
> +}

[Severity: Medium]
Could this dynamically program the limit_addr to base_addr + size - 1
to enforce strict DMA bounds, rather than using the statically
configured ~42MB limit?

Since the actual DMA buffer allocations are typically much smaller, the
current approach massively over-provisions the DMA window. Could a
hardware bug or malicious endpoint firmware exploit this to perform
out-of-bounds DMAs and silently corrupt adjacent host physical memory?

[ ... ]

> +void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr)
> +{
> +	struct fdma_pci_atu_region *regions = atu->regions;
> +	u32 region_size = fdma_pci_atu_region_size();
> +
> +	atu->addr = addr;
> +
> +	for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
> +		regions[i].base_addr =
> +			FDMA_PCI_ATU_OB_START + (i * region_size);
> +		regions[i].limit_addr =
> +			regions[i].base_addr + region_size - 1;
> +		regions[i].idx = i;
> +		regions[i].atu = atu;
> +	}
> +}

[Severity: Medium]
Should the initialization explicitly disable all hardware ATU regions by
clearing their enable bits?

Because the PCIe endpoint controller cannot be hard-reset by the driver
without dropping the PCIe link to the host, the ATU registers might
retain state from the bootloader or a previous kernel crash. Could
stray DMAs to host memory occur before the kernel explicitly reclaims
and reconfigures these regions if they are left enabled?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support
  2026-05-20  8:12 ` [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support Daniel Machon
  2026-05-23  1:56   ` Jakub Kicinski
@ 2026-05-30  0:44   ` sashiko-bot
  1 sibling, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [Critical] The driver unconditionally exposes XDP features for the new PCIe FDMA path, which lacks the required data structures, leading to a kernel crash.
- [High] Unconditional freeing of SKBs in `lan966x_fdma_pci_xmit` causes a Use-After-Free if hardware timestamping (PTP) is enabled.
- [High] The DMA descriptor status (`db->status`) is fetched multiple times from PCIe coherent memory, creating a Time-of-Check to Time-of-Use (TOCTOU) vulnerability that bypasses frame size validation.

Pre-existing issues:
- [High] Calling `lan966x_hw_offload()` can free the SKB if `skb_vlan_untag()` fails or reallocates it, leading to a Use-After-Free in the caller.
- [High] Missing `dma_wmb()` and incorrect assignment order in `__fdma_dcb_add()` leads to hardware ring corruption.
--

commit 120b358d3f4f3104d4b61923f6ff96cd2e8cc282
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: add PCIe FDMA support

Add PCIe FDMA support for lan966x. The PCIe FDMA path uses contiguous
DMA buffers mapped through the endpoint's ATU, with memcpy-based frame
transfer instead of per-page DMA mappings.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
> new file mode 100644
> index 0000000000000..c27d9e76e7358
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
> @@ -0,0 +1,394 @@
> +// SPDX-License-Identifier: GPL-2.0+
[ ... ]
> +static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
> +{
[ ... ]
> +	db = fdma_db_next_get(fdma);
> +
> +	/* BLOCKL is a 16-bit HW-populated field; reject obviously-bad
> +	 * values before they feed memcpy/XDP sizes.
> +	 */
> +	blockl = FDMA_DCB_STATUS_BLOCKL(db->status);
> +	if (!lan966x_fdma_pci_rx_size_fits(fdma, blockl))
> +		return FDMA_ERROR;
> +
> +	return FDMA_PASS;
> +}
> +
> +static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
> +						     u64 src_port)
> +{
> +	struct lan966x *lan966x = rx->lan966x;
> +	struct fdma *fdma = &rx->fdma;
> +	struct sk_buff *skb;
> +	struct fdma_db *db;
> +	u32 data_len;
> +
> +	/* Get the received frame and create an SKB for it. */
> +	db = fdma_db_next_get(fdma);
> +	data_len = FDMA_DCB_STATUS_BLOCKL(db->status);

[Severity: High]
Does fetching db->status again here create a Time-of-Check to Time-of-Use
(TOCTOU) vulnerability?

The driver first calls lan966x_fdma_pci_rx_check_frame(), which reads
db->status from PCIe-accessible coherent memory and validates that the
BLOCKL frame size fits within the buffer.

Immediately after, it calls lan966x_fdma_pci_rx_get_frame(), which fetches
db->status again to determine data_len.

Could a compromised or buggy PCIe endpoint modify the status word between
these fetches, changing the frame size to a large value, bypassing the
validation, and causing an out-of-bounds memcpy()?

> +
> +	skb = napi_alloc_skb(&lan966x->napi, data_len);
[ ... ]
> +	if (lan966x->bridge_mask & BIT(src_port)) {
> +		skb->offload_fwd_mark = 1;
> +
> +		skb_reset_network_header(skb);
> +		if (!lan966x_hw_offload(lan966x, src_port, skb))
> +			skb->offload_fwd_mark = 0;
> +	}
> +
> +	skb->dev->stats.rx_bytes += skb->len;

[Severity: High]
This isn't a bug introduced by this patch, but can lan966x_hw_offload() free
the SKB and lead to a Use-After-Free here?

lan966x_hw_offload() takes the skb pointer by value and calls
skb_vlan_untag(skb). If skb_vlan_untag() fails or needs to modify the VLAN
header (e.g., via skb_share_check() or skb_cow()), it frees the original SKB
and returns a new pointer or NULL.

However, because the SKB is passed by value, this caller retains the original
freed pointer. It then accesses skb->offload_fwd_mark, increments statistics
using skb->len, and passes the freed SKB to napi_gro_receive().

Does this cause a use-after-free when receiving VLAN-tagged packets while
bridging and IGMP snooping are enabled?

> +	skb->dev->stats.rx_packets++;
> +
> +	return skb;
> +}
[ ... ]
> +static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh,
> +				 struct net_device *dev)
> +{
[ ... ]
> +	/* Order frame write before DCB status write below. */
> +	dma_wmb();
> +
> +	fdma_dcb_add(fdma,
> +		     next_to_use,
> +		     0,
> +		     FDMA_DCB_STATUS_INTR |
> +		     FDMA_DCB_STATUS_SOF |
> +		     FDMA_DCB_STATUS_EOF |
> +		     FDMA_DCB_STATUS_BLOCKO(0) |
> +		     FDMA_DCB_STATUS_BLOCKL(IFH_LEN_BYTES + skb->len + ETH_FCS_LEN));

[Severity: High]
This is a pre-existing issue, but does fdma_dcb_add() lack a memory barrier
when linking the new DCB into the hardware ring?

In __fdma_dcb_add(), the driver links the new DMA Control Block (DCB) into the
hardware ring by updating last_dcb->nextptr before initializing the new DCB's
own nextptr and info fields:

drivers/net/ethernet/microchip/fdma/fdma_api.c:__fdma_dcb_add() {
    ...
	err = dcb_cb(fdma, dcb_idx, &fdma->last_dcb->nextptr);
    ...
	fdma->last_dcb = dcb;

	dcb->nextptr = FDMA_DCB_INVALID_DATA;
	dcb->info = info;
    ...
}

Furthermore, there is no dma_wmb() barrier between setting up the DBs and
linking the DCB.

Since the memory is coherent DMA memory, can the hardware immediately follow
the nextptr link and read uninitialized or stale info and nextptr values from
the new DCB, causing the DMA controller to process garbage data?

> +
> +	/* Start the transmission. */
> +	lan966x_fdma_tx_start(tx);
> +
> +	dev->stats.tx_bytes += skb->len;
> +	dev->stats.tx_packets++;
> +
> +	/* Safe to free: the PCIe DTBO does not enable the PTP interrupt,
> +	 * so lan966x->ptp stays 0 and lan966x_port_xmit() never enqueues
> +	 * this skb on port->tx_skbs for a TX timestamp.
> +	 */
> +	dev_consume_skb_any(skb);

[Severity: High]
Can unconditionally freeing the SKB here cause a Use-After-Free if hardware
timestamping (PTP) is enabled?

If PTP is enabled via the device tree (setting lan966x->ptp to 1),
lan966x_port_xmit() enqueues the SKB to port->tx_skbs for hardware
timestamping before passing it to lan966x_fdma_pci_xmit().

However, dev_consume_skb_any(skb) frees the SKB unconditionally. When the PTP
TX timestamp interrupt later fires, lan966x_ptp_irq_handler() will traverse
the tx_skbs queue and dereference the freed SKB.

The comment explicitly relies on a specific Device Tree Overlay to keep PTP
disabled. Is it safe to rely on userspace not providing their own device trees
that enable PTP, which would lead to memory corruption?

> +
> +	return NETDEV_TX_OK;
> +}
[ ... ]
> +static int lan966x_fdma_pci_init(struct lan966x *lan966x)
> +{
[ ... ]
> +	err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
> +	if (err)
> +		return err;
> +
> +	err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);

[Severity: Critical]
Does this PCIe FDMA path lack the required data structures for XDP, leading
to a kernel crash if XDP is attached?

Because lan966x->fdma is enabled, lan966x_probe_port() unconditionally exposes
dev->xdp_features. If a user attaches an XDP program, lan966x_xdp_setup()
calls lan966x_fdma_reload_page_pool(), which falls back to the old MMIO
lan966x_fdma_rx_alloc() function instead of the new
lan966x_fdma_pci_rx_alloc().

This replaces the PCIe ATU-mapped coherent ring with an incompatible page_pool
based ring without ATU mappings, destroying the DMA setup and potentially
causing IOMMU faults or memory corruption.

Furthermore, XDP_REDIRECT will crash because lan966x_fdma_pci_tx_alloc() does
not allocate tx->dcbs_buf, resulting in a NULL pointer dereference in
lan966x_fdma_xmit_xdpf().

Is this regression addressed by the subsequent patch 'net: lan966x: add PCIe
FDMA XDP support' in the series?

> +	if (err) {
> +		fdma_free_coherent_and_unmap(lan966x->dev, rx_fdma);
> +		return err;
> +	}
> +
> +	lan966x_fdma_rx_start(&lan966x->rx);
> +
> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=9

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper
  2026-05-20  8:12 ` [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper Daniel Machon
@ 2026-05-30  0:44   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Moving the LLP register configuration to the allocation function breaks the MTU change error recovery path, leaving the RX channel reactivated without a valid LLP.

Pre-existing issues:
- [Critical] The return value of `fdma_dcbs_init()` is ignored in `lan966x_fdma_rx_alloc()`, leading to partial initialization and potential memory corruption.
- [Critical] Setting the PTP flag for the TX buffer after handing the DCB to hardware creates a race condition with the TX completion handler.
- [High] The hardware and software DMA states become desynchronized during the MTU reload error path due to taking a stale snapshot of the FDMA state before stopping the traffic.
- [High] Missing `dma_wmb()` before updating the `nextptr` of the previous DCB in `__fdma_dcb_add()`.
--

commit 4390e4c62295adb12eb1d72e9557c5052040eb8a
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: add FDMA LLP register write helper

This patch introduces a shared helper for configuring the FDMA Link List
Pointer (LLP) register and moves the LLP configuration from the channel
start/activate functions to the allocation functions. This change
prevents the shared start/activate path from overwriting ATU-translated
addresses with DMA addresses in the upcoming PCIe FDMA path.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> index f8ce735a7fc09..6c5761e886d47 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c

[ ... ]

> @@ -127,6 +134,9 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
>  	fdma_dcbs_init(fdma, FDMA_DCB_INFO_DATAL(fdma->db_size),
>  		       FDMA_DCB_STATUS_INTR);

[Severity: Critical]
This isn't a bug introduced by this patch, but is it safe to ignore the
return value of fdma_dcbs_init() here?

If fdma_dcbs_init() fails (e.g., due to page allocation failure in the
data pointer callback), it returns an error code, leaving subsequent
DCBs with uninitialized or zeroed data pointers. Because the return
value is ignored, could the driver incorrectly proceed to activate the
DMA ring, causing the hardware to attempt to DMA incoming packets into
physical address 0?

> +	lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
> +				   fdma->channel_id);
> +
>  	return 0;
>  }

[ ... ]

> @@ -136,14 +146,6 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
>  	struct fdma *fdma = &rx->fdma;
>  	u32 mask;
>  
> -	/* When activating a channel, first is required to write the first DCB
> -	 * address and then to activate it
> -	 */
> -	lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
> -	       FDMA_DCB_LLP(fdma->channel_id));
> -	lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
> -	       FDMA_DCB_LLP1(fdma->channel_id));
> -

[Severity: High]
Does moving the LLP register configuration to the allocation function
break the MTU change error recovery path?

During an MTU reload failure, the driver restores the software state and
calls lan966x_fdma_rx_start(), which now fails to write the required
LLP register. Could this activate the hardware DMA channel without a
valid Link List Pointer, leading to DMA failures or hardware stalls?

[Severity: High]
This is a pre-existing issue, but does taking a stale snapshot of the
FDMA state before stopping the traffic cause the hardware and software
DMA states to become desynchronized during the MTU reload error path?

In lan966x_fdma_reload(), the software snapshot fdma_rx_old is taken
before NAPI and the hardware channel are disabled. If packets arrive in
this window, the hardware advances.

On allocation failure, restoring this stale snapshot rolls back the
software indices while the hardware's internal state remains advanced.
Could this permanently desynchronize the ring indices and cause ring
corruption?

>  	lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
>  	       FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
>  	       FDMA_CH_CFG_CH_INJ_PORT_SET(0) |

[ ... ]

[Severity: High]
This is a pre-existing issue, but is there a missing dma_wmb() before
updating the nextptr of the previous DCB in __fdma_dcb_add()?

When appending a new DCB to a running TX ring in lan966x_fdma_xmit(),
the CPU updates the new DB's status and data pointers, and then updates
the nextptr of the previous DCB to link it. Without a dma_wmb() separating
these writes, could the CPU or compiler reorder them?

If the hardware polling the previous DCB's nextptr observes the new link
before the new DCB's contents are visible in memory, could this lead to
the hardware reading garbage descriptors?

[Severity: Critical]
This isn't a bug introduced by this patch, but does setting the PTP flag
for the TX buffer after handing the DCB to hardware create a race condition
with the TX completion handler?

In lan966x_fdma_xmit(), the DCB is submitted to the hardware via
__fdma_dcb_add() and lan966x_fdma_tx_start() before the software sets
next_dcb_buf->ptp = true.

If the hardware transmits the packet immediately and raises a completion
interrupt, could a concurrent NAPI poll process the completion, read the
stale ptp == false state, and free the SKB? Could the submitting CPU then
set the PTP flag on the already freed SKB, causing a use-after-free during
subsequent PTP timestamp processing?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot
  2026-05-20  8:12 ` [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot Daniel Machon
  2026-05-23  1:45   ` Jakub Kicinski
@ 2026-05-30  0:44   ` sashiko-bot
  1 sibling, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] Calling napi_disable() unconditionally during shutdown causes a NULL pointer dereference if the device tree contains no valid ports.
- [High] The lan966x_shutdown() callback disables hardware channels before stopping the network stack and NAPI, allowing concurrent traffic to re-activate the DMA channels.

Pre-existing issues:
- [Critical] The driver's `remove` method tears down the FDMA structures before unregistering the network devices, leading to a Use-After-Free upon concurrent transmission.
- [High] NAPI is deleted without being disabled in the probe error path if fdma_init fails.
--

commit 4567cfb4a8261113fe86f623b6d21c6d3a34dd6b
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: add shutdown callback to stop FDMA on reboot

This commit adds a shutdown callback to stop the FDMA engine when the host
reboots. It masks FDMA interrupts and disables the RX and TX channels.
This prevents the shared PCIe INTx from asserting before the driver
re-probes on the card.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c

[ ... ]

> static void lan966x_remove(struct platform_device *pdev)
> {
> 	struct lan966x *lan966x = platform_get_drvdata(pdev);
> 
> 	lan966x_taprio_deinit(lan966x);
> 	lan966x_vcap_deinit(lan966x);
> 	lan966x->ops->fdma_deinit(lan966x);
> 	lan966x_cleanup_ports(lan966x);

[Severity: Critical]
This is a pre-existing issue, but does the remove method tear down FDMA
structures before unregistering the network devices?

Because lan966x_cleanup_ports() has not yet unregistered the netdevs, they
remain active. Could a concurrent frame transmission call lan966x_fdma_xmit(),
access the freed DMA coherent memory, and result in a use-after-free?

> 
> 	cancel_delayed_work_sync(&lan966x->stats_work);
> 	destroy_workqueue(lan966x->stats_queue);

[ ... ]

> @@ -1313,9 +1313,27 @@ static void lan966x_remove(struct platform_device *pdev)
>  	debugfs_remove_recursive(lan966x->debugfs_root);
>  }
>  
> +static void lan966x_shutdown(struct platform_device *pdev)
> +{
> +	struct lan966x *lan966x = platform_get_drvdata(pdev);
> +
> +	if (!lan966x->fdma)
> +		return;
> +
> +	lan966x_fdma_rx_disable(&lan966x->rx);
> +	lan966x_fdma_tx_disable(&lan966x->tx);

[Severity: High]
Should the network stack and NAPI be disabled before turning off the
hardware channels?

If the networking stack invokes ndo_start_xmit concurrently, could the driver
observe the channel is deactivated and write to FDMA_CH_ACTIVATE to turn the
DMA engine back on? This might defeat the purpose of the shutdown callback
and risk DMA memory corruption.

> +
> +	napi_synchronize(&lan966x->napi);
> +	napi_disable(&lan966x->napi);

[Severity: High]
What happens here if the device tree contains no valid ports?

If the ports list is empty, lan966x_fdma_netdev_init() is never called, leaving
lan966x->napi zero-initialized (meaning napi->dev is NULL). Does calling
napi_disable() here unconditionally attempt to acquire netdev_lock(n->dev) and
cause a NULL pointer dereference?

> +
> +	lan_wr(0, lan966x, FDMA_INTR_ENA);
> +	lan_wr(0, lan966x, FDMA_INTR_DB_ENA);
> +}

[Severity: High]
This is a pre-existing issue, but I noticed a potential problem in
lan966x_probe() during error handling.

If lan966x->ops->fdma_init() fails after the global NAPI instance is
initialized and enabled by lan966x_port_init(), the error path jumps to
cleanup_ptp and bypasses cleanup_fdma.

This leads to lan966x_cleanup_ports() removing the NAPI instance via
netif_napi_del() without first calling napi_disable(). Does deleting an
enabled NAPI instance trigger a WARN_ON and potentially lead to a
use-after-free if a poll happens concurrently?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support
  2026-05-20  8:12 ` [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
@ 2026-05-30  0:44   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Race condition leading to NULL pointer dereference, and improper error cleanup causing active NAPI deletion during probe.
--

commit 451c636b2ada8ed9e0353b85f81b6fd1d9682b69
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: add FDMA ops dispatch for PCIe support

This commit introduces an ops table for FDMA operations to support
different implementations for platform and PCIe instances. It plumbs
the various FDMA functions through this new ops structure.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c

[ ... ]

> @@ -1234,7 +1249,7 @@ static int lan966x_probe(struct platform_device *pdev)
>  	if (err)
>  		goto cleanup_fdb;
>  
> -	err = lan966x_fdma_init(lan966x);
> +	err = lan966x->ops->fdma_init(lan966x);
>  	if (err)
>  		goto cleanup_ptp;

[Severity: High]
This isn't a bug introduced by this patch, but looking at the probe
sequence, could this late initialization of FDMA lead to a NULL pointer
dereference?

Earlier in lan966x_probe(), the ports are initialized and
lan966x_fdma_netdev_init() is called, which in turn calls napi_enable().
The FDMA interrupt is also requested early. 

Since the FDMA queue backing memory isn't allocated until
lan966x->ops->fdma_init() is called here, what happens if an interrupt fires
before this point? It appears the interrupt handler could schedule the NAPI
instance, and the poll routine would execute fdma_has_frames(), which then
dereferences the unallocated fdma->dcbs pointer.

>  
> @@ -1247,7 +1262,7 @@ static int lan966x_probe(struct platform_device *pdev)
>  	return 0;
>  
>  cleanup_fdma:
> -	lan966x_fdma_deinit(lan966x);
> +	lan966x->ops->fdma_deinit(lan966x);
>  
>  cleanup_ptp:
>  	lan966x_ptp_deinit(lan966x);

[Severity: High]
This is a pre-existing issue, but does skipping fdma_deinit() here leave
the NAPI instance enabled during the rest of the error cleanup?

When jumping to cleanup_ptp, it skips the fdma_deinit() call which would
normally disable NAPI. The cleanup path eventually reaches
lan966x_fdma_netdev_deinit(), which calls netif_napi_del() on the
still-enabled NAPI structure. 

Does this create a use-after-free vulnerability in softnet_data.poll_list by
deleting an active NAPI structure?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=6

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support
  2026-05-20  8:12 ` [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
@ 2026-05-30  0:44   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-30  0:44 UTC (permalink / raw)
  To: Daniel Machon; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Enabling NAPI after starting FDMA RX and waking TX queues introduces a race condition that can result in permanently masked hardware interrupts.
- [Low] Calling napi_synchronize() immediately before napi_disable() is an anti-pattern and inherently racy.

Pre-existing issues:
- [High] Missing tx_lock in lan966x_fdma_pci_xmit causes TX ring corruption when multiple ports transmit concurrently.
--

commit 57dba359ada8c7a7372a6ba3549f892f7d7fd051
Author: Daniel Machon <daniel.machon@microchip.com>

net: lan966x: add PCIe FDMA MTU change support

This commit adds MTU change support for the PCIe FDMA path. It reallocates
the RX and TX buffers with the new size when the MTU changes, and caps the
PCIe DCB ring at 256 to prevent the contiguous allocation from overflowing
MAX_PAGE_ORDER at jumbo MTUs.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
> index c27d9e76e7358..3ea6d22ee573b 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
[ ... ]
> @@ -342,7 +347,7 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
>  
>  	lan966x->tx.lan966x = lan966x;
>  	tx_fdma->channel_id = FDMA_INJ_CHANNEL;
> -	tx_fdma->n_dcbs = FDMA_DCB_MAX;
> +	tx_fdma->n_dcbs = FDMA_PCI_DCB_MAX;
>  	tx_fdma->n_dbs = FDMA_TX_DCB_MAX_DBS;
>  	tx_fdma->priv = lan966x;
>  	tx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);

[Severity: High]
This is a pre-existing issue, but does the shared FDMA TX ring require
synchronization in the transmission path?

The driver shares a single FDMA TX ring (lan966x->tx.fdma) across multiple
physical switch ports, each represented by a separate net_device.

Since the network stack synchronizes ndo_start_xmit on a per-queue
(per net_device) basis, multiple ports can execute lan966x_fdma_pci_xmit()
concurrently on different CPUs. Because lan966x_fdma_pci_xmit() accesses
and modifies the shared tx->fdma state without holding the
spin_lock(&lan966x->tx_lock) (unlike the non-PCI lan966x_fdma_xmit() path),
could concurrent transmissions lead to unsynchronized TX ring accesses and
ring corruption?

[ ... ]
> @@ -365,9 +370,155 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
>  	return 0;
>  }
>  
[ ... ]
> +static int lan966x_fdma_pci_reload(struct lan966x *lan966x, int new_mtu)
> +{
> +	struct fdma tx_fdma_old = lan966x->tx.fdma;
> +	struct fdma rx_fdma_old = lan966x->rx.fdma;
> +	u32 old_mtu = lan966x->rx.max_mtu;
> +	int err;
> +
> +	napi_synchronize(&lan966x->napi);
> +	napi_disable(&lan966x->napi);

[Severity: Low]
Is this call to napi_synchronize() necessary before napi_disable()?

Since napi_disable() safely sets NAPI_STATE_DISABLE and internally blocks
until the NAPI_STATE_SCHED flag clears, adding a manual napi_synchronize()
right before it seems redundant. Does this leave a tiny race window where NAPI
could be rescheduled before napi_disable() takes effect?

> +	lan966x_fdma_pci_stop_netdev(lan966x);
> +	lan966x_fdma_rx_disable(&lan966x->rx);
> +	lan966x_fdma_tx_disable(&lan966x->tx);
[ ... ]
> +	/* Free and unmap old memory. */
> +	fdma_free_coherent_and_unmap(lan966x->dev, &rx_fdma_old);
> +	fdma_free_coherent_and_unmap(lan966x->dev, &tx_fdma_old);
> +
> +	/* Keep this order: rx_start, wakeup_netdev, napi_enable. */
> +	lan966x_fdma_rx_start(&lan966x->rx);
> +	lan966x_fdma_pci_wakeup_netdev(lan966x);
> +	napi_enable(&lan966x->napi);

[Severity: High]
Does this order introduce a race condition that can result in permanently
masked hardware interrupts?

If a packet is received right after lan966x_fdma_rx_start() enables hardware
interrupts, the device raises an interrupt triggering the handler. The ISR
would then mask the hardware interrupt and call napi_schedule().

Because napi_enable() has not yet been called, NAPI_STATE_SCHED is still set
from the earlier napi_disable(), which causes napi_schedule() to fail to
queue the NAPI poll. Subsequently, napi_enable() clears the state flags but
does not schedule a poll.

Should napi_enable() be called before lan966x_fdma_rx_start() to prevent
hardware interrupts from remaining permanently masked?

> +
> +	return err;
> +restore:
> +
> +	/* No new buffers are allocated at this point. Use the old buffers,
> +	 * but reset them before starting the FDMA again.
> +	 */
> +
> +	memcpy(&lan966x->tx.fdma, &tx_fdma_old, sizeof(struct fdma));
> +	memcpy(&lan966x->rx.fdma, &rx_fdma_old, sizeof(struct fdma));
> +
> +	lan966x->rx.max_mtu = old_mtu;
> +
> +	lan966x_fdma_pci_reset_mem(lan966x);
> +
> +	/* Keep this order: rx_start, wakeup_netdev, napi_enable. */
> +	lan966x_fdma_rx_start(&lan966x->rx);
> +	lan966x_fdma_pci_wakeup_netdev(lan966x);
> +	napi_enable(&lan966x->napi);

[Severity: High]
Does this restore path have the same race condition with hardware interrupts
as mentioned above?

> +
> +	return err;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520-lan966x-pci-fdma-v5-0-ca56197ae05b@microchip.com?part=10

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-05-30  0:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20  8:12 [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon
2026-05-20  8:12 ` [PATCH net-next v5 01/13] MAINTAINERS: add FDMA library to Sparx5 SoC entry Daniel Machon
2026-05-20  8:12 ` [PATCH net-next v5 02/13] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
2026-05-20  8:12 ` [PATCH net-next v5 03/13] net: microchip: fdma: add PCIe ATU support Daniel Machon
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 04/13] net: lan966x: add FDMA LLP register write helper Daniel Machon
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 05/13] net: lan966x: export FDMA helpers for reuse Daniel Machon
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 06/13] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 07/13] net: lan966x: clear FDMA interrupt stickies after switch reset Daniel Machon
2026-05-20  8:12 ` [PATCH net-next v5 08/13] net: lan966x: add shutdown callback to stop FDMA on reboot Daniel Machon
2026-05-23  1:45   ` Jakub Kicinski
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 09/13] net: lan966x: add PCIe FDMA support Daniel Machon
2026-05-23  1:56   ` Jakub Kicinski
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 10/13] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
2026-05-30  0:44   ` sashiko-bot
2026-05-20  8:12 ` [PATCH net-next v5 11/13] net: lan966x: add PCIe FDMA XDP support Daniel Machon
2026-05-23  2:01   ` Jakub Kicinski
2026-05-20  8:12 ` [PATCH net-next v5 12/13] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
2026-05-23  2:01   ` Jakub Kicinski
2026-05-20  8:12 ` [PATCH net-next v5 13/13] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
2026-05-21 14:12 ` [PATCH net-next v5 00/13] net: lan966x: add support for PCIe FDMA Daniel Machon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.