public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
@ 2026-03-20 15:00 Daniel Machon
  2026-03-20 15:00 ` [PATCH net-next 01/10] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

When lan966x operates as a PCIe endpoint, the driver currently uses
register-based I/O for frame injection and extraction. This approach is
functional but slow, topping out at around 33 Mbps on an Intel x86 host
with a lan966x PCIe card.

This series adds FDMA (Frame DMA) support for the PCIe path. When
operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
directly access host memory, so DMA buffers are allocated as contiguous
coherent memory and mapped through the PCIe Address Translation Unit
(ATU). The ATU provides outbound windows that translate internal FDMA
addresses to PCIe bus addresses, allowing the FDMA engine to read and
write host memory. Because the ATU requires contiguous address regions,
page_pool and normal per-page DMA mappings cannot be used. Instead,
frames are transferred using memcpy between the ATU-mapped buffers and
the network stack. With this, throughput increases from ~33 Mbps to ~620
Mbps for default MTU.

Patches 1-2 prepare the shared FDMA library: patch 1 renames the
contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
management and coherent DMA allocation with ATU mapping.

Patches 3-5 refactor the lan966x FDMA code to support both platform and
PCIe paths: extracting the LLP register write into a helper, exporting
shared functions, and introducing an ops dispatch table selected at
probe time.

Patch 6 adds the core PCIe FDMA implementation with RX/TX using
contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
change and XDP support respectively.

Patches 9-10 update the lan966x PCI device tree overlay to extend the
cpu register mapping to cover the ATU register space and add the FDMA
interrupt.

To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Horatiu Vultur <horatiu.vultur@microchip.com>
To: Steen Hegelund <steen.hegelund@microchip.com>
To: UNGLinuxDriver@microchip.com
To: Alexei Starovoitov <ast@kernel.org>
To: Daniel Borkmann <daniel@iogearbox.net>
To: Jesper Dangaard Brouer <hawk@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
To: Herve Codina <herve.codina@bootlin.com>
To: Arnd Bergmann <arnd@arndb.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
Daniel Machon (10):
      net: microchip: fdma: rename contiguous dataptr helpers
      net: microchip: fdma: add PCIe ATU support
      net: lan966x: add FDMA LLP register write helper
      net: lan966x: export FDMA helpers for reuse
      net: lan966x: add FDMA ops dispatch for PCIe support
      net: lan966x: add PCIe FDMA support
      net: lan966x: add PCIe FDMA MTU change support
      net: lan966x: add PCIe FDMA XDP support
      misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
      misc: lan966x-pci: dts: add fdma interrupt to overlay

 drivers/misc/lan966x_pci.dtso                      |   5 +-
 drivers/net/ethernet/microchip/fdma/Makefile       |   4 +
 drivers/net/ethernet/microchip/fdma/fdma_api.c     |  33 ++
 drivers/net/ethernet/microchip/fdma/fdma_api.h     |  25 +-
 drivers/net/ethernet/microchip/fdma/fdma_pci.c     | 177 +++++++
 drivers/net/ethernet/microchip/fdma/fdma_pci.h     |  41 ++
 drivers/net/ethernet/microchip/lan966x/Makefile    |   4 +
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  |  51 +-
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 551 +++++++++++++++++++++
 .../net/ethernet/microchip/lan966x/lan966x_main.c  |  47 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  46 ++
 .../net/ethernet/microchip/lan966x/lan966x_regs.h  |   1 +
 .../net/ethernet/microchip/lan966x/lan966x_xdp.c   |   6 +
 13 files changed, 949 insertions(+), 42 deletions(-)
---
base-commit: 9ac76f3d0bb2940db3a9684d596b9c8f301ef315
change-id: 20260313-lan966x-pci-fdma-94ed485d23fa

Best regards,
-- 
Daniel Machon <daniel.machon@microchip.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 01/10] net: microchip: fdma: rename contiguous dataptr helpers
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
@ 2026-03-20 15:00 ` Daniel Machon
  2026-03-20 15:00 ` [PATCH net-next 02/10] net: microchip: fdma: add PCIe ATU support Daniel Machon
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

When the FDMA library was introduced [1], two helpers to get the DMA and
virtual address of a DCB, in contiguous memory, were added. These
helpers have had no callers until this series. I found the naming I
initially used confusing and inconsistent.

Rename fdma_dataptr_get_contiguous() and
fdma_dataptr_virt_get_contiguous() to fdma_dataptr_dma_addr_contiguous()
and fdma_dataptr_virt_addr_contiguous(). This makes the pair symmetric
and clarifies what type of address each returns.

[1]: commit 30e48a75df9c ("net: microchip: add FDMA library")

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/fdma/fdma_api.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.h b/drivers/net/ethernet/microchip/fdma/fdma_api.h
index d91affe8bd98..94f1a6596097 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.h
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.h
@@ -197,8 +197,9 @@ static inline int fdma_nextptr_cb(struct fdma *fdma, int dcb_idx, u64 *nextptr)
  * if the dataptr addresses and DCB's are in contiguous memory and the driver
  * supports XDP.
  */
-static inline u64 fdma_dataptr_get_contiguous(struct fdma *fdma, int dcb_idx,
-					      int db_idx)
+static inline u64 fdma_dataptr_dma_addr_contiguous(struct fdma *fdma,
+						   int dcb_idx,
+						   int db_idx)
 {
 	return fdma->dma + (sizeof(struct fdma_dcb) * fdma->n_dcbs) +
 	       (dcb_idx * fdma->n_dbs + db_idx) * fdma->db_size +
@@ -209,8 +210,8 @@ static inline u64 fdma_dataptr_get_contiguous(struct fdma *fdma, int dcb_idx,
  * applicable if the dataptr addresses and DCB's are in contiguous memory and
  * the driver supports XDP.
  */
-static inline void *fdma_dataptr_virt_get_contiguous(struct fdma *fdma,
-						     int dcb_idx, int db_idx)
+static inline void *fdma_dataptr_virt_addr_contiguous(struct fdma *fdma,
+						      int dcb_idx, int db_idx)
 {
 	return (u8 *)fdma->dcbs + (sizeof(struct fdma_dcb) * fdma->n_dcbs) +
 	       (dcb_idx * fdma->n_dbs + db_idx) * fdma->db_size +

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 02/10] net: microchip: fdma: add PCIe ATU support
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
  2026-03-20 15:00 ` [PATCH net-next 01/10] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
@ 2026-03-20 15:00 ` Daniel Machon
  2026-03-20 15:00 ` [PATCH net-next 03/10] net: lan966x: add FDMA LLP register write helper Daniel Machon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

When lan966x or lan969x operates as a PCIe endpoint, the internal FDMA
engine cannot directly access host memory. Instead, DMA addresses must
be translated through the PCIe Address Translation Unit (ATU). The ATU
provides outbound windows that map internal addresses to PCIe bus
addresses.

The ATU outbound address space (0x10000000-0x1fffffff) is divided into
six equally-sized regions (~42MB each). When FDMA buffers are allocated,
a free ATU region is claimed and programmed with the DMA target address.
The FDMA engine then uses the region's base address in its descriptors,
and the ATU translates these to the actual DMA addresses on the PCIe bus.

Add the required functions and helpers that combine the DMA allocation
with the ATU region mapping, effectively adding support for PCIe FDMA.

This implementation will also be used by the lan969x, when PCIe FDMA is
added for that platform in the future.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/fdma/Makefile   |   4 +
 drivers/net/ethernet/microchip/fdma/fdma_api.c |  33 +++++
 drivers/net/ethernet/microchip/fdma/fdma_api.h |  16 +++
 drivers/net/ethernet/microchip/fdma/fdma_pci.c | 177 +++++++++++++++++++++++++
 drivers/net/ethernet/microchip/fdma/fdma_pci.h |  41 ++++++
 5 files changed, 271 insertions(+)

diff --git a/drivers/net/ethernet/microchip/fdma/Makefile b/drivers/net/ethernet/microchip/fdma/Makefile
index cc9a736be357..eed4df6f7158 100644
--- a/drivers/net/ethernet/microchip/fdma/Makefile
+++ b/drivers/net/ethernet/microchip/fdma/Makefile
@@ -5,3 +5,7 @@
 
 obj-$(CONFIG_FDMA) += fdma.o
 fdma-y += fdma_api.o
+
+ifdef CONFIG_MCHP_LAN966X_PCI
+fdma-y += fdma_pci.o
+endif
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.c b/drivers/net/ethernet/microchip/fdma/fdma_api.c
index e78c3590da9e..072d36773835 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.c
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.c
@@ -127,6 +127,39 @@ void fdma_free_phys(struct fdma *fdma)
 }
 EXPORT_SYMBOL_GPL(fdma_free_phys);
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+/* Allocate coherent DMA memory and map it in the ATU. */
+int fdma_alloc_coherent_and_map(struct device *dev, struct fdma *fdma,
+				struct fdma_pci_atu *atu)
+{
+	int err;
+
+	err = fdma_alloc_coherent(dev, fdma);
+	if (err)
+		return err;
+
+	fdma->atu_region = fdma_pci_atu_region_map(atu,
+						   fdma->dma,
+						   fdma->size);
+
+	if (IS_ERR(fdma->atu_region)) {
+		fdma_free_coherent(dev, fdma);
+		return PTR_ERR(fdma->atu_region);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fdma_alloc_coherent_and_map);
+
+/* Free coherent DMA memory and unmap the memory in the ATU. */
+void fdma_free_coherent_and_unmap(struct device *dev, struct fdma *fdma)
+{
+	fdma_pci_atu_region_unmap(fdma->atu_region);
+	fdma_free_coherent(dev, fdma);
+}
+EXPORT_SYMBOL_GPL(fdma_free_coherent_and_unmap);
+#endif
+
 /* Get the size of the FDMA memory */
 u32 fdma_get_size(struct fdma *fdma)
 {
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_api.h b/drivers/net/ethernet/microchip/fdma/fdma_api.h
index 94f1a6596097..0e0f8af7463f 100644
--- a/drivers/net/ethernet/microchip/fdma/fdma_api.h
+++ b/drivers/net/ethernet/microchip/fdma/fdma_api.h
@@ -7,6 +7,10 @@
 #include <linux/etherdevice.h>
 #include <linux/types.h>
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+#include "fdma_pci.h"
+#endif
+
 /* This provides a common set of functions and data structures for interacting
  * with the Frame DMA engine on multiple Microchip switchcores.
  *
@@ -109,6 +113,11 @@ struct fdma {
 	u32 channel_id;
 
 	struct fdma_ops ops;
+
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	/* PCI ATU region for this FDMA instance. */
+	struct fdma_pci_atu_region *atu_region;
+#endif
 };
 
 /* Advance the DCB index and wrap if required. */
@@ -234,9 +243,16 @@ int __fdma_dcb_add(struct fdma *fdma, int dcb_idx, u64 info, u64 status,
 
 int fdma_alloc_coherent(struct device *dev, struct fdma *fdma);
 int fdma_alloc_phys(struct fdma *fdma);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+int fdma_alloc_coherent_and_map(struct device *dev, struct fdma *fdma,
+				struct fdma_pci_atu *atu);
+#endif
 
 void fdma_free_coherent(struct device *dev, struct fdma *fdma);
 void fdma_free_phys(struct fdma *fdma);
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+void fdma_free_coherent_and_unmap(struct device *dev, struct fdma *fdma);
+#endif
 
 u32 fdma_get_size(struct fdma *fdma);
 u32 fdma_get_size_contiguous(struct fdma *fdma);
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.c b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
new file mode 100644
index 000000000000..d9d3b61a73ef
--- /dev/null
+++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/bitfield.h>
+#include <linux/bug.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+
+#include "fdma_pci.h"
+
+/* When the switch operates as a PCIe endpoint, the FDMA engine needs to
+ * DMA to/from host memory. The FDMA writes to addresses within the endpoint's
+ * internal Outbound (OB) address space, and the PCIe ATU translates these to
+ * DMA addresses on the PCIe bus, targeting host memory.
+ *
+ * The ATU supports up to six outbound regions. This implementation divides
+ * the OB address space into six equally sized chunks.
+ *
+ * +-------------+------------+------------+-----+------------+
+ * | Index       | Region 0   | Region 1   | ... | Region 5   |
+ * +-------------+------------+------------+-----+------------+
+ * | Base addr   | 0x10000000 | 0x12aa0000 | ... | 0x1d520000 |
+ * | Limit addr  | 0x12a9ffff | 0x1553ffff | ... | 0x1ffbffff |
+ * | Target addr | host dma   | host dma   | ... | host dma   |
+ * +-------------+------------+------------+-----+------------+
+ *
+ * Base addr is the start address of the region within the OB address space.
+ * Limit addr is the end address of the region within the OB address space.
+ * Target addr is the host DMA address that the base addr translates to.
+ */
+
+#define FDMA_PCI_ATU_REGION_ALIGN    BIT(16) /* 64KB */
+#define FDMA_PCI_ATU_OB_START        0x10000000
+#define FDMA_PCI_ATU_OB_END          0x1fffffff
+
+#define FDMA_PCI_ATU_ADDR            0x300000
+#define FDMA_PCI_ATU_IDX_SIZE        0x200
+#define FDMA_PCI_ATU_ENA_REG         0x4
+#define FDMA_PCI_ATU_ENA_BIT         BIT(31)
+#define FDMA_PCI_ATU_LWR_BASE_ADDR   0x8
+#define FDMA_PCI_ATU_UPP_BASE_ADDR   0xc
+#define FDMA_PCI_ATU_LIMIT_ADDR      0x10
+#define FDMA_PCI_ATU_LWR_TARGET_ADDR 0x14
+#define FDMA_PCI_ATU_UPP_TARGET_ADDR 0x18
+
+static u32 fdma_pci_atu_region_size(void)
+{
+	return round_down((FDMA_PCI_ATU_OB_END - FDMA_PCI_ATU_OB_START) /
+			  FDMA_PCI_ATU_REGION_MAX, FDMA_PCI_ATU_REGION_ALIGN);
+}
+
+static void __iomem *fdma_pci_atu_addr_get(void __iomem *addr, int offset,
+					   int idx)
+{
+	return addr + FDMA_PCI_ATU_ADDR + FDMA_PCI_ATU_IDX_SIZE * idx + offset;
+}
+
+static void fdma_pci_atu_region_enable(struct fdma_pci_atu_region *region)
+{
+	writel(FDMA_PCI_ATU_ENA_BIT,
+	       fdma_pci_atu_addr_get(region->atu->addr, FDMA_PCI_ATU_ENA_REG,
+				     region->idx));
+}
+
+static void fdma_pci_atu_region_disable(struct fdma_pci_atu_region *region)
+{
+	writel(0, fdma_pci_atu_addr_get(region->atu->addr, FDMA_PCI_ATU_ENA_REG,
+					region->idx));
+}
+
+/* Configure the address translation in the ATU. */
+static void
+fdma_pci_atu_configure_translation(struct fdma_pci_atu_region *region)
+{
+	struct fdma_pci_atu *atu = region->atu;
+	int idx = region->idx;
+
+	writel(lower_32_bits(region->base_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_LWR_BASE_ADDR, idx));
+
+	writel(upper_32_bits(region->base_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_UPP_BASE_ADDR, idx));
+
+	/* Upper limit register only needed with REGION_SIZE > 4GB. */
+	writel(region->limit_addr,
+	       fdma_pci_atu_addr_get(atu->addr, FDMA_PCI_ATU_LIMIT_ADDR, idx));
+
+	writel(lower_32_bits(region->target_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_LWR_TARGET_ADDR, idx));
+
+	writel(upper_32_bits(region->target_addr),
+	       fdma_pci_atu_addr_get(atu->addr,
+				     FDMA_PCI_ATU_UPP_TARGET_ADDR, idx));
+}
+
+/* Find an unused ATU region (target_addr == 0). */
+static struct fdma_pci_atu_region *
+fdma_pci_atu_region_get_free(struct fdma_pci_atu *atu)
+{
+	struct fdma_pci_atu_region *regions = atu->regions;
+
+	for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
+		if (regions[i].target_addr)
+			continue;
+
+		return &regions[i];
+	}
+
+	return ERR_PTR(-ENOMEM);
+}
+
+/* Unmap an ATU region, clearing its translation and disabling it. */
+void fdma_pci_atu_region_unmap(struct fdma_pci_atu_region *region)
+{
+	region->target_addr = 0;
+
+	fdma_pci_atu_configure_translation(region);
+	fdma_pci_atu_region_disable(region);
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_region_unmap);
+
+/* Map a host DMA address into a free outbound region. */
+struct fdma_pci_atu_region *
+fdma_pci_atu_region_map(struct fdma_pci_atu *atu, u64 target_addr, int size)
+{
+	struct fdma_pci_atu_region *region;
+
+	if (!atu)
+		return ERR_PTR(-EINVAL);
+
+	if (size > fdma_pci_atu_region_size())
+		return ERR_PTR(-E2BIG);
+
+	region = fdma_pci_atu_region_get_free(atu);
+	if (IS_ERR(region))
+		return region;
+
+	region->target_addr = target_addr;
+
+	/* Enable first, according to datasheet section 3.24.7.4.1 */
+	fdma_pci_atu_region_enable(region);
+	fdma_pci_atu_configure_translation(region);
+
+	return region;
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_region_map);
+
+/* Translate a host DMA address to the corresponding OB address. */
+u64 fdma_pci_atu_translate_addr(struct fdma_pci_atu_region *region, u64 addr)
+{
+	return region->base_addr + (addr - region->target_addr);
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_translate_addr);
+
+/* Initialize ATU, dividing the OB space into equally sized regions. */
+void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr)
+{
+	struct fdma_pci_atu_region *regions = atu->regions;
+	u32 region_size = fdma_pci_atu_region_size();
+
+	atu->addr = addr;
+
+	for (int i = 0; i < FDMA_PCI_ATU_REGION_MAX; i++) {
+		regions[i].base_addr =
+			FDMA_PCI_ATU_OB_START + (i * region_size);
+		regions[i].limit_addr =
+			regions[i].base_addr + region_size - 1;
+		regions[i].idx = i;
+		regions[i].atu = atu;
+	}
+}
+EXPORT_SYMBOL_GPL(fdma_pci_atu_init);
diff --git a/drivers/net/ethernet/microchip/fdma/fdma_pci.h b/drivers/net/ethernet/microchip/fdma/fdma_pci.h
new file mode 100644
index 000000000000..359950ccabac
--- /dev/null
+++ b/drivers/net/ethernet/microchip/fdma/fdma_pci.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _FDMA_PCI_H_
+#define _FDMA_PCI_H_
+
+#include <linux/types.h>
+
+#define FDMA_PCI_ATU_REGION_MAX 6
+#define FDMA_PCI_DB_ALIGN 128
+#define FDMA_PCI_DB_SIZE(mtu) ALIGN(mtu, FDMA_PCI_DB_ALIGN)
+
+struct fdma_pci_atu;
+
+struct fdma_pci_atu_region {
+	struct fdma_pci_atu *atu;
+	u64 base_addr; /* Base addr of the OB window */
+	u64 limit_addr; /* Limit addr of the OB window */
+	u64 target_addr; /* Host DMA address this region maps to */
+	int idx;
+};
+
+struct fdma_pci_atu {
+	void __iomem *addr;
+	struct fdma_pci_atu_region regions[FDMA_PCI_ATU_REGION_MAX];
+};
+
+/* Initialize ATU, dividing OB space into regions. */
+void fdma_pci_atu_init(struct fdma_pci_atu *atu, void __iomem *addr);
+
+/* Unmap an ATU region, clearing its translation and disabling it. */
+void fdma_pci_atu_region_unmap(struct fdma_pci_atu_region *region);
+
+/* Map a host DMA address into a free ATU region. */
+struct fdma_pci_atu_region *fdma_pci_atu_region_map(struct fdma_pci_atu *atu,
+						    u64 target_addr,
+						    int size);
+
+/* Translate a host DMA address to the OB address space. */
+u64 fdma_pci_atu_translate_addr(struct fdma_pci_atu_region *region, u64 addr);
+
+#endif

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 03/10] net: lan966x: add FDMA LLP register write helper
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
  2026-03-20 15:00 ` [PATCH net-next 01/10] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
  2026-03-20 15:00 ` [PATCH net-next 02/10] net: microchip: fdma: add PCIe ATU support Daniel Machon
@ 2026-03-20 15:00 ` Daniel Machon
  2026-03-20 15:01 ` [PATCH net-next 04/10] net: lan966x: export FDMA helpers for reuse Daniel Machon
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

The FDMA Link List Pointer (LLP) register points to the first DCB in the
chain and must be written before the channel is activated. This tells
the FDMA engine where to begin DMA transfers.

Move the LLP register writes from the channel start/activate functions
into the allocation functions and introduce a shared
lan966x_fdma_llp_configure() helper. This is needed because the upcoming
PCIe FDMA path writes ATU-translated addresses to the LLP registers
instead of DMA addresses. Keeping the writes in the shared
start/activate path would overwrite these translated addresses.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  | 29 ++++++++++------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 7b6369e43451..5c5ae8b36058 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -107,6 +107,13 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
 	return PTR_ERR_OR_ZERO(rx->page_pool);
 }
 
+static void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+				       u8 channel_id)
+{
+	lan_wr(lower_32_bits(addr), lan966x, FDMA_DCB_LLP(channel_id));
+	lan_wr(upper_32_bits(addr), lan966x, FDMA_DCB_LLP1(channel_id));
+}
+
 static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
@@ -123,6 +130,9 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
 	fdma_dcbs_init(fdma, FDMA_DCB_INFO_DATAL(fdma->db_size),
 		       FDMA_DCB_STATUS_INTR);
 
+	lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
+				   fdma->channel_id);
+
 	return 0;
 }
 
@@ -132,14 +142,6 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 	struct fdma *fdma = &rx->fdma;
 	u32 mask;
 
-	/* When activating a channel, first is required to write the first DCB
-	 * address and then to activate it
-	 */
-	lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP(fdma->channel_id));
-	lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP1(fdma->channel_id));
-
 	lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
 	       FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
 	       FDMA_CH_CFG_CH_INJ_PORT_SET(0) |
@@ -210,6 +212,9 @@ static int lan966x_fdma_tx_alloc(struct lan966x_tx *tx)
 
 	fdma_dcbs_init(fdma, 0, 0);
 
+	lan966x_fdma_llp_configure(lan966x, (u64)fdma->dma,
+				   fdma->channel_id);
+
 	return 0;
 
 out:
@@ -231,14 +236,6 @@ static void lan966x_fdma_tx_activate(struct lan966x_tx *tx)
 	struct fdma *fdma = &tx->fdma;
 	u32 mask;
 
-	/* When activating a channel, first is required to write the first DCB
-	 * address and then to activate it
-	 */
-	lan_wr(lower_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP(fdma->channel_id));
-	lan_wr(upper_32_bits((u64)fdma->dma), lan966x,
-	       FDMA_DCB_LLP1(fdma->channel_id));
-
 	lan_wr(FDMA_CH_CFG_CH_DCB_DB_CNT_SET(fdma->n_dbs) |
 	       FDMA_CH_CFG_CH_INTR_DB_EOF_ONLY_SET(1) |
 	       FDMA_CH_CFG_CH_INJ_PORT_SET(0) |

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 04/10] net: lan966x: export FDMA helpers for reuse
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (2 preceding siblings ...)
  2026-03-20 15:00 ` [PATCH net-next 03/10] net: lan966x: add FDMA LLP register write helper Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-20 15:01 ` [PATCH net-next 05/10] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

Make shared FDMA helpers non-static, so they can be reused by the PCIe
FDMA implementation.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  | 24 +++++++++++-----------
 .../net/ethernet/microchip/lan966x/lan966x_main.h  | 12 +++++++++++
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 5c5ae8b36058..870f7d00d325 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -107,8 +107,8 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
 	return PTR_ERR_OR_ZERO(rx->page_pool);
 }
 
-static void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
-				       u8 channel_id)
+void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+				u8 channel_id)
 {
 	lan_wr(lower_32_bits(addr), lan966x, FDMA_DCB_LLP(channel_id));
 	lan_wr(upper_32_bits(addr), lan966x, FDMA_DCB_LLP1(channel_id));
@@ -136,7 +136,7 @@ static int lan966x_fdma_rx_alloc(struct lan966x_rx *rx)
 	return 0;
 }
 
-static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
+void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
 	struct fdma *fdma = &rx->fdma;
@@ -167,7 +167,7 @@ static void lan966x_fdma_rx_start(struct lan966x_rx *rx)
 		lan966x, FDMA_CH_ACTIVATE);
 }
 
-static void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
+void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
 	struct fdma *fdma = &rx->fdma;
@@ -187,7 +187,7 @@ static void lan966x_fdma_rx_disable(struct lan966x_rx *rx)
 		lan966x, FDMA_CH_DB_DISCARD);
 }
 
-static void lan966x_fdma_rx_reload(struct lan966x_rx *rx)
+void lan966x_fdma_rx_reload(struct lan966x_rx *rx)
 {
 	struct lan966x *lan966x = rx->lan966x;
 
@@ -261,7 +261,7 @@ static void lan966x_fdma_tx_activate(struct lan966x_tx *tx)
 		lan966x, FDMA_CH_ACTIVATE);
 }
 
-static void lan966x_fdma_tx_disable(struct lan966x_tx *tx)
+void lan966x_fdma_tx_disable(struct lan966x_tx *tx)
 {
 	struct lan966x *lan966x = tx->lan966x;
 	struct fdma *fdma = &tx->fdma;
@@ -293,7 +293,7 @@ static void lan966x_fdma_tx_reload(struct lan966x_tx *tx)
 		lan966x, FDMA_CH_RELOAD);
 }
 
-static void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
+void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
 {
 	struct lan966x_port *port;
 	int i;
@@ -308,7 +308,7 @@ static void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x)
 	}
 }
 
-static void lan966x_fdma_stop_netdev(struct lan966x *lan966x)
+void lan966x_fdma_stop_netdev(struct lan966x *lan966x)
 {
 	struct lan966x_port *port;
 	int i;
@@ -467,7 +467,7 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx,
 	return NULL;
 }
 
-static int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
+int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight)
 {
 	struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
 	struct lan966x_rx *rx = &lan966x->rx;
@@ -580,7 +580,7 @@ static int lan966x_fdma_get_next_dcb(struct lan966x_tx *tx)
 	return -1;
 }
 
-static void lan966x_fdma_tx_start(struct lan966x_tx *tx)
+void lan966x_fdma_tx_start(struct lan966x_tx *tx)
 {
 	struct lan966x *lan966x = tx->lan966x;
 
@@ -798,7 +798,7 @@ static int lan966x_fdma_get_max_mtu(struct lan966x *lan966x)
 	return max_mtu;
 }
 
-static int lan966x_qsys_sw_status(struct lan966x *lan966x)
+int lan966x_qsys_sw_status(struct lan966x *lan966x)
 {
 	return lan_rd(lan966x, QSYS_SW_STATUS(CPU_PORT));
 }
@@ -842,7 +842,7 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
 	return err;
 }
 
-static int lan966x_fdma_get_max_frame(struct lan966x *lan966x)
+int lan966x_fdma_get_max_frame(struct lan966x *lan966x)
 {
 	return lan966x_fdma_get_max_mtu(lan966x) +
 	       IFH_LEN_BYTES +
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index eea286c29474..a1f590f81cbb 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -561,6 +561,18 @@ int lan966x_fdma_init(struct lan966x *lan966x);
 void lan966x_fdma_deinit(struct lan966x *lan966x);
 irqreturn_t lan966x_fdma_irq_handler(int irq, void *args);
 int lan966x_fdma_reload_page_pool(struct lan966x *lan966x);
+int lan966x_fdma_napi_poll(struct napi_struct *napi, int weight);
+void lan966x_fdma_llp_configure(struct lan966x *lan966x, u64 addr,
+				u8 channel_id);
+void lan966x_fdma_rx_start(struct lan966x_rx *rx);
+void lan966x_fdma_rx_disable(struct lan966x_rx *rx);
+void lan966x_fdma_rx_reload(struct lan966x_rx *rx);
+void lan966x_fdma_tx_start(struct lan966x_tx *tx);
+void lan966x_fdma_tx_disable(struct lan966x_tx *tx);
+void lan966x_fdma_wakeup_netdev(struct lan966x *lan966x);
+void lan966x_fdma_stop_netdev(struct lan966x *lan966x);
+int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
+int lan966x_qsys_sw_status(struct lan966x *lan966x);
 
 int lan966x_lag_port_join(struct lan966x_port *port,
 			  struct net_device *brport_dev,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 05/10] net: lan966x: add FDMA ops dispatch for PCIe support
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (3 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 04/10] net: lan966x: export FDMA helpers for reuse Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-20 15:01 ` [PATCH net-next 06/10] net: lan966x: add PCIe FDMA support Daniel Machon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

Introduce lan966x_fdma_ops to support different FDMA implementations
for platform and PCIe. Plumb fdma_init, fdma_deinit, fdma_xmit,
fdma_poll and fdma_resize through the ops table, and select the
implementation at probe time based on runtime PCI bus detection.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../net/ethernet/microchip/lan966x/lan966x_fdma.c  |  2 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.c  | 25 +++++++++++++++++-----
 .../net/ethernet/microchip/lan966x/lan966x_main.h  | 13 +++++++++++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index 870f7d00d325..be6e4044d6f5 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -906,7 +906,7 @@ void lan966x_fdma_netdev_init(struct lan966x *lan966x, struct net_device *dev)
 		return;
 
 	lan966x->fdma_ndev = dev;
-	netif_napi_add(dev, &lan966x->napi, lan966x_fdma_napi_poll);
+	netif_napi_add(dev, &lan966x->napi, lan966x->ops->fdma_poll);
 	napi_enable(&lan966x->napi);
 }
 
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 47752d3fde0b..9f69634ebb0a 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -26,6 +26,14 @@
 
 #define IO_RANGES 2
 
+static const struct lan966x_fdma_ops lan966x_fdma_ops = {
+	.fdma_init = &lan966x_fdma_init,
+	.fdma_deinit = &lan966x_fdma_deinit,
+	.fdma_xmit = &lan966x_fdma_xmit,
+	.fdma_poll = &lan966x_fdma_napi_poll,
+	.fdma_resize = &lan966x_fdma_change_mtu,
+};
+
 static const struct of_device_id lan966x_match[] = {
 	{ .compatible = "microchip,lan966x-switch" },
 	{ }
@@ -391,7 +399,7 @@ static netdev_tx_t lan966x_port_xmit(struct sk_buff *skb,
 
 	spin_lock(&lan966x->tx_lock);
 	if (port->lan966x->fdma)
-		err = lan966x_fdma_xmit(skb, ifh, dev);
+		err = lan966x->ops->fdma_xmit(skb, ifh, dev);
 	else
 		err = lan966x_port_ifh_xmit(skb, ifh, dev);
 	spin_unlock(&lan966x->tx_lock);
@@ -413,7 +421,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
 	if (!lan966x->fdma)
 		return 0;
 
-	err = lan966x_fdma_change_mtu(lan966x);
+	err = lan966x->ops->fdma_resize(lan966x);
 	if (err) {
 		lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(LAN966X_HW_MTU(old_mtu)),
 		       lan966x, DEV_MAC_MAXLEN_CFG(port->chip_port));
@@ -1079,6 +1087,11 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
 	return 0;
 }
 
+static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
+{
+	return &lan966x_fdma_ops;
+}
+
 static int lan966x_probe(struct platform_device *pdev)
 {
 	struct fwnode_handle *ports, *portnp;
@@ -1093,6 +1106,8 @@ static int lan966x_probe(struct platform_device *pdev)
 	platform_set_drvdata(pdev, lan966x);
 	lan966x->dev = &pdev->dev;
 
+	lan966x->ops = lan966x_get_fdma_ops(&pdev->dev);
+
 	if (!device_get_mac_address(&pdev->dev, mac_addr)) {
 		ether_addr_copy(lan966x->base_mac, mac_addr);
 	} else {
@@ -1232,7 +1247,7 @@ static int lan966x_probe(struct platform_device *pdev)
 	if (err)
 		goto cleanup_fdb;
 
-	err = lan966x_fdma_init(lan966x);
+	err = lan966x->ops->fdma_init(lan966x);
 	if (err)
 		goto cleanup_ptp;
 
@@ -1245,7 +1260,7 @@ static int lan966x_probe(struct platform_device *pdev)
 	return 0;
 
 cleanup_fdma:
-	lan966x_fdma_deinit(lan966x);
+	lan966x->ops->fdma_deinit(lan966x);
 
 cleanup_ptp:
 	lan966x_ptp_deinit(lan966x);
@@ -1273,7 +1288,7 @@ static void lan966x_remove(struct platform_device *pdev)
 
 	lan966x_taprio_deinit(lan966x);
 	lan966x_vcap_deinit(lan966x);
-	lan966x_fdma_deinit(lan966x);
+	lan966x->ops->fdma_deinit(lan966x);
 	lan966x_cleanup_ports(lan966x);
 
 	cancel_delayed_work_sync(&lan966x->stats_work);
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index a1f590f81cbb..ed2707079d3e 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -193,6 +193,17 @@ enum vcap_is1_port_sel_rt {
 	VCAP_IS1_PS_RT_FOLLOW_OTHER = 7,
 };
 
+struct lan966x;
+
+struct lan966x_fdma_ops {
+	int (*fdma_init)(struct lan966x *lan966x);
+	void (*fdma_deinit)(struct lan966x *lan966x);
+	int (*fdma_xmit)(struct sk_buff *skb, __be32 *ifh,
+			 struct net_device *dev);
+	int (*fdma_poll)(struct napi_struct *napi, int weight);
+	int (*fdma_resize)(struct lan966x *lan966x);
+};
+
 struct lan966x_port;
 
 struct lan966x_rx {
@@ -270,6 +281,8 @@ struct lan966x_skb_cb {
 struct lan966x {
 	struct device *dev;
 
+	const struct lan966x_fdma_ops *ops;
+
 	u8 num_phys_ports;
 	struct lan966x_port **ports;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 06/10] net: lan966x: add PCIe FDMA support
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (4 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 05/10] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-20 15:01 ` [PATCH net-next 07/10] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

Add PCIe FDMA support for lan966x. The PCIe FDMA path uses contiguous
DMA buffers mapped through the endpoint's ATU, with memcpy-based frame
transfer instead of per-page DMA mappings.

With PCIe FDMA, throughput increases from ~33 Mbps (register-based I/O)
to ~620 Mbps on an Intel x86 host with a lan966x PCIe card.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/net/ethernet/microchip/lan966x/Makefile    |   4 +
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 329 +++++++++++++++++++++
 .../net/ethernet/microchip/lan966x/lan966x_main.c  |  11 +
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  11 +
 .../net/ethernet/microchip/lan966x/lan966x_regs.h  |   1 +
 5 files changed, 356 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
index 4cdbe263502c..ac0beceb2a0d 100644
--- a/drivers/net/ethernet/microchip/lan966x/Makefile
+++ b/drivers/net/ethernet/microchip/lan966x/Makefile
@@ -18,6 +18,10 @@ lan966x-switch-objs  := lan966x_main.o lan966x_phylink.o lan966x_port.o \
 lan966x-switch-$(CONFIG_LAN966X_DCB) += lan966x_dcb.o
 lan966x-switch-$(CONFIG_DEBUG_FS) += lan966x_vcap_debugfs.o
 
+ifdef CONFIG_MCHP_LAN966X_PCI
+lan966x-switch-y += lan966x_fdma_pci.o
+endif
+
 # Provide include files
 ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/vcap
 ccflags-y += -I$(srctree)/drivers/net/ethernet/microchip/fdma
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
new file mode 100644
index 000000000000..a92862b386ab
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -0,0 +1,329 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include "fdma_api.h"
+#include "lan966x_main.h"
+
+static int lan966x_fdma_pci_dataptr_cb(struct fdma *fdma, int dcb, int db,
+				       u64 *dataptr)
+{
+	u64 addr;
+
+	addr = fdma_dataptr_dma_addr_contiguous(fdma, dcb, db);
+
+	*dataptr = fdma_pci_atu_translate_addr(fdma->atu_region, addr);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_nextptr_cb(struct fdma *fdma, int dcb, u64 *nextptr)
+{
+	u64 addr;
+
+	fdma_nextptr_cb(fdma, dcb, &addr);
+
+	*nextptr = fdma_pci_atu_translate_addr(fdma->atu_region, addr);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_rx_alloc(struct lan966x_rx *rx)
+{
+	struct lan966x *lan966x = rx->lan966x;
+	struct fdma *fdma = &rx->fdma;
+	int err;
+
+	err = fdma_alloc_coherent_and_map(lan966x->dev, fdma, &lan966x->atu);
+	if (err)
+		return err;
+
+	fdma_dcbs_init(fdma,
+		       FDMA_DCB_INFO_DATAL(fdma->db_size),
+		       FDMA_DCB_STATUS_INTR);
+
+	lan966x_fdma_llp_configure(lan966x,
+				   fdma->atu_region->base_addr,
+				   fdma->channel_id);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_tx_alloc(struct lan966x_tx *tx)
+{
+	struct lan966x *lan966x = tx->lan966x;
+	struct fdma *fdma = &tx->fdma;
+	int err;
+
+	err = fdma_alloc_coherent_and_map(lan966x->dev, fdma, &lan966x->atu);
+	if (err)
+		return err;
+
+	fdma_dcbs_init(fdma,
+		       FDMA_DCB_INFO_DATAL(fdma->db_size),
+		       FDMA_DCB_STATUS_DONE);
+
+	lan966x_fdma_llp_configure(lan966x,
+				   fdma->atu_region->base_addr,
+				   fdma->channel_id);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
+{
+	struct lan966x *lan966x = rx->lan966x;
+	struct fdma *fdma = &rx->fdma;
+	void *virt_addr;
+
+	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma,
+						      fdma->dcb_index,
+						      fdma->db_index);
+
+	lan966x_ifh_get_src_port(virt_addr, src_port);
+
+	if (WARN_ON(*src_port >= lan966x->num_phys_ports))
+		return FDMA_ERROR;
+
+	return FDMA_PASS;
+}
+
+static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
+						     u64 src_port)
+{
+	struct lan966x *lan966x = rx->lan966x;
+	struct fdma *fdma = &rx->fdma;
+	struct sk_buff *skb;
+	struct fdma_db *db;
+	u32 data_len;
+
+	/* Get the received frame and create an SKB for it. */
+	db = fdma_db_next_get(fdma);
+	data_len = FDMA_DCB_STATUS_BLOCKL(db->status);
+
+	skb = napi_alloc_skb(&lan966x->napi, data_len);
+	if (unlikely(!skb))
+		return NULL;
+
+	memcpy(skb->data,
+	       fdma_dataptr_virt_addr_contiguous(fdma,
+						 fdma->dcb_index,
+						 fdma->db_index),
+						 data_len);
+
+	skb_put(skb, data_len);
+
+	skb->dev = lan966x->ports[src_port]->dev;
+	skb_pull(skb, IFH_LEN_BYTES);
+
+	if (likely(!(skb->dev->features & NETIF_F_RXFCS)))
+		skb_trim(skb, skb->len - ETH_FCS_LEN);
+
+	skb->protocol = eth_type_trans(skb, skb->dev);
+
+	if (lan966x->bridge_mask & BIT(src_port)) {
+		skb->offload_fwd_mark = 1;
+
+		skb_reset_network_header(skb);
+		if (!lan966x_hw_offload(lan966x, src_port, skb))
+			skb->offload_fwd_mark = 0;
+	}
+
+	skb->dev->stats.rx_bytes += skb->len;
+	skb->dev->stats.rx_packets++;
+
+	return skb;
+}
+
+static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma)
+{
+	struct fdma_db *db;
+
+	for (int i = 0; i < fdma->n_dcbs; i++) {
+		db = fdma_db_get(fdma, i, 0);
+
+		if (!fdma_db_is_done(db))
+			continue;
+		if (fdma_is_last(fdma, &fdma->dcbs[i]))
+			continue;
+
+		return i;
+	}
+
+	return -1;
+}
+
+static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh,
+				 struct net_device *dev)
+{
+	struct lan966x_port *port = netdev_priv(dev);
+	struct lan966x *lan966x = port->lan966x;
+	struct lan966x_tx *tx = &lan966x->tx;
+	struct fdma *fdma = &tx->fdma;
+	int next_to_use;
+	void *virt_addr;
+
+	next_to_use = lan966x_fdma_pci_get_next_dcb(fdma);
+
+	if (next_to_use < 0) {
+		netif_stop_queue(dev);
+		return NETDEV_TX_BUSY;
+	}
+
+	if (skb_put_padto(skb, ETH_ZLEN)) {
+		dev->stats.tx_dropped++;
+		return NETDEV_TX_OK;
+	}
+
+	skb_tx_timestamp(skb);
+
+	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma, next_to_use, 0);
+	memcpy(virt_addr, ifh, IFH_LEN_BYTES);
+	memcpy((u8 *)virt_addr + IFH_LEN_BYTES, skb->data, skb->len);
+
+	fdma_dcb_add(fdma,
+		     next_to_use,
+		     0,
+		     FDMA_DCB_STATUS_SOF |
+		     FDMA_DCB_STATUS_EOF |
+		     FDMA_DCB_STATUS_BLOCKO(0) |
+		     FDMA_DCB_STATUS_BLOCKL(IFH_LEN_BYTES + skb->len + ETH_FCS_LEN));
+
+	/* Start the transmission. */
+	lan966x_fdma_tx_start(tx);
+
+	dev->stats.tx_bytes += skb->len;
+	dev->stats.tx_packets++;
+
+	dev_consume_skb_any(skb);
+
+	return NETDEV_TX_OK;
+}
+
+static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
+{
+	struct lan966x *lan966x = container_of(napi, struct lan966x, napi);
+	struct lan966x_rx *rx = &lan966x->rx;
+	struct fdma *fdma = &rx->fdma;
+	int dcb_reload, old_dcb;
+	struct sk_buff *skb;
+	int counter = 0;
+	u64 src_port;
+
+	/* Wake any stopped TX queues if a TX DCB is available. */
+	if (lan966x_fdma_pci_get_next_dcb(&lan966x->tx.fdma) >= 0)
+		lan966x_fdma_wakeup_netdev(lan966x);
+
+	dcb_reload = fdma->dcb_index;
+
+	/* Get all received skbs. */
+	while (counter < weight) {
+		if (!fdma_has_frames(fdma))
+			break;
+		counter++;
+		switch (lan966x_fdma_pci_rx_check_frame(rx, &src_port)) {
+		case FDMA_PASS:
+			break;
+		case FDMA_ERROR:
+			fdma_dcb_advance(fdma);
+			goto allocate_new;
+		}
+		skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
+		fdma_dcb_advance(fdma);
+		if (!skb)
+			goto allocate_new;
+
+		napi_gro_receive(&lan966x->napi, skb);
+	}
+allocate_new:
+	while (dcb_reload != fdma->dcb_index) {
+		old_dcb = dcb_reload;
+		dcb_reload++;
+		dcb_reload &= fdma->n_dcbs - 1;
+
+		fdma_dcb_add(fdma,
+			     old_dcb,
+			     FDMA_DCB_INFO_DATAL(fdma->db_size),
+			     FDMA_DCB_STATUS_INTR);
+
+		lan966x_fdma_rx_reload(rx);
+	}
+
+	if (counter < weight && napi_complete_done(napi, counter))
+		lan_wr(0xff, lan966x, FDMA_INTR_DB_ENA);
+
+	return counter;
+}
+
+static int lan966x_fdma_pci_init(struct lan966x *lan966x)
+{
+	struct fdma *rx_fdma = &lan966x->rx.fdma;
+	struct fdma *tx_fdma = &lan966x->tx.fdma;
+	int err;
+
+	if (!lan966x->fdma)
+		return 0;
+
+	fdma_pci_atu_init(&lan966x->atu, lan966x->regs[TARGET_PCIE_DBI]);
+
+	lan966x->rx.lan966x = lan966x;
+	lan966x->rx.max_mtu = lan966x_fdma_get_max_frame(lan966x);
+	rx_fdma->channel_id = FDMA_XTR_CHANNEL;
+	rx_fdma->n_dcbs = FDMA_DCB_MAX;
+	rx_fdma->n_dbs = FDMA_RX_DCB_MAX_DBS;
+	rx_fdma->priv = lan966x;
+	rx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	rx_fdma->size = fdma_get_size_contiguous(rx_fdma);
+	rx_fdma->ops.nextptr_cb = &lan966x_fdma_pci_nextptr_cb;
+	rx_fdma->ops.dataptr_cb = &lan966x_fdma_pci_dataptr_cb;
+
+	lan966x->tx.lan966x = lan966x;
+	tx_fdma->channel_id = FDMA_INJ_CHANNEL;
+	tx_fdma->n_dcbs = FDMA_DCB_MAX;
+	tx_fdma->n_dbs = FDMA_TX_DCB_MAX_DBS;
+	tx_fdma->priv = lan966x;
+	tx_fdma->db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	tx_fdma->size = fdma_get_size_contiguous(tx_fdma);
+	tx_fdma->ops.nextptr_cb = &lan966x_fdma_pci_nextptr_cb;
+	tx_fdma->ops.dataptr_cb = &lan966x_fdma_pci_dataptr_cb;
+
+	err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
+	if (err)
+		return err;
+
+	err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);
+	if (err) {
+		fdma_free_coherent_and_unmap(lan966x->dev, rx_fdma);
+		return err;
+	}
+
+	lan966x_fdma_rx_start(&lan966x->rx);
+
+	return 0;
+}
+
+static int lan966x_fdma_pci_resize(struct lan966x *lan966x)
+{
+	return -EOPNOTSUPP;
+}
+
+static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)
+{
+	if (!lan966x->fdma)
+		return;
+
+	lan966x_fdma_rx_disable(&lan966x->rx);
+	lan966x_fdma_tx_disable(&lan966x->tx);
+
+	napi_synchronize(&lan966x->napi);
+	napi_disable(&lan966x->napi);
+
+	fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
+	fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->tx.fdma);
+}
+
+const struct lan966x_fdma_ops lan966x_fdma_pci_ops = {
+	.fdma_init = &lan966x_fdma_pci_init,
+	.fdma_deinit = &lan966x_fdma_pci_deinit,
+	.fdma_xmit = &lan966x_fdma_pci_xmit,
+	.fdma_poll = &lan966x_fdma_pci_napi_poll,
+	.fdma_resize = &lan966x_fdma_pci_resize,
+};
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 9f69634ebb0a..fc14738774ec 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -7,6 +7,7 @@
 #include <linux/ip.h>
 #include <linux/of.h>
 #include <linux/of_net.h>
+#include <linux/pci.h>
 #include <linux/phy/phy.h>
 #include <linux/platform_device.h>
 #include <linux/reset.h>
@@ -49,6 +50,9 @@ struct lan966x_main_io_resource {
 static const struct lan966x_main_io_resource lan966x_main_iomap[] =  {
 	{ TARGET_CPU,                   0xc0000, 0 }, /* 0xe00c0000 */
 	{ TARGET_FDMA,                  0xc0400, 0 }, /* 0xe00c0400 */
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	{ TARGET_PCIE_DBI,             0x400000, 0 }, /* 0xe0400000 */
+#endif
 	{ TARGET_ORG,                         0, 1 }, /* 0xe2000000 */
 	{ TARGET_GCB,                    0x4000, 1 }, /* 0xe2004000 */
 	{ TARGET_QS,                     0x8000, 1 }, /* 0xe2008000 */
@@ -1089,6 +1093,13 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
 
 static const struct lan966x_fdma_ops *lan966x_get_fdma_ops(struct device *dev)
 {
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	struct device *parent = dev->parent;
+
+	if (parent && parent->parent && dev_is_pci(parent->parent))
+		return &lan966x_fdma_pci_ops;
+#endif
+
 	return &lan966x_fdma_ops;
 }
 
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index ed2707079d3e..8fcc51133417 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -17,6 +17,9 @@
 #include <net/xdp.h>
 
 #include <fdma_api.h>
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+#include <fdma_pci.h>
+#endif
 #include <vcap_api.h>
 #include <vcap_api_client.h>
 
@@ -288,6 +291,10 @@ struct lan966x {
 
 	void __iomem *regs[NUM_TARGETS];
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+	struct fdma_pci_atu atu;
+#endif
+
 	int shared_queue_sz;
 
 	u8 base_mac[ETH_ALEN];
@@ -587,6 +594,10 @@ void lan966x_fdma_stop_netdev(struct lan966x *lan966x);
 int lan966x_fdma_get_max_frame(struct lan966x *lan966x);
 int lan966x_qsys_sw_status(struct lan966x *lan966x);
 
+#if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
+extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
+#endif
+
 int lan966x_lag_port_join(struct lan966x_port *port,
 			  struct net_device *brport_dev,
 			  struct net_device *bond,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
index 4b553927d2e0..f9448780bd4f 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h
@@ -20,6 +20,7 @@ enum lan966x_target {
 	TARGET_FDMA = 21,
 	TARGET_GCB = 27,
 	TARGET_ORG = 36,
+	TARGET_PCIE_DBI = 40,
 	TARGET_PTP = 41,
 	TARGET_QS = 42,
 	TARGET_QSYS = 46,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 07/10] net: lan966x: add PCIe FDMA MTU change support
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (5 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 06/10] net: lan966x: add PCIe FDMA support Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-20 15:01 ` [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support Daniel Machon
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

Add MTU change support for the PCIe FDMA path. When the MTU changes,
the contiguous ATU-mapped RX and TX buffers are reallocated with the
new size. On allocation failure, the old buffers are restored.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 126 ++++++++++++++++++++-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
index a92862b386ab..4d69beb41c0c 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -300,9 +300,133 @@ static int lan966x_fdma_pci_init(struct lan966x *lan966x)
 	return 0;
 }
 
+/* Reset existing rx and tx buffers. */
+static void lan966x_fdma_pci_reset_mem(struct lan966x *lan966x)
+{
+	struct lan966x_rx *rx = &lan966x->rx;
+	struct lan966x_tx *tx = &lan966x->tx;
+
+	memset(rx->fdma.dcbs, 0, rx->fdma.size);
+	memset(tx->fdma.dcbs, 0, tx->fdma.size);
+
+	fdma_dcbs_init(&rx->fdma,
+		       FDMA_DCB_INFO_DATAL(rx->fdma.db_size),
+		       FDMA_DCB_STATUS_INTR);
+
+	fdma_dcbs_init(&tx->fdma,
+		       FDMA_DCB_INFO_DATAL(tx->fdma.db_size),
+		       FDMA_DCB_STATUS_DONE);
+
+	lan966x_fdma_llp_configure(lan966x,
+				   tx->fdma.atu_region->base_addr,
+				   tx->fdma.channel_id);
+	lan966x_fdma_llp_configure(lan966x,
+				   rx->fdma.atu_region->base_addr,
+				   rx->fdma.channel_id);
+}
+
+static int lan966x_fdma_pci_reload(struct lan966x *lan966x, int new_mtu)
+{
+	struct fdma tx_fdma_old = lan966x->tx.fdma;
+	struct fdma rx_fdma_old = lan966x->rx.fdma;
+	u32 old_mtu = lan966x->rx.max_mtu;
+	int err;
+
+	napi_synchronize(&lan966x->napi);
+	napi_disable(&lan966x->napi);
+	lan966x_fdma_stop_netdev(lan966x);
+	lan966x_fdma_rx_disable(&lan966x->rx);
+	lan966x_fdma_tx_disable(&lan966x->tx);
+	lan966x->tx.activated = false;
+
+	lan966x->rx.max_mtu = new_mtu;
+
+	lan966x->tx.fdma.db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	lan966x->tx.fdma.size = fdma_get_size_contiguous(&lan966x->tx.fdma);
+	lan966x->rx.fdma.db_size = FDMA_PCI_DB_SIZE(lan966x->rx.max_mtu);
+	lan966x->rx.fdma.size = fdma_get_size_contiguous(&lan966x->rx.fdma);
+
+	err = lan966x_fdma_pci_rx_alloc(&lan966x->rx);
+	if (err)
+		goto restore;
+
+	err = lan966x_fdma_pci_tx_alloc(&lan966x->tx);
+	if (err) {
+		fdma_free_coherent_and_unmap(lan966x->dev, &lan966x->rx.fdma);
+		goto restore;
+	}
+
+	lan966x_fdma_rx_start(&lan966x->rx);
+
+	/* Free and unmap old memory. */
+	fdma_free_coherent_and_unmap(lan966x->dev, &rx_fdma_old);
+	fdma_free_coherent_and_unmap(lan966x->dev, &tx_fdma_old);
+
+	lan966x_fdma_wakeup_netdev(lan966x);
+	napi_enable(&lan966x->napi);
+
+	return err;
+restore:
+
+	/* No new buffers are allocated at this point. Use the old buffers,
+	 * but reset them before starting the FDMA again.
+	 */
+
+	memcpy(&lan966x->tx.fdma, &tx_fdma_old, sizeof(struct fdma));
+	memcpy(&lan966x->rx.fdma, &rx_fdma_old, sizeof(struct fdma));
+
+	lan966x->rx.max_mtu = old_mtu;
+
+	lan966x_fdma_pci_reset_mem(lan966x);
+
+	lan966x_fdma_rx_start(&lan966x->rx);
+	lan966x_fdma_wakeup_netdev(lan966x);
+	napi_enable(&lan966x->napi);
+
+	return err;
+}
+
+static int __lan966x_fdma_pci_reload(struct lan966x *lan966x, int max_mtu)
+{
+	int err;
+	u32 val;
+
+	/* Disable the CPU port. */
+	lan_rmw(QSYS_SW_PORT_MODE_PORT_ENA_SET(0),
+		QSYS_SW_PORT_MODE_PORT_ENA,
+		lan966x, QSYS_SW_PORT_MODE(CPU_PORT));
+
+	/* Flush the CPU queues. */
+	readx_poll_timeout(lan966x_qsys_sw_status,
+			   lan966x,
+			   val,
+			   !(QSYS_SW_STATUS_EQ_AVAIL_GET(val)),
+			   READL_SLEEP_US, READL_TIMEOUT_US);
+
+	/* Add a sleep in case there are frames between the queues and the CPU
+	 * port
+	 */
+	usleep_range(USEC_PER_MSEC, 2 * USEC_PER_MSEC);
+
+	err = lan966x_fdma_pci_reload(lan966x, max_mtu);
+
+	/* Enable back the CPU port. */
+	lan_rmw(QSYS_SW_PORT_MODE_PORT_ENA_SET(1),
+		QSYS_SW_PORT_MODE_PORT_ENA,
+		lan966x, QSYS_SW_PORT_MODE(CPU_PORT));
+
+	return err;
+}
+
 static int lan966x_fdma_pci_resize(struct lan966x *lan966x)
 {
-	return -EOPNOTSUPP;
+	int max_mtu;
+
+	max_mtu = lan966x_fdma_get_max_frame(lan966x);
+	if (max_mtu == lan966x->rx.max_mtu)
+		return 0;
+
+	return __lan966x_fdma_pci_reload(lan966x, max_mtu);
 }
 
 static void lan966x_fdma_pci_deinit(struct lan966x *lan966x)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (6 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 07/10] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-22  7:11   ` Mohsin Bashir
  2026-03-20 15:01 ` [PATCH net-next 09/10] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

Add XDP support for the PCIe FDMA path. The PCIe XDP implementation
operates on contiguous ATU-mapped buffers with memcpy-based XDP_TX
transmit, unlike the platform path which uses page_pool.

Only XDP_ACT_BASIC (XDP_PASS, XDP_DROP, XDP_TX, XDP_ABORTED) is
supported; XDP_REDIRECT and NDO_XMIT are not available on the PCIe path,
as they, to my knowledge, require page_pool-backed buffers.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 .../ethernet/microchip/lan966x/lan966x_fdma_pci.c  | 136 ++++++++++++++++++---
 .../net/ethernet/microchip/lan966x/lan966x_main.c  |  11 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  10 ++
 .../net/ethernet/microchip/lan966x/lan966x_xdp.c   |   6 +
 4 files changed, 140 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
index 4d69beb41c0c..f2c8c6aa3d4f 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma_pci.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
 
+#include <linux/bpf_trace.h>
+
 #include "fdma_api.h"
 #include "lan966x_main.h"
 
@@ -68,10 +70,110 @@ static int lan966x_fdma_pci_tx_alloc(struct lan966x_tx *tx)
 	return 0;
 }
 
+static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma)
+{
+	struct fdma_db *db;
+
+	for (int i = 0; i < fdma->n_dcbs; i++) {
+		db = fdma_db_get(fdma, i, 0);
+
+		if (!fdma_db_is_done(db))
+			continue;
+		if (fdma_is_last(fdma, &fdma->dcbs[i]))
+			continue;
+
+		return i;
+	}
+
+	return -1;
+}
+
+static int lan966x_fdma_pci_xmit_xdpf(struct lan966x_port *port,
+				      void *ptr, u32 len)
+{
+	struct lan966x *lan966x = port->lan966x;
+	struct lan966x_tx *tx = &lan966x->tx;
+	struct fdma *fdma = &tx->fdma;
+	int next_to_use, ret = 0;
+	void *virt_addr;
+	__be32 *ifh;
+
+	spin_lock(&lan966x->tx_lock);
+
+	next_to_use = lan966x_fdma_pci_get_next_dcb(fdma);
+
+	if (next_to_use < 0) {
+		netif_stop_queue(port->dev);
+		ret = NETDEV_TX_BUSY;
+		goto out;
+	}
+
+	ifh = ptr;
+	memset(ifh, 0, IFH_LEN_BYTES);
+	lan966x_ifh_set_bypass(ifh, 1);
+	lan966x_ifh_set_port(ifh, BIT_ULL(port->chip_port));
+
+	/* Get the virtual addr of the next DB and copy frame to it. */
+	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma, next_to_use, 0);
+	memcpy(virt_addr, ptr, len);
+
+	fdma_dcb_add(fdma,
+		     next_to_use,
+		     0,
+		     FDMA_DCB_STATUS_SOF |
+		     FDMA_DCB_STATUS_EOF |
+		     FDMA_DCB_STATUS_BLOCKO(0) |
+		     FDMA_DCB_STATUS_BLOCKL(len));
+
+	/* Start the transmission. */
+	lan966x_fdma_tx_start(tx);
+
+out:
+	spin_unlock(&lan966x->tx_lock);
+
+	return ret;
+}
+
+static int lan966x_xdp_pci_run(struct lan966x_port *port, void *data,
+			       u32 data_len)
+{
+	struct bpf_prog *xdp_prog = port->xdp_prog;
+	struct lan966x *lan966x = port->lan966x;
+	struct xdp_buff xdp;
+	u32 act;
+
+	xdp_init_buff(&xdp, lan966x->rx.max_mtu, &port->xdp_rxq);
+
+	xdp_prepare_buff(&xdp,
+			 data - XDP_PACKET_HEADROOM,
+			 IFH_LEN_BYTES + XDP_PACKET_HEADROOM,
+			 data_len - IFH_LEN_BYTES,
+			 false);
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	switch (act) {
+	case XDP_PASS:
+		return FDMA_PASS;
+	case XDP_TX:
+		return lan966x_fdma_pci_xmit_xdpf(port, data, data_len) ?
+		       FDMA_DROP : FDMA_TX;
+	default:
+		bpf_warn_invalid_xdp_action(port->dev, xdp_prog, act);
+		fallthrough;
+	case XDP_ABORTED:
+		trace_xdp_exception(port->dev, xdp_prog, act);
+		fallthrough;
+	case XDP_DROP:
+		return FDMA_DROP;
+	}
+}
+
 static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
 {
 	struct lan966x *lan966x = rx->lan966x;
 	struct fdma *fdma = &rx->fdma;
+	struct lan966x_port *port;
+	struct fdma_db *db;
 	void *virt_addr;
 
 	virt_addr = fdma_dataptr_virt_addr_contiguous(fdma,
@@ -83,7 +185,15 @@ static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
 	if (WARN_ON(*src_port >= lan966x->num_phys_ports))
 		return FDMA_ERROR;
 
-	return FDMA_PASS;
+	port = lan966x->ports[*src_port];
+	if (!lan966x_xdp_port_present(port))
+		return FDMA_PASS;
+
+	db = fdma_db_next_get(fdma);
+
+	return lan966x_xdp_pci_run(port,
+				   virt_addr,
+				   FDMA_DCB_STATUS_BLOCKL(db->status));
 }
 
 static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
@@ -133,24 +243,6 @@ static struct sk_buff *lan966x_fdma_pci_rx_get_frame(struct lan966x_rx *rx,
 	return skb;
 }
 
-static int lan966x_fdma_pci_get_next_dcb(struct fdma *fdma)
-{
-	struct fdma_db *db;
-
-	for (int i = 0; i < fdma->n_dcbs; i++) {
-		db = fdma_db_get(fdma, i, 0);
-
-		if (!fdma_db_is_done(db))
-			continue;
-		if (fdma_is_last(fdma, &fdma->dcbs[i]))
-			continue;
-
-		return i;
-	}
-
-	return -1;
-}
-
 static int lan966x_fdma_pci_xmit(struct sk_buff *skb, __be32 *ifh,
 				 struct net_device *dev)
 {
@@ -225,6 +317,12 @@ static int lan966x_fdma_pci_napi_poll(struct napi_struct *napi, int weight)
 		case FDMA_ERROR:
 			fdma_dcb_advance(fdma);
 			goto allocate_new;
+		case FDMA_TX:
+			fdma_dcb_advance(fdma);
+			continue;
+		case FDMA_DROP:
+			fdma_dcb_advance(fdma);
+			continue;
 		}
 		skb = lan966x_fdma_pci_rx_get_frame(rx, src_port);
 		fdma_dcb_advance(fdma);
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index fc14738774ec..b42e044da735 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -877,10 +877,13 @@ static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
 
 	port->phylink = phylink;
 
-	if (lan966x->fdma)
-		dev->xdp_features = NETDEV_XDP_ACT_BASIC |
-				    NETDEV_XDP_ACT_REDIRECT |
-				    NETDEV_XDP_ACT_NDO_XMIT;
+	if (lan966x->fdma) {
+		dev->xdp_features = NETDEV_XDP_ACT_BASIC;
+
+		if (!lan966x_is_pci(lan966x))
+			dev->xdp_features |= NETDEV_XDP_ACT_REDIRECT |
+					     NETDEV_XDP_ACT_NDO_XMIT;
+	}
 
 	err = register_netdev(dev);
 	if (err) {
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index 8fcc51133417..2491fe937e36 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -596,6 +596,16 @@ int lan966x_qsys_sw_status(struct lan966x *lan966x);
 
 #if IS_ENABLED(CONFIG_MCHP_LAN966X_PCI)
 extern const struct lan966x_fdma_ops lan966x_fdma_pci_ops;
+
+static inline bool lan966x_is_pci(struct lan966x *lan966x)
+{
+	return lan966x->ops == &lan966x_fdma_pci_ops;
+}
+#else
+static inline bool lan966x_is_pci(struct lan966x *lan966x)
+{
+	return false;
+}
 #endif
 
 int lan966x_lag_port_join(struct lan966x_port *port,
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
index 9ee61db8690b..9b3356ba6ba8 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_xdp.c
@@ -27,6 +27,12 @@ static int lan966x_xdp_setup(struct net_device *dev, struct netdev_bpf *xdp)
 	if (old_xdp == new_xdp)
 		goto out;
 
+	/* PCIe FDMA uses contiguous buffers, so no page_pool reload
+	 * is needed.
+	 */
+	if (lan966x_is_pci(lan966x))
+		goto out;
+
 	err = lan966x_fdma_reload_page_pool(lan966x);
 	if (err) {
 		xchg(&port->xdp_prog, old_prog);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 09/10] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (7 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-20 15:01 ` [PATCH net-next 10/10] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
  2026-03-23 14:52 ` [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Herve Codina
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

The ATU outbound windows used by the FDMA engine are programmed through
registers at offset 0x400000+, which falls outside the current cpu reg
mapping. Extend the cpu reg size from 0x100000 (1MB) to 0x800000 (8MB)
to cover the full PCIE DBI and iATU register space.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/misc/lan966x_pci.dtso | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/lan966x_pci.dtso b/drivers/misc/lan966x_pci.dtso
index 7b196b0a0eb6..7bb726550caf 100644
--- a/drivers/misc/lan966x_pci.dtso
+++ b/drivers/misc/lan966x_pci.dtso
@@ -135,7 +135,7 @@ lan966x_phy1: ethernet-lan966x_phy@2 {
 
 				switch: switch@e0000000 {
 					compatible = "microchip,lan966x-switch";
-					reg = <0xe0000000 0x0100000>,
+					reg = <0xe0000000 0x0800000>,
 					      <0xe2000000 0x0800000>;
 					reg-names = "cpu", "gcb";
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 10/10] misc: lan966x-pci: dts: add fdma interrupt to overlay
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (8 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 09/10] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
@ 2026-03-20 15:01 ` Daniel Machon
  2026-03-23 14:52 ` [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Herve Codina
  10 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf

Add the fdma interrupt (OIC interrupt 14) to the lan966x PCI device
tree overlay, enabling FDMA-based frame injection/extraction when
the switch is connected over PCIe.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
---
 drivers/misc/lan966x_pci.dtso | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/lan966x_pci.dtso b/drivers/misc/lan966x_pci.dtso
index 7bb726550caf..5bb12dbc0843 100644
--- a/drivers/misc/lan966x_pci.dtso
+++ b/drivers/misc/lan966x_pci.dtso
@@ -141,8 +141,9 @@ switch: switch@e0000000 {
 
 					interrupt-parent = <&oic>;
 					interrupts = <12 IRQ_TYPE_LEVEL_HIGH>,
+						     <14 IRQ_TYPE_LEVEL_HIGH>,
 						     <9 IRQ_TYPE_LEVEL_HIGH>;
-					interrupt-names = "xtr", "ana";
+					interrupt-names = "xtr", "fdma", "ana";
 
 					resets = <&reset 0>;
 					reset-names = "switch";

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support
  2026-03-20 15:01 ` [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support Daniel Machon
@ 2026-03-22  7:11   ` Mohsin Bashir
  2026-03-22 20:30     ` Daniel Machon
  0 siblings, 1 reply; 18+ messages in thread
From: Mohsin Bashir @ 2026-03-22  7:11 UTC (permalink / raw)
  To: Daniel Machon, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Horatiu Vultur, Steen Hegelund,
	UNGLinuxDriver, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Herve Codina, Arnd Bergmann, Greg Kroah-Hartman
  Cc: netdev, linux-kernel, bpf


> +static int lan966x_xdp_pci_run(struct lan966x_port *port, void *data,
> +			       u32 data_len)
> +{
> +	struct bpf_prog *xdp_prog = port->xdp_prog;
> +	struct lan966x *lan966x = port->lan966x;
> +	struct xdp_buff xdp;
> +	u32 act;
> +
> +	xdp_init_buff(&xdp, lan966x->rx.max_mtu, &port->xdp_rxq);
> +
> +	xdp_prepare_buff(&xdp,
> +			 data - XDP_PACKET_HEADROOM,
> +			 IFH_LEN_BYTES + XDP_PACKET_HEADROOM,
> +			 data_len - IFH_LEN_BYTES,
> +			 false);
> +
> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +	switch (act) {
> +	case XDP_PASS:
> +		return FDMA_PASS;
> +	case XDP_TX:
> +		return lan966x_fdma_pci_xmit_xdpf(port, data, data_len) ?
> +		       FDMA_DROP : FDMA_TX;

What if the BPF program modifies packet boundaries (e.g., 
headroom/tailroom adjustment)? After bpf_prog_run_xdp(), xdp.data and 
xdp.data_end may differ from the original data and data_len, but 
lan966x_fdma_pci_xmit_xdpf() is called with the original values. 
Wouldn't any adjustments made by the XDP program be silently lost?

> +	default:
> +		bpf_warn_invalid_xdp_action(port->dev, xdp_prog, act);
> +		fallthrough;
> +	case XDP_ABORTED:
> +		trace_xdp_exception(port->dev, xdp_prog, act);
> +		fallthrough;
> +	case XDP_DROP:
> +		return FDMA_DROP;
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support
  2026-03-22  7:11   ` Mohsin Bashir
@ 2026-03-22 20:30     ` Daniel Machon
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-22 20:30 UTC (permalink / raw)
  To: Mohsin Bashir
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, netdev, linux-kernel, bpf

Hi Mohsin,

> > +static int lan966x_xdp_pci_run(struct lan966x_port *port, void *data,
> > +                            u32 data_len)
> > +{
> > +     struct bpf_prog *xdp_prog = port->xdp_prog;
> > +     struct lan966x *lan966x = port->lan966x;
> > +     struct xdp_buff xdp;
> > +     u32 act;
> > +
> > +     xdp_init_buff(&xdp, lan966x->rx.max_mtu, &port->xdp_rxq);
> > +
> > +     xdp_prepare_buff(&xdp,
> > +                      data - XDP_PACKET_HEADROOM,
> > +                      IFH_LEN_BYTES + XDP_PACKET_HEADROOM,
> > +                      data_len - IFH_LEN_BYTES,
> > +                      false);
> > +
> > +     act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > +     switch (act) {
> > +     case XDP_PASS:
> > +             return FDMA_PASS;
> > +     case XDP_TX:
> > +             return lan966x_fdma_pci_xmit_xdpf(port, data, data_len) ?
> > +                    FDMA_DROP : FDMA_TX;
> 
> What if the BPF program modifies packet boundaries (e.g.,
> headroom/tailroom adjustment)? After bpf_prog_run_xdp(), xdp.data and
> xdp.data_end may differ from the original data and data_len, but
> lan966x_fdma_pci_xmit_xdpf() is called with the original values.
> Wouldn't any adjustments made by the XDP program be silently lost?
>

I think you are absolutely right. Thanks for catching this. Actually, we might
have the same issue in the platform XDP path also.

I will fix this in v2.

> > +     default:
> > +             bpf_warn_invalid_xdp_action(port->dev, xdp_prog, act);
> > +             fallthrough;
> > +     case XDP_ABORTED:
> > +             trace_xdp_exception(port->dev, xdp_prog, act);
> > +             fallthrough;
> > +     case XDP_DROP:
> > +             return FDMA_DROP;
> > +     }
> > +}
> > +

/Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
  2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
                   ` (9 preceding siblings ...)
  2026-03-20 15:01 ` [PATCH net-next 10/10] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
@ 2026-03-23 14:52 ` Herve Codina
  2026-03-23 16:26   ` Herve Codina
  10 siblings, 1 reply; 18+ messages in thread
From: Herve Codina @ 2026-03-23 14:52 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Arnd Bergmann,
	Greg Kroah-Hartman, netdev, linux-kernel, bpf

Hi Daniel,

On Fri, 20 Mar 2026 16:00:56 +0100
Daniel Machon <daniel.machon@microchip.com> wrote:

> When lan966x operates as a PCIe endpoint, the driver currently uses
> register-based I/O for frame injection and extraction. This approach is
> functional but slow, topping out at around 33 Mbps on an Intel x86 host
> with a lan966x PCIe card.
> 
> This series adds FDMA (Frame DMA) support for the PCIe path. When
> operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> directly access host memory, so DMA buffers are allocated as contiguous
> coherent memory and mapped through the PCIe Address Translation Unit
> (ATU). The ATU provides outbound windows that translate internal FDMA
> addresses to PCIe bus addresses, allowing the FDMA engine to read and
> write host memory. Because the ATU requires contiguous address regions,
> page_pool and normal per-page DMA mappings cannot be used. Instead,
> frames are transferred using memcpy between the ATU-mapped buffers and
> the network stack. With this, throughput increases from ~33 Mbps to ~620
> Mbps for default MTU.
> 
> Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> management and coherent DMA allocation with ATU mapping.
> 
> Patches 3-5 refactor the lan966x FDMA code to support both platform and
> PCIe paths: extracting the LLP register write into a helper, exporting
> shared functions, and introducing an ops dispatch table selected at
> probe time.
> 
> Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> change and XDP support respectively.
> 
> Patches 9-10 update the lan966x PCI device tree overlay to extend the
> cpu register mapping to cover the ATU register space and add the FDMA
> interrupt.
> 

Thanks a lot for the series taking care of DMA and ATU in PCIe variants.

I have tested the whole series on both my ARM and x86 systems.

Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
so the improvement is obvious.

Tested-by: Herve Codina <herve.codina@bootlin.com>

Best regards,
Hervé


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
  2026-03-23 14:52 ` [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Herve Codina
@ 2026-03-23 16:26   ` Herve Codina
  2026-03-23 19:40     ` Daniel Machon
  0 siblings, 1 reply; 18+ messages in thread
From: Herve Codina @ 2026-03-23 16:26 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Arnd Bergmann,
	Greg Kroah-Hartman, netdev, linux-kernel, bpf

Hi Daniel,

On Mon, 23 Mar 2026 15:52:04 +0100
Herve Codina <herve.codina@bootlin.com> wrote:

> Hi Daniel,
> 
> On Fri, 20 Mar 2026 16:00:56 +0100
> Daniel Machon <daniel.machon@microchip.com> wrote:
> 
> > When lan966x operates as a PCIe endpoint, the driver currently uses
> > register-based I/O for frame injection and extraction. This approach is
> > functional but slow, topping out at around 33 Mbps on an Intel x86 host
> > with a lan966x PCIe card.
> > 
> > This series adds FDMA (Frame DMA) support for the PCIe path. When
> > operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> > directly access host memory, so DMA buffers are allocated as contiguous
> > coherent memory and mapped through the PCIe Address Translation Unit
> > (ATU). The ATU provides outbound windows that translate internal FDMA
> > addresses to PCIe bus addresses, allowing the FDMA engine to read and
> > write host memory. Because the ATU requires contiguous address regions,
> > page_pool and normal per-page DMA mappings cannot be used. Instead,
> > frames are transferred using memcpy between the ATU-mapped buffers and
> > the network stack. With this, throughput increases from ~33 Mbps to ~620
> > Mbps for default MTU.
> > 
> > Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> > contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> > management and coherent DMA allocation with ATU mapping.
> > 
> > Patches 3-5 refactor the lan966x FDMA code to support both platform and
> > PCIe paths: extracting the LLP register write into a helper, exporting
> > shared functions, and introducing an ops dispatch table selected at
> > probe time.
> > 
> > Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> > contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> > change and XDP support respectively.
> > 
> > Patches 9-10 update the lan966x PCI device tree overlay to extend the
> > cpu register mapping to cover the ATU register space and add the FDMA
> > interrupt.
> >   
> 
> Thanks a lot for the series taking care of DMA and ATU in PCIe variants.
> 
> I have tested the whole series on both my ARM and x86 systems.
> 
> Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
> so the improvement is obvious.
> 
> Tested-by: Herve Codina <herve.codina@bootlin.com>
> 

Hum, I think I found an issue.

If I remove the lan966x_pci module (modprobe -r lan966x_pci), and reload
it (modprobe lan966x_pci), the board is not working.

The system performs DHCP requests. Those requests are served by my PC (observed
with Wireshark) but the system doesn't see those answers. Indeed, he continues
to perform DHCP requests.

Looks like the lan966x_pci module removal leaves the board in a bad state.

Without the series applied, DHCP request answers from my PC are seen by the
system after any module unloading / reloading.

Do you have any ideas of what could be wrong?

Best regards,
Hervé

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
  2026-03-23 16:26   ` Herve Codina
@ 2026-03-23 19:40     ` Daniel Machon
  2026-03-24  8:07       ` Herve Codina
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Machon @ 2026-03-23 19:40 UTC (permalink / raw)
  To: Herve Codina
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Arnd Bergmann,
	Greg Kroah-Hartman, netdev, linux-kernel, bpf

Hi Hervé,

> Hi Daniel,
> 
> On Mon, 23 Mar 2026 15:52:04 +0100
> Herve Codina <herve.codina@bootlin.com> wrote:
> 
> > Hi Daniel,
> >
> > On Fri, 20 Mar 2026 16:00:56 +0100
> > Daniel Machon <daniel.machon@microchip.com> wrote:
> >
> > > When lan966x operates as a PCIe endpoint, the driver currently uses
> > > register-based I/O for frame injection and extraction. This approach is
> > > functional but slow, topping out at around 33 Mbps on an Intel x86 host
> > > with a lan966x PCIe card.
> > >
> > > This series adds FDMA (Frame DMA) support for the PCIe path. When
> > > operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> > > directly access host memory, so DMA buffers are allocated as contiguous
> > > coherent memory and mapped through the PCIe Address Translation Unit
> > > (ATU). The ATU provides outbound windows that translate internal FDMA
> > > addresses to PCIe bus addresses, allowing the FDMA engine to read and
> > > write host memory. Because the ATU requires contiguous address regions,
> > > page_pool and normal per-page DMA mappings cannot be used. Instead,
> > > frames are transferred using memcpy between the ATU-mapped buffers and
> > > the network stack. With this, throughput increases from ~33 Mbps to ~620
> > > Mbps for default MTU.
> > >
> > > Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> > > contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> > > management and coherent DMA allocation with ATU mapping.
> > >
> > > Patches 3-5 refactor the lan966x FDMA code to support both platform and
> > > PCIe paths: extracting the LLP register write into a helper, exporting
> > > shared functions, and introducing an ops dispatch table selected at
> > > probe time.
> > >
> > > Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> > > contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> > > change and XDP support respectively.
> > >
> > > Patches 9-10 update the lan966x PCI device tree overlay to extend the
> > > cpu register mapping to cover the ATU register space and add the FDMA
> > > interrupt.
> > >
> >
> > Thanks a lot for the series taking care of DMA and ATU in PCIe variants.
> >
> > I have tested the whole series on both my ARM and x86 systems.
> >
> > Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
> > so the improvement is obvious.
> >
> > Tested-by: Herve Codina <herve.codina@bootlin.com>
> >
> 
> Hum, I think I found an issue.
> 
> If I remove the lan966x_pci module (modprobe -r lan966x_pci), and reload
> it (modprobe lan966x_pci), the board is not working.
> 
> The system performs DHCP requests. Those requests are served by my PC (observed
> with Wireshark) but the system doesn't see those answers. Indeed, he continues
> to perform DHCP requests.
> 
> Looks like the lan966x_pci module removal leaves the board in a bad state.
> 
> Without the series applied, DHCP request answers from my PC are seen by the
> system after any module unloading / reloading.
> 
> Do you have any ideas of what could be wrong?
> 
> Best regards,
> Hervé

Thanks for testing this!

As part of my testing I did unload/load the lan966x_switch module to ensure the
ATU was properly reset and reconfigured, and that seemed to work fine. I must
admit, I did not try with the lan966x_pci module.

From what I hear, when you are in the bad state, TX is still working, so it's an
RX issue. Could be the interrupt is not firing, so the napi poll is not
scheduled.

Anyway, I will have a look at it during the week. Will let you know.

/Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
  2026-03-23 19:40     ` Daniel Machon
@ 2026-03-24  8:07       ` Herve Codina
  2026-03-26 15:48         ` Daniel Machon
  0 siblings, 1 reply; 18+ messages in thread
From: Herve Codina @ 2026-03-24  8:07 UTC (permalink / raw)
  To: Daniel Machon
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Arnd Bergmann,
	Greg Kroah-Hartman, netdev, linux-kernel, bpf

Hi Daniel,

On Mon, 23 Mar 2026 20:40:59 +0100
Daniel Machon <daniel.machon@microchip.com> wrote:

> Hi Hervé,
> 
> > Hi Daniel,
> > 
> > On Mon, 23 Mar 2026 15:52:04 +0100
> > Herve Codina <herve.codina@bootlin.com> wrote:
> >   
> > > Hi Daniel,
> > >
> > > On Fri, 20 Mar 2026 16:00:56 +0100
> > > Daniel Machon <daniel.machon@microchip.com> wrote:
> > >  
> > > > When lan966x operates as a PCIe endpoint, the driver currently uses
> > > > register-based I/O for frame injection and extraction. This approach is
> > > > functional but slow, topping out at around 33 Mbps on an Intel x86 host
> > > > with a lan966x PCIe card.
> > > >
> > > > This series adds FDMA (Frame DMA) support for the PCIe path. When
> > > > operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> > > > directly access host memory, so DMA buffers are allocated as contiguous
> > > > coherent memory and mapped through the PCIe Address Translation Unit
> > > > (ATU). The ATU provides outbound windows that translate internal FDMA
> > > > addresses to PCIe bus addresses, allowing the FDMA engine to read and
> > > > write host memory. Because the ATU requires contiguous address regions,
> > > > page_pool and normal per-page DMA mappings cannot be used. Instead,
> > > > frames are transferred using memcpy between the ATU-mapped buffers and
> > > > the network stack. With this, throughput increases from ~33 Mbps to ~620
> > > > Mbps for default MTU.
> > > >
> > > > Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> > > > contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> > > > management and coherent DMA allocation with ATU mapping.
> > > >
> > > > Patches 3-5 refactor the lan966x FDMA code to support both platform and
> > > > PCIe paths: extracting the LLP register write into a helper, exporting
> > > > shared functions, and introducing an ops dispatch table selected at
> > > > probe time.
> > > >
> > > > Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> > > > contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> > > > change and XDP support respectively.
> > > >
> > > > Patches 9-10 update the lan966x PCI device tree overlay to extend the
> > > > cpu register mapping to cover the ATU register space and add the FDMA
> > > > interrupt.
> > > >  
> > >
> > > Thanks a lot for the series taking care of DMA and ATU in PCIe variants.
> > >
> > > I have tested the whole series on both my ARM and x86 systems.
> > >
> > > Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
> > > so the improvement is obvious.
> > >
> > > Tested-by: Herve Codina <herve.codina@bootlin.com>
> > >  
> > 
> > Hum, I think I found an issue.
> > 
> > If I remove the lan966x_pci module (modprobe -r lan966x_pci), and reload
> > it (modprobe lan966x_pci), the board is not working.
> > 
> > The system performs DHCP requests. Those requests are served by my PC (observed
> > with Wireshark) but the system doesn't see those answers. Indeed, he continues
> > to perform DHCP requests.
> > 
> > Looks like the lan966x_pci module removal leaves the board in a bad state.
> > 
> > Without the series applied, DHCP request answers from my PC are seen by the
> > system after any module unloading / reloading.
> > 
> > Do you have any ideas of what could be wrong?
> > 
> > Best regards,
> > Hervé  
> 
> Thanks for testing this!
> 
> As part of my testing I did unload/load the lan966x_switch module to ensure the
> ATU was properly reset and reconfigured, and that seemed to work fine. I must
> admit, I did not try with the lan966x_pci module.
> 
> From what I hear, when you are in the bad state, TX is still working, so it's an
> RX issue. Could be the interrupt is not firing, so the napi poll is not
> scheduled.

Yes, confirmed that the issue is on Rx path. Tx data were received by my PC.

> 
> Anyway, I will have a look at it during the week. Will let you know.
> 

Some more interesting information.

I tested lan966x_pci module unloading and re-loading on my ARM system.

On this system the following traces are present when I reload the lan966x_pci
module. Those traces were not present on my x86 system.

    [  104.715031] ------------[ cut here ]------------
    [  104.719746] Unexpected error: 64, error_type: 1073741824
    [  104.725217] WARNING: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:558 at lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch], CPU#0: swapper/0/0
    [  104.739119] Modules linked in: lan966x_pci irq_lan966x_oic reset_microchip_sparx5 pinctrl_ocelot lan966x_serdes mdio_mscc_miim lan966x_switch rtc_ds1307 marvell [last unloaded: lan966x_pci]
    [  104.756250] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc1-00010-gfa357f2a6a00 #565 PREEMPT 
    [  104.765743] Hardware name: Marvell Armada 3720 Development Board DB-88F3720-DDR3 (DT)
    [  104.773579] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [  104.780551] pc : lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch]
    [  104.787046] lr : lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch]
    [  104.793538] sp : ffff800082a7bc00
    [  104.796860] x29: ffff800082a7bc00 x28: ffff8000819abd00 x27: ffff800081a57757
    [  104.804033] x26: ffff8000819a5d88 x25: 0000000000012400 x24: ffff800081a57758
    [  104.811205] x23: 0000000000000038 x22: ffff00000f70ec00 x21: 0000000040000000
    [  104.818376] x20: ffff00000f7a8080 x19: 0000000000000040 x18: 00000000ffffffff
    [  104.825548] x17: ffff7fff9e7d6000 x16: ffff800082a78000 x15: ffff800102a7b837
    [  104.832719] x14: 0000000000000000 x13: 0000000000000000 x12: 3031203a65707974
    [  104.839891] x11: ffff8000819ca758 x10: 0000000000000018 x9 : ffff8000819ca758
    [  104.847062] x8 : 00000000ffffefff x7 : ffff800081a22758 x6 : 00000000fffff000
    [  104.854233] x5 : ffff00001fea1588 x4 : 0000000000000000 x3 : 0000000000000027
    [  104.861404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff8000819abd00
    [  104.868576] Call trace:
    [  104.871034]  lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch] (P)
    [  104.877531]  __handle_irq_event_percpu+0xa0/0x4c4
    [  104.882261]  handle_irq_event+0x4c/0xf8
    [  104.886117]  handle_level_irq+0xec/0x17c
    [  104.890064]  handle_irq_desc+0x40/0x58
    [  104.893831]  generic_handle_domain_irq+0x18/0x24
    [  104.898467]  lan966x_oic_irq_handler_domain+0x64/0xb0 [irq_lan966x_oic]
    [  104.905105]  lan966x_oic_irq_handler+0x34/0xb4 [irq_lan966x_oic]
    [  104.911131]  handle_irq_desc+0x40/0x58
    [  104.914900]  generic_handle_domain_irq+0x18/0x24
    [  104.919536]  pci_dev_irq_handler+0x1c/0x30 [lan966x_pci]
    [  104.924871]  __handle_irq_event_percpu+0xa0/0x4c4
    [  104.929594]  handle_irq_event+0x4c/0xf8
    [  104.933449]  handle_level_irq+0xec/0x17c
    [  104.937394]  handle_irq_desc+0x40/0x58
    [  104.941162]  generic_handle_domain_irq+0x18/0x24
    [  104.945797]  advk_pcie_irq_handler+0x160/0x390
    [  104.950260]  __handle_irq_event_percpu+0xa0/0x4c4
    [  104.954983]  handle_irq_event+0x4c/0xf8
    [  104.958839]  handle_fasteoi_irq+0x108/0x20c
    [  104.963043]  handle_irq_desc+0x40/0x58
    [  104.966811]  generic_handle_domain_irq+0x18/0x24
    [  104.971447]  gic_handle_irq+0x4c/0x110
    [  104.975214]  call_on_irq_stack+0x30/0x48
    [  104.979156]  do_interrupt_handler+0x80/0x84
    [  104.983360]  el1_interrupt+0x3c/0x60
    [  104.986959]  el1h_64_irq_handler+0x18/0x24
    [  104.991076]  el1h_64_irq+0x6c/0x70
    [  104.994495]  default_idle_call+0x80/0x138 (P)
    [  104.998870]  do_idle+0x220/0x290
    [  105.002121]  cpu_startup_entry+0x34/0x3c
    [  105.006064]  rest_init+0xf8/0x188
    [  105.009398]  start_kernel+0x818/0x8ec
    [  105.013085]  __primary_switched+0x88/0x90
    [  105.017119] irq event stamp: 60740
    [  105.020529] hardirqs last  enabled at (60739): [<ffff800080fb37bc>] default_idle_call+0x7c/0x138
    [  105.029328] hardirqs last disabled at (60740): [<ffff800080fab5c4>] enter_from_kernel_mode+0x10/0x3c
    [  105.038477] softirqs last  enabled at (60728): [<ffff8000800cb774>] handle_softirqs+0x624/0x63c
    [  105.047193] softirqs last disabled at (60711): [<ffff8000800102d8>] __do_softirq+0x14/0x20
    [  105.055471] ---[ end trace 0000000000000000 ]---
    [  105.060274] ------------[ cut here ]------------
    [  105.064963] Unexpected error: 64, error_type: 1073741824
    [  105.070440] WARNING: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:558 at lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch], CPU#0: swapper/0/0
    ...
    [  105.443891] ------------[ cut here ]------------
    [  105.448536] Unexpected error: 64, error_type: 0
    [  105.453235] WARNING: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:558 at lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch], CPU#0: swapper/0/0
    ...

And after them my ARM system don't see the answer of a ping command
replied by my PC (I don't use DHCP with my ARM system).

I don't know why those traces are not present on my x86 (config, race
condition, other reason) but anyway, they can help to find some clues
about what's going on.

Of course, feel free to ask me some more tests or anything else I can do
to help on this topic.

Best regards,
Hervé

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
  2026-03-24  8:07       ` Herve Codina
@ 2026-03-26 15:48         ` Daniel Machon
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Machon @ 2026-03-26 15:48 UTC (permalink / raw)
  To: Herve Codina
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Arnd Bergmann,
	Greg Kroah-Hartman, netdev, linux-kernel, bpf

Hi Hervé,

> Hi Daniel,
> 
> On Mon, 23 Mar 2026 20:40:59 +0100
> Daniel Machon <daniel.machon@microchip.com> wrote:
> 
> > Hi Hervé,
> >
> > > Hi Daniel,
> > >
> > > On Mon, 23 Mar 2026 15:52:04 +0100
> > > Herve Codina <herve.codina@bootlin.com> wrote:
> > >
> > > > Hi Daniel,
> > > >
> > > > On Fri, 20 Mar 2026 16:00:56 +0100
> > > > Daniel Machon <daniel.machon@microchip.com> wrote:
> > > >
> > > > > When lan966x operates as a PCIe endpoint, the driver currently uses
> > > > > register-based I/O for frame injection and extraction. This approach is
> > > > > functional but slow, topping out at around 33 Mbps on an Intel x86 host
> > > > > with a lan966x PCIe card.
> > > > >
> > > > > This series adds FDMA (Frame DMA) support for the PCIe path. When
> > > > > operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> > > > > directly access host memory, so DMA buffers are allocated as contiguous
> > > > > coherent memory and mapped through the PCIe Address Translation Unit
> > > > > (ATU). The ATU provides outbound windows that translate internal FDMA
> > > > > addresses to PCIe bus addresses, allowing the FDMA engine to read and
> > > > > write host memory. Because the ATU requires contiguous address regions,
> > > > > page_pool and normal per-page DMA mappings cannot be used. Instead,
> > > > > frames are transferred using memcpy between the ATU-mapped buffers and
> > > > > the network stack. With this, throughput increases from ~33 Mbps to ~620
> > > > > Mbps for default MTU.
> > > > >
> > > > > Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> > > > > contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> > > > > management and coherent DMA allocation with ATU mapping.
> > > > >
> > > > > Patches 3-5 refactor the lan966x FDMA code to support both platform and
> > > > > PCIe paths: extracting the LLP register write into a helper, exporting
> > > > > shared functions, and introducing an ops dispatch table selected at
> > > > > probe time.
> > > > >
> > > > > Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> > > > > contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> > > > > change and XDP support respectively.
> > > > >
> > > > > Patches 9-10 update the lan966x PCI device tree overlay to extend the
> > > > > cpu register mapping to cover the ATU register space and add the FDMA
> > > > > interrupt.
> > > > >
> > > >
> > > > Thanks a lot for the series taking care of DMA and ATU in PCIe variants.
> > > >
> > > > I have tested the whole series on both my ARM and x86 systems.
> > > >
> > > > Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
> > > > so the improvement is obvious.
> > > >
> > > > Tested-by: Herve Codina <herve.codina@bootlin.com>
> > > >
> > >
> > > Hum, I think I found an issue.
> > >
> > > If I remove the lan966x_pci module (modprobe -r lan966x_pci), and reload
> > > it (modprobe lan966x_pci), the board is not working.
> > >
> > > The system performs DHCP requests. Those requests are served by my PC (observed
> > > with Wireshark) but the system doesn't see those answers. Indeed, he continues
> > > to perform DHCP requests.
> > >
> > > Looks like the lan966x_pci module removal leaves the board in a bad state.
> > >
> > > Without the series applied, DHCP request answers from my PC are seen by the
> > > system after any module unloading / reloading.
> > >
> > > Do you have any ideas of what could be wrong?
> > >
> > > Best regards,
> > > Hervé
> >
> > Thanks for testing this!
> >
> > As part of my testing I did unload/load the lan966x_switch module to ensure the
> > ATU was properly reset and reconfigured, and that seemed to work fine. I must
> > admit, I did not try with the lan966x_pci module.
> >
> > From what I hear, when you are in the bad state, TX is still working, so it's an
> > RX issue. Could be the interrupt is not firing, so the napi poll is not
> > scheduled.
> 
> Yes, confirmed that the issue is on Rx path. Tx data were received by my PC.
> 
> >
> > Anyway, I will have a look at it during the week. Will let you know.
> >
> 
> Some more interesting information.
> 
> I tested lan966x_pci module unloading and re-loading on my ARM system.
> 
> On this system the following traces are present when I reload the lan966x_pci
> module. Those traces were not present on my x86 system.
> 
>     [  104.715031] ------------[ cut here ]------------
>     [  104.719746] Unexpected error: 64, error_type: 1073741824
>     [  104.725217] WARNING: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:558 at lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch], CPU#0: swapper/0/0
>     [  104.739119] Modules linked in: lan966x_pci irq_lan966x_oic reset_microchip_sparx5 pinctrl_ocelot lan966x_serdes mdio_mscc_miim lan966x_switch rtc_ds1307 marvell [last unloaded: lan966x_pci]
>     [  104.756250] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc1-00010-gfa357f2a6a00 #565 PREEMPT
>     [  104.765743] Hardware name: Marvell Armada 3720 Development Board DB-88F3720-DDR3 (DT)
>     [  104.773579] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>     [  104.780551] pc : lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch]
>     [  104.787046] lr : lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch]
>     [  104.793538] sp : ffff800082a7bc00
>     [  104.796860] x29: ffff800082a7bc00 x28: ffff8000819abd00 x27: ffff800081a57757
>     [  104.804033] x26: ffff8000819a5d88 x25: 0000000000012400 x24: ffff800081a57758
>     [  104.811205] x23: 0000000000000038 x22: ffff00000f70ec00 x21: 0000000040000000
>     [  104.818376] x20: ffff00000f7a8080 x19: 0000000000000040 x18: 00000000ffffffff
>     [  104.825548] x17: ffff7fff9e7d6000 x16: ffff800082a78000 x15: ffff800102a7b837
>     [  104.832719] x14: 0000000000000000 x13: 0000000000000000 x12: 3031203a65707974
>     [  104.839891] x11: ffff8000819ca758 x10: 0000000000000018 x9 : ffff8000819ca758
>     [  104.847062] x8 : 00000000ffffefff x7 : ffff800081a22758 x6 : 00000000fffff000
>     [  104.854233] x5 : ffff00001fea1588 x4 : 0000000000000000 x3 : 0000000000000027
>     [  104.861404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff8000819abd00
>     [  104.868576] Call trace:
>     [  104.871034]  lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch] (P)
>     [  104.877531]  __handle_irq_event_percpu+0xa0/0x4c4
>     [  104.882261]  handle_irq_event+0x4c/0xf8
>     [  104.886117]  handle_level_irq+0xec/0x17c
>     [  104.890064]  handle_irq_desc+0x40/0x58
>     [  104.893831]  generic_handle_domain_irq+0x18/0x24
>     [  104.898467]  lan966x_oic_irq_handler_domain+0x64/0xb0 [irq_lan966x_oic]
>     [  104.905105]  lan966x_oic_irq_handler+0x34/0xb4 [irq_lan966x_oic]
>     [  104.911131]  handle_irq_desc+0x40/0x58
>     [  104.914900]  generic_handle_domain_irq+0x18/0x24
>     [  104.919536]  pci_dev_irq_handler+0x1c/0x30 [lan966x_pci]
>     [  104.924871]  __handle_irq_event_percpu+0xa0/0x4c4
>     [  104.929594]  handle_irq_event+0x4c/0xf8
>     [  104.933449]  handle_level_irq+0xec/0x17c
>     [  104.937394]  handle_irq_desc+0x40/0x58
>     [  104.941162]  generic_handle_domain_irq+0x18/0x24
>     [  104.945797]  advk_pcie_irq_handler+0x160/0x390
>     [  104.950260]  __handle_irq_event_percpu+0xa0/0x4c4
>     [  104.954983]  handle_irq_event+0x4c/0xf8
>     [  104.958839]  handle_fasteoi_irq+0x108/0x20c
>     [  104.963043]  handle_irq_desc+0x40/0x58
>     [  104.966811]  generic_handle_domain_irq+0x18/0x24
>     [  104.971447]  gic_handle_irq+0x4c/0x110
>     [  104.975214]  call_on_irq_stack+0x30/0x48
>     [  104.979156]  do_interrupt_handler+0x80/0x84
>     [  104.983360]  el1_interrupt+0x3c/0x60
>     [  104.986959]  el1h_64_irq_handler+0x18/0x24
>     [  104.991076]  el1h_64_irq+0x6c/0x70
>     [  104.994495]  default_idle_call+0x80/0x138 (P)
>     [  104.998870]  do_idle+0x220/0x290
>     [  105.002121]  cpu_startup_entry+0x34/0x3c
>     [  105.006064]  rest_init+0xf8/0x188
>     [  105.009398]  start_kernel+0x818/0x8ec
>     [  105.013085]  __primary_switched+0x88/0x90
>     [  105.017119] irq event stamp: 60740
>     [  105.020529] hardirqs last  enabled at (60739): [<ffff800080fb37bc>] default_idle_call+0x7c/0x138
>     [  105.029328] hardirqs last disabled at (60740): [<ffff800080fab5c4>] enter_from_kernel_mode+0x10/0x3c
>     [  105.038477] softirqs last  enabled at (60728): [<ffff8000800cb774>] handle_softirqs+0x624/0x63c
>     [  105.047193] softirqs last disabled at (60711): [<ffff8000800102d8>] __do_softirq+0x14/0x20
>     [  105.055471] ---[ end trace 0000000000000000 ]---
>     [  105.060274] ------------[ cut here ]------------
>     [  105.064963] Unexpected error: 64, error_type: 1073741824
>     [  105.070440] WARNING: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:558 at lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch], CPU#0: swapper/0/0
>     ...
>     [  105.443891] ------------[ cut here ]------------
>     [  105.448536] Unexpected error: 64, error_type: 0
>     [  105.453235] WARNING: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:558 at lan966x_fdma_irq_handler+0xe0/0x12c [lan966x_switch], CPU#0: swapper/0/0
>     ...
> 
> And after them my ARM system don't see the answer of a ping command
> replied by my PC (I don't use DHCP with my ARM system).
> 
> I don't know why those traces are not present on my x86 (config, race
> condition, other reason) but anyway, they can help to find some clues
> about what's going on.
> 
> Of course, feel free to ask me some more tests or anything else I can do
> to help on this topic.
> 
> Best regards,
> Hervé

As I remembered, doing rmmod on the lan966x_switch followed by modprobe
lan966x_switch works fine. This is because neither the switch core, nor the FDMA
engine is reset, so they remain in sync.

When the lan966x_pci module is removed and reloaded (what you did), the DT
overlay is re-applied, which causes the reset controller
(reset-microchip-sparx5) to re-probe. During probe, it performs a GCB soft reset
that resets the switch core, but protects the CPU domain from the reset. The
FDMA engine is part of the CPU domain, so it is not reset.

This leaves the switch core in a reset state while the FDMA
retains state from the previous driver instance. When the switch driver
subsequently probes and activates the FDMA channels, the two are out of
sync, and the FDMA immediately reports extraction errors.

Theres actually an FDMA register called NRESET that resets the FDMA controller
state. Calling this in the FDMA init path causes traffic to work correctly on
lan966x_pci reload, but it does not get rid of the FDMA splats you posted above.
They get queued up between the switch core reset, in the reset controller, and
the FDMA enabling. I tried different approaches to drain or flush queues, but
they wont go away entirely.

The only thing that seems to work consistently is to *not* do the soft reset in
the reset controller for the PCI path. The soft reset is actually the problem:
it only resets the switch core while protecting the CPU domain (including FDMA),
causing a desync.

A simple fix could be (in reset-microchip-sparx5.c):

+static bool mchp_reset_is_pci(struct device *dev)
+{
+	for (dev = dev->parent; dev; dev = dev->parent) {
+		if (dev_is_pci(dev))
+			return true;
+	}
+	return false;
+}

-	/* Issue the reset very early, our actual reset callback is a noop. */
-	err = sparx5_switch_reset(ctx);
-	if (err)
-		return err;
+	/* Issue the reset very early, our actual reset callback is a noop.
+	 *
+	 * On the PCI path, skip the reset. The endpoint is already in
+	 * power-on reset state on the first probe. On subsequent probes
+	 * (after driver reload), resetting the switch core while the FDMA
+	 * retains state (CPU domain is protected from the soft reset)
+	 * causes the two to go out of sync, leading to FDMA extraction
+	 * errors.
+	 */
+	if (!mchp_reset_is_pci(&pdev->dev)) {
+		err = sparx5_switch_reset(ctx);
+		if (err)
+			return err;
+	}

Could you test it and see if it helps the problem on your side.

/Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-03-26 15:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
2026-03-20 15:00 ` [PATCH net-next 01/10] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
2026-03-20 15:00 ` [PATCH net-next 02/10] net: microchip: fdma: add PCIe ATU support Daniel Machon
2026-03-20 15:00 ` [PATCH net-next 03/10] net: lan966x: add FDMA LLP register write helper Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 04/10] net: lan966x: export FDMA helpers for reuse Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 05/10] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 06/10] net: lan966x: add PCIe FDMA support Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 07/10] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support Daniel Machon
2026-03-22  7:11   ` Mohsin Bashir
2026-03-22 20:30     ` Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 09/10] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 10/10] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
2026-03-23 14:52 ` [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Herve Codina
2026-03-23 16:26   ` Herve Codina
2026-03-23 19:40     ` Daniel Machon
2026-03-24  8:07       ` Herve Codina
2026-03-26 15:48         ` Daniel Machon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox