[PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
@ 2025-11-19 13:53 Paolo Valerio
  2025-11-19 13:53 ` [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support Paolo Valerio
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Testing were performed on Raspberry Pi 5 with upstream kernel
and all the changes are intended for gem only.

The series consists of two main changes:

- Migration from netdev_alloc_skb() to page pool allocation,
  enabling skb recycling.
  This also adds support for multi-descriptor frame reception,
  removing the previous single-descriptor approach and avoiding
  potentially large contiguous allocations for e.g. jumbo frames
  with CONFIG_PAGE_SIZE_4KB.

- XDP support: Complete XDP implementation supporting all major
  verdicts (XDP_PASS, XDP_DROP, XDP_REDIRECT, XDP_TX) along with
  the ndo_xdp_xmit function for packet redirection.

The driver now advertises NETDEV_XDP_ACT_BASIC, NETDEV_XDP_ACT_REDIRECT,
NETDEV_XDP_ACT_NDO_XMIT capabilities.

Paolo Valerio (6):
  cadence: macb/gem: Add page pool support
  cadence: macb/gem: handle multi-descriptor frame reception
  cadence: macb/gem: use the current queue number for stats
  cadence: macb/gem: add XDP support for gem
  cadence: macb/gem: make tx path skb agnostic
  cadence: macb/gem: introduce xmit support

 drivers/net/ethernet/cadence/Kconfig     |   1 +
 drivers/net/ethernet/cadence/macb.h      |  42 +-
 drivers/net/ethernet/cadence/macb_main.c | 680 ++++++++++++++++++-----
 3 files changed, 567 insertions(+), 156 deletions(-)

-- 
2.51.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
@ 2025-11-19 13:53 ` Paolo Valerio
  2025-11-27 13:21   ` Théo Lebrun
  2025-11-19 13:53 ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Use the page pool allocator for the data buffers and enable skb recycling
support, instead of relying on netdev_alloc_skb allocating the entire skb
during the refill.

The change replaces the rx_skbuff array with rx_buff array to store page
pool allocated buffers instead of pre-allocated skbs. DMA mapping, rx and
refill path were updated accordingly.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
---
 drivers/net/ethernet/cadence/Kconfig     |   1 +
 drivers/net/ethernet/cadence/macb.h      |  10 +-
 drivers/net/ethernet/cadence/macb_main.c | 169 +++++++++++++++--------
 3 files changed, 121 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
index 5b2a461dfd28..ae500f717433 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -25,6 +25,7 @@ config MACB
 	depends on PTP_1588_CLOCK_OPTIONAL
 	select PHYLINK
 	select CRC32
+	select PAGE_POOL
 	help
 	  The Cadence MACB ethernet interface is found on many Atmel AT32 and
 	  AT91 parts.  This driver also supports the Cadence GEM (Gigabit
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 87414a2ddf6e..dcf768bd1bc1 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -14,6 +14,8 @@
 #include <linux/interrupt.h>
 #include <linux/phy/phy.h>
 #include <linux/workqueue.h>
+#include <net/page_pool/helpers.h>
+#include <net/xdp.h>
 
 #define MACB_GREGS_NBR 16
 #define MACB_GREGS_VERSION 2
@@ -957,6 +959,10 @@ struct macb_dma_desc_ptp {
 /* Scaled PPM fraction */
 #define PPM_FRACTION	16
 
+/* The buf includes headroom compatible with both skb and xdpf */
+#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
+#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
+
 /* struct macb_tx_skb - data about an skb which is being transmitted
  * @skb: skb currently being transmitted, only set for the last buffer
  *       of the frame
@@ -1262,10 +1268,11 @@ struct macb_queue {
 	unsigned int		rx_tail;
 	unsigned int		rx_prepared_head;
 	struct macb_dma_desc	*rx_ring;
-	struct sk_buff		**rx_skbuff;
+	void			**rx_buff;
 	void			*rx_buffers;
 	struct napi_struct	napi_rx;
 	struct queue_stats stats;
+	struct page_pool	*page_pool;
 };
 
 struct ethtool_rx_fs_item {
@@ -1289,6 +1296,7 @@ struct macb {
 	struct macb_dma_desc	*rx_ring_tieoff;
 	dma_addr_t		rx_ring_tieoff_dma;
 	size_t			rx_buffer_size;
+	u16			rx_offset;
 
 	unsigned int		rx_ring_size;
 	unsigned int		tx_ring_size;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index e461f5072884..985c81913ba6 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1250,11 +1250,28 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 	return packets;
 }
 
-static void gem_rx_refill(struct macb_queue *queue)
+static void *gem_page_pool_get_buff(struct page_pool *pool,
+				    dma_addr_t *dma_addr, gfp_t gfp_mask)
+{
+	struct page *page;
+
+	if (!pool)
+		return NULL;
+
+	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
+	if (!page)
+		return NULL;
+
+	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
+
+	return page_address(page);
+}
+
+static void gem_rx_refill(struct macb_queue *queue, bool napi)
 {
 	unsigned int		entry;
-	struct sk_buff		*skb;
 	dma_addr_t		paddr;
+	void			*data;
 	struct macb *bp = queue->bp;
 	struct macb_dma_desc *desc;
 
@@ -1267,25 +1284,17 @@ static void gem_rx_refill(struct macb_queue *queue)
 
 		desc = macb_rx_desc(queue, entry);
 
-		if (!queue->rx_skbuff[entry]) {
+		if (!queue->rx_buff[entry]) {
 			/* allocate sk_buff for this free entry in ring */
-			skb = netdev_alloc_skb(bp->dev, bp->rx_buffer_size);
-			if (unlikely(!skb)) {
+			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
+						      napi ? GFP_ATOMIC : GFP_KERNEL);
+			if (unlikely(!data)) {
 				netdev_err(bp->dev,
-					   "Unable to allocate sk_buff\n");
+					   "Unable to allocate page\n");
 				break;
 			}
 
-			/* now fill corresponding descriptor entry */
-			paddr = dma_map_single(&bp->pdev->dev, skb->data,
-					       bp->rx_buffer_size,
-					       DMA_FROM_DEVICE);
-			if (dma_mapping_error(&bp->pdev->dev, paddr)) {
-				dev_kfree_skb(skb);
-				break;
-			}
-
-			queue->rx_skbuff[entry] = skb;
+			queue->rx_buff[entry] = data;
 
 			if (entry == bp->rx_ring_size - 1)
 				paddr |= MACB_BIT(RX_WRAP);
@@ -1295,20 +1304,6 @@ static void gem_rx_refill(struct macb_queue *queue)
 			 */
 			dma_wmb();
 			macb_set_addr(bp, desc, paddr);
-
-			/* Properly align Ethernet header.
-			 *
-			 * Hardware can add dummy bytes if asked using the RBOF
-			 * field inside the NCFGR register. That feature isn't
-			 * available if hardware is RSC capable.
-			 *
-			 * We cannot fallback to doing the 2-byte shift before
-			 * DMA mapping because the address field does not allow
-			 * setting the low 2/3 bits.
-			 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
-			 */
-			if (!(bp->caps & MACB_CAPS_RSC))
-				skb_reserve(skb, NET_IP_ALIGN);
 		} else {
 			desc->ctrl = 0;
 			dma_wmb();
@@ -1349,12 +1344,16 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		  int budget)
 {
 	struct macb *bp = queue->bp;
+	int			buffer_size;
 	unsigned int		len;
 	unsigned int		entry;
+	void			*data;
 	struct sk_buff		*skb;
 	struct macb_dma_desc	*desc;
 	int			count = 0;
 
+	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
+
 	while (count < budget) {
 		u32 ctrl;
 		dma_addr_t addr;
@@ -1387,24 +1386,49 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 			queue->stats.rx_dropped++;
 			break;
 		}
-		skb = queue->rx_skbuff[entry];
-		if (unlikely(!skb)) {
+		data = queue->rx_buff[entry];
+		if (unlikely(!data)) {
 			netdev_err(bp->dev,
 				   "inconsistent Rx descriptor chain\n");
 			bp->dev->stats.rx_dropped++;
 			queue->stats.rx_dropped++;
 			break;
 		}
+
+		skb = napi_build_skb(data, buffer_size);
+		if (unlikely(!skb)) {
+			netdev_err(bp->dev,
+				   "Unable to allocate sk_buff\n");
+			page_pool_put_full_page(queue->page_pool,
+						virt_to_head_page(data),
+						false);
+			break;
+		}
+
+		/* Properly align Ethernet header.
+		 *
+		 * Hardware can add dummy bytes if asked using the RBOF
+		 * field inside the NCFGR register. That feature isn't
+		 * available if hardware is RSC capable.
+		 *
+		 * We cannot fallback to doing the 2-byte shift before
+		 * DMA mapping because the address field does not allow
+		 * setting the low 2/3 bits.
+		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
+		 */
+		skb_reserve(skb, bp->rx_offset);
+		skb_mark_for_recycle(skb);
+
 		/* now everything is ready for receiving packet */
-		queue->rx_skbuff[entry] = NULL;
+		queue->rx_buff[entry] = NULL;
 		len = ctrl & bp->rx_frm_len_mask;
 
 		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
 
+		dma_sync_single_for_cpu(&bp->pdev->dev,
+					addr, len,
+					page_pool_get_dma_dir(queue->page_pool));
 		skb_put(skb, len);
-		dma_unmap_single(&bp->pdev->dev, addr,
-				 bp->rx_buffer_size, DMA_FROM_DEVICE);
-
 		skb->protocol = eth_type_trans(skb, bp->dev);
 		skb_checksum_none_assert(skb);
 		if (bp->dev->features & NETIF_F_RXCSUM &&
@@ -1431,7 +1455,7 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		napi_gro_receive(napi, skb);
 	}
 
-	gem_rx_refill(queue);
+	gem_rx_refill(queue, true);
 
 	return count;
 }
@@ -2387,34 +2411,30 @@ static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
 
 static void gem_free_rx_buffers(struct macb *bp)
 {
-	struct sk_buff		*skb;
-	struct macb_dma_desc	*desc;
 	struct macb_queue *queue;
-	dma_addr_t		addr;
 	unsigned int q;
+	void *data;
 	int i;
 
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
-		if (!queue->rx_skbuff)
+		if (!queue->rx_buff)
 			continue;
 
 		for (i = 0; i < bp->rx_ring_size; i++) {
-			skb = queue->rx_skbuff[i];
-
-			if (!skb)
+			data = queue->rx_buff[i];
+			if (!data)
 				continue;
 
-			desc = macb_rx_desc(queue, i);
-			addr = macb_get_addr(bp, desc);
-
-			dma_unmap_single(&bp->pdev->dev, addr, bp->rx_buffer_size,
-					DMA_FROM_DEVICE);
-			dev_kfree_skb_any(skb);
-			skb = NULL;
+			page_pool_put_full_page(queue->page_pool,
+						virt_to_head_page(data),
+						false);
+			queue->rx_buff[i] = NULL;
 		}
 
-		kfree(queue->rx_skbuff);
-		queue->rx_skbuff = NULL;
+		kfree(queue->rx_buff);
+		queue->rx_buff = NULL;
+		page_pool_destroy(queue->page_pool);
+		queue->page_pool = NULL;
 	}
 }
 
@@ -2477,13 +2497,13 @@ static int gem_alloc_rx_buffers(struct macb *bp)
 
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
 		size = bp->rx_ring_size * sizeof(struct sk_buff *);
-		queue->rx_skbuff = kzalloc(size, GFP_KERNEL);
-		if (!queue->rx_skbuff)
+		queue->rx_buff = kzalloc(size, GFP_KERNEL);
+		if (!queue->rx_buff)
 			return -ENOMEM;
 		else
 			netdev_dbg(bp->dev,
-				   "Allocated %d RX struct sk_buff entries at %p\n",
-				   bp->rx_ring_size, queue->rx_skbuff);
+				   "Allocated %d RX buff entries at %p\n",
+				   bp->rx_ring_size, queue->rx_buff);
 	}
 	return 0;
 }
@@ -2567,6 +2587,32 @@ static int macb_alloc_consistent(struct macb *bp)
 	return -ENOMEM;
 }
 
+static void gem_create_page_pool(struct macb_queue *queue)
+{
+	unsigned int num_pages = DIV_ROUND_UP(queue->bp->rx_buffer_size, PAGE_SIZE);
+	struct macb *bp = queue->bp;
+	struct page_pool_params pp_params = {
+		.order = order_base_2(num_pages),
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+		.pool_size = queue->bp->rx_ring_size,
+		.nid = NUMA_NO_NODE,
+		.dma_dir = DMA_FROM_DEVICE,
+		.dev = &queue->bp->pdev->dev,
+		.napi = &queue->napi_rx,
+		.offset = bp->rx_offset,
+		.max_len = MACB_PP_MAX_BUF_SIZE(num_pages),
+	};
+	struct page_pool *pool;
+
+	pool = page_pool_create(&pp_params);
+	if (IS_ERR(pool)) {
+		netdev_err(queue->bp->dev, "cannot create rx page pool\n");
+		pool = NULL;
+	}
+
+	queue->page_pool = pool;
+}
+
 static void macb_init_tieoff(struct macb *bp)
 {
 	struct macb_dma_desc *desc = bp->rx_ring_tieoff;
@@ -2600,7 +2646,8 @@ static void gem_init_rings(struct macb *bp)
 		queue->rx_tail = 0;
 		queue->rx_prepared_head = 0;
 
-		gem_rx_refill(queue);
+		gem_create_page_pool(queue);
+		gem_rx_refill(queue, false);
 	}
 
 	macb_init_tieoff(bp);
@@ -5604,6 +5651,12 @@ static int macb_probe(struct platform_device *pdev)
 	if (err)
 		goto err_out_phy_exit;
 
+	if (macb_is_gem(bp)) {
+		bp->rx_offset = MACB_PP_HEADROOM;
+		if (!(bp->caps & MACB_CAPS_RSC))
+			bp->rx_offset += NET_IP_ALIGN;
+	}
+
 	netif_carrier_off(dev);
 
 	err = register_netdev(dev);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
  2025-11-19 13:53 ` [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support Paolo Valerio
@ 2025-11-19 13:53 ` Paolo Valerio
  2025-11-27 13:38   ` Théo Lebrun
  2025-11-19 13:53 ` [PATCH RFC net-next 3/6] cadence: macb/gem: use the current queue number for stats Paolo Valerio
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Add support for receiving network frames that span multiple DMA
descriptors in the Cadence MACB/GEM Ethernet driver.

The patch removes the requirement that limited frame reception to
a single descriptor (RX_SOF && RX_EOF), also avoiding potential
contiguous multi-page allocation for large frames. It also uses
page pool fragments allocation instead of full page for saving
memory.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
---
 drivers/net/ethernet/cadence/macb.h      |   4 +-
 drivers/net/ethernet/cadence/macb_main.c | 180 ++++++++++++++---------
 2 files changed, 111 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index dcf768bd1bc1..e2f397b7a27f 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -960,8 +960,7 @@ struct macb_dma_desc_ptp {
 #define PPM_FRACTION	16
 
 /* The buf includes headroom compatible with both skb and xdpf */
-#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
-#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
+#define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
 
 /* struct macb_tx_skb - data about an skb which is being transmitted
  * @skb: skb currently being transmitted, only set for the last buffer
@@ -1273,6 +1272,7 @@ struct macb_queue {
 	struct napi_struct	napi_rx;
 	struct queue_stats stats;
 	struct page_pool	*page_pool;
+	struct sk_buff		*skb;
 };
 
 struct ethtool_rx_fs_item {
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 985c81913ba6..be0c8e101639 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1250,21 +1250,25 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 	return packets;
 }
 
-static void *gem_page_pool_get_buff(struct page_pool *pool,
+static void *gem_page_pool_get_buff(struct  macb_queue *queue,
 				    dma_addr_t *dma_addr, gfp_t gfp_mask)
 {
+	struct macb *bp = queue->bp;
 	struct page *page;
+	int offset;
 
-	if (!pool)
+	if (!queue->page_pool)
 		return NULL;
 
-	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
+	page = page_pool_alloc_frag(queue->page_pool, &offset,
+				    bp->rx_buffer_size,
+				    gfp_mask | __GFP_NOWARN);
 	if (!page)
 		return NULL;
 
-	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
+	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM + offset;
 
-	return page_address(page);
+	return page_address(page) + offset;
 }
 
 static void gem_rx_refill(struct macb_queue *queue, bool napi)
@@ -1286,7 +1290,7 @@ static void gem_rx_refill(struct macb_queue *queue, bool napi)
 
 		if (!queue->rx_buff[entry]) {
 			/* allocate sk_buff for this free entry in ring */
-			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
+			data = gem_page_pool_get_buff(queue, &paddr,
 						      napi ? GFP_ATOMIC : GFP_KERNEL);
 			if (unlikely(!data)) {
 				netdev_err(bp->dev,
@@ -1344,20 +1348,17 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		  int budget)
 {
 	struct macb *bp = queue->bp;
-	int			buffer_size;
 	unsigned int		len;
 	unsigned int		entry;
 	void			*data;
-	struct sk_buff		*skb;
 	struct macb_dma_desc	*desc;
+	int			data_len;
 	int			count = 0;
 
-	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
-
 	while (count < budget) {
 		u32 ctrl;
 		dma_addr_t addr;
-		bool rxused;
+		bool rxused, first_frame;
 
 		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
 		desc = macb_rx_desc(queue, entry);
@@ -1368,6 +1369,9 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
 		addr = macb_get_addr(bp, desc);
 
+		dma_sync_single_for_cpu(&bp->pdev->dev,
+					addr, bp->rx_buffer_size - bp->rx_offset,
+					page_pool_get_dma_dir(queue->page_pool));
 		if (!rxused)
 			break;
 
@@ -1379,13 +1383,6 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		queue->rx_tail++;
 		count++;
 
-		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
-			netdev_err(bp->dev,
-				   "not whole frame pointed by descriptor\n");
-			bp->dev->stats.rx_dropped++;
-			queue->stats.rx_dropped++;
-			break;
-		}
 		data = queue->rx_buff[entry];
 		if (unlikely(!data)) {
 			netdev_err(bp->dev,
@@ -1395,64 +1392,102 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 			break;
 		}
 
-		skb = napi_build_skb(data, buffer_size);
-		if (unlikely(!skb)) {
-			netdev_err(bp->dev,
-				   "Unable to allocate sk_buff\n");
-			page_pool_put_full_page(queue->page_pool,
-						virt_to_head_page(data),
-						false);
-			break;
+		first_frame = ctrl & MACB_BIT(RX_SOF);
+		len = ctrl & bp->rx_frm_len_mask;
+
+		if (len) {
+			data_len = len;
+			if (!first_frame)
+				data_len -= queue->skb->len;
+		} else {
+			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
 		}
 
-		/* Properly align Ethernet header.
-		 *
-		 * Hardware can add dummy bytes if asked using the RBOF
-		 * field inside the NCFGR register. That feature isn't
-		 * available if hardware is RSC capable.
-		 *
-		 * We cannot fallback to doing the 2-byte shift before
-		 * DMA mapping because the address field does not allow
-		 * setting the low 2/3 bits.
-		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
-		 */
-		skb_reserve(skb, bp->rx_offset);
-		skb_mark_for_recycle(skb);
+		if (first_frame) {
+			queue->skb = napi_build_skb(data, bp->rx_buffer_size);
+			if (unlikely(!queue->skb)) {
+				netdev_err(bp->dev,
+					   "Unable to allocate sk_buff\n");
+				goto free_frags;
+			}
+
+			/* Properly align Ethernet header.
+			 *
+			 * Hardware can add dummy bytes if asked using the RBOF
+			 * field inside the NCFGR register. That feature isn't
+			 * available if hardware is RSC capable.
+			 *
+			 * We cannot fallback to doing the 2-byte shift before
+			 * DMA mapping because the address field does not allow
+			 * setting the low 2/3 bits.
+			 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
+			 */
+			skb_reserve(queue->skb, bp->rx_offset);
+			skb_mark_for_recycle(queue->skb);
+			skb_put(queue->skb, data_len);
+			queue->skb->protocol = eth_type_trans(queue->skb, bp->dev);
+
+			skb_checksum_none_assert(queue->skb);
+			if (bp->dev->features & NETIF_F_RXCSUM &&
+			    !(bp->dev->flags & IFF_PROMISC) &&
+			    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
+				queue->skb->ip_summed = CHECKSUM_UNNECESSARY;
+		} else {
+			if (!queue->skb) {
+				netdev_err(bp->dev,
+					   "Received non-starting frame while expecting it\n");
+				goto free_frags;
+			}
+
+			struct skb_shared_info *shinfo = skb_shinfo(queue->skb);
+			struct page *page = virt_to_head_page(data);
+			int nr_frags = shinfo->nr_frags;
+
+			if (nr_frags >= ARRAY_SIZE(shinfo->frags))
+				goto free_frags;
+
+			skb_add_rx_frag(queue->skb, nr_frags, page,
+					data - page_address(page) + bp->rx_offset,
+					data_len, bp->rx_buffer_size);
+		}
 
 		/* now everything is ready for receiving packet */
 		queue->rx_buff[entry] = NULL;
-		len = ctrl & bp->rx_frm_len_mask;
-
-		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
 
-		dma_sync_single_for_cpu(&bp->pdev->dev,
-					addr, len,
-					page_pool_get_dma_dir(queue->page_pool));
-		skb_put(skb, len);
-		skb->protocol = eth_type_trans(skb, bp->dev);
-		skb_checksum_none_assert(skb);
-		if (bp->dev->features & NETIF_F_RXCSUM &&
-		    !(bp->dev->flags & IFF_PROMISC) &&
-		    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
-			skb->ip_summed = CHECKSUM_UNNECESSARY;
+		netdev_vdbg(bp->dev, "%s %u (len %u)\n", __func__, entry, data_len);
 
-		bp->dev->stats.rx_packets++;
-		queue->stats.rx_packets++;
-		bp->dev->stats.rx_bytes += skb->len;
-		queue->stats.rx_bytes += skb->len;
+		if (ctrl & MACB_BIT(RX_EOF)) {
+			bp->dev->stats.rx_packets++;
+			queue->stats.rx_packets++;
+			bp->dev->stats.rx_bytes += queue->skb->len;
+			queue->stats.rx_bytes += queue->skb->len;
 
-		gem_ptp_do_rxstamp(bp, skb, desc);
+			gem_ptp_do_rxstamp(bp, queue->skb, desc);
 
 #if defined(DEBUG) && defined(VERBOSE_DEBUG)
-		netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
-			    skb->len, skb->csum);
-		print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
-			       skb_mac_header(skb), 16, true);
-		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
-			       skb->data, 32, true);
+			netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
+				    queue->skb->len, queue->skb->csum);
+			print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
+				       skb_mac_header(queue->skb), 16, true);
+			print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
+				       queue->skb->data, 32, true);
 #endif
 
-		napi_gro_receive(napi, skb);
+			napi_gro_receive(napi, queue->skb);
+			queue->skb = NULL;
+		}
+
+		continue;
+
+free_frags:
+		if (queue->skb) {
+			dev_kfree_skb(queue->skb);
+			queue->skb = NULL;
+		} else {
+			page_pool_put_full_page(queue->page_pool,
+						virt_to_head_page(data),
+						false);
+		}
 	}
 
 	gem_rx_refill(queue, true);
@@ -2394,7 +2429,10 @@ static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
 	if (!macb_is_gem(bp)) {
 		bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
 	} else {
-		bp->rx_buffer_size = size;
+		bp->rx_buffer_size = size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
+			+ MACB_PP_HEADROOM;
+		if (bp->rx_buffer_size > PAGE_SIZE)
+			bp->rx_buffer_size = PAGE_SIZE;
 
 		if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
 			netdev_dbg(bp->dev,
@@ -2589,18 +2627,15 @@ static int macb_alloc_consistent(struct macb *bp)
 
 static void gem_create_page_pool(struct macb_queue *queue)
 {
-	unsigned int num_pages = DIV_ROUND_UP(queue->bp->rx_buffer_size, PAGE_SIZE);
-	struct macb *bp = queue->bp;
 	struct page_pool_params pp_params = {
-		.order = order_base_2(num_pages),
+		.order = 0,
 		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.pool_size = queue->bp->rx_ring_size,
 		.nid = NUMA_NO_NODE,
 		.dma_dir = DMA_FROM_DEVICE,
 		.dev = &queue->bp->pdev->dev,
 		.napi = &queue->napi_rx,
-		.offset = bp->rx_offset,
-		.max_len = MACB_PP_MAX_BUF_SIZE(num_pages),
+		.max_len = PAGE_SIZE,
 	};
 	struct page_pool *pool;
 
@@ -2784,8 +2819,9 @@ static void macb_configure_dma(struct macb *bp)
 	unsigned int q;
 	u32 dmacfg;
 
-	buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
 	if (macb_is_gem(bp)) {
+		buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
+		buffer_size /= RX_BUFFER_MULTIPLE;
 		dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L);
 		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
 			if (q)
@@ -2816,6 +2852,8 @@ static void macb_configure_dma(struct macb *bp)
 		netdev_dbg(bp->dev, "Cadence configure DMA with 0x%08x\n",
 			   dmacfg);
 		gem_writel(bp, DMACFG, dmacfg);
+	} else {
+		buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
 	}
 }
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC net-next 3/6] cadence: macb/gem: use the current queue number for stats
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
  2025-11-19 13:53 ` [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support Paolo Valerio
  2025-11-19 13:53 ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
@ 2025-11-19 13:53 ` Paolo Valerio
  2025-11-19 13:53 ` [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem Paolo Valerio
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

gem_get_ethtool_stats calculates the size of the statistics
data to copy always considering maximum number of queues.

The patch makes sure the statistics are copied only for the
active queues as returned in the string set count op.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
---
This is not related to XDP, but an issue related to this was
spotted while introducing ethtool stats support resulting in page
pool stats pollution.
Page pool stats support patch was later dropped from the series
once realized its deprecation.
---
 drivers/net/ethernet/cadence/macb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index be0c8e101639..5829c1f773dd 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -3212,7 +3212,7 @@ static void gem_get_ethtool_stats(struct net_device *dev,
 	spin_lock_irq(&bp->stats_lock);
 	gem_update_stats(bp);
 	memcpy(data, &bp->ethtool_stats, sizeof(u64)
-			* (GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES));
+			* (GEM_STATS_LEN + QUEUE_STATS_LEN * bp->num_queues));
 	spin_unlock_irq(&bp->stats_lock);
 }
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
                   ` (2 preceding siblings ...)
  2025-11-19 13:53 ` [PATCH RFC net-next 3/6] cadence: macb/gem: use the current queue number for stats Paolo Valerio
@ 2025-11-19 13:53 ` Paolo Valerio
  2025-11-27 14:41   ` Théo Lebrun
  2025-11-19 13:53 ` [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic Paolo Valerio
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Introduce basic XDP support for macb/gem with the XDP_PASS,
XDP_DROP, XDP_REDIRECT verdict support.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
---
 drivers/net/ethernet/cadence/macb.h      |   4 +
 drivers/net/ethernet/cadence/macb_main.c | 161 +++++++++++++++++++++--
 2 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index e2f397b7a27f..2f665260a84d 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -16,6 +16,7 @@
 #include <linux/workqueue.h>
 #include <net/page_pool/helpers.h>
 #include <net/xdp.h>
+#include <linux/bpf_trace.h>
 
 #define MACB_GREGS_NBR 16
 #define MACB_GREGS_VERSION 2
@@ -961,6 +962,7 @@ struct macb_dma_desc_ptp {
 
 /* The buf includes headroom compatible with both skb and xdpf */
 #define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
+#define MACB_MAX_PAD		(MACB_PP_HEADROOM + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 /* struct macb_tx_skb - data about an skb which is being transmitted
  * @skb: skb currently being transmitted, only set for the last buffer
@@ -1273,6 +1275,7 @@ struct macb_queue {
 	struct queue_stats stats;
 	struct page_pool	*page_pool;
 	struct sk_buff		*skb;
+	struct xdp_rxq_info	xdp_q;
 };
 
 struct ethtool_rx_fs_item {
@@ -1372,6 +1375,7 @@ struct macb {
 
 	struct macb_pm_data pm_data;
 	const struct macb_usrio_config *usrio;
+	struct bpf_prog	__rcu *prog;
 };
 
 #ifdef CONFIG_MACB_USE_HWSTAMP
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 5829c1f773dd..53ea1958b8e4 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1344,10 +1344,51 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
 	 */
 }
 
+static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
+		       struct net_device *dev)
+{
+	struct bpf_prog *prog;
+	u32 act = XDP_PASS;
+
+	rcu_read_lock();
+
+	prog = rcu_dereference(queue->bp->prog);
+	if (!prog)
+		goto out;
+
+	act = bpf_prog_run_xdp(prog, xdp);
+	switch (act) {
+	case XDP_PASS:
+		goto out;
+	case XDP_REDIRECT:
+		if (unlikely(xdp_do_redirect(dev, xdp, prog))) {
+			act = XDP_DROP;
+			break;
+		}
+		goto out;
+	default:
+		bpf_warn_invalid_xdp_action(dev, prog, act);
+		fallthrough;
+	case XDP_ABORTED:
+		trace_xdp_exception(dev, prog, act);
+		fallthrough;
+	case XDP_DROP:
+		break;
+	}
+
+	page_pool_put_full_page(queue->page_pool,
+				virt_to_head_page(xdp->data), true);
+out:
+	rcu_read_unlock();
+
+	return act;
+}
+
 static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		  int budget)
 {
 	struct macb *bp = queue->bp;
+	bool			xdp_flush = false;
 	unsigned int		len;
 	unsigned int		entry;
 	void			*data;
@@ -1356,9 +1397,11 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 	int			count = 0;
 
 	while (count < budget) {
-		u32 ctrl;
-		dma_addr_t addr;
 		bool rxused, first_frame;
+		struct xdp_buff xdp;
+		dma_addr_t addr;
+		u32 ctrl;
+		u32 ret;
 
 		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
 		desc = macb_rx_desc(queue, entry);
@@ -1403,6 +1446,22 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
 		}
 
+		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF)))
+			goto skip_xdp;
+
+		xdp_init_buff(&xdp, bp->rx_buffer_size, &queue->xdp_q);
+		xdp_prepare_buff(&xdp, data, bp->rx_offset, len,
+				 false);
+		xdp_buff_clear_frags_flag(&xdp);
+
+		ret = gem_xdp_run(queue, &xdp, bp->dev);
+		if (ret == XDP_REDIRECT)
+			xdp_flush = true;
+
+		if (ret != XDP_PASS)
+			goto next_frame;
+
+skip_xdp:
 		if (first_frame) {
 			queue->skb = napi_build_skb(data, bp->rx_buffer_size);
 			if (unlikely(!queue->skb)) {
@@ -1452,10 +1511,6 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		}
 
 		/* now everything is ready for receiving packet */
-		queue->rx_buff[entry] = NULL;
-
-		netdev_vdbg(bp->dev, "%s %u (len %u)\n", __func__, entry, data_len);
-
 		if (ctrl & MACB_BIT(RX_EOF)) {
 			bp->dev->stats.rx_packets++;
 			queue->stats.rx_packets++;
@@ -1477,6 +1532,8 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 			queue->skb = NULL;
 		}
 
+next_frame:
+		queue->rx_buff[entry] = NULL;
 		continue;
 
 free_frags:
@@ -1490,6 +1547,9 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 		}
 	}
 
+	if (xdp_flush)
+		xdp_do_flush();
+
 	gem_rx_refill(queue, true);
 
 	return count;
@@ -2471,6 +2531,8 @@ static void gem_free_rx_buffers(struct macb *bp)
 
 		kfree(queue->rx_buff);
 		queue->rx_buff = NULL;
+		if (xdp_rxq_info_is_reg(&queue->xdp_q))
+			xdp_rxq_info_unreg(&queue->xdp_q);
 		page_pool_destroy(queue->page_pool);
 		queue->page_pool = NULL;
 	}
@@ -2625,19 +2687,22 @@ static int macb_alloc_consistent(struct macb *bp)
 	return -ENOMEM;
 }
 
-static void gem_create_page_pool(struct macb_queue *queue)
+static void gem_create_page_pool(struct macb_queue *queue, int qid)
 {
 	struct page_pool_params pp_params = {
 		.order = 0,
 		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.pool_size = queue->bp->rx_ring_size,
 		.nid = NUMA_NO_NODE,
-		.dma_dir = DMA_FROM_DEVICE,
+		.dma_dir = rcu_access_pointer(queue->bp->prog)
+				? DMA_BIDIRECTIONAL
+				: DMA_FROM_DEVICE,
 		.dev = &queue->bp->pdev->dev,
 		.napi = &queue->napi_rx,
 		.max_len = PAGE_SIZE,
 	};
 	struct page_pool *pool;
+	int err;
 
 	pool = page_pool_create(&pp_params);
 	if (IS_ERR(pool)) {
@@ -2646,6 +2711,28 @@ static void gem_create_page_pool(struct macb_queue *queue)
 	}
 
 	queue->page_pool = pool;
+
+	err = xdp_rxq_info_reg(&queue->xdp_q, queue->bp->dev, qid,
+			       queue->napi_rx.napi_id);
+	if (err < 0) {
+		netdev_err(queue->bp->dev, "xdp: failed to register rxq info\n");
+		goto destroy_pool;
+	}
+
+	err = xdp_rxq_info_reg_mem_model(&queue->xdp_q, MEM_TYPE_PAGE_POOL,
+					 queue->page_pool);
+	if (err) {
+		netdev_err(queue->bp->dev, "xdp: failed to register rxq memory model\n");
+		goto unreg_info;
+	}
+
+	return;
+
+unreg_info:
+	xdp_rxq_info_unreg(&queue->xdp_q);
+destroy_pool:
+	page_pool_destroy(pool);
+	queue->page_pool = NULL;
 }
 
 static void macb_init_tieoff(struct macb *bp)
@@ -2681,7 +2768,7 @@ static void gem_init_rings(struct macb *bp)
 		queue->rx_tail = 0;
 		queue->rx_prepared_head = 0;
 
-		gem_create_page_pool(queue);
+		gem_create_page_pool(queue, q);
 		gem_rx_refill(queue, false);
 	}
 
@@ -3117,9 +3204,18 @@ static int macb_close(struct net_device *dev)
 
 static int macb_change_mtu(struct net_device *dev, int new_mtu)
 {
+	int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + MACB_MAX_PAD;
+	struct macb *bp = netdev_priv(dev);
+	struct bpf_prog *prog = bp->prog;
+
 	if (netif_running(dev))
 		return -EBUSY;
 
+	if (prog && frame_size > PAGE_SIZE) {
+		netdev_err(dev, "MTU %d too large for XDP", new_mtu);
+		return -EINVAL;
+	}
+
 	WRITE_ONCE(dev->mtu, new_mtu);
 
 	return 0;
@@ -3137,6 +3233,49 @@ static int macb_set_mac_addr(struct net_device *dev, void *addr)
 	return 0;
 }
 
+static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
+			 struct netlink_ext_ack *extack)
+{
+	int frame = ETH_HLEN + ETH_FCS_LEN + MACB_MAX_PAD;
+	struct macb *bp = netdev_priv(dev);
+	struct bpf_prog *old_prog;
+	bool need_update, running;
+
+	if (prog && dev->mtu + frame > bp->rx_buffer_size) {
+		NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
+		return -EOPNOTSUPP;
+	}
+
+	running = netif_running(dev);
+	need_update = !!bp->prog != !!prog;
+	if (running && need_update)
+		macb_close(dev);
+
+	old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (running && need_update)
+		return macb_open(dev);
+
+	return 0;
+}
+
+static int macb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	struct macb *bp = netdev_priv(dev);
+
+	if (!macb_is_gem(bp))
+		return 0;
+
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return gem_xdp_setup(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static void gem_update_stats(struct macb *bp)
 {
 	struct macb_queue *queue;
@@ -4390,6 +4529,7 @@ static const struct net_device_ops macb_netdev_ops = {
 	.ndo_hwtstamp_set	= macb_hwtstamp_set,
 	.ndo_hwtstamp_get	= macb_hwtstamp_get,
 	.ndo_setup_tc		= macb_setup_tc,
+	.ndo_bpf		= macb_xdp,
 };
 
 /* Configure peripheral capabilities according to device tree
@@ -5693,6 +5833,9 @@ static int macb_probe(struct platform_device *pdev)
 		bp->rx_offset = MACB_PP_HEADROOM;
 		if (!(bp->caps & MACB_CAPS_RSC))
 			bp->rx_offset += NET_IP_ALIGN;
+
+		dev->xdp_features = NETDEV_XDP_ACT_BASIC |
+				    NETDEV_XDP_ACT_REDIRECT;
 	}
 
 	netif_carrier_off(dev);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
                   ` (3 preceding siblings ...)
  2025-11-19 13:53 ` [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem Paolo Valerio
@ 2025-11-19 13:53 ` Paolo Valerio
  2025-11-27 14:49   ` Théo Lebrun
  2025-11-19 13:53 ` [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support Paolo Valerio
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

The macb_tx_skb structure is renamed to macb_tx_buff. Also, the skb
member is now called data as it can be an sk_buff or an xdp_frame
depending on the tx path, along with an enum to identify the buffer
type.

This is a preparatory step for adding xdp xmit support.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
---
 drivers/net/ethernet/cadence/macb.h      |  28 ++++--
 drivers/net/ethernet/cadence/macb_main.c | 120 +++++++++++++----------
 2 files changed, 86 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 2f665260a84d..67bb98d3cb00 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -964,19 +964,27 @@ struct macb_dma_desc_ptp {
 #define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
 #define MACB_MAX_PAD		(MACB_PP_HEADROOM + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
-/* struct macb_tx_skb - data about an skb which is being transmitted
- * @skb: skb currently being transmitted, only set for the last buffer
- *       of the frame
+enum macb_tx_buff_type {
+	MACB_TYPE_SKB,
+	MACB_TYPE_XDP_TX,
+	MACB_TYPE_XDP_NDO,
+};
+
+/* struct macb_tx_buff - data about an skb or xdp frame which is being transmitted
+ * @data: pointer to skb or xdp frame being transmitted, only set
+ *        for the last buffer for sk_buff
  * @mapping: DMA address of the skb's fragment buffer
  * @size: size of the DMA mapped buffer
  * @mapped_as_page: true when buffer was mapped with skb_frag_dma_map(),
  *                  false when buffer was mapped with dma_map_single()
+ * @type: type of buffer (MACB_TYPE_SKB, MACB_TYPE_XDP_TX, MACB_TYPE_XDP_NDO)
  */
-struct macb_tx_skb {
-	struct sk_buff		*skb;
-	dma_addr_t		mapping;
-	size_t			size;
-	bool			mapped_as_page;
+struct macb_tx_buff {
+	void				*data;
+	dma_addr_t			mapping;
+	size_t				size;
+	bool				mapped_as_page;
+	enum macb_tx_buff_type		type;
 };
 
 /* Hardware-collected statistics. Used when updating the network
@@ -1258,7 +1266,7 @@ struct macb_queue {
 	spinlock_t		tx_ptr_lock;
 	unsigned int		tx_head, tx_tail;
 	struct macb_dma_desc	*tx_ring;
-	struct macb_tx_skb	*tx_skb;
+	struct macb_tx_buff	*tx_buff;
 	dma_addr_t		tx_ring_dma;
 	struct work_struct	tx_error_task;
 	bool			txubr_pending;
@@ -1336,7 +1344,7 @@ struct macb {
 	phy_interface_t		phy_interface;
 
 	/* AT91RM9200 transmit queue (1 on wire + 1 queued) */
-	struct macb_tx_skb	rm9200_txq[2];
+	struct macb_tx_buff	rm9200_txq[2];
 	unsigned int		max_tx_length;
 
 	u64			ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES];
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 53ea1958b8e4..eeda1a3871a6 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -156,10 +156,10 @@ static struct macb_dma_desc *macb_tx_desc(struct macb_queue *queue,
 	return &queue->tx_ring[index];
 }
 
-static struct macb_tx_skb *macb_tx_skb(struct macb_queue *queue,
-				       unsigned int index)
+static struct macb_tx_buff *macb_tx_buff(struct macb_queue *queue,
+					 unsigned int index)
 {
-	return &queue->tx_skb[macb_tx_ring_wrap(queue->bp, index)];
+	return &queue->tx_buff[macb_tx_ring_wrap(queue->bp, index)];
 }
 
 static dma_addr_t macb_tx_dma(struct macb_queue *queue, unsigned int index)
@@ -969,21 +969,25 @@ static int macb_halt_tx(struct macb *bp)
 					bp, TSR);
 }
 
-static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int budget)
+static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
+			  int budget)
 {
-	if (tx_skb->mapping) {
-		if (tx_skb->mapped_as_page)
-			dma_unmap_page(&bp->pdev->dev, tx_skb->mapping,
-				       tx_skb->size, DMA_TO_DEVICE);
+	if (tx_buff->mapping) {
+		if (tx_buff->mapped_as_page)
+			dma_unmap_page(&bp->pdev->dev, tx_buff->mapping,
+				       tx_buff->size, DMA_TO_DEVICE);
 		else
-			dma_unmap_single(&bp->pdev->dev, tx_skb->mapping,
-					 tx_skb->size, DMA_TO_DEVICE);
-		tx_skb->mapping = 0;
+			dma_unmap_single(&bp->pdev->dev, tx_buff->mapping,
+					 tx_buff->size, DMA_TO_DEVICE);
+		tx_buff->mapping = 0;
 	}
 
-	if (tx_skb->skb) {
-		napi_consume_skb(tx_skb->skb, budget);
-		tx_skb->skb = NULL;
+	if (tx_buff->data) {
+		if (tx_buff->type != MACB_TYPE_SKB)
+			netdev_err(bp->dev, "BUG: Unexpected tx buffer type while unmapping (%d)",
+				   tx_buff->type);
+		napi_consume_skb(tx_buff->data, budget);
+		tx_buff->data = NULL;
 	}
 }
 
@@ -1029,7 +1033,7 @@ static void macb_tx_error_task(struct work_struct *work)
 	u32			queue_index;
 	u32			packets = 0;
 	u32			bytes = 0;
-	struct macb_tx_skb	*tx_skb;
+	struct macb_tx_buff	*tx_buff;
 	struct macb_dma_desc	*desc;
 	struct sk_buff		*skb;
 	unsigned int		tail;
@@ -1069,16 +1073,23 @@ static void macb_tx_error_task(struct work_struct *work)
 
 		desc = macb_tx_desc(queue, tail);
 		ctrl = desc->ctrl;
-		tx_skb = macb_tx_skb(queue, tail);
-		skb = tx_skb->skb;
+		tx_buff = macb_tx_buff(queue, tail);
+
+		if (tx_buff->type != MACB_TYPE_SKB)
+			netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
+				   tx_buff->type);
+		skb = tx_buff->data;
 
 		if (ctrl & MACB_BIT(TX_USED)) {
 			/* skb is set for the last buffer of the frame */
 			while (!skb) {
-				macb_tx_unmap(bp, tx_skb, 0);
+				macb_tx_unmap(bp, tx_buff, 0);
 				tail++;
-				tx_skb = macb_tx_skb(queue, tail);
-				skb = tx_skb->skb;
+				tx_buff = macb_tx_buff(queue, tail);
+				if (tx_buff->type != MACB_TYPE_SKB)
+					netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
+						   tx_buff->type);
+				skb = tx_buff->data;
 			}
 
 			/* ctrl still refers to the first buffer descriptor
@@ -1107,7 +1118,7 @@ static void macb_tx_error_task(struct work_struct *work)
 			desc->ctrl = ctrl | MACB_BIT(TX_USED);
 		}
 
-		macb_tx_unmap(bp, tx_skb, 0);
+		macb_tx_unmap(bp, tx_buff, 0);
 	}
 
 	netdev_tx_completed_queue(netdev_get_tx_queue(bp->dev, queue_index),
@@ -1185,7 +1196,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
 	head = queue->tx_head;
 	for (tail = queue->tx_tail; tail != head && packets < budget; tail++) {
-		struct macb_tx_skb	*tx_skb;
+		struct macb_tx_buff	*tx_buff;
 		struct sk_buff		*skb;
 		struct macb_dma_desc	*desc;
 		u32			ctrl;
@@ -1205,8 +1216,10 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 
 		/* Process all buffers of the current transmitted frame */
 		for (;; tail++) {
-			tx_skb = macb_tx_skb(queue, tail);
-			skb = tx_skb->skb;
+			tx_buff = macb_tx_buff(queue, tail);
+
+			if (tx_buff->type == MACB_TYPE_SKB)
+				skb = tx_buff->data;
 
 			/* First, update TX stats if needed */
 			if (skb) {
@@ -1226,7 +1239,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 			}
 
 			/* Now we can safely release resources */
-			macb_tx_unmap(bp, tx_skb, budget);
+			macb_tx_unmap(bp, tx_buff, budget);
 
 			/* skb is set only for the last buffer of the frame.
 			 * WARNING: at this point skb has been freed by
@@ -2117,8 +2130,8 @@ static unsigned int macb_tx_map(struct macb *bp,
 	unsigned int f, nr_frags = skb_shinfo(skb)->nr_frags;
 	unsigned int len, i, tx_head = queue->tx_head;
 	u32 ctrl, lso_ctrl = 0, seq_ctrl = 0;
+	struct macb_tx_buff *tx_buff = NULL;
 	unsigned int eof = 1, mss_mfs = 0;
-	struct macb_tx_skb *tx_skb = NULL;
 	struct macb_dma_desc *desc;
 	unsigned int offset, size;
 	dma_addr_t mapping;
@@ -2141,7 +2154,7 @@ static unsigned int macb_tx_map(struct macb *bp,
 
 	offset = 0;
 	while (len) {
-		tx_skb = macb_tx_skb(queue, tx_head);
+		tx_buff = macb_tx_buff(queue, tx_head);
 
 		mapping = dma_map_single(&bp->pdev->dev,
 					 skb->data + offset,
@@ -2150,10 +2163,11 @@ static unsigned int macb_tx_map(struct macb *bp,
 			goto dma_error;
 
 		/* Save info to properly release resources */
-		tx_skb->skb = NULL;
-		tx_skb->mapping = mapping;
-		tx_skb->size = size;
-		tx_skb->mapped_as_page = false;
+		tx_buff->data = NULL;
+		tx_buff->type = MACB_TYPE_SKB;
+		tx_buff->mapping = mapping;
+		tx_buff->size = size;
+		tx_buff->mapped_as_page = false;
 
 		len -= size;
 		offset += size;
@@ -2170,7 +2184,7 @@ static unsigned int macb_tx_map(struct macb *bp,
 		offset = 0;
 		while (len) {
 			size = umin(len, bp->max_tx_length);
-			tx_skb = macb_tx_skb(queue, tx_head);
+			tx_buff = macb_tx_buff(queue, tx_head);
 
 			mapping = skb_frag_dma_map(&bp->pdev->dev, frag,
 						   offset, size, DMA_TO_DEVICE);
@@ -2178,10 +2192,11 @@ static unsigned int macb_tx_map(struct macb *bp,
 				goto dma_error;
 
 			/* Save info to properly release resources */
-			tx_skb->skb = NULL;
-			tx_skb->mapping = mapping;
-			tx_skb->size = size;
-			tx_skb->mapped_as_page = true;
+			tx_buff->data = NULL;
+			tx_buff->type = MACB_TYPE_SKB;
+			tx_buff->mapping = mapping;
+			tx_buff->size = size;
+			tx_buff->mapped_as_page = true;
 
 			len -= size;
 			offset += size;
@@ -2190,13 +2205,14 @@ static unsigned int macb_tx_map(struct macb *bp,
 	}
 
 	/* Should never happen */
-	if (unlikely(!tx_skb)) {
+	if (unlikely(!tx_buff)) {
 		netdev_err(bp->dev, "BUG! empty skb!\n");
 		return 0;
 	}
 
 	/* This is the last buffer of the frame: save socket buffer */
-	tx_skb->skb = skb;
+	tx_buff->data = skb;
+	tx_buff->type = MACB_TYPE_SKB;
 
 	/* Update TX ring: update buffer descriptors in reverse order
 	 * to avoid race condition
@@ -2227,10 +2243,10 @@ static unsigned int macb_tx_map(struct macb *bp,
 
 	do {
 		i--;
-		tx_skb = macb_tx_skb(queue, i);
+		tx_buff = macb_tx_buff(queue, i);
 		desc = macb_tx_desc(queue, i);
 
-		ctrl = (u32)tx_skb->size;
+		ctrl = (u32)tx_buff->size;
 		if (eof) {
 			ctrl |= MACB_BIT(TX_LAST);
 			eof = 0;
@@ -2253,7 +2269,7 @@ static unsigned int macb_tx_map(struct macb *bp,
 			ctrl |= MACB_BF(MSS_MFS, mss_mfs);
 
 		/* Set TX buffer descriptor */
-		macb_set_addr(bp, desc, tx_skb->mapping);
+		macb_set_addr(bp, desc, tx_buff->mapping);
 		/* desc->addr must be visible to hardware before clearing
 		 * 'TX_USED' bit in desc->ctrl.
 		 */
@@ -2269,9 +2285,9 @@ static unsigned int macb_tx_map(struct macb *bp,
 	netdev_err(bp->dev, "TX DMA map failed\n");
 
 	for (i = queue->tx_head; i != tx_head; i++) {
-		tx_skb = macb_tx_skb(queue, i);
+		tx_buff = macb_tx_buff(queue, i);
 
-		macb_tx_unmap(bp, tx_skb, 0);
+		macb_tx_unmap(bp, tx_buff, 0);
 	}
 
 	return -ENOMEM;
@@ -2582,8 +2598,8 @@ static void macb_free_consistent(struct macb *bp)
 	dma_free_coherent(dev, size, bp->queues[0].rx_ring, bp->queues[0].rx_ring_dma);
 
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
-		kfree(queue->tx_skb);
-		queue->tx_skb = NULL;
+		kfree(queue->tx_buff);
+		queue->tx_buff = NULL;
 		queue->tx_ring = NULL;
 		queue->rx_ring = NULL;
 	}
@@ -2662,9 +2678,9 @@ static int macb_alloc_consistent(struct macb *bp)
 		queue->rx_ring = rx + macb_rx_ring_size_per_queue(bp) * q;
 		queue->rx_ring_dma = rx_dma + macb_rx_ring_size_per_queue(bp) * q;
 
-		size = bp->tx_ring_size * sizeof(struct macb_tx_skb);
-		queue->tx_skb = kmalloc(size, GFP_KERNEL);
-		if (!queue->tx_skb)
+		size = bp->tx_ring_size * sizeof(struct macb_tx_buff);
+		queue->tx_buff = kmalloc(size, GFP_KERNEL);
+		if (!queue->tx_buff)
 			goto out_err;
 	}
 	if (bp->macbgem_ops.mog_alloc_rx_buffers(bp))
@@ -5050,7 +5066,7 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb,
 		netif_stop_queue(dev);
 
 		/* Store packet information (to free when Tx completed) */
-		lp->rm9200_txq[desc].skb = skb;
+		lp->rm9200_txq[desc].data = skb;
 		lp->rm9200_txq[desc].size = skb->len;
 		lp->rm9200_txq[desc].mapping = dma_map_single(&lp->pdev->dev, skb->data,
 							      skb->len, DMA_TO_DEVICE);
@@ -5143,9 +5159,9 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id)
 			dev->stats.tx_errors++;
 
 		desc = 0;
-		if (lp->rm9200_txq[desc].skb) {
-			dev_consume_skb_irq(lp->rm9200_txq[desc].skb);
-			lp->rm9200_txq[desc].skb = NULL;
+		if (lp->rm9200_txq[desc].data) {
+			dev_consume_skb_irq(lp->rm9200_txq[desc].data);
+			lp->rm9200_txq[desc].data = NULL;
 			dma_unmap_single(&lp->pdev->dev, lp->rm9200_txq[desc].mapping,
 					 lp->rm9200_txq[desc].size, DMA_TO_DEVICE);
 			dev->stats.tx_packets++;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
                   ` (4 preceding siblings ...)
  2025-11-19 13:53 ` [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic Paolo Valerio
@ 2025-11-19 13:53 ` Paolo Valerio
  2025-11-27 15:07   ` Théo Lebrun
  2025-11-25 16:50 ` [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Théo Lebrun
  2025-11-26 18:08 ` Théo Lebrun
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-11-19 13:53 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Add XDP_TX verdict support, also introduce ndo_xdp_xmit function for
redirection, and update macb_tx_unmap() to handle both skbs and xdp
frames advertising NETDEV_XDP_ACT_NDO_XMIT capability and the ability
to process XDP_TX verdicts.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 166 +++++++++++++++++++++--
 1 file changed, 153 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index eeda1a3871a6..bd62d3febeb1 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -969,6 +969,17 @@ static int macb_halt_tx(struct macb *bp)
 					bp, TSR);
 }
 
+static void release_buff(void *buff, enum macb_tx_buff_type type, int budget)
+{
+	if (type == MACB_TYPE_SKB) {
+		napi_consume_skb(buff, budget);
+	} else if (type == MACB_TYPE_XDP_TX) {
+		xdp_return_frame_rx_napi(buff);
+	} else {
+		xdp_return_frame(buff);
+	}
+}
+
 static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
 			  int budget)
 {
@@ -983,10 +994,7 @@ static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
 	}
 
 	if (tx_buff->data) {
-		if (tx_buff->type != MACB_TYPE_SKB)
-			netdev_err(bp->dev, "BUG: Unexpected tx buffer type while unmapping (%d)",
-				   tx_buff->type);
-		napi_consume_skb(tx_buff->data, budget);
+		release_buff(tx_buff->data, tx_buff->type, budget);
 		tx_buff->data = NULL;
 	}
 }
@@ -1076,8 +1084,8 @@ static void macb_tx_error_task(struct work_struct *work)
 		tx_buff = macb_tx_buff(queue, tail);
 
 		if (tx_buff->type != MACB_TYPE_SKB)
-			netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
-				   tx_buff->type);
+			goto unmap;
+
 		skb = tx_buff->data;
 
 		if (ctrl & MACB_BIT(TX_USED)) {
@@ -1118,6 +1126,7 @@ static void macb_tx_error_task(struct work_struct *work)
 			desc->ctrl = ctrl | MACB_BIT(TX_USED);
 		}
 
+unmap:
 		macb_tx_unmap(bp, tx_buff, 0);
 	}
 
@@ -1196,6 +1205,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
 	head = queue->tx_head;
 	for (tail = queue->tx_tail; tail != head && packets < budget; tail++) {
+		void			*data = NULL;
 		struct macb_tx_buff	*tx_buff;
 		struct sk_buff		*skb;
 		struct macb_dma_desc	*desc;
@@ -1218,11 +1228,16 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 		for (;; tail++) {
 			tx_buff = macb_tx_buff(queue, tail);
 
-			if (tx_buff->type == MACB_TYPE_SKB)
-				skb = tx_buff->data;
+			if (tx_buff->type != MACB_TYPE_SKB) {
+				data = tx_buff->data;
+				goto unmap;
+			}
 
 			/* First, update TX stats if needed */
-			if (skb) {
+			if (tx_buff->type == MACB_TYPE_SKB && tx_buff->data) {
+				data = tx_buff->data;
+				skb = tx_buff->data;
+
 				if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
 				    !ptp_one_step_sync(skb))
 					gem_ptp_do_txstamp(bp, skb, desc);
@@ -1238,6 +1253,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 				bytes += skb->len;
 			}
 
+unmap:
 			/* Now we can safely release resources */
 			macb_tx_unmap(bp, tx_buff, budget);
 
@@ -1245,7 +1261,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 			 * WARNING: at this point skb has been freed by
 			 * macb_tx_unmap().
 			 */
-			if (skb)
+			if (data)
 				break;
 		}
 	}
@@ -1357,8 +1373,124 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
 	 */
 }
 
+static int macb_xdp_submit_frame(struct macb *bp, struct xdp_frame *xdpf,
+				 struct net_device *dev, dma_addr_t addr)
+{
+	enum macb_tx_buff_type buff_type;
+	struct macb_tx_buff *tx_buff;
+	int cpu = smp_processor_id();
+	struct macb_dma_desc *desc;
+	struct macb_queue *queue;
+	unsigned long flags;
+	dma_addr_t mapping;
+	u16 queue_index;
+	int err = 0;
+	u32 ctrl;
+
+	queue_index = cpu % bp->num_queues;
+	queue = &bp->queues[queue_index];
+	buff_type = !addr ? MACB_TYPE_XDP_NDO : MACB_TYPE_XDP_TX;
+
+	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
+
+	/* This is a hard error, log it. */
+	if (CIRC_SPACE(queue->tx_head, queue->tx_tail,
+		       bp->tx_ring_size) < 1) {
+		netif_stop_subqueue(dev, queue_index);
+		netdev_dbg(bp->dev, "tx_head = %u, tx_tail = %u\n",
+			   queue->tx_head, queue->tx_tail);
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	if (!addr) {
+		mapping = dma_map_single(&bp->pdev->dev,
+					 xdpf->data,
+					 xdpf->len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {
+			err = -ENOMEM;
+			goto unlock;
+		}
+	} else {
+		mapping = addr;
+		dma_sync_single_for_device(&bp->pdev->dev, mapping,
+					   xdpf->len, DMA_BIDIRECTIONAL);
+	}
+
+	unsigned int tx_head = queue->tx_head + 1;
+
+	ctrl = MACB_BIT(TX_USED);
+	desc = macb_tx_desc(queue, tx_head);
+	desc->ctrl = ctrl;
+
+	desc = macb_tx_desc(queue, queue->tx_head);
+	tx_buff = macb_tx_buff(queue, queue->tx_head);
+	tx_buff->data = xdpf;
+	tx_buff->type = buff_type;
+	tx_buff->mapping = mapping;
+	tx_buff->size = xdpf->len;
+	tx_buff->mapped_as_page = false;
+
+	ctrl = (u32)tx_buff->size;
+	ctrl |= MACB_BIT(TX_LAST);
+
+	if (unlikely(macb_tx_ring_wrap(bp, queue->tx_head) == (bp->tx_ring_size - 1)))
+		ctrl |= MACB_BIT(TX_WRAP);
+
+	/* Set TX buffer descriptor */
+	macb_set_addr(bp, desc, tx_buff->mapping);
+	/* desc->addr must be visible to hardware before clearing
+	 * 'TX_USED' bit in desc->ctrl.
+	 */
+	wmb();
+	desc->ctrl = ctrl;
+	queue->tx_head = tx_head;
+
+	/* Make newly initialized descriptor visible to hardware */
+	wmb();
+
+	spin_lock(&bp->lock);
+	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+	spin_unlock(&bp->lock);
+
+	if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
+		netif_stop_subqueue(dev, queue_index);
+
+unlock:
+	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
+
+	if (err)
+		release_buff(xdpf, buff_type, 0);
+
+	return err;
+}
+
+static int
+macb_xdp_xmit(struct net_device *dev, int num_frame,
+	      struct xdp_frame **frames, u32 flags)
+{
+	struct macb *bp = netdev_priv(dev);
+	u32 xmitted = 0;
+	int i;
+
+	if (!macb_is_gem(bp))
+		return -EOPNOTSUPP;
+
+	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+		return -EINVAL;
+
+	for (i = 0; i < num_frame; i++) {
+		if (macb_xdp_submit_frame(bp, frames[i], dev, 0))
+			break;
+
+		xmitted++;
+	}
+
+	return xmitted;
+}
+
 static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
-		       struct net_device *dev)
+		       struct net_device *dev, dma_addr_t addr)
 {
 	struct bpf_prog *prog;
 	u32 act = XDP_PASS;
@@ -1379,6 +1511,12 @@ static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
 			break;
 		}
 		goto out;
+	case XDP_TX:
+		struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
+
+		if (!xdpf || macb_xdp_submit_frame(queue->bp, xdpf, dev, addr))
+			act = XDP_DROP;
+		goto out;
 	default:
 		bpf_warn_invalid_xdp_action(dev, prog, act);
 		fallthrough;
@@ -1467,7 +1605,7 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
 				 false);
 		xdp_buff_clear_frags_flag(&xdp);
 
-		ret = gem_xdp_run(queue, &xdp, bp->dev);
+		ret = gem_xdp_run(queue, &xdp, bp->dev, addr);
 		if (ret == XDP_REDIRECT)
 			xdp_flush = true;
 
@@ -4546,6 +4684,7 @@ static const struct net_device_ops macb_netdev_ops = {
 	.ndo_hwtstamp_get	= macb_hwtstamp_get,
 	.ndo_setup_tc		= macb_setup_tc,
 	.ndo_bpf		= macb_xdp,
+	.ndo_xdp_xmit		= macb_xdp_xmit,
 };
 
 /* Configure peripheral capabilities according to device tree
@@ -5851,7 +5990,8 @@ static int macb_probe(struct platform_device *pdev)
 			bp->rx_offset += NET_IP_ALIGN;
 
 		dev->xdp_features = NETDEV_XDP_ACT_BASIC |
-				    NETDEV_XDP_ACT_REDIRECT;
+				    NETDEV_XDP_ACT_REDIRECT |
+				    NETDEV_XDP_ACT_NDO_XMIT;
 	}
 
 	netif_carrier_off(dev);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
                   ` (5 preceding siblings ...)
  2025-11-19 13:53 ` [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support Paolo Valerio
@ 2025-11-25 16:50 ` Théo Lebrun
  2025-11-25 23:11   ` Paolo Valerio
  2025-11-26 18:08 ` Théo Lebrun
  7 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-25 16:50 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Hello Paolo,

This is a small message to tell you your RFC series hasn't been ignored.
I had been working on the same topic for a while, I'm therefore happy
to see this series! Will provide you soon with feedback, but of course
there is a lot to cover.

Today I managed to:

Tested-by: Théo Lebrun <theo.lebrun@bootlin.com> # on EyeQ5 hardware

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
  2025-11-25 16:50 ` [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Théo Lebrun
@ 2025-11-25 23:11   ` Paolo Valerio
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Valerio @ 2025-11-25 23:11 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

On 25 Nov 2025 at 05:50:25 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> Hello Paolo,
>
> This is a small message to tell you your RFC series hasn't been ignored.
> I had been working on the same topic for a while, I'm therefore happy
> to see this series! Will provide you soon with feedback, but of course
> there is a lot to cover.
>

sure, thank you Théo!

> Today I managed to:
>
> Tested-by: Théo Lebrun <theo.lebrun@bootlin.com> # on EyeQ5 hardware
>
> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
  2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
                   ` (6 preceding siblings ...)
  2025-11-25 16:50 ` [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Théo Lebrun
@ 2025-11-26 18:08 ` Théo Lebrun
  2025-12-02 17:24   ` Paolo Valerio
  7 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-26 18:08 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Hello Paolo,

So this is an initial review, I'll start here with five series-wide
topics and send small per-line comments (ie nitpicks) in a second stage.

### Rx buffer size computation

The buffer size computation should be reworked. At the end of the series
it looks like:

static int macb_open(struct net_device *dev)
{
    size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;

    // ...

    macb_init_rx_buffer_size(bp, bufsz);

    // ...
}

static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
{
    if (!macb_is_gem(bp)) {
        bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
    } else {
        bp->rx_buffer_size = size
            + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
            + MACB_PP_HEADROOM;

        if (bp->rx_buffer_size > PAGE_SIZE)
            bp->rx_buffer_size = PAGE_SIZE;

        if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE)
            bp->rx_buffer_size = roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
    }
}

Most of the issues with this code is not stemming from your series, but
this big rework is the right moment to fix it all.

 - NET_IP_ALIGN is accounted for in the headroom even though it isn't
   present if !RSC.

 - When skbuff.c/h allocates an SKB buffer, it SKB_DATA_ALIGN()s
   headroom+data. We should probably do the same. In our case it would
   be:

   bp->rx_buffer_size = SKB_DATA_ALIGN(MACB_PP_HEADROOM + size) +
                        SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
   // or
   bp->rx_buffer_size = SKB_HEAD_ALIGN(MACB_PP_HEADROOM + size);

   I've not computed if it can ever give a different value to your
   series in terms of total size or shinfo alignment. I'd guess it is
   unlikely.

 - If the size clamping to PAGE_SIZE comes into play, we are probably
   doomed. It means we cannot deal with the MTU and we'll probably get
   corruption. If we do put a check in place, it should loudly fail
   rather than silently clamp.

TLDR: I think macb_init_rx_buffer_size() should encapsulate the whole
rx buffer size computation. It can use bp->rx_offset and add on top
MTU & co. It might start failing if >PAGE_SIZE (?).

### Buffer variable names

Related: so many variables, fields or constants have ambiguous names,
can we do something about it?

 - bp->rx_offset is named oddly to my ears. Offset to what?
   Maybe bp->rx_head or bp->rx_headroom?

 - bp->rx_buffer_size: it used to be approximately the payload size
   (plus NET_IP_ALIGN). Now it contains the XDP headroom and shinfo.
   That's on GEM, because on MACB it means something different.

   This line is a bit ironic and illustrates the trouble:
      buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset

 - data in gem_rx|gem_rx_refill|gem_free_rx_buffers() points to what we
   could call skb->head, ie start of buffer & start of XDP headroom.
   Its name implied skb->data to me, ie after the headroom.

   It also made `data - page_address(page) + bp->rx_offset` hard to
   understand. It is easier for me to understand that the following is
   the page fragment offset till skb->data:

      buff_head + bp->rx_headroom - page_address(page)

 - MACB_MAX_PAD is ambiguous. It rhymes with NET_SKB_PAD which is the
   default SKB headroom but here it contains part of the headroom
   (XDP_PACKET_HEADROOM but not NET_IP_ALIGN) and the tailroom (shinfo).

 - Field `data` in `struct macb_tx_buff` points to skb/xdp_frame but my
   initial thought was skb->data pointer (ie after headroom).
   What about va or ptr or buff or frame or ...?

TLDR: I had a hard time understanding size/offset expressions (both from
existing code and the series) because of variable names.

### Duplicated buffer size computations

Last point related to buffer size computation:

 - it is duplicated in macb_set_mtu() but forgets NET_IP_ALIGN & proper
   SKB_DATA_ALIGN() and,

 - it is duplicated in gem_xdp_setup() but I don't see why because the
   buffer size is computed to work fine with/without XDP enabled. Also
   this check means we cannot load an XDP program before macb_open()
   because bp->rx_buffer_size == 0.

TLDR: Let's deduplicate size computations to minimise chances of getting
it wrong.

### Allocation changes

I am not convinced by patches 1/6 and 2/6 that change the alloc strategy
in two steps, from netdev_alloc_skb() to page_pool_alloc_pages() to
page_pool_alloc_frag().

 - The transient solution isn't acceptable when PAGE_SIZE is large.
   We have 16K and would never want one buffer per page.

 - It forces you to introduce temporary code & constants which is added
   noise IMO. MACB_PP_MAX_BUF_SIZE() is odd as is the alignment of
   buffer sizes to page sizes. It forces you to deal with
   `bp->rx_buffer_size > PAGE_SIZE` which we could ignore. Right?

TLDR: do alloc changes in one step.

### XDP_SETUP_PROG if netif_running()

I'd like to start a discussion on the expected behavior on XDP program
change if netif_running(). Summarised:

static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
             struct netlink_ext_ack *extack)
{
    bool running = netif_running(dev);
    bool need_update = !!bp->prog != !!prog;

    if (running && need_update)
        macb_close(dev);
    old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
    if (running && need_update)
        return macb_open(dev);
}

Have you experimented with that? I don't see anything graceful in our
close operation, it looks like we'll get corruption or dropped packets
or both. We shouldn't impose that on the user who just wanted to swap
the program.

I cannot find any good reason that implies we wouldn't be able to swap
our XDP program on the fly. If we think it is unsafe, I'd vote for
starting with a -EBUSY return code and iterating on that.

TLDR: macb_close() on XDP program change is probably a bad idea.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support
  2025-11-19 13:53 ` [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support Paolo Valerio
@ 2025-11-27 13:21   ` Théo Lebrun
  2025-12-02 17:30     ` Paolo Valerio
  0 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-27 13:21 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

Hello Paolo,

On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
> Subject: [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support

`git log --oneline drivers/net/ethernet/cadence/` tells us all commits
in this series should be prefixed with "net: macb: ".

> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index 87414a2ddf6e..dcf768bd1bc1 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -14,6 +14,8 @@
>  #include <linux/interrupt.h>
>  #include <linux/phy/phy.h>
>  #include <linux/workqueue.h>
> +#include <net/page_pool/helpers.h>
> +#include <net/xdp.h>
>  
>  #define MACB_GREGS_NBR 16
>  #define MACB_GREGS_VERSION 2
> @@ -957,6 +959,10 @@ struct macb_dma_desc_ptp {
>  /* Scaled PPM fraction */
>  #define PPM_FRACTION	16
>  
> +/* The buf includes headroom compatible with both skb and xdpf */
> +#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
> +#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)

From my previous review, you know I think MACB_PP_MAX_BUF_SIZE() should
disappear.

I also don't see the point of MACB_PP_HEADROOM. Maybe if it was
max(XDP_PACKET_HEADROOM, NET_SKB_PAD) it would be more useful, but that
isn't useful anyway. Can we drop it and use XDP_PACKET_HEADROOM directly
in the code?

>  /* struct macb_tx_skb - data about an skb which is being transmitted
>   * @skb: skb currently being transmitted, only set for the last buffer
>   *       of the frame
> @@ -1262,10 +1268,11 @@ struct macb_queue {
>  	unsigned int		rx_tail;
>  	unsigned int		rx_prepared_head;
>  	struct macb_dma_desc	*rx_ring;
> -	struct sk_buff		**rx_skbuff;
> +	void			**rx_buff;

It would help review if the s/rx_skbuff/rx_buff/ renaming was done in a
separate commit with a commit message being "this only renames X and
implies no functional changes".

>  	void			*rx_buffers;
>  	struct napi_struct	napi_rx;
>  	struct queue_stats stats;
> +	struct page_pool	*page_pool;
>  };
>  
>  struct ethtool_rx_fs_item {
> @@ -1289,6 +1296,7 @@ struct macb {
>  	struct macb_dma_desc	*rx_ring_tieoff;
>  	dma_addr_t		rx_ring_tieoff_dma;
>  	size_t			rx_buffer_size;
> +	u16			rx_offset;

u16 makes me worried that we might do mistakes. For example the
following propagates the u16 type.

        bp->rx_offset + data - page_address(page)

We can spare the additional 6 bytes and turn it into a size_t. It'll
fall in holes anyway, at least it does for my target according to pahole.

> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index e461f5072884..985c81913ba6 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1250,11 +1250,28 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  	return packets;
>  }
>  
> -static void gem_rx_refill(struct macb_queue *queue)
> +static void *gem_page_pool_get_buff(struct page_pool *pool,
> +				    dma_addr_t *dma_addr, gfp_t gfp_mask)
> +{
> +	struct page *page;
> +
> +	if (!pool)
> +		return NULL;
> +
> +	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
> +	if (!page)
> +		return NULL;
> +
> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
> +
> +	return page_address(page);
> +}

Do we need a separate function called from only one location? Or we
could change its name to highlight it allocates and does not just "get"
a buffer.

> @@ -1267,25 +1284,17 @@ static void gem_rx_refill(struct macb_queue *queue)
>  
>  		desc = macb_rx_desc(queue, entry);
>  
> -		if (!queue->rx_skbuff[entry]) {
> +		if (!queue->rx_buff[entry]) {
>  			/* allocate sk_buff for this free entry in ring */
> -			skb = netdev_alloc_skb(bp->dev, bp->rx_buffer_size);
> -			if (unlikely(!skb)) {
> +			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
> +						      napi ? GFP_ATOMIC : GFP_KERNEL);

I don't get why the gfp flags computation is spread across
gem_page_pool_get_buff() and gem_rx_refill().

> @@ -1349,12 +1344,16 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  		  int budget)
>  {
>  	struct macb *bp = queue->bp;
> +	int			buffer_size;
>  	unsigned int		len;
>  	unsigned int		entry;
> +	void			*data;
>  	struct sk_buff		*skb;
>  	struct macb_dma_desc	*desc;
>  	int			count = 0;
>  
> +	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;

This looks like

        buffer_size = ALIGN(bp->rx_buffer_size, PAGE_SIZE);

no? Anyway I think it should be dropped. It does get dropped next patch
in this RFC.

> @@ -1387,24 +1386,49 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  			queue->stats.rx_dropped++;
>  			break;
>  		}
> -		skb = queue->rx_skbuff[entry];
> -		if (unlikely(!skb)) {
> +		data = queue->rx_buff[entry];
> +		if (unlikely(!data)) {
>  			netdev_err(bp->dev,
>  				   "inconsistent Rx descriptor chain\n");
>  			bp->dev->stats.rx_dropped++;
>  			queue->stats.rx_dropped++;
>  			break;
>  		}
> +
> +		skb = napi_build_skb(data, buffer_size);
> +		if (unlikely(!skb)) {
> +			netdev_err(bp->dev,
> +				   "Unable to allocate sk_buff\n");
> +			page_pool_put_full_page(queue->page_pool,
> +						virt_to_head_page(data),
> +						false);
> +			break;

We should `queue->rx_skbuff[entry] = NULL` here no?
We free a page and keep a pointer to it.

> +		}
> +
> +		/* Properly align Ethernet header.
> +		 *
> +		 * Hardware can add dummy bytes if asked using the RBOF
> +		 * field inside the NCFGR register. That feature isn't
> +		 * available if hardware is RSC capable.
> +		 *
> +		 * We cannot fallback to doing the 2-byte shift before
> +		 * DMA mapping because the address field does not allow
> +		 * setting the low 2/3 bits.
> +		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
> +		 */
> +		skb_reserve(skb, bp->rx_offset);
> +		skb_mark_for_recycle(skb);

I have a platform with RSC support and NET_IP_ALIGN=2. What is yours
like? It'd be nice if we can test different cases of this RBOF topic.

>  		/* now everything is ready for receiving packet */
> -		queue->rx_skbuff[entry] = NULL;
> +		queue->rx_buff[entry] = NULL;
>  		len = ctrl & bp->rx_frm_len_mask;
>  
>  		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>  
> +		dma_sync_single_for_cpu(&bp->pdev->dev,
> +					addr, len,
> +					page_pool_get_dma_dir(queue->page_pool));

Any reason for the call to dma_sync_single_for_cpu(), we could hardcode
it to DMA_FROM_DEVICE no?

> @@ -2477,13 +2497,13 @@ static int gem_alloc_rx_buffers(struct macb *bp)
>  
>  	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>  		size = bp->rx_ring_size * sizeof(struct sk_buff *);

sizeof is called with the wrong type. Not that it matters because
pointers are pointers, but we can take this opportunity to move to

        sizeof(*queue->rx_buff)

> -		queue->rx_skbuff = kzalloc(size, GFP_KERNEL);
> -		if (!queue->rx_skbuff)
> +		queue->rx_buff = kzalloc(size, GFP_KERNEL);
> +		if (!queue->rx_buff)
>  			return -ENOMEM;
>  		else
>  			netdev_dbg(bp->dev,
> -				   "Allocated %d RX struct sk_buff entries at %p\n",
> -				   bp->rx_ring_size, queue->rx_skbuff);
> +				   "Allocated %d RX buff entries at %p\n",
> +				   bp->rx_ring_size, queue->rx_buff);

Opportunity to deindent this block?

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception
  2025-11-19 13:53 ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
@ 2025-11-27 13:38   ` Théo Lebrun
  2025-12-02 17:32     ` Paolo Valerio
  0 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-27 13:38 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
> Add support for receiving network frames that span multiple DMA
> descriptors in the Cadence MACB/GEM Ethernet driver.
>
> The patch removes the requirement that limited frame reception to
> a single descriptor (RX_SOF && RX_EOF), also avoiding potential
> contiguous multi-page allocation for large frames. It also uses
> page pool fragments allocation instead of full page for saving
> memory.
>
> Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
> ---
>  drivers/net/ethernet/cadence/macb.h      |   4 +-
>  drivers/net/ethernet/cadence/macb_main.c | 180 ++++++++++++++---------
>  2 files changed, 111 insertions(+), 73 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index dcf768bd1bc1..e2f397b7a27f 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -960,8 +960,7 @@ struct macb_dma_desc_ptp {
>  #define PPM_FRACTION	16
>  
>  /* The buf includes headroom compatible with both skb and xdpf */
> -#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
> -#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
> +#define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
>  
>  /* struct macb_tx_skb - data about an skb which is being transmitted
>   * @skb: skb currently being transmitted, only set for the last buffer
> @@ -1273,6 +1272,7 @@ struct macb_queue {
>  	struct napi_struct	napi_rx;
>  	struct queue_stats stats;
>  	struct page_pool	*page_pool;
> +	struct sk_buff		*skb;
>  };
>  
>  struct ethtool_rx_fs_item {
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 985c81913ba6..be0c8e101639 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1250,21 +1250,25 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  	return packets;
>  }
>  
> -static void *gem_page_pool_get_buff(struct page_pool *pool,
> +static void *gem_page_pool_get_buff(struct  macb_queue *queue,
>  				    dma_addr_t *dma_addr, gfp_t gfp_mask)
>  {
> +	struct macb *bp = queue->bp;
>  	struct page *page;
> +	int offset;
>  
> -	if (!pool)
> +	if (!queue->page_pool)
>  		return NULL;
>  
> -	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
> +	page = page_pool_alloc_frag(queue->page_pool, &offset,
> +				    bp->rx_buffer_size,
> +				    gfp_mask | __GFP_NOWARN);
>  	if (!page)
>  		return NULL;
>  
> -	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM + offset;
>  
> -	return page_address(page);
> +	return page_address(page) + offset;
>  }
>  
>  static void gem_rx_refill(struct macb_queue *queue, bool napi)
> @@ -1286,7 +1290,7 @@ static void gem_rx_refill(struct macb_queue *queue, bool napi)
>  
>  		if (!queue->rx_buff[entry]) {
>  			/* allocate sk_buff for this free entry in ring */
> -			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
> +			data = gem_page_pool_get_buff(queue, &paddr,
>  						      napi ? GFP_ATOMIC : GFP_KERNEL);
>  			if (unlikely(!data)) {
>  				netdev_err(bp->dev,
> @@ -1344,20 +1348,17 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  		  int budget)
>  {
>  	struct macb *bp = queue->bp;
> -	int			buffer_size;
>  	unsigned int		len;
>  	unsigned int		entry;
>  	void			*data;
> -	struct sk_buff		*skb;
>  	struct macb_dma_desc	*desc;
> +	int			data_len;
>  	int			count = 0;
>  
> -	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
> -
>  	while (count < budget) {
>  		u32 ctrl;
>  		dma_addr_t addr;
> -		bool rxused;
> +		bool rxused, first_frame;
>  
>  		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
>  		desc = macb_rx_desc(queue, entry);
> @@ -1368,6 +1369,9 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  		rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>  		addr = macb_get_addr(bp, desc);
>  
> +		dma_sync_single_for_cpu(&bp->pdev->dev,
> +					addr, bp->rx_buffer_size - bp->rx_offset,
> +					page_pool_get_dma_dir(queue->page_pool));

page_pool_get_dma_dir() will always return the same value, we could
hardcode it.

>  		if (!rxused)
>  			break;
>  
> @@ -1379,13 +1383,6 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  		queue->rx_tail++;
>  		count++;
>  
> -		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
> -			netdev_err(bp->dev,
> -				   "not whole frame pointed by descriptor\n");
> -			bp->dev->stats.rx_dropped++;
> -			queue->stats.rx_dropped++;
> -			break;
> -		}
>  		data = queue->rx_buff[entry];
>  		if (unlikely(!data)) {
>  			netdev_err(bp->dev,
> @@ -1395,64 +1392,102 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  			break;
>  		}
>  
> -		skb = napi_build_skb(data, buffer_size);
> -		if (unlikely(!skb)) {
> -			netdev_err(bp->dev,
> -				   "Unable to allocate sk_buff\n");
> -			page_pool_put_full_page(queue->page_pool,
> -						virt_to_head_page(data),
> -						false);
> -			break;
> +		first_frame = ctrl & MACB_BIT(RX_SOF);
> +		len = ctrl & bp->rx_frm_len_mask;
> +
> +		if (len) {
> +			data_len = len;
> +			if (!first_frame)
> +				data_len -= queue->skb->len;
> +		} else {
> +			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>  		}
>  
> -		/* Properly align Ethernet header.
> -		 *
> -		 * Hardware can add dummy bytes if asked using the RBOF
> -		 * field inside the NCFGR register. That feature isn't
> -		 * available if hardware is RSC capable.
> -		 *
> -		 * We cannot fallback to doing the 2-byte shift before
> -		 * DMA mapping because the address field does not allow
> -		 * setting the low 2/3 bits.
> -		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
> -		 */
> -		skb_reserve(skb, bp->rx_offset);
> -		skb_mark_for_recycle(skb);
> +		if (first_frame) {
> +			queue->skb = napi_build_skb(data, bp->rx_buffer_size);
> +			if (unlikely(!queue->skb)) {
> +				netdev_err(bp->dev,
> +					   "Unable to allocate sk_buff\n");
> +				goto free_frags;
> +			}
> +
> +			/* Properly align Ethernet header.
> +			 *
> +			 * Hardware can add dummy bytes if asked using the RBOF
> +			 * field inside the NCFGR register. That feature isn't
> +			 * available if hardware is RSC capable.
> +			 *
> +			 * We cannot fallback to doing the 2-byte shift before
> +			 * DMA mapping because the address field does not allow
> +			 * setting the low 2/3 bits.
> +			 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
> +			 */
> +			skb_reserve(queue->skb, bp->rx_offset);
> +			skb_mark_for_recycle(queue->skb);
> +			skb_put(queue->skb, data_len);
> +			queue->skb->protocol = eth_type_trans(queue->skb, bp->dev);
> +
> +			skb_checksum_none_assert(queue->skb);
> +			if (bp->dev->features & NETIF_F_RXCSUM &&
> +			    !(bp->dev->flags & IFF_PROMISC) &&
> +			    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
> +				queue->skb->ip_summed = CHECKSUM_UNNECESSARY;
> +		} else {
> +			if (!queue->skb) {
> +				netdev_err(bp->dev,
> +					   "Received non-starting frame while expecting it\n");
> +				goto free_frags;
> +			}

Here as well we want `queue->rx_buff[entry] = NULL;` no?
Maybe put it in the free_frags codepath to ensure it isn't forgotten?

> +			struct skb_shared_info *shinfo = skb_shinfo(queue->skb);
> +			struct page *page = virt_to_head_page(data);
> +			int nr_frags = shinfo->nr_frags;

Variable definitions in the middle of a scope? I think it is acceptable
when using __attribute__((cleanup)) but otherwise not so much (yet).

> +			if (nr_frags >= ARRAY_SIZE(shinfo->frags))
> +				goto free_frags;
> +
> +			skb_add_rx_frag(queue->skb, nr_frags, page,
> +					data - page_address(page) + bp->rx_offset,
> +					data_len, bp->rx_buffer_size);
> +		}
>  
>  		/* now everything is ready for receiving packet */
>  		queue->rx_buff[entry] = NULL;
> -		len = ctrl & bp->rx_frm_len_mask;
> -
> -		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>  
> -		dma_sync_single_for_cpu(&bp->pdev->dev,
> -					addr, len,
> -					page_pool_get_dma_dir(queue->page_pool));
> -		skb_put(skb, len);
> -		skb->protocol = eth_type_trans(skb, bp->dev);
> -		skb_checksum_none_assert(skb);
> -		if (bp->dev->features & NETIF_F_RXCSUM &&
> -		    !(bp->dev->flags & IFF_PROMISC) &&
> -		    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
> -			skb->ip_summed = CHECKSUM_UNNECESSARY;
> +		netdev_vdbg(bp->dev, "%s %u (len %u)\n", __func__, entry, data_len);
>  
> -		bp->dev->stats.rx_packets++;
> -		queue->stats.rx_packets++;
> -		bp->dev->stats.rx_bytes += skb->len;
> -		queue->stats.rx_bytes += skb->len;
> +		if (ctrl & MACB_BIT(RX_EOF)) {
> +			bp->dev->stats.rx_packets++;
> +			queue->stats.rx_packets++;
> +			bp->dev->stats.rx_bytes += queue->skb->len;
> +			queue->stats.rx_bytes += queue->skb->len;
>  
> -		gem_ptp_do_rxstamp(bp, skb, desc);
> +			gem_ptp_do_rxstamp(bp, queue->skb, desc);
>  
>  #if defined(DEBUG) && defined(VERBOSE_DEBUG)
> -		netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
> -			    skb->len, skb->csum);
> -		print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
> -			       skb_mac_header(skb), 16, true);
> -		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
> -			       skb->data, 32, true);
> +			netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
> +				    queue->skb->len, queue->skb->csum);
> +			print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
> +				       skb_mac_header(queue->skb), 16, true);
> +			print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
> +				       queue->skb->data, 32, true);
>  #endif
>  
> -		napi_gro_receive(napi, skb);
> +			napi_gro_receive(napi, queue->skb);
> +			queue->skb = NULL;
> +		}
> +
> +		continue;
> +
> +free_frags:
> +		if (queue->skb) {
> +			dev_kfree_skb(queue->skb);
> +			queue->skb = NULL;
> +		} else {
> +			page_pool_put_full_page(queue->page_pool,
> +						virt_to_head_page(data),
> +						false);
> +		}
>  	}
>  
>  	gem_rx_refill(queue, true);
> @@ -2394,7 +2429,10 @@ static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
>  	if (!macb_is_gem(bp)) {
>  		bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>  	} else {
> -		bp->rx_buffer_size = size;
> +		bp->rx_buffer_size = size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> +			+ MACB_PP_HEADROOM;
> +		if (bp->rx_buffer_size > PAGE_SIZE)
> +			bp->rx_buffer_size = PAGE_SIZE;
>  
>  		if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
>  			netdev_dbg(bp->dev,
> @@ -2589,18 +2627,15 @@ static int macb_alloc_consistent(struct macb *bp)
>  
>  static void gem_create_page_pool(struct macb_queue *queue)
>  {
> -	unsigned int num_pages = DIV_ROUND_UP(queue->bp->rx_buffer_size, PAGE_SIZE);
> -	struct macb *bp = queue->bp;
>  	struct page_pool_params pp_params = {
> -		.order = order_base_2(num_pages),
> +		.order = 0,
>  		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>  		.pool_size = queue->bp->rx_ring_size,
>  		.nid = NUMA_NO_NODE,
>  		.dma_dir = DMA_FROM_DEVICE,
>  		.dev = &queue->bp->pdev->dev,
>  		.napi = &queue->napi_rx,
> -		.offset = bp->rx_offset,
> -		.max_len = MACB_PP_MAX_BUF_SIZE(num_pages),
> +		.max_len = PAGE_SIZE,
>  	};
>  	struct page_pool *pool;

I forgot about it in [PATCH 1/6], but the error handling if
gem_create_page_pool() fails is odd. We set queue->page_pool to NULL
and keep on going. Then once opened we'll fail allocating any buffer
but still be open. Shouldn't we fail the link up operation?

If we want to support this codepath (page pool not allocated), then we
should unmask Rx interrupts only if alloc succeeded. I don't know if
we'd want that though.

	queue_writel(queue, IER, bp->rx_intr_mask | ...);

> @@ -2784,8 +2819,9 @@ static void macb_configure_dma(struct macb *bp)
>  	unsigned int q;
>  	u32 dmacfg;
>  
> -	buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
>  	if (macb_is_gem(bp)) {
> +		buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
> +		buffer_size /= RX_BUFFER_MULTIPLE;
>  		dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L);
>  		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>  			if (q)
> @@ -2816,6 +2852,8 @@ static void macb_configure_dma(struct macb *bp)
>  		netdev_dbg(bp->dev, "Cadence configure DMA with 0x%08x\n",
>  			   dmacfg);
>  		gem_writel(bp, DMACFG, dmacfg);
> +	} else {
> +		buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
>  	}
>  }

You do:

static void macb_configure_dma(struct macb *bp)
{
	u32 buffer_size;

	if (macb_is_gem(bp)) {
		buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
		buffer_size /= RX_BUFFER_MULTIPLE;
		// ... use buffer_size ...
	} else {
		buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
	}
}

Instead I'd vote for:

static void macb_configure_dma(struct macb *bp)
{
	u32 buffer_size;

	if (!macb_is_gem(bp))
		return;

	buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
	buffer_size /= RX_BUFFER_MULTIPLE;
	// ... use buffer_size ...
}

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem
  2025-11-19 13:53 ` [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem Paolo Valerio
@ 2025-11-27 14:41   ` Théo Lebrun
  2025-12-02 17:32     ` Paolo Valerio
  0 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-27 14:41 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Thomas Petazzoni, Grégory Clement, Benoît Monin

On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
> @@ -1273,6 +1275,7 @@ struct macb_queue {
>  	struct queue_stats stats;
>  	struct page_pool	*page_pool;
>  	struct sk_buff		*skb;
> +	struct xdp_rxq_info	xdp_q;
>  };

Those are always named `xdp_rxq` inside the kernel, we should stick with
the calling convention no?

> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 5829c1f773dd..53ea1958b8e4 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1344,10 +1344,51 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
>  	 */
>  }
>  
> +static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
> +		       struct net_device *dev)

Why pass `struct net_device` explicitly? It is in queue->bp->dev.

> +{
> +	struct bpf_prog *prog;
> +	u32 act = XDP_PASS;
> +
> +	rcu_read_lock();
> +
> +	prog = rcu_dereference(queue->bp->prog);
> +	if (!prog)
> +		goto out;
> +
> +	act = bpf_prog_run_xdp(prog, xdp);
> +	switch (act) {
> +	case XDP_PASS:
> +		goto out;
> +	case XDP_REDIRECT:
> +		if (unlikely(xdp_do_redirect(dev, xdp, prog))) {
> +			act = XDP_DROP;
> +			break;
> +		}
> +		goto out;

Why the `unlikely()`?

> +	default:
> +		bpf_warn_invalid_xdp_action(dev, prog, act);
> +		fallthrough;
> +	case XDP_ABORTED:
> +		trace_xdp_exception(dev, prog, act);
> +		fallthrough;
> +	case XDP_DROP:
> +		break;
> +	}
> +
> +	page_pool_put_full_page(queue->page_pool,
> +				virt_to_head_page(xdp->data), true);

Maybe move that to the XDP_DROP, it is the only `break` in the above
switch statement. It will be used by the default and XDP_ABORTED cases
through fallthrough. We can avoid the out label and its two gotos that
way.

>  static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  		  int budget)
>  {
>  	struct macb *bp = queue->bp;
> +	bool			xdp_flush = false;
>  	unsigned int		len;
>  	unsigned int		entry;
>  	void			*data;
> @@ -1356,9 +1397,11 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  	int			count = 0;
>  
>  	while (count < budget) {
> -		u32 ctrl;
> -		dma_addr_t addr;
>  		bool rxused, first_frame;
> +		struct xdp_buff xdp;
> +		dma_addr_t addr;
> +		u32 ctrl;
> +		u32 ret;
>  
>  		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
>  		desc = macb_rx_desc(queue, entry);
> @@ -1403,6 +1446,22 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>  		}
>  
> +		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF)))
> +			goto skip_xdp;
> +
> +		xdp_init_buff(&xdp, bp->rx_buffer_size, &queue->xdp_q);
> +		xdp_prepare_buff(&xdp, data, bp->rx_offset, len,
> +				 false);
> +		xdp_buff_clear_frags_flag(&xdp);

You prepare the XDP buffer before checking an XDP program is attached.
Could we avoid this work? We'd move the xdp_buff preparation into
gem_xdp_run(), after the RCU pointer dereference.

> -static void gem_create_page_pool(struct macb_queue *queue)
> +static void gem_create_page_pool(struct macb_queue *queue, int qid)
>  {
>  	struct page_pool_params pp_params = {
>  		.order = 0,
>  		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>  		.pool_size = queue->bp->rx_ring_size,
>  		.nid = NUMA_NO_NODE,
> -		.dma_dir = DMA_FROM_DEVICE,
> +		.dma_dir = rcu_access_pointer(queue->bp->prog)
> +				? DMA_BIDIRECTIONAL
> +				: DMA_FROM_DEVICE,

Ah, that is the reason for page_pool_get_dma_dir() calls!

>  static int macb_change_mtu(struct net_device *dev, int new_mtu)
>  {
> +	int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + MACB_MAX_PAD;
> +	struct macb *bp = netdev_priv(dev);
> +	struct bpf_prog *prog = bp->prog;

No fancy RCU macro?

> +static int macb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> +{
> +	struct macb *bp = netdev_priv(dev);
> +
> +	if (!macb_is_gem(bp))
> +		return 0;

Returning 0 sounds like a mistake, -EOPNOTSUPP sounds more appropriate.

> +	switch (xdp->command) {
> +	case XDP_SETUP_PROG:
> +		return gem_xdp_setup(dev, xdp->prog, xdp->extack);
> +	default:
> +		return -EINVAL;

Same here: we want -EOPNOTSUPP. Otherwise caller cannot dissociate an
unsupported call from one that is supported but failed.

> +	}
> +}
> +
>  static void gem_update_stats(struct macb *bp)
>  {
>  	struct macb_queue *queue;
> @@ -4390,6 +4529,7 @@ static const struct net_device_ops macb_netdev_ops = {
>  	.ndo_hwtstamp_set	= macb_hwtstamp_set,
>  	.ndo_hwtstamp_get	= macb_hwtstamp_get,
>  	.ndo_setup_tc		= macb_setup_tc,
> +	.ndo_bpf		= macb_xdp,

We want it to be "gem_" prefixed as it does not support MACB.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic
  2025-11-19 13:53 ` [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic Paolo Valerio
@ 2025-11-27 14:49   ` Théo Lebrun
  2025-12-02 17:33     ` Paolo Valerio
  0 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-27 14:49 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Thomas Petazzoni, Grégory Clement, Benoît Monin

On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index 2f665260a84d..67bb98d3cb00 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -964,19 +964,27 @@ struct macb_dma_desc_ptp {
>  #define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
>  #define MACB_MAX_PAD		(MACB_PP_HEADROOM + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
>  
> -/* struct macb_tx_skb - data about an skb which is being transmitted
> - * @skb: skb currently being transmitted, only set for the last buffer
> - *       of the frame
> +enum macb_tx_buff_type {
> +	MACB_TYPE_SKB,
> +	MACB_TYPE_XDP_TX,
> +	MACB_TYPE_XDP_NDO,
> +};
> +
> +/* struct macb_tx_buff - data about an skb or xdp frame which is being transmitted
> + * @data: pointer to skb or xdp frame being transmitted, only set
> + *        for the last buffer for sk_buff
>   * @mapping: DMA address of the skb's fragment buffer
>   * @size: size of the DMA mapped buffer
>   * @mapped_as_page: true when buffer was mapped with skb_frag_dma_map(),
>   *                  false when buffer was mapped with dma_map_single()
> + * @type: type of buffer (MACB_TYPE_SKB, MACB_TYPE_XDP_TX, MACB_TYPE_XDP_NDO)
>   */
> -struct macb_tx_skb {
> -	struct sk_buff		*skb;
> -	dma_addr_t		mapping;
> -	size_t			size;
> -	bool			mapped_as_page;
> +struct macb_tx_buff {
> +	void				*data;
> +	dma_addr_t			mapping;
> +	size_t				size;
> +	bool				mapped_as_page;
> +	enum macb_tx_buff_type		type;
>  };

Here as well, reviewing would be helped by moving the tx_skb to tx_buff
renaming to a separate commit that has no functional change.

As said in [0], I am not a fan of the field name `data`.
Let's discuss it there.

[0]: https://lore.kernel.org/all/DEITSIO441QL.X81MVLL3EIV4@bootlin.com/

> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int budget)
> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
> +			  int budget)
>  {
> -	if (tx_skb->mapping) {
> -		if (tx_skb->mapped_as_page)
> -			dma_unmap_page(&bp->pdev->dev, tx_skb->mapping,
> -				       tx_skb->size, DMA_TO_DEVICE);
> +	if (tx_buff->mapping) {
> +		if (tx_buff->mapped_as_page)
> +			dma_unmap_page(&bp->pdev->dev, tx_buff->mapping,
> +				       tx_buff->size, DMA_TO_DEVICE);
>  		else
> -			dma_unmap_single(&bp->pdev->dev, tx_skb->mapping,
> -					 tx_skb->size, DMA_TO_DEVICE);
> -		tx_skb->mapping = 0;
> +			dma_unmap_single(&bp->pdev->dev, tx_buff->mapping,
> +					 tx_buff->size, DMA_TO_DEVICE);
> +		tx_buff->mapping = 0;
>  	}
>  
> -	if (tx_skb->skb) {
> -		napi_consume_skb(tx_skb->skb, budget);
> -		tx_skb->skb = NULL;
> +	if (tx_buff->data) {
> +		if (tx_buff->type != MACB_TYPE_SKB)
> +			netdev_err(bp->dev, "BUG: Unexpected tx buffer type while unmapping (%d)",
> +				   tx_buff->type);
> +		napi_consume_skb(tx_buff->data, budget);
> +		tx_buff->data = NULL;
>  	}

This code does not make much sense by itself. We check `tx_buff->type !=
MACB_TYPE_SKB` but call napi_consume_skb() in all cases. I remember it
changes in the next commit.

> @@ -1069,16 +1073,23 @@ static void macb_tx_error_task(struct work_struct *work)
>  
>  		desc = macb_tx_desc(queue, tail);
>  		ctrl = desc->ctrl;
> -		tx_skb = macb_tx_skb(queue, tail);
> -		skb = tx_skb->skb;
> +		tx_buff = macb_tx_buff(queue, tail);
> +
> +		if (tx_buff->type != MACB_TYPE_SKB)
> +			netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
> +				   tx_buff->type);
> +		skb = tx_buff->data;

Same here: `tx_buff->type != MACB_TYPE_SKB` does not make sense if we
keep on going with the SKB case anyways.

>  
>  		if (ctrl & MACB_BIT(TX_USED)) {
>  			/* skb is set for the last buffer of the frame */
>  			while (!skb) {
> -				macb_tx_unmap(bp, tx_skb, 0);
> +				macb_tx_unmap(bp, tx_buff, 0);
>  				tail++;
> -				tx_skb = macb_tx_skb(queue, tail);
> -				skb = tx_skb->skb;
> +				tx_buff = macb_tx_buff(queue, tail);
> +				if (tx_buff->type != MACB_TYPE_SKB)
> +					netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
> +						   tx_buff->type);
> +				skb = tx_buff->data;

Same.

> @@ -5050,7 +5066,7 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb,
>  		netif_stop_queue(dev);
>  
>  		/* Store packet information (to free when Tx completed) */
> -		lp->rm9200_txq[desc].skb = skb;
> +		lp->rm9200_txq[desc].data = skb;
>  		lp->rm9200_txq[desc].size = skb->len;
>  		lp->rm9200_txq[desc].mapping = dma_map_single(&lp->pdev->dev, skb->data,
>  							      skb->len, DMA_TO_DEVICE);

We might want to assign `lp->rm9200_txq[desc].type` here, to ensure all
`struct macb_tx_buff` instances are fully initialised.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support
  2025-11-19 13:53 ` [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support Paolo Valerio
@ 2025-11-27 15:07   ` Théo Lebrun
  2025-12-02 17:34     ` Paolo Valerio
  0 siblings, 1 reply; 28+ messages in thread
From: Théo Lebrun @ 2025-11-27 15:07 UTC (permalink / raw)
  To: Paolo Valerio, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Hello Paolo, netdev,

On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
> Add XDP_TX verdict support, also introduce ndo_xdp_xmit function for
> redirection, and update macb_tx_unmap() to handle both skbs and xdp
> frames advertising NETDEV_XDP_ACT_NDO_XMIT capability and the ability
> to process XDP_TX verdicts.
>
> Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 166 +++++++++++++++++++++--
>  1 file changed, 153 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index eeda1a3871a6..bd62d3febeb1 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -969,6 +969,17 @@ static int macb_halt_tx(struct macb *bp)
>  					bp, TSR);
>  }
>  
> +static void release_buff(void *buff, enum macb_tx_buff_type type, int budget)
> +{
> +	if (type == MACB_TYPE_SKB) {
> +		napi_consume_skb(buff, budget);
> +	} else if (type == MACB_TYPE_XDP_TX) {
> +		xdp_return_frame_rx_napi(buff);
> +	} else {
> +		xdp_return_frame(buff);
> +	}
> +}
> +
>  static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
>  			  int budget)
>  {
> @@ -983,10 +994,7 @@ static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
>  	}
>  
>  	if (tx_buff->data) {
> -		if (tx_buff->type != MACB_TYPE_SKB)
> -			netdev_err(bp->dev, "BUG: Unexpected tx buffer type while unmapping (%d)",
> -				   tx_buff->type);
> -		napi_consume_skb(tx_buff->data, budget);
> +		release_buff(tx_buff->data, tx_buff->type, budget);
>  		tx_buff->data = NULL;
>  	}
>  }
> @@ -1076,8 +1084,8 @@ static void macb_tx_error_task(struct work_struct *work)
>  		tx_buff = macb_tx_buff(queue, tail);
>  
>  		if (tx_buff->type != MACB_TYPE_SKB)
> -			netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
> -				   tx_buff->type);
> +			goto unmap;
> +
>  		skb = tx_buff->data;
>  
>  		if (ctrl & MACB_BIT(TX_USED)) {
> @@ -1118,6 +1126,7 @@ static void macb_tx_error_task(struct work_struct *work)
>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>  		}
>  
> +unmap:
>  		macb_tx_unmap(bp, tx_buff, 0);
>  	}
>  
> @@ -1196,6 +1205,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
>  	head = queue->tx_head;
>  	for (tail = queue->tx_tail; tail != head && packets < budget; tail++) {
> +		void			*data = NULL;
>  		struct macb_tx_buff	*tx_buff;
>  		struct sk_buff		*skb;
>  		struct macb_dma_desc	*desc;
> @@ -1218,11 +1228,16 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  		for (;; tail++) {
>  			tx_buff = macb_tx_buff(queue, tail);
>  
> -			if (tx_buff->type == MACB_TYPE_SKB)
> -				skb = tx_buff->data;
> +			if (tx_buff->type != MACB_TYPE_SKB) {
> +				data = tx_buff->data;
> +				goto unmap;
> +			}
>  
>  			/* First, update TX stats if needed */
> -			if (skb) {
> +			if (tx_buff->type == MACB_TYPE_SKB && tx_buff->data) {
> +				data = tx_buff->data;
> +				skb = tx_buff->data;
> +
>  				if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
>  				    !ptp_one_step_sync(skb))
>  					gem_ptp_do_txstamp(bp, skb, desc);
> @@ -1238,6 +1253,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  				bytes += skb->len;
>  			}
>  
> +unmap:
>  			/* Now we can safely release resources */
>  			macb_tx_unmap(bp, tx_buff, budget);
>  
> @@ -1245,7 +1261,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  			 * WARNING: at this point skb has been freed by
>  			 * macb_tx_unmap().
>  			 */
> -			if (skb)
> +			if (data)
>  				break;
>  		}
>  	}
> @@ -1357,8 +1373,124 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
>  	 */
>  }
>  
> +static int macb_xdp_submit_frame(struct macb *bp, struct xdp_frame *xdpf,
> +				 struct net_device *dev, dma_addr_t addr)
> +{
> +	enum macb_tx_buff_type buff_type;
> +	struct macb_tx_buff *tx_buff;
> +	int cpu = smp_processor_id();
> +	struct macb_dma_desc *desc;
> +	struct macb_queue *queue;
> +	unsigned long flags;
> +	dma_addr_t mapping;
> +	u16 queue_index;
> +	int err = 0;
> +	u32 ctrl;
> +
> +	queue_index = cpu % bp->num_queues;
> +	queue = &bp->queues[queue_index];
> +	buff_type = !addr ? MACB_TYPE_XDP_NDO : MACB_TYPE_XDP_TX;

I am not the biggest fan of piggy-backing on !!addr to know which
codepath called us. If the macb_xdp_submit_frame() call in gem_xdp_run()
ever gives an addr=0 coming from macb_get_addr(bp, desc), then we will
be submitting NDO typed frames and creating additional DMA mappings
which would be a really hard to debug bug.

> +	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
> +
> +	/* This is a hard error, log it. */
> +	if (CIRC_SPACE(queue->tx_head, queue->tx_tail,
> +		       bp->tx_ring_size) < 1) {

Hard wrapped line is not required, it fits in one line.

> +		netif_stop_subqueue(dev, queue_index);
> +		netdev_dbg(bp->dev, "tx_head = %u, tx_tail = %u\n",
> +			   queue->tx_head, queue->tx_tail);
> +		err = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	if (!addr) {
> +		mapping = dma_map_single(&bp->pdev->dev,
> +					 xdpf->data,
> +					 xdpf->len, DMA_TO_DEVICE);
> +		if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {
> +			err = -ENOMEM;
> +			goto unlock;
> +		}
> +	} else {
> +		mapping = addr;
> +		dma_sync_single_for_device(&bp->pdev->dev, mapping,
> +					   xdpf->len, DMA_BIDIRECTIONAL);
> +	}
> +
> +	unsigned int tx_head = queue->tx_head + 1;

Middle scope variable definition. Weirdly named as it isn't storing the
current head offset but the future head offset.

> +	ctrl = MACB_BIT(TX_USED);
> +	desc = macb_tx_desc(queue, tx_head);
> +	desc->ctrl = ctrl;
> +
> +	desc = macb_tx_desc(queue, queue->tx_head);
> +	tx_buff = macb_tx_buff(queue, queue->tx_head);
> +	tx_buff->data = xdpf;
> +	tx_buff->type = buff_type;
> +	tx_buff->mapping = mapping;
> +	tx_buff->size = xdpf->len;
> +	tx_buff->mapped_as_page = false;
> +
> +	ctrl = (u32)tx_buff->size;
> +	ctrl |= MACB_BIT(TX_LAST);
> +
> +	if (unlikely(macb_tx_ring_wrap(bp, queue->tx_head) == (bp->tx_ring_size - 1)))
> +		ctrl |= MACB_BIT(TX_WRAP);
> +
> +	/* Set TX buffer descriptor */
> +	macb_set_addr(bp, desc, tx_buff->mapping);
> +	/* desc->addr must be visible to hardware before clearing
> +	 * 'TX_USED' bit in desc->ctrl.
> +	 */
> +	wmb();
> +	desc->ctrl = ctrl;
> +	queue->tx_head = tx_head;
> +
> +	/* Make newly initialized descriptor visible to hardware */
> +	wmb();
> +
> +	spin_lock(&bp->lock);
> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +	spin_unlock(&bp->lock);
> +
> +	if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
> +		netif_stop_subqueue(dev, queue_index);

The above 30~40 lines are super similar to macb_start_xmit() &
macb_tx_map(). They implement almost the same logic; can we avoid the
duplication?

> +
> +unlock:
> +	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
> +
> +	if (err)
> +		release_buff(xdpf, buff_type, 0);
> +
> +	return err;
> +}
> +
> +static int
> +macb_xdp_xmit(struct net_device *dev, int num_frame,
> +	      struct xdp_frame **frames, u32 flags)
> +{
> +	struct macb *bp = netdev_priv(dev);
> +	u32 xmitted = 0;
> +	int i;
> +
> +	if (!macb_is_gem(bp))
> +		return -EOPNOTSUPP;
> +
> +	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
> +		return -EINVAL;
> +
> +	for (i = 0; i < num_frame; i++) {
> +		if (macb_xdp_submit_frame(bp, frames[i], dev, 0))
> +			break;
> +
> +		xmitted++;
> +	}
> +
> +	return xmitted;
> +}
> +
>  static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
> -		       struct net_device *dev)
> +		       struct net_device *dev, dma_addr_t addr)
>  {
>  	struct bpf_prog *prog;
>  	u32 act = XDP_PASS;
> @@ -1379,6 +1511,12 @@ static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
>  			break;
>  		}
>  		goto out;
> +	case XDP_TX:
> +		struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
> +
> +		if (!xdpf || macb_xdp_submit_frame(queue->bp, xdpf, dev, addr))
> +			act = XDP_DROP;
> +		goto out;
>  	default:
>  		bpf_warn_invalid_xdp_action(dev, prog, act);
>  		fallthrough;
> @@ -1467,7 +1605,7 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>  				 false);
>  		xdp_buff_clear_frags_flag(&xdp);
>  
> -		ret = gem_xdp_run(queue, &xdp, bp->dev);
> +		ret = gem_xdp_run(queue, &xdp, bp->dev, addr);
>  		if (ret == XDP_REDIRECT)
>  			xdp_flush = true;
>  
> @@ -4546,6 +4684,7 @@ static const struct net_device_ops macb_netdev_ops = {
>  	.ndo_hwtstamp_get	= macb_hwtstamp_get,
>  	.ndo_setup_tc		= macb_setup_tc,
>  	.ndo_bpf		= macb_xdp,
> +	.ndo_xdp_xmit		= macb_xdp_xmit,

I'd expect macb_xdp_xmit() to be called gem_xdp_xmit() as well.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
  2025-11-26 18:08 ` Théo Lebrun
@ 2025-12-02 17:24   ` Paolo Valerio
  2025-12-03 14:28     ` Théo Lebrun
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-12-02 17:24 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

Hello Théo,

thank you for the feedback

On 26 Nov 2025 at 07:08:14 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> Hello Paolo,
>
> So this is an initial review, I'll start here with five series-wide
> topics and send small per-line comments (ie nitpicks) in a second stage.
>
>
>
> ### Rx buffer size computation
>
> The buffer size computation should be reworked. At the end of the series
> it looks like:
>
> static int macb_open(struct net_device *dev)
> {
>     size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
>
>     // ...
>
>     macb_init_rx_buffer_size(bp, bufsz);
>
>     // ...
> }
>
> static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
> {
>     if (!macb_is_gem(bp)) {
>         bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>     } else {
>         bp->rx_buffer_size = size
>             + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>             + MACB_PP_HEADROOM;
>
>         if (bp->rx_buffer_size > PAGE_SIZE)
>             bp->rx_buffer_size = PAGE_SIZE;
>
>         if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE)
>             bp->rx_buffer_size = roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
>     }
> }
>
> Most of the issues with this code is not stemming from your series, but
> this big rework is the right moment to fix it all.
>
>  - NET_IP_ALIGN is accounted for in the headroom even though it isn't
>    present if !RSC.
>

that's something I noticed and I was a unsure about the reason.

>  - When skbuff.c/h allocates an SKB buffer, it SKB_DATA_ALIGN()s
>    headroom+data. We should probably do the same. In our case it would
>    be:
>
>    bp->rx_buffer_size = SKB_DATA_ALIGN(MACB_PP_HEADROOM + size) +
>                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>    // or
>    bp->rx_buffer_size = SKB_HEAD_ALIGN(MACB_PP_HEADROOM + size);
>
>    I've not computed if it can ever give a different value to your
>    series in terms of total size or shinfo alignment. I'd guess it is
>    unlikely.
>

I'll take a look at this

>  - If the size clamping to PAGE_SIZE comes into play, we are probably
>    doomed. It means we cannot deal with the MTU and we'll probably get
>    corruption. If we do put a check in place, it should loudly fail
>    rather than silently clamp.
>

That should not happen, unless I'm missing something.
E.g., 9000B mtu on a 4K PAGE_SIZE kernel should be handled with multiple
descriptors. The clamping is there because according with how the series
creates the pool, the maximum buffer size is page order 0.

Hardware-wise bp->rx_buffer_size should also be taken into account for
the receive buffer size.

> TLDR: I think macb_init_rx_buffer_size() should encapsulate the whole
> rx buffer size computation. It can use bp->rx_offset and add on top
> MTU & co. It might start failing if >PAGE_SIZE (?).
>

ack, I'll move that part

>
>
> ### Buffer variable names
>
> Related: so many variables, fields or constants have ambiguous names,
> can we do something about it?
>
>  - bp->rx_offset is named oddly to my ears. Offset to what?
>    Maybe bp->rx_head or bp->rx_headroom?
>

bp->rx_headroom sounds a good choice to me, but if you have a stronger
preference for bp->rx_head just let me know.

>  - bp->rx_buffer_size: it used to be approximately the payload size
>    (plus NET_IP_ALIGN). Now it contains the XDP headroom and shinfo.
>    That's on GEM, because on MACB it means something different.
>
>    This line is a bit ironic and illustrates the trouble:
>       buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset
>
>  - data in gem_rx|gem_rx_refill|gem_free_rx_buffers() points to what we
>    could call skb->head, ie start of buffer & start of XDP headroom.
>    Its name implied skb->data to me, ie after the headroom.
>
>    It also made `data - page_address(page) + bp->rx_offset` hard to
>    understand. It is easier for me to understand that the following is
>    the page fragment offset till skb->data:
>
>       buff_head + bp->rx_headroom - page_address(page)
>

ack, will change this

>  - MACB_MAX_PAD is ambiguous. It rhymes with NET_SKB_PAD which is the
>    default SKB headroom but here it contains part of the headroom
>    (XDP_PACKET_HEADROOM but not NET_IP_ALIGN) and the tailroom (shinfo).
>

uhm, looking at this, I think NET_IP_ALIGN should be part of it (for the
!rsc case).
I'll revisit this part, though.

>  - Field `data` in `struct macb_tx_buff` points to skb/xdp_frame but my
>    initial thought was skb->data pointer (ie after headroom).
>    What about va or ptr or buff or frame or ...?
>

I see. At some point I considered buff, but then I realized
tx_buff->buff was not perfect, hence data :)

I guess one between frame or va works, thanks.

> TLDR: I had a hard time understanding size/offset expressions (both from
> existing code and the series) because of variable names.
>

ack. Will revisit this aspect based on your suggestions.

>
>
> ### Duplicated buffer size computations
>
> Last point related to buffer size computation:
>
>  - it is duplicated in macb_set_mtu() but forgets NET_IP_ALIGN & proper
>    SKB_DATA_ALIGN() and,
>
>  - it is duplicated in gem_xdp_setup() but I don't see why because the
>    buffer size is computed to work fine with/without XDP enabled. Also
>    this check means we cannot load an XDP program before macb_open()
>    because bp->rx_buffer_size == 0.
>
> TLDR: Let's deduplicate size computations to minimise chances of getting
> it wrong.
>

ack

>
>
> ### Allocation changes
>
> I am not convinced by patches 1/6 and 2/6 that change the alloc strategy
> in two steps, from netdev_alloc_skb() to page_pool_alloc_pages() to
> page_pool_alloc_frag().
>
>  - The transient solution isn't acceptable when PAGE_SIZE is large.
>    We have 16K and would never want one buffer per page.
>
>  - It forces you to introduce temporary code & constants which is added
>    noise IMO. MACB_PP_MAX_BUF_SIZE() is odd as is the alignment of
>    buffer sizes to page sizes. It forces you to deal with
>    `bp->rx_buffer_size > PAGE_SIZE` which we could ignore. Right?
>
> TLDR: do alloc changes in one step.
>

yes, makes sense I'll squash them.

>
>
> ### XDP_SETUP_PROG if netif_running()
>
> I'd like to start a discussion on the expected behavior on XDP program
> change if netif_running(). Summarised:
>
> static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
>              struct netlink_ext_ack *extack)
> {
>     bool running = netif_running(dev);
>     bool need_update = !!bp->prog != !!prog;
>
>     if (running && need_update)
>         macb_close(dev);
>     old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
>     if (running && need_update)
>         return macb_open(dev);
> }
>
> Have you experimented with that? I don't see anything graceful in our
> close operation, it looks like we'll get corruption or dropped packets
> or both. We shouldn't impose that on the user who just wanted to swap
> the program.
>
> I cannot find any good reason that implies we wouldn't be able to swap
> our XDP program on the fly. If we think it is unsafe, I'd vote for
> starting with a -EBUSY return code and iterating on that.
>

I didn't experiment much with this, other than simply adding and
removing programs as needed during my tests. Didn't experience
particular issues.

The reason a close/open sequence was added here was mostly because I was
considering to account XDP_PACKET_HEADROOM only when a program was
present. I later decided to not proceed with that (mostly to avoid
changing too many things at once).

Given the geometry of the buffer remains untouched in either case, I
see no particular reasons we can't swap on the fly as you suggest.

I'll try this and change it, thanks!

> TLDR: macb_close() on XDP program change is probably a bad idea.
>
> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support
  2025-11-27 13:21   ` Théo Lebrun
@ 2025-12-02 17:30     ` Paolo Valerio
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Valerio @ 2025-12-02 17:30 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

On 27 Nov 2025 at 02:21:52 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> Hello Paolo,
>
> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>> Subject: [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support
>
> `git log --oneline drivers/net/ethernet/cadence/` tells us all commits
> in this series should be prefixed with "net: macb: ".
>

ack

>> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
>> index 87414a2ddf6e..dcf768bd1bc1 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -14,6 +14,8 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/phy/phy.h>
>>  #include <linux/workqueue.h>
>> +#include <net/page_pool/helpers.h>
>> +#include <net/xdp.h>
>>  
>>  #define MACB_GREGS_NBR 16
>>  #define MACB_GREGS_VERSION 2
>> @@ -957,6 +959,10 @@ struct macb_dma_desc_ptp {
>>  /* Scaled PPM fraction */
>>  #define PPM_FRACTION	16
>>  
>> +/* The buf includes headroom compatible with both skb and xdpf */
>> +#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
>> +#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
>
> From my previous review, you know I think MACB_PP_MAX_BUF_SIZE() should
> disappear.
>
> I also don't see the point of MACB_PP_HEADROOM. Maybe if it was
> max(XDP_PACKET_HEADROOM, NET_SKB_PAD) it would be more useful, but that
> isn't useful anyway. Can we drop it and use XDP_PACKET_HEADROOM directly
> in the code?
>

sure

>>  /* struct macb_tx_skb - data about an skb which is being transmitted
>>   * @skb: skb currently being transmitted, only set for the last buffer
>>   *       of the frame
>> @@ -1262,10 +1268,11 @@ struct macb_queue {
>>  	unsigned int		rx_tail;
>>  	unsigned int		rx_prepared_head;
>>  	struct macb_dma_desc	*rx_ring;
>> -	struct sk_buff		**rx_skbuff;
>> +	void			**rx_buff;
>
> It would help review if the s/rx_skbuff/rx_buff/ renaming was done in a
> separate commit with a commit message being "this only renames X and
> implies no functional changes".
>

ack, will do

>>  	void			*rx_buffers;
>>  	struct napi_struct	napi_rx;
>>  	struct queue_stats stats;
>> +	struct page_pool	*page_pool;
>>  };
>>  
>>  struct ethtool_rx_fs_item {
>> @@ -1289,6 +1296,7 @@ struct macb {
>>  	struct macb_dma_desc	*rx_ring_tieoff;
>>  	dma_addr_t		rx_ring_tieoff_dma;
>>  	size_t			rx_buffer_size;
>> +	u16			rx_offset;
>
> u16 makes me worried that we might do mistakes. For example the
> following propagates the u16 type.
>
>         bp->rx_offset + data - page_address(page)
>
> We can spare the additional 6 bytes and turn it into a size_t. It'll
> fall in holes anyway, at least it does for my target according to pahole.
>

will do

>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index e461f5072884..985c81913ba6 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1250,11 +1250,28 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  	return packets;
>>  }
>>  
>> -static void gem_rx_refill(struct macb_queue *queue)
>> +static void *gem_page_pool_get_buff(struct page_pool *pool,
>> +				    dma_addr_t *dma_addr, gfp_t gfp_mask)
>> +{
>> +	struct page *page;
>> +
>> +	if (!pool)
>> +		return NULL;
>> +
>> +	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
>> +	if (!page)
>> +		return NULL;
>> +
>> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
>> +
>> +	return page_address(page);
>> +}
>
> Do we need a separate function called from only one location? Or we
> could change its name to highlight it allocates and does not just "get"
> a buffer.
>

I guess it's fine to open-code this.

>> @@ -1267,25 +1284,17 @@ static void gem_rx_refill(struct macb_queue *queue)
>>  
>>  		desc = macb_rx_desc(queue, entry);
>>  
>> -		if (!queue->rx_skbuff[entry]) {
>> +		if (!queue->rx_buff[entry]) {
>>  			/* allocate sk_buff for this free entry in ring */
>> -			skb = netdev_alloc_skb(bp->dev, bp->rx_buffer_size);
>> -			if (unlikely(!skb)) {
>> +			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
>> +						      napi ? GFP_ATOMIC : GFP_KERNEL);
>
> I don't get why the gfp flags computation is spread across
> gem_page_pool_get_buff() and gem_rx_refill().
>

Not sure I get the point here.
This will be open-coded, so atomic vs non-atomic flag will not be passed
to gem_page_pool_get_buff() anymore after the change.
Not sure this answer your question.

>> @@ -1349,12 +1344,16 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  		  int budget)
>>  {
>>  	struct macb *bp = queue->bp;
>> +	int			buffer_size;
>>  	unsigned int		len;
>>  	unsigned int		entry;
>> +	void			*data;
>>  	struct sk_buff		*skb;
>>  	struct macb_dma_desc	*desc;
>>  	int			count = 0;
>>  
>> +	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
>
> This looks like
>
>         buffer_size = ALIGN(bp->rx_buffer_size, PAGE_SIZE);
>
> no? Anyway I think it should be dropped. It does get dropped next patch
> in this RFC.
>

will proceed squashing the two patches

>> @@ -1387,24 +1386,49 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  			queue->stats.rx_dropped++;
>>  			break;
>>  		}
>> -		skb = queue->rx_skbuff[entry];
>> -		if (unlikely(!skb)) {
>> +		data = queue->rx_buff[entry];
>> +		if (unlikely(!data)) {
>>  			netdev_err(bp->dev,
>>  				   "inconsistent Rx descriptor chain\n");
>>  			bp->dev->stats.rx_dropped++;
>>  			queue->stats.rx_dropped++;
>>  			break;
>>  		}
>> +
>> +		skb = napi_build_skb(data, buffer_size);
>> +		if (unlikely(!skb)) {
>> +			netdev_err(bp->dev,
>> +				   "Unable to allocate sk_buff\n");
>> +			page_pool_put_full_page(queue->page_pool,
>> +						virt_to_head_page(data),
>> +						false);
>> +			break;
>
> We should `queue->rx_skbuff[entry] = NULL` here no?
> We free a page and keep a pointer to it.
>

This will be squashed, but the point is still valid as it's the same as
the other patch

>> +		}
>> +
>> +		/* Properly align Ethernet header.
>> +		 *
>> +		 * Hardware can add dummy bytes if asked using the RBOF
>> +		 * field inside the NCFGR register. That feature isn't
>> +		 * available if hardware is RSC capable.
>> +		 *
>> +		 * We cannot fallback to doing the 2-byte shift before
>> +		 * DMA mapping because the address field does not allow
>> +		 * setting the low 2/3 bits.
>> +		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>> +		 */
>> +		skb_reserve(skb, bp->rx_offset);
>> +		skb_mark_for_recycle(skb);
>
> I have a platform with RSC support and NET_IP_ALIGN=2. What is yours
> like? It'd be nice if we can test different cases of this RBOF topic.
>

my caps:
macb 1f00100000.ethernet: Cadence caps 0x00548061

Bit RSC looks unset and NET_IP_ALIGN is 0 in my case.

>>  		/* now everything is ready for receiving packet */
>> -		queue->rx_skbuff[entry] = NULL;
>> +		queue->rx_buff[entry] = NULL;
>>  		len = ctrl & bp->rx_frm_len_mask;
>>  
>>  		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>>  
>> +		dma_sync_single_for_cpu(&bp->pdev->dev,
>> +					addr, len,
>> +					page_pool_get_dma_dir(queue->page_pool));
>
> Any reason for the call to dma_sync_single_for_cpu(), we could hardcode
> it to DMA_FROM_DEVICE no?
>

I saw in the other reply you found the answer

>> @@ -2477,13 +2497,13 @@ static int gem_alloc_rx_buffers(struct macb *bp)
>>  
>>  	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>>  		size = bp->rx_ring_size * sizeof(struct sk_buff *);
>
> sizeof is called with the wrong type. Not that it matters because
> pointers are pointers, but we can take this opportunity to move to
>
>         sizeof(*queue->rx_buff)
>

definitely

>> -		queue->rx_skbuff = kzalloc(size, GFP_KERNEL);
>> -		if (!queue->rx_skbuff)
>> +		queue->rx_buff = kzalloc(size, GFP_KERNEL);
>> +		if (!queue->rx_buff)
>>  			return -ENOMEM;
>>  		else
>>  			netdev_dbg(bp->dev,
>> -				   "Allocated %d RX struct sk_buff entries at %p\n",
>> -				   bp->rx_ring_size, queue->rx_skbuff);
>> +				   "Allocated %d RX buff entries at %p\n",
>> +				   bp->rx_ring_size, queue->rx_buff);
>
> Opportunity to deindent this block?
>

ack

> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception
  2025-11-27 13:38   ` Théo Lebrun
@ 2025-12-02 17:32     ` Paolo Valerio
  2025-12-08 10:21       ` Théo Lebrun
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-12-02 17:32 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

On 27 Nov 2025 at 02:38:45 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>> Add support for receiving network frames that span multiple DMA
>> descriptors in the Cadence MACB/GEM Ethernet driver.
>>
>> The patch removes the requirement that limited frame reception to
>> a single descriptor (RX_SOF && RX_EOF), also avoiding potential
>> contiguous multi-page allocation for large frames. It also uses
>> page pool fragments allocation instead of full page for saving
>> memory.
>>
>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
>> ---
>>  drivers/net/ethernet/cadence/macb.h      |   4 +-
>>  drivers/net/ethernet/cadence/macb_main.c | 180 ++++++++++++++---------
>>  2 files changed, 111 insertions(+), 73 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
>> index dcf768bd1bc1..e2f397b7a27f 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -960,8 +960,7 @@ struct macb_dma_desc_ptp {
>>  #define PPM_FRACTION	16
>>  
>>  /* The buf includes headroom compatible with both skb and xdpf */
>> -#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
>> -#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
>> +#define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
>>  
>>  /* struct macb_tx_skb - data about an skb which is being transmitted
>>   * @skb: skb currently being transmitted, only set for the last buffer
>> @@ -1273,6 +1272,7 @@ struct macb_queue {
>>  	struct napi_struct	napi_rx;
>>  	struct queue_stats stats;
>>  	struct page_pool	*page_pool;
>> +	struct sk_buff		*skb;
>>  };
>>  
>>  struct ethtool_rx_fs_item {
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index 985c81913ba6..be0c8e101639 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1250,21 +1250,25 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  	return packets;
>>  }
>>  
>> -static void *gem_page_pool_get_buff(struct page_pool *pool,
>> +static void *gem_page_pool_get_buff(struct  macb_queue *queue,
>>  				    dma_addr_t *dma_addr, gfp_t gfp_mask)
>>  {
>> +	struct macb *bp = queue->bp;
>>  	struct page *page;
>> +	int offset;
>>  
>> -	if (!pool)
>> +	if (!queue->page_pool)
>>  		return NULL;
>>  
>> -	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
>> +	page = page_pool_alloc_frag(queue->page_pool, &offset,
>> +				    bp->rx_buffer_size,
>> +				    gfp_mask | __GFP_NOWARN);
>>  	if (!page)
>>  		return NULL;
>>  
>> -	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
>> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM + offset;
>>  
>> -	return page_address(page);
>> +	return page_address(page) + offset;
>>  }
>>  
>>  static void gem_rx_refill(struct macb_queue *queue, bool napi)
>> @@ -1286,7 +1290,7 @@ static void gem_rx_refill(struct macb_queue *queue, bool napi)
>>  
>>  		if (!queue->rx_buff[entry]) {
>>  			/* allocate sk_buff for this free entry in ring */
>> -			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
>> +			data = gem_page_pool_get_buff(queue, &paddr,
>>  						      napi ? GFP_ATOMIC : GFP_KERNEL);
>>  			if (unlikely(!data)) {
>>  				netdev_err(bp->dev,
>> @@ -1344,20 +1348,17 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  		  int budget)
>>  {
>>  	struct macb *bp = queue->bp;
>> -	int			buffer_size;
>>  	unsigned int		len;
>>  	unsigned int		entry;
>>  	void			*data;
>> -	struct sk_buff		*skb;
>>  	struct macb_dma_desc	*desc;
>> +	int			data_len;
>>  	int			count = 0;
>>  
>> -	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
>> -
>>  	while (count < budget) {
>>  		u32 ctrl;
>>  		dma_addr_t addr;
>> -		bool rxused;
>> +		bool rxused, first_frame;
>>  
>>  		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
>>  		desc = macb_rx_desc(queue, entry);
>> @@ -1368,6 +1369,9 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  		rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>>  		addr = macb_get_addr(bp, desc);
>>  
>> +		dma_sync_single_for_cpu(&bp->pdev->dev,
>> +					addr, bp->rx_buffer_size - bp->rx_offset,
>> +					page_pool_get_dma_dir(queue->page_pool));
>
> page_pool_get_dma_dir() will always return the same value, we could
> hardcode it.
>
>>  		if (!rxused)
>>  			break;
>>  
>> @@ -1379,13 +1383,6 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  		queue->rx_tail++;
>>  		count++;
>>  
>> -		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
>> -			netdev_err(bp->dev,
>> -				   "not whole frame pointed by descriptor\n");
>> -			bp->dev->stats.rx_dropped++;
>> -			queue->stats.rx_dropped++;
>> -			break;
>> -		}
>>  		data = queue->rx_buff[entry];
>>  		if (unlikely(!data)) {
>>  			netdev_err(bp->dev,
>> @@ -1395,64 +1392,102 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  			break;
>>  		}
>>  
>> -		skb = napi_build_skb(data, buffer_size);
>> -		if (unlikely(!skb)) {
>> -			netdev_err(bp->dev,
>> -				   "Unable to allocate sk_buff\n");
>> -			page_pool_put_full_page(queue->page_pool,
>> -						virt_to_head_page(data),
>> -						false);
>> -			break;
>> +		first_frame = ctrl & MACB_BIT(RX_SOF);
>> +		len = ctrl & bp->rx_frm_len_mask;
>> +
>> +		if (len) {
>> +			data_len = len;
>> +			if (!first_frame)
>> +				data_len -= queue->skb->len;
>> +		} else {
>> +			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>>  		}
>>  
>> -		/* Properly align Ethernet header.
>> -		 *
>> -		 * Hardware can add dummy bytes if asked using the RBOF
>> -		 * field inside the NCFGR register. That feature isn't
>> -		 * available if hardware is RSC capable.
>> -		 *
>> -		 * We cannot fallback to doing the 2-byte shift before
>> -		 * DMA mapping because the address field does not allow
>> -		 * setting the low 2/3 bits.
>> -		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>> -		 */
>> -		skb_reserve(skb, bp->rx_offset);
>> -		skb_mark_for_recycle(skb);
>> +		if (first_frame) {
>> +			queue->skb = napi_build_skb(data, bp->rx_buffer_size);
>> +			if (unlikely(!queue->skb)) {
>> +				netdev_err(bp->dev,
>> +					   "Unable to allocate sk_buff\n");
>> +				goto free_frags;
>> +			}
>> +
>> +			/* Properly align Ethernet header.
>> +			 *
>> +			 * Hardware can add dummy bytes if asked using the RBOF
>> +			 * field inside the NCFGR register. That feature isn't
>> +			 * available if hardware is RSC capable.
>> +			 *
>> +			 * We cannot fallback to doing the 2-byte shift before
>> +			 * DMA mapping because the address field does not allow
>> +			 * setting the low 2/3 bits.
>> +			 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>> +			 */
>> +			skb_reserve(queue->skb, bp->rx_offset);
>> +			skb_mark_for_recycle(queue->skb);
>> +			skb_put(queue->skb, data_len);
>> +			queue->skb->protocol = eth_type_trans(queue->skb, bp->dev);
>> +
>> +			skb_checksum_none_assert(queue->skb);
>> +			if (bp->dev->features & NETIF_F_RXCSUM &&
>> +			    !(bp->dev->flags & IFF_PROMISC) &&
>> +			    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>> +				queue->skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +		} else {
>> +			if (!queue->skb) {
>> +				netdev_err(bp->dev,
>> +					   "Received non-starting frame while expecting it\n");
>> +				goto free_frags;
>> +			}
>
> Here as well we want `queue->rx_buff[entry] = NULL;` no?
> Maybe put it in the free_frags codepath to ensure it isn't forgotten?
>

That's correct, that slipped in this RFC.

>> +			struct skb_shared_info *shinfo = skb_shinfo(queue->skb);
>> +			struct page *page = virt_to_head_page(data);
>> +			int nr_frags = shinfo->nr_frags;
>
> Variable definitions in the middle of a scope? I think it is acceptable
> when using __attribute__((cleanup)) but otherwise not so much (yet).
>

I guess I simply forgot to move them.
Ack for this and for the previous ones.

>> +			if (nr_frags >= ARRAY_SIZE(shinfo->frags))
>> +				goto free_frags;
>> +
>> +			skb_add_rx_frag(queue->skb, nr_frags, page,
>> +					data - page_address(page) + bp->rx_offset,
>> +					data_len, bp->rx_buffer_size);
>> +		}
>>  
>>  		/* now everything is ready for receiving packet */
>>  		queue->rx_buff[entry] = NULL;
>> -		len = ctrl & bp->rx_frm_len_mask;
>> -
>> -		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>>  
>> -		dma_sync_single_for_cpu(&bp->pdev->dev,
>> -					addr, len,
>> -					page_pool_get_dma_dir(queue->page_pool));
>> -		skb_put(skb, len);
>> -		skb->protocol = eth_type_trans(skb, bp->dev);
>> -		skb_checksum_none_assert(skb);
>> -		if (bp->dev->features & NETIF_F_RXCSUM &&
>> -		    !(bp->dev->flags & IFF_PROMISC) &&
>> -		    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>> -			skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +		netdev_vdbg(bp->dev, "%s %u (len %u)\n", __func__, entry, data_len);
>>  
>> -		bp->dev->stats.rx_packets++;
>> -		queue->stats.rx_packets++;
>> -		bp->dev->stats.rx_bytes += skb->len;
>> -		queue->stats.rx_bytes += skb->len;
>> +		if (ctrl & MACB_BIT(RX_EOF)) {
>> +			bp->dev->stats.rx_packets++;
>> +			queue->stats.rx_packets++;
>> +			bp->dev->stats.rx_bytes += queue->skb->len;
>> +			queue->stats.rx_bytes += queue->skb->len;
>>  
>> -		gem_ptp_do_rxstamp(bp, skb, desc);
>> +			gem_ptp_do_rxstamp(bp, queue->skb, desc);
>>  
>>  #if defined(DEBUG) && defined(VERBOSE_DEBUG)
>> -		netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>> -			    skb->len, skb->csum);
>> -		print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
>> -			       skb_mac_header(skb), 16, true);
>> -		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
>> -			       skb->data, 32, true);
>> +			netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>> +				    queue->skb->len, queue->skb->csum);
>> +			print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
>> +				       skb_mac_header(queue->skb), 16, true);
>> +			print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
>> +				       queue->skb->data, 32, true);
>>  #endif
>>  
>> -		napi_gro_receive(napi, skb);
>> +			napi_gro_receive(napi, queue->skb);
>> +			queue->skb = NULL;
>> +		}
>> +
>> +		continue;
>> +
>> +free_frags:
>> +		if (queue->skb) {
>> +			dev_kfree_skb(queue->skb);
>> +			queue->skb = NULL;
>> +		} else {
>> +			page_pool_put_full_page(queue->page_pool,
>> +						virt_to_head_page(data),
>> +						false);
>> +		}
>>  	}
>>  
>>  	gem_rx_refill(queue, true);
>> @@ -2394,7 +2429,10 @@ static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
>>  	if (!macb_is_gem(bp)) {
>>  		bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>>  	} else {
>> -		bp->rx_buffer_size = size;
>> +		bp->rx_buffer_size = size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>> +			+ MACB_PP_HEADROOM;
>> +		if (bp->rx_buffer_size > PAGE_SIZE)
>> +			bp->rx_buffer_size = PAGE_SIZE;
>>  
>>  		if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
>>  			netdev_dbg(bp->dev,
>> @@ -2589,18 +2627,15 @@ static int macb_alloc_consistent(struct macb *bp)
>>  
>>  static void gem_create_page_pool(struct macb_queue *queue)
>>  {
>> -	unsigned int num_pages = DIV_ROUND_UP(queue->bp->rx_buffer_size, PAGE_SIZE);
>> -	struct macb *bp = queue->bp;
>>  	struct page_pool_params pp_params = {
>> -		.order = order_base_2(num_pages),
>> +		.order = 0,
>>  		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>>  		.pool_size = queue->bp->rx_ring_size,
>>  		.nid = NUMA_NO_NODE,
>>  		.dma_dir = DMA_FROM_DEVICE,
>>  		.dev = &queue->bp->pdev->dev,
>>  		.napi = &queue->napi_rx,
>> -		.offset = bp->rx_offset,
>> -		.max_len = MACB_PP_MAX_BUF_SIZE(num_pages),
>> +		.max_len = PAGE_SIZE,
>>  	};
>>  	struct page_pool *pool;
>
> I forgot about it in [PATCH 1/6], but the error handling if
> gem_create_page_pool() fails is odd. We set queue->page_pool to NULL
> and keep on going. Then once opened we'll fail allocating any buffer
> but still be open. Shouldn't we fail the link up operation?
>
> If we want to support this codepath (page pool not allocated), then we
> should unmask Rx interrupts only if alloc succeeded. I don't know if
> we'd want that though.
>
> 	queue_writel(queue, IER, bp->rx_intr_mask | ...);
>

Makes sense to fail the link up.
Doesn't this imply to move the page pool creation and refill into
macb_open()?

I didn't look into it, I'm not sure if this can potentially become a
bigger change.

>> @@ -2784,8 +2819,9 @@ static void macb_configure_dma(struct macb *bp)
>>  	unsigned int q;
>>  	u32 dmacfg;
>>  
>> -	buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
>>  	if (macb_is_gem(bp)) {
>> +		buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>> +		buffer_size /= RX_BUFFER_MULTIPLE;
>>  		dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L);
>>  		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>>  			if (q)
>> @@ -2816,6 +2852,8 @@ static void macb_configure_dma(struct macb *bp)
>>  		netdev_dbg(bp->dev, "Cadence configure DMA with 0x%08x\n",
>>  			   dmacfg);
>>  		gem_writel(bp, DMACFG, dmacfg);
>> +	} else {
>> +		buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
>>  	}
>>  }
>
> You do:
>
> static void macb_configure_dma(struct macb *bp)
> {
> 	u32 buffer_size;
>
> 	if (macb_is_gem(bp)) {
> 		buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
> 		buffer_size /= RX_BUFFER_MULTIPLE;
> 		// ... use buffer_size ...
> 	} else {
> 		buffer_size = bp->rx_buffer_size / RX_BUFFER_MULTIPLE;
> 	}
> }
>
> Instead I'd vote for:
>
> static void macb_configure_dma(struct macb *bp)
> {
> 	u32 buffer_size;
>
> 	if (!macb_is_gem(bp))
> 		return;
>
> 	buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
> 	buffer_size /= RX_BUFFER_MULTIPLE;
> 	// ... use buffer_size ...
> }
>
> Thanks,
>

I vote for this as well :)

> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem
  2025-11-27 14:41   ` Théo Lebrun
@ 2025-12-02 17:32     ` Paolo Valerio
  2025-12-08 10:59       ` Théo Lebrun
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-12-02 17:32 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Thomas Petazzoni, Grégory Clement, Benoît Monin

On 27 Nov 2025 at 03:41:56 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>> @@ -1273,6 +1275,7 @@ struct macb_queue {
>>  	struct queue_stats stats;
>>  	struct page_pool	*page_pool;
>>  	struct sk_buff		*skb;
>> +	struct xdp_rxq_info	xdp_q;
>>  };
>
> Those are always named `xdp_rxq` inside the kernel, we should stick with
> the calling convention no?
>

sure

>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index 5829c1f773dd..53ea1958b8e4 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1344,10 +1344,51 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
>>  	 */
>>  }
>>  
>> +static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
>> +		       struct net_device *dev)
>
> Why pass `struct net_device` explicitly? It is in queue->bp->dev.
>

let's avoid this

>> +{
>> +	struct bpf_prog *prog;
>> +	u32 act = XDP_PASS;
>> +
>> +	rcu_read_lock();
>> +
>> +	prog = rcu_dereference(queue->bp->prog);
>> +	if (!prog)
>> +		goto out;
>> +
>> +	act = bpf_prog_run_xdp(prog, xdp);
>> +	switch (act) {
>> +	case XDP_PASS:
>> +		goto out;
>> +	case XDP_REDIRECT:
>> +		if (unlikely(xdp_do_redirect(dev, xdp, prog))) {
>> +			act = XDP_DROP;
>> +			break;
>> +		}
>> +		goto out;
>
> Why the `unlikely()`?
>

just expecting the err path to be the exception, although this is not
consistend with the XDP_TX path.
Do you prefer to remove it?

>> +	default:
>> +		bpf_warn_invalid_xdp_action(dev, prog, act);
>> +		fallthrough;
>> +	case XDP_ABORTED:
>> +		trace_xdp_exception(dev, prog, act);
>> +		fallthrough;
>> +	case XDP_DROP:
>> +		break;
>> +	}
>> +
>> +	page_pool_put_full_page(queue->page_pool,
>> +				virt_to_head_page(xdp->data), true);
>
> Maybe move that to the XDP_DROP, it is the only `break` in the above
> switch statement. It will be used by the default and XDP_ABORTED cases
> through fallthrough. We can avoid the out label and its two gotos that
> way.
>

We'd not put to page pool in case the redirect fails or am I missing
something?

>>  static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  		  int budget)
>>  {
>>  	struct macb *bp = queue->bp;
>> +	bool			xdp_flush = false;
>>  	unsigned int		len;
>>  	unsigned int		entry;
>>  	void			*data;
>> @@ -1356,9 +1397,11 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  	int			count = 0;
>>  
>>  	while (count < budget) {
>> -		u32 ctrl;
>> -		dma_addr_t addr;
>>  		bool rxused, first_frame;
>> +		struct xdp_buff xdp;
>> +		dma_addr_t addr;
>> +		u32 ctrl;
>> +		u32 ret;
>>  
>>  		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
>>  		desc = macb_rx_desc(queue, entry);
>> @@ -1403,6 +1446,22 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>>  		}
>>  
>> +		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF)))
>> +			goto skip_xdp;
>> +
>> +		xdp_init_buff(&xdp, bp->rx_buffer_size, &queue->xdp_q);
>> +		xdp_prepare_buff(&xdp, data, bp->rx_offset, len,
>> +				 false);
>> +		xdp_buff_clear_frags_flag(&xdp);
>
> You prepare the XDP buffer before checking an XDP program is attached.
> Could we avoid this work? We'd move the xdp_buff preparation into
> gem_xdp_run(), after the RCU pointer dereference.
>

ack, makes sense

>> -static void gem_create_page_pool(struct macb_queue *queue)
>> +static void gem_create_page_pool(struct macb_queue *queue, int qid)
>>  {
>>  	struct page_pool_params pp_params = {
>>  		.order = 0,
>>  		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>>  		.pool_size = queue->bp->rx_ring_size,
>>  		.nid = NUMA_NO_NODE,
>> -		.dma_dir = DMA_FROM_DEVICE,
>> +		.dma_dir = rcu_access_pointer(queue->bp->prog)
>> +				? DMA_BIDIRECTIONAL
>> +				: DMA_FROM_DEVICE,
>
> Ah, that is the reason for page_pool_get_dma_dir() calls!
>

:)

>>  static int macb_change_mtu(struct net_device *dev, int new_mtu)
>>  {
>> +	int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + MACB_MAX_PAD;
>> +	struct macb *bp = netdev_priv(dev);
>> +	struct bpf_prog *prog = bp->prog;
>
> No fancy RCU macro?
>

right, guess an rcu_access_pointer() call is needed here

>> +static int macb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>> +{
>> +	struct macb *bp = netdev_priv(dev);
>> +
>> +	if (!macb_is_gem(bp))
>> +		return 0;
>
> Returning 0 sounds like a mistake, -EOPNOTSUPP sounds more appropriate.
>

agreed

>> +	switch (xdp->command) {
>> +	case XDP_SETUP_PROG:
>> +		return gem_xdp_setup(dev, xdp->prog, xdp->extack);
>> +	default:
>> +		return -EINVAL;
>
> Same here: we want -EOPNOTSUPP. Otherwise caller cannot dissociate an
> unsupported call from one that is supported but failed.
>

ack

>> +	}
>> +}
>> +
>>  static void gem_update_stats(struct macb *bp)
>>  {
>>  	struct macb_queue *queue;
>> @@ -4390,6 +4529,7 @@ static const struct net_device_ops macb_netdev_ops = {
>>  	.ndo_hwtstamp_set	= macb_hwtstamp_set,
>>  	.ndo_hwtstamp_get	= macb_hwtstamp_get,
>>  	.ndo_setup_tc		= macb_setup_tc,
>> +	.ndo_bpf		= macb_xdp,
>
> We want it to be "gem_" prefixed as it does not support MACB.
>

ack

> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic
  2025-11-27 14:49   ` Théo Lebrun
@ 2025-12-02 17:33     ` Paolo Valerio
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Valerio @ 2025-12-02 17:33 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Thomas Petazzoni, Grégory Clement, Benoît Monin

On 27 Nov 2025 at 03:49:13 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
>> index 2f665260a84d..67bb98d3cb00 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -964,19 +964,27 @@ struct macb_dma_desc_ptp {
>>  #define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
>>  #define MACB_MAX_PAD		(MACB_PP_HEADROOM + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
>>  
>> -/* struct macb_tx_skb - data about an skb which is being transmitted
>> - * @skb: skb currently being transmitted, only set for the last buffer
>> - *       of the frame
>> +enum macb_tx_buff_type {
>> +	MACB_TYPE_SKB,
>> +	MACB_TYPE_XDP_TX,
>> +	MACB_TYPE_XDP_NDO,
>> +};
>> +
>> +/* struct macb_tx_buff - data about an skb or xdp frame which is being transmitted
>> + * @data: pointer to skb or xdp frame being transmitted, only set
>> + *        for the last buffer for sk_buff
>>   * @mapping: DMA address of the skb's fragment buffer
>>   * @size: size of the DMA mapped buffer
>>   * @mapped_as_page: true when buffer was mapped with skb_frag_dma_map(),
>>   *                  false when buffer was mapped with dma_map_single()
>> + * @type: type of buffer (MACB_TYPE_SKB, MACB_TYPE_XDP_TX, MACB_TYPE_XDP_NDO)
>>   */
>> -struct macb_tx_skb {
>> -	struct sk_buff		*skb;
>> -	dma_addr_t		mapping;
>> -	size_t			size;
>> -	bool			mapped_as_page;
>> +struct macb_tx_buff {
>> +	void				*data;
>> +	dma_addr_t			mapping;
>> +	size_t				size;
>> +	bool				mapped_as_page;
>> +	enum macb_tx_buff_type		type;
>>  };
>
> Here as well, reviewing would be helped by moving the tx_skb to tx_buff
> renaming to a separate commit that has no functional change.
>
> As said in [0], I am not a fan of the field name `data`.
> Let's discuss it there.
>
> [0]: https://lore.kernel.org/all/DEITSIO441QL.X81MVLL3EIV4@bootlin.com/
>

sure

>> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int budget)
>> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
>> +			  int budget)
>>  {
>> -	if (tx_skb->mapping) {
>> -		if (tx_skb->mapped_as_page)
>> -			dma_unmap_page(&bp->pdev->dev, tx_skb->mapping,
>> -				       tx_skb->size, DMA_TO_DEVICE);
>> +	if (tx_buff->mapping) {
>> +		if (tx_buff->mapped_as_page)
>> +			dma_unmap_page(&bp->pdev->dev, tx_buff->mapping,
>> +				       tx_buff->size, DMA_TO_DEVICE);
>>  		else
>> -			dma_unmap_single(&bp->pdev->dev, tx_skb->mapping,
>> -					 tx_skb->size, DMA_TO_DEVICE);
>> -		tx_skb->mapping = 0;
>> +			dma_unmap_single(&bp->pdev->dev, tx_buff->mapping,
>> +					 tx_buff->size, DMA_TO_DEVICE);
>> +		tx_buff->mapping = 0;
>>  	}
>>  
>> -	if (tx_skb->skb) {
>> -		napi_consume_skb(tx_skb->skb, budget);
>> -		tx_skb->skb = NULL;
>> +	if (tx_buff->data) {
>> +		if (tx_buff->type != MACB_TYPE_SKB)
>> +			netdev_err(bp->dev, "BUG: Unexpected tx buffer type while unmapping (%d)",
>> +				   tx_buff->type);
>> +		napi_consume_skb(tx_buff->data, budget);
>> +		tx_buff->data = NULL;
>>  	}
>
> This code does not make much sense by itself. We check `tx_buff->type !=
> MACB_TYPE_SKB` but call napi_consume_skb() in all cases. I remember it
> changes in the next commit.
>
>> @@ -1069,16 +1073,23 @@ static void macb_tx_error_task(struct work_struct *work)
>>  
>>  		desc = macb_tx_desc(queue, tail);
>>  		ctrl = desc->ctrl;
>> -		tx_skb = macb_tx_skb(queue, tail);
>> -		skb = tx_skb->skb;
>> +		tx_buff = macb_tx_buff(queue, tail);
>> +
>> +		if (tx_buff->type != MACB_TYPE_SKB)
>> +			netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
>> +				   tx_buff->type);
>> +		skb = tx_buff->data;
>
> Same here: `tx_buff->type != MACB_TYPE_SKB` does not make sense if we
> keep on going with the SKB case anyways.
>
>>  
>>  		if (ctrl & MACB_BIT(TX_USED)) {
>>  			/* skb is set for the last buffer of the frame */
>>  			while (!skb) {
>> -				macb_tx_unmap(bp, tx_skb, 0);
>> +				macb_tx_unmap(bp, tx_buff, 0);
>>  				tail++;
>> -				tx_skb = macb_tx_skb(queue, tail);
>> -				skb = tx_skb->skb;
>> +				tx_buff = macb_tx_buff(queue, tail);
>> +				if (tx_buff->type != MACB_TYPE_SKB)
>> +					netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
>> +						   tx_buff->type);
>> +				skb = tx_buff->data;
>
> Same.
>

yeah, I'll revisit this and the ones above

>> @@ -5050,7 +5066,7 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb,
>>  		netif_stop_queue(dev);
>>  
>>  		/* Store packet information (to free when Tx completed) */
>> -		lp->rm9200_txq[desc].skb = skb;
>> +		lp->rm9200_txq[desc].data = skb;
>>  		lp->rm9200_txq[desc].size = skb->len;
>>  		lp->rm9200_txq[desc].mapping = dma_map_single(&lp->pdev->dev, skb->data,
>>  							      skb->len, DMA_TO_DEVICE);
>
> We might want to assign `lp->rm9200_txq[desc].type` here, to ensure all
> `struct macb_tx_buff` instances are fully initialised.
>

good catch

> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support
  2025-11-27 15:07   ` Théo Lebrun
@ 2025-12-02 17:34     ` Paolo Valerio
  2025-12-08 11:01       ` Théo Lebrun
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-12-02 17:34 UTC (permalink / raw)
  To: Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

On 27 Nov 2025 at 04:07:52 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> Hello Paolo, netdev,
>
> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>> Add XDP_TX verdict support, also introduce ndo_xdp_xmit function for
>> redirection, and update macb_tx_unmap() to handle both skbs and xdp
>> frames advertising NETDEV_XDP_ACT_NDO_XMIT capability and the ability
>> to process XDP_TX verdicts.
>>
>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
>> ---
>>  drivers/net/ethernet/cadence/macb_main.c | 166 +++++++++++++++++++++--
>>  1 file changed, 153 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index eeda1a3871a6..bd62d3febeb1 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -969,6 +969,17 @@ static int macb_halt_tx(struct macb *bp)
>>  					bp, TSR);
>>  }
>>  
>> +static void release_buff(void *buff, enum macb_tx_buff_type type, int budget)
>> +{
>> +	if (type == MACB_TYPE_SKB) {
>> +		napi_consume_skb(buff, budget);
>> +	} else if (type == MACB_TYPE_XDP_TX) {
>> +		xdp_return_frame_rx_napi(buff);
>> +	} else {
>> +		xdp_return_frame(buff);
>> +	}
>> +}
>> +
>>  static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
>>  			  int budget)
>>  {
>> @@ -983,10 +994,7 @@ static void macb_tx_unmap(struct macb *bp, struct macb_tx_buff *tx_buff,
>>  	}
>>  
>>  	if (tx_buff->data) {
>> -		if (tx_buff->type != MACB_TYPE_SKB)
>> -			netdev_err(bp->dev, "BUG: Unexpected tx buffer type while unmapping (%d)",
>> -				   tx_buff->type);
>> -		napi_consume_skb(tx_buff->data, budget);
>> +		release_buff(tx_buff->data, tx_buff->type, budget);
>>  		tx_buff->data = NULL;
>>  	}
>>  }
>> @@ -1076,8 +1084,8 @@ static void macb_tx_error_task(struct work_struct *work)
>>  		tx_buff = macb_tx_buff(queue, tail);
>>  
>>  		if (tx_buff->type != MACB_TYPE_SKB)
>> -			netdev_err(bp->dev, "BUG: Unexpected tx buffer type (%d)",
>> -				   tx_buff->type);
>> +			goto unmap;
>> +
>>  		skb = tx_buff->data;
>>  
>>  		if (ctrl & MACB_BIT(TX_USED)) {
>> @@ -1118,6 +1126,7 @@ static void macb_tx_error_task(struct work_struct *work)
>>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>>  		}
>>  
>> +unmap:
>>  		macb_tx_unmap(bp, tx_buff, 0);
>>  	}
>>  
>> @@ -1196,6 +1205,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
>>  	head = queue->tx_head;
>>  	for (tail = queue->tx_tail; tail != head && packets < budget; tail++) {
>> +		void			*data = NULL;
>>  		struct macb_tx_buff	*tx_buff;
>>  		struct sk_buff		*skb;
>>  		struct macb_dma_desc	*desc;
>> @@ -1218,11 +1228,16 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  		for (;; tail++) {
>>  			tx_buff = macb_tx_buff(queue, tail);
>>  
>> -			if (tx_buff->type == MACB_TYPE_SKB)
>> -				skb = tx_buff->data;
>> +			if (tx_buff->type != MACB_TYPE_SKB) {
>> +				data = tx_buff->data;
>> +				goto unmap;
>> +			}
>>  
>>  			/* First, update TX stats if needed */
>> -			if (skb) {
>> +			if (tx_buff->type == MACB_TYPE_SKB && tx_buff->data) {
>> +				data = tx_buff->data;
>> +				skb = tx_buff->data;
>> +
>>  				if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
>>  				    !ptp_one_step_sync(skb))
>>  					gem_ptp_do_txstamp(bp, skb, desc);
>> @@ -1238,6 +1253,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  				bytes += skb->len;
>>  			}
>>  
>> +unmap:
>>  			/* Now we can safely release resources */
>>  			macb_tx_unmap(bp, tx_buff, budget);
>>  
>> @@ -1245,7 +1261,7 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  			 * WARNING: at this point skb has been freed by
>>  			 * macb_tx_unmap().
>>  			 */
>> -			if (skb)
>> +			if (data)
>>  				break;
>>  		}
>>  	}
>> @@ -1357,8 +1373,124 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
>>  	 */
>>  }
>>  
>> +static int macb_xdp_submit_frame(struct macb *bp, struct xdp_frame *xdpf,
>> +				 struct net_device *dev, dma_addr_t addr)
>> +{
>> +	enum macb_tx_buff_type buff_type;
>> +	struct macb_tx_buff *tx_buff;
>> +	int cpu = smp_processor_id();
>> +	struct macb_dma_desc *desc;
>> +	struct macb_queue *queue;
>> +	unsigned long flags;
>> +	dma_addr_t mapping;
>> +	u16 queue_index;
>> +	int err = 0;
>> +	u32 ctrl;
>> +
>> +	queue_index = cpu % bp->num_queues;
>> +	queue = &bp->queues[queue_index];
>> +	buff_type = !addr ? MACB_TYPE_XDP_NDO : MACB_TYPE_XDP_TX;
>
> I am not the biggest fan of piggy-backing on !!addr to know which
> codepath called us. If the macb_xdp_submit_frame() call in gem_xdp_run()
> ever gives an addr=0 coming from macb_get_addr(bp, desc), then we will
> be submitting NDO typed frames and creating additional DMA mappings
> which would be a really hard to debug bug.
>

I guess we can add a separate boolean, WDYT?

>> +	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
>> +
>> +	/* This is a hard error, log it. */
>> +	if (CIRC_SPACE(queue->tx_head, queue->tx_tail,
>> +		       bp->tx_ring_size) < 1) {
>
> Hard wrapped line is not required, it fits in one line.
>

ack to this and all the remaning ones

>> +		netif_stop_subqueue(dev, queue_index);
>> +		netdev_dbg(bp->dev, "tx_head = %u, tx_tail = %u\n",
>> +			   queue->tx_head, queue->tx_tail);
>> +		err = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	if (!addr) {
>> +		mapping = dma_map_single(&bp->pdev->dev,
>> +					 xdpf->data,
>> +					 xdpf->len, DMA_TO_DEVICE);
>> +		if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {
>> +			err = -ENOMEM;
>> +			goto unlock;
>> +		}
>> +	} else {
>> +		mapping = addr;
>> +		dma_sync_single_for_device(&bp->pdev->dev, mapping,
>> +					   xdpf->len, DMA_BIDIRECTIONAL);
>> +	}
>> +
>> +	unsigned int tx_head = queue->tx_head + 1;
>
> Middle scope variable definition. Weirdly named as it isn't storing the
> current head offset but the future head offset.
>
>> +	ctrl = MACB_BIT(TX_USED);
>> +	desc = macb_tx_desc(queue, tx_head);
>> +	desc->ctrl = ctrl;
>> +
>> +	desc = macb_tx_desc(queue, queue->tx_head);
>> +	tx_buff = macb_tx_buff(queue, queue->tx_head);
>> +	tx_buff->data = xdpf;
>> +	tx_buff->type = buff_type;
>> +	tx_buff->mapping = mapping;
>> +	tx_buff->size = xdpf->len;
>> +	tx_buff->mapped_as_page = false;
>> +
>> +	ctrl = (u32)tx_buff->size;
>> +	ctrl |= MACB_BIT(TX_LAST);
>> +
>> +	if (unlikely(macb_tx_ring_wrap(bp, queue->tx_head) == (bp->tx_ring_size - 1)))
>> +		ctrl |= MACB_BIT(TX_WRAP);
>> +
>> +	/* Set TX buffer descriptor */
>> +	macb_set_addr(bp, desc, tx_buff->mapping);
>> +	/* desc->addr must be visible to hardware before clearing
>> +	 * 'TX_USED' bit in desc->ctrl.
>> +	 */
>> +	wmb();
>> +	desc->ctrl = ctrl;
>> +	queue->tx_head = tx_head;
>> +
>> +	/* Make newly initialized descriptor visible to hardware */
>> +	wmb();
>> +
>> +	spin_lock(&bp->lock);
>> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>> +	spin_unlock(&bp->lock);
>> +
>> +	if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
>> +		netif_stop_subqueue(dev, queue_index);
>
> The above 30~40 lines are super similar to macb_start_xmit() &
> macb_tx_map(). They implement almost the same logic; can we avoid the
> duplication?
>
>> +
>> +unlock:
>> +	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
>> +
>> +	if (err)
>> +		release_buff(xdpf, buff_type, 0);
>> +
>> +	return err;
>> +}
>> +
>> +static int
>> +macb_xdp_xmit(struct net_device *dev, int num_frame,
>> +	      struct xdp_frame **frames, u32 flags)
>> +{
>> +	struct macb *bp = netdev_priv(dev);
>> +	u32 xmitted = 0;
>> +	int i;
>> +
>> +	if (!macb_is_gem(bp))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < num_frame; i++) {
>> +		if (macb_xdp_submit_frame(bp, frames[i], dev, 0))
>> +			break;
>> +
>> +		xmitted++;
>> +	}
>> +
>> +	return xmitted;
>> +}
>> +
>>  static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
>> -		       struct net_device *dev)
>> +		       struct net_device *dev, dma_addr_t addr)
>>  {
>>  	struct bpf_prog *prog;
>>  	u32 act = XDP_PASS;
>> @@ -1379,6 +1511,12 @@ static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
>>  			break;
>>  		}
>>  		goto out;
>> +	case XDP_TX:
>> +		struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
>> +
>> +		if (!xdpf || macb_xdp_submit_frame(queue->bp, xdpf, dev, addr))
>> +			act = XDP_DROP;
>> +		goto out;
>>  	default:
>>  		bpf_warn_invalid_xdp_action(dev, prog, act);
>>  		fallthrough;
>> @@ -1467,7 +1605,7 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  				 false);
>>  		xdp_buff_clear_frags_flag(&xdp);
>>  
>> -		ret = gem_xdp_run(queue, &xdp, bp->dev);
>> +		ret = gem_xdp_run(queue, &xdp, bp->dev, addr);
>>  		if (ret == XDP_REDIRECT)
>>  			xdp_flush = true;
>>  
>> @@ -4546,6 +4684,7 @@ static const struct net_device_ops macb_netdev_ops = {
>>  	.ndo_hwtstamp_get	= macb_hwtstamp_get,
>>  	.ndo_setup_tc		= macb_setup_tc,
>>  	.ndo_bpf		= macb_xdp,
>> +	.ndo_xdp_xmit		= macb_xdp_xmit,
>
> I'd expect macb_xdp_xmit() to be called gem_xdp_xmit() as well.
>
> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
  2025-12-02 17:24   ` Paolo Valerio
@ 2025-12-03 14:28     ` Théo Lebrun
  0 siblings, 0 replies; 28+ messages in thread
From: Théo Lebrun @ 2025-12-03 14:28 UTC (permalink / raw)
  To: Paolo Valerio, Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi

On Tue Dec 2, 2025 at 6:24 PM CET, Paolo Valerio wrote:
> On 26 Nov 2025 at 07:08:14 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>> ### Rx buffer size computation

[...]

>>  - NET_IP_ALIGN is accounted for in the headroom even though it isn't
>>    present if !RSC.
>
> that's something I noticed and I was a unsure about the reason.

Mistake because I forgot, nothing more than that.

>>  - If the size clamping to PAGE_SIZE comes into play, we are probably
>>    doomed. It means we cannot deal with the MTU and we'll probably get
>>    corruption. If we do put a check in place, it should loudly fail
>>    rather than silently clamp.
>
> That should not happen, unless I'm missing something.
> E.g., 9000B mtu on a 4K PAGE_SIZE kernel should be handled with multiple
> descriptors. The clamping is there because according with how the series
> creates the pool, the maximum buffer size is page order 0.
>
> Hardware-wise bp->rx_buffer_size should also be taken into account for
> the receive buffer size.

Yes I agree. We can drop the check, I was not implying we *had* to keep
the check.

[...]

>> ### Buffer variable names
>>
>> Related: so many variables, fields or constants have ambiguous names,
>> can we do something about it?
>>
>>  - bp->rx_offset is named oddly to my ears. Offset to what?
>>    Maybe bp->rx_head or bp->rx_headroom?
>
> bp->rx_headroom sounds a good choice to me, but if you have a stronger
> preference for bp->rx_head just let me know.

No strong preference, ack for bp->rx_headroom.

[...]

>> ### XDP_SETUP_PROG if netif_running()
>>
>> I'd like to start a discussion on the expected behavior on XDP program
>> change if netif_running(). Summarised:
>>
>> static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
>>              struct netlink_ext_ack *extack)
>> {
>>     bool running = netif_running(dev);
>>     bool need_update = !!bp->prog != !!prog;
>>
>>     if (running && need_update)
>>         macb_close(dev);
>>     old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
>>     if (running && need_update)
>>         return macb_open(dev);
>> }
>>
>> Have you experimented with that? I don't see anything graceful in our
>> close operation, it looks like we'll get corruption or dropped packets
>> or both. We shouldn't impose that on the user who just wanted to swap
>> the program.
>>
>> I cannot find any good reason that implies we wouldn't be able to swap
>> our XDP program on the fly. If we think it is unsafe, I'd vote for
>> starting with a -EBUSY return code and iterating on that.
>>
>
> I didn't experiment much with this, other than simply adding and
> removing programs as needed during my tests. Didn't experience
> particular issues.
>
> The reason a close/open sequence was added here was mostly because I was
> considering to account XDP_PACKET_HEADROOM only when a program was
> present. I later decided to not proceed with that (mostly to avoid
> changing too many things at once).
>
> Given the geometry of the buffer remains untouched in either case, I
> see no particular reasons we can't swap on the fly as you suggest.
>
> I'll try this and change it, thanks!

Yes! I had guessed that you thought about changing the headroom based on
XDP or !XDP by reading the code. :-)

I agree we should aim for on-the-fly swapping available in all cases,
it sounds reasonable to achieve and a nice-to-have feature.

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception
  2025-12-02 17:32     ` Paolo Valerio
@ 2025-12-08 10:21       ` Théo Lebrun
  2025-12-08 10:22         ` [PATCH 1/8] net: macb: move Rx buffers alloc from link up to open Théo Lebrun
  2025-12-08 12:53         ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
  0 siblings, 2 replies; 28+ messages in thread
From: Théo Lebrun @ 2025-12-08 10:21 UTC (permalink / raw)
  To: Paolo Valerio, Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

Hello Paolo, Théo, netdev,

On Tue Dec 2, 2025 at 6:32 PM CET, Paolo Valerio wrote:
> On 27 Nov 2025 at 02:38:45 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>
>> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>>> Add support for receiving network frames that span multiple DMA
>>> descriptors in the Cadence MACB/GEM Ethernet driver.
>>>
>>> The patch removes the requirement that limited frame reception to
>>> a single descriptor (RX_SOF && RX_EOF), also avoiding potential
>>> contiguous multi-page allocation for large frames. It also uses
>>> page pool fragments allocation instead of full page for saving
>>> memory.
>>>
>>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
>>> ---
>>>  drivers/net/ethernet/cadence/macb.h      |   4 +-
>>>  drivers/net/ethernet/cadence/macb_main.c | 180 ++++++++++++++---------
>>>  2 files changed, 111 insertions(+), 73 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
>>> index dcf768bd1bc1..e2f397b7a27f 100644
>>> --- a/drivers/net/ethernet/cadence/macb.h
>>> +++ b/drivers/net/ethernet/cadence/macb.h
>>> @@ -960,8 +960,7 @@ struct macb_dma_desc_ptp {
>>>  #define PPM_FRACTION	16
>>>  
>>>  /* The buf includes headroom compatible with both skb and xdpf */
>>> -#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
>>> -#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
>>> +#define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
>>>  
>>>  /* struct macb_tx_skb - data about an skb which is being transmitted
>>>   * @skb: skb currently being transmitted, only set for the last buffer
>>> @@ -1273,6 +1272,7 @@ struct macb_queue {
>>>  	struct napi_struct	napi_rx;
>>>  	struct queue_stats stats;
>>>  	struct page_pool	*page_pool;
>>> +	struct sk_buff		*skb;
>>>  };
>>>  
>>>  struct ethtool_rx_fs_item {
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>>> index 985c81913ba6..be0c8e101639 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> @@ -1250,21 +1250,25 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>>  	return packets;
>>>  }
>>>  
>>> -static void *gem_page_pool_get_buff(struct page_pool *pool,
>>> +static void *gem_page_pool_get_buff(struct  macb_queue *queue,
>>>  				    dma_addr_t *dma_addr, gfp_t gfp_mask)
>>>  {
>>> +	struct macb *bp = queue->bp;
>>>  	struct page *page;
>>> +	int offset;
>>>  
>>> -	if (!pool)
>>> +	if (!queue->page_pool)
>>>  		return NULL;
>>>  
>>> -	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
>>> +	page = page_pool_alloc_frag(queue->page_pool, &offset,
>>> +				    bp->rx_buffer_size,
>>> +				    gfp_mask | __GFP_NOWARN);
>>>  	if (!page)
>>>  		return NULL;
>>>  
>>> -	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
>>> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM + offset;
>>>  
>>> -	return page_address(page);
>>> +	return page_address(page) + offset;
>>>  }
>>>  
>>>  static void gem_rx_refill(struct macb_queue *queue, bool napi)
>>> @@ -1286,7 +1290,7 @@ static void gem_rx_refill(struct macb_queue *queue, bool napi)
>>>  
>>>  		if (!queue->rx_buff[entry]) {
>>>  			/* allocate sk_buff for this free entry in ring */
>>> -			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
>>> +			data = gem_page_pool_get_buff(queue, &paddr,
>>>  						      napi ? GFP_ATOMIC : GFP_KERNEL);
>>>  			if (unlikely(!data)) {
>>>  				netdev_err(bp->dev,
>>> @@ -1344,20 +1348,17 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>  		  int budget)
>>>  {
>>>  	struct macb *bp = queue->bp;
>>> -	int			buffer_size;
>>>  	unsigned int		len;
>>>  	unsigned int		entry;
>>>  	void			*data;
>>> -	struct sk_buff		*skb;
>>>  	struct macb_dma_desc	*desc;
>>> +	int			data_len;
>>>  	int			count = 0;
>>>  
>>> -	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
>>> -
>>>  	while (count < budget) {
>>>  		u32 ctrl;
>>>  		dma_addr_t addr;
>>> -		bool rxused;
>>> +		bool rxused, first_frame;
>>>  
>>>  		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
>>>  		desc = macb_rx_desc(queue, entry);
>>> @@ -1368,6 +1369,9 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>  		rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>>>  		addr = macb_get_addr(bp, desc);
>>>  
>>> +		dma_sync_single_for_cpu(&bp->pdev->dev,
>>> +					addr, bp->rx_buffer_size - bp->rx_offset,
>>> +					page_pool_get_dma_dir(queue->page_pool));
>>
>> page_pool_get_dma_dir() will always return the same value, we could
>> hardcode it.
>>
>>>  		if (!rxused)
>>>  			break;
>>>  
>>> @@ -1379,13 +1383,6 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>  		queue->rx_tail++;
>>>  		count++;
>>>  
>>> -		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
>>> -			netdev_err(bp->dev,
>>> -				   "not whole frame pointed by descriptor\n");
>>> -			bp->dev->stats.rx_dropped++;
>>> -			queue->stats.rx_dropped++;
>>> -			break;
>>> -		}
>>>  		data = queue->rx_buff[entry];
>>>  		if (unlikely(!data)) {
>>>  			netdev_err(bp->dev,
>>> @@ -1395,64 +1392,102 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>  			break;
>>>  		}
>>>  
>>> -		skb = napi_build_skb(data, buffer_size);
>>> -		if (unlikely(!skb)) {
>>> -			netdev_err(bp->dev,
>>> -				   "Unable to allocate sk_buff\n");
>>> -			page_pool_put_full_page(queue->page_pool,
>>> -						virt_to_head_page(data),
>>> -						false);
>>> -			break;
>>> +		first_frame = ctrl & MACB_BIT(RX_SOF);
>>> +		len = ctrl & bp->rx_frm_len_mask;
>>> +
>>> +		if (len) {
>>> +			data_len = len;
>>> +			if (!first_frame)
>>> +				data_len -= queue->skb->len;
>>> +		} else {
>>> +			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>>>  		}
>>>  
>>> -		/* Properly align Ethernet header.
>>> -		 *
>>> -		 * Hardware can add dummy bytes if asked using the RBOF
>>> -		 * field inside the NCFGR register. That feature isn't
>>> -		 * available if hardware is RSC capable.
>>> -		 *
>>> -		 * We cannot fallback to doing the 2-byte shift before
>>> -		 * DMA mapping because the address field does not allow
>>> -		 * setting the low 2/3 bits.
>>> -		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>>> -		 */
>>> -		skb_reserve(skb, bp->rx_offset);
>>> -		skb_mark_for_recycle(skb);
>>> +		if (first_frame) {
>>> +			queue->skb = napi_build_skb(data, bp->rx_buffer_size);
>>> +			if (unlikely(!queue->skb)) {
>>> +				netdev_err(bp->dev,
>>> +					   "Unable to allocate sk_buff\n");
>>> +				goto free_frags;
>>> +			}
>>> +
>>> +			/* Properly align Ethernet header.
>>> +			 *
>>> +			 * Hardware can add dummy bytes if asked using the RBOF
>>> +			 * field inside the NCFGR register. That feature isn't
>>> +			 * available if hardware is RSC capable.
>>> +			 *
>>> +			 * We cannot fallback to doing the 2-byte shift before
>>> +			 * DMA mapping because the address field does not allow
>>> +			 * setting the low 2/3 bits.
>>> +			 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>>> +			 */
>>> +			skb_reserve(queue->skb, bp->rx_offset);
>>> +			skb_mark_for_recycle(queue->skb);
>>> +			skb_put(queue->skb, data_len);
>>> +			queue->skb->protocol = eth_type_trans(queue->skb, bp->dev);
>>> +
>>> +			skb_checksum_none_assert(queue->skb);
>>> +			if (bp->dev->features & NETIF_F_RXCSUM &&
>>> +			    !(bp->dev->flags & IFF_PROMISC) &&
>>> +			    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>>> +				queue->skb->ip_summed = CHECKSUM_UNNECESSARY;
>>> +		} else {
>>> +			if (!queue->skb) {
>>> +				netdev_err(bp->dev,
>>> +					   "Received non-starting frame while expecting it\n");
>>> +				goto free_frags;
>>> +			}
>>
>> Here as well we want `queue->rx_buff[entry] = NULL;` no?
>> Maybe put it in the free_frags codepath to ensure it isn't forgotten?
>>
>
> That's correct, that slipped in this RFC.
>
>>> +			struct skb_shared_info *shinfo = skb_shinfo(queue->skb);
>>> +			struct page *page = virt_to_head_page(data);
>>> +			int nr_frags = shinfo->nr_frags;
>>
>> Variable definitions in the middle of a scope? I think it is acceptable
>> when using __attribute__((cleanup)) but otherwise not so much (yet).
>>
>
> I guess I simply forgot to move them.
> Ack for this and for the previous ones.
>
>>> +			if (nr_frags >= ARRAY_SIZE(shinfo->frags))
>>> +				goto free_frags;
>>> +
>>> +			skb_add_rx_frag(queue->skb, nr_frags, page,
>>> +					data - page_address(page) + bp->rx_offset,
>>> +					data_len, bp->rx_buffer_size);
>>> +		}
>>>  
>>>  		/* now everything is ready for receiving packet */
>>>  		queue->rx_buff[entry] = NULL;
>>> -		len = ctrl & bp->rx_frm_len_mask;
>>> -
>>> -		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>>>  
>>> -		dma_sync_single_for_cpu(&bp->pdev->dev,
>>> -					addr, len,
>>> -					page_pool_get_dma_dir(queue->page_pool));
>>> -		skb_put(skb, len);
>>> -		skb->protocol = eth_type_trans(skb, bp->dev);
>>> -		skb_checksum_none_assert(skb);
>>> -		if (bp->dev->features & NETIF_F_RXCSUM &&
>>> -		    !(bp->dev->flags & IFF_PROMISC) &&
>>> -		    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>>> -			skb->ip_summed = CHECKSUM_UNNECESSARY;
>>> +		netdev_vdbg(bp->dev, "%s %u (len %u)\n", __func__, entry, data_len);
>>>  
>>> -		bp->dev->stats.rx_packets++;
>>> -		queue->stats.rx_packets++;
>>> -		bp->dev->stats.rx_bytes += skb->len;
>>> -		queue->stats.rx_bytes += skb->len;
>>> +		if (ctrl & MACB_BIT(RX_EOF)) {
>>> +			bp->dev->stats.rx_packets++;
>>> +			queue->stats.rx_packets++;
>>> +			bp->dev->stats.rx_bytes += queue->skb->len;
>>> +			queue->stats.rx_bytes += queue->skb->len;
>>>  
>>> -		gem_ptp_do_rxstamp(bp, skb, desc);
>>> +			gem_ptp_do_rxstamp(bp, queue->skb, desc);
>>>  
>>>  #if defined(DEBUG) && defined(VERBOSE_DEBUG)
>>> -		netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>>> -			    skb->len, skb->csum);
>>> -		print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>> -			       skb_mac_header(skb), 16, true);
>>> -		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>> -			       skb->data, 32, true);
>>> +			netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>>> +				    queue->skb->len, queue->skb->csum);
>>> +			print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>> +				       skb_mac_header(queue->skb), 16, true);
>>> +			print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>> +				       queue->skb->data, 32, true);
>>>  #endif
>>>  
>>> -		napi_gro_receive(napi, skb);
>>> +			napi_gro_receive(napi, queue->skb);
>>> +			queue->skb = NULL;
>>> +		}
>>> +
>>> +		continue;
>>> +
>>> +free_frags:
>>> +		if (queue->skb) {
>>> +			dev_kfree_skb(queue->skb);
>>> +			queue->skb = NULL;
>>> +		} else {
>>> +			page_pool_put_full_page(queue->page_pool,
>>> +						virt_to_head_page(data),
>>> +						false);
>>> +		}
>>>  	}
>>>  
>>>  	gem_rx_refill(queue, true);
>>> @@ -2394,7 +2429,10 @@ static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
>>>  	if (!macb_is_gem(bp)) {
>>>  		bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>>>  	} else {
>>> -		bp->rx_buffer_size = size;
>>> +		bp->rx_buffer_size = size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>>> +			+ MACB_PP_HEADROOM;
>>> +		if (bp->rx_buffer_size > PAGE_SIZE)
>>> +			bp->rx_buffer_size = PAGE_SIZE;
>>>  
>>>  		if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
>>>  			netdev_dbg(bp->dev,
>>> @@ -2589,18 +2627,15 @@ static int macb_alloc_consistent(struct macb *bp)
>>>  
>>>  static void gem_create_page_pool(struct macb_queue *queue)
>>>  {
>>> -	unsigned int num_pages = DIV_ROUND_UP(queue->bp->rx_buffer_size, PAGE_SIZE);
>>> -	struct macb *bp = queue->bp;
>>>  	struct page_pool_params pp_params = {
>>> -		.order = order_base_2(num_pages),
>>> +		.order = 0,
>>>  		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>>>  		.pool_size = queue->bp->rx_ring_size,
>>>  		.nid = NUMA_NO_NODE,
>>>  		.dma_dir = DMA_FROM_DEVICE,
>>>  		.dev = &queue->bp->pdev->dev,
>>>  		.napi = &queue->napi_rx,
>>> -		.offset = bp->rx_offset,
>>> -		.max_len = MACB_PP_MAX_BUF_SIZE(num_pages),
>>> +		.max_len = PAGE_SIZE,
>>>  	};
>>>  	struct page_pool *pool;
>>
>> I forgot about it in [PATCH 1/6], but the error handling if
>> gem_create_page_pool() fails is odd. We set queue->page_pool to NULL
>> and keep on going. Then once opened we'll fail allocating any buffer
>> but still be open. Shouldn't we fail the link up operation?
>>
>> If we want to support this codepath (page pool not allocated), then we
>> should unmask Rx interrupts only if alloc succeeded. I don't know if
>> we'd want that though.
>>
>> 	queue_writel(queue, IER, bp->rx_intr_mask | ...);
>>
>
> Makes sense to fail the link up.
> Doesn't this imply to move the page pool creation and refill into
> macb_open()?
>
> I didn't look into it, I'm not sure if this can potentially become a
> bigger change.

So I looked into it. Indeed buffer alloc should be done at open, doing
it at link up (that cannot fail) makes no sense. It is probably
historical, because on MACB it is mog_alloc_rx_buffers() that does all
the alloc. On GEM it only does the ring buffer but not each individual
slot buffer, which is done by ->mog_init_rings() / gem_rx_refill().

I am linking a patch that applies before your series. Then the rebase
conflict resolution is pretty simple, and the gem_create_page_pool()
function should be adapted something like:

-- >8 ------------------------------------------------------------------

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 3dcba7d672bf..55237b840289 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2865,7 +2865,7 @@ static int macb_alloc_consistent(struct macb *bp)
   return -ENOMEM;
 }

-static void gem_create_page_pool(struct macb_queue *queue, int qid)
+static int gem_create_page_pool(struct macb_queue *queue, int qid)
 {
   struct page_pool_params pp_params = {
      .order = 0,
@@ -2885,7 +2885,8 @@ static void gem_create_page_pool(struct macb_queue *queue, int qid)
   pool = page_pool_create(&pp_params);
   if (IS_ERR(pool)) {
      netdev_err(queue->bp->dev, "cannot create rx page pool\n");
-     pool = NULL;
+     err = PTR_ERR(pool);
+     goto clear_pool_ptr;
   }

   queue->page_pool = pool;
@@ -2904,13 +2905,15 @@ static void gem_create_page_pool(struct macb_queue *queue, int qid)
      goto unreg_info;
   }

-  return;
+  return 0;

 unreg_info:
   xdp_rxq_info_unreg(&queue->xdp_q);
 destroy_pool:
   page_pool_destroy(pool);
+clear_pool_ptr:
   queue->page_pool = NULL;
+  return err;
 }

 static void macb_init_tieoff(struct macb *bp)

-- >8 ------------------------------------------------------------------

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/8] net: macb: move Rx buffers alloc from link up to open
  2025-12-08 10:21       ` Théo Lebrun
@ 2025-12-08 10:22         ` Théo Lebrun
  2025-12-08 12:53         ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
  1 sibling, 0 replies; 28+ messages in thread
From: Théo Lebrun @ 2025-12-08 10:22 UTC (permalink / raw)
  To: theo.lebrun
  Cc: andrew+netdev, benoit.monin, claudiu.beznea, davem, edumazet,
	gregory.clement, kuba, lorenzo, netdev, nicolas.ferre, pabeni,
	pvalerio, thomas.petazzoni

mog_alloc_rx_buffers(), getting called at open, does not do rx buffer
alloc on GEM. The bulk of the work is done by gem_rx_refill() filling
up all slots with valid buffers.

gem_rx_refill() is called at link up by
gem_init_rings() == bp->macbgem_ops.mog_init_rings().

Move operation to macb_open(), mostly to allow it to fail early and
loudly rather than init the device with Rx mostly broken.

About `bool fail_early`:
 - When called from macb_open(), ring init fails as soon as a queue
   cannot be refilled.
 - When called from macb_hresp_error_task(), we do our best to reinit
   the device: we still iterate over all queues and try refilling all
   even if a previous queue failed.

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
 drivers/net/ethernet/cadence/macb.h      |  2 +-
 drivers/net/ethernet/cadence/macb_main.c | 34 ++++++++++++++++++------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 87414a2ddf6e..2cb65ec37d44 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1180,7 +1180,7 @@ struct macb_queue;
 struct macb_or_gem_ops {
 	int	(*mog_alloc_rx_buffers)(struct macb *bp);
 	void	(*mog_free_rx_buffers)(struct macb *bp);
-	void	(*mog_init_rings)(struct macb *bp);
+	int	(*mog_init_rings)(struct macb *bp, bool fail_early);
 	int	(*mog_rx)(struct macb_queue *queue, struct napi_struct *napi,
 			  int budget);
 };
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index e461f5072884..65431a7e3533 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -705,10 +705,9 @@ static void macb_mac_link_up(struct phylink_config *config,
 		if (rx_pause)
 			ctrl |= MACB_BIT(PAE);
 
-		/* Initialize rings & buffers as clearing MACB_BIT(TE) in link down
-		 * cleared the pipeline and control registers.
+		/* Initialize buffer registers as clearing MACB_BIT(TE) in link
+		 * down cleared the pipeline and control registers.
 		 */
-		bp->macbgem_ops.mog_init_rings(bp);
 		macb_init_buffers(bp);
 
 		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
@@ -1250,13 +1249,14 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 	return packets;
 }
 
-static void gem_rx_refill(struct macb_queue *queue)
+static int gem_rx_refill(struct macb_queue *queue)
 {
 	unsigned int		entry;
 	struct sk_buff		*skb;
 	dma_addr_t		paddr;
 	struct macb *bp = queue->bp;
 	struct macb_dma_desc *desc;
+	int err = 0;
 
 	while (CIRC_SPACE(queue->rx_prepared_head, queue->rx_tail,
 			bp->rx_ring_size) > 0) {
@@ -1273,6 +1273,7 @@ static void gem_rx_refill(struct macb_queue *queue)
 			if (unlikely(!skb)) {
 				netdev_err(bp->dev,
 					   "Unable to allocate sk_buff\n");
+				err = -ENOMEM;
 				break;
 			}
 
@@ -1322,6 +1323,7 @@ static void gem_rx_refill(struct macb_queue *queue)
 
 	netdev_vdbg(bp->dev, "rx ring: queue: %p, prepared head %d, tail %d\n",
 			queue, queue->rx_prepared_head, queue->rx_tail);
+	return err;
 }
 
 /* Mark DMA descriptors from begin up to and not including end as unused */
@@ -1774,7 +1776,7 @@ static void macb_hresp_error_task(struct work_struct *work)
 	netif_tx_stop_all_queues(dev);
 	netif_carrier_off(dev);
 
-	bp->macbgem_ops.mog_init_rings(bp);
+	bp->macbgem_ops.mog_init_rings(bp, false);
 
 	/* Initialize TX and RX buffers */
 	macb_init_buffers(bp);
@@ -2549,6 +2551,8 @@ static int macb_alloc_consistent(struct macb *bp)
 	}
 	if (bp->macbgem_ops.mog_alloc_rx_buffers(bp))
 		goto out_err;
+	if (bp->macbgem_ops.mog_init_rings(bp, true))
+		goto out_err;
 
 	/* Required for tie off descriptor for PM cases */
 	if (!(bp->caps & MACB_CAPS_QUEUE_DISABLE)) {
@@ -2580,11 +2584,13 @@ static void macb_init_tieoff(struct macb *bp)
 	desc->ctrl = 0;
 }
 
-static void gem_init_rings(struct macb *bp)
+static int gem_init_rings(struct macb *bp, bool fail_early)
 {
 	struct macb_queue *queue;
 	struct macb_dma_desc *desc = NULL;
+	int last_err = 0;
 	unsigned int q;
+	int err;
 	int i;
 
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
@@ -2600,13 +2606,24 @@ static void gem_init_rings(struct macb *bp)
 		queue->rx_tail = 0;
 		queue->rx_prepared_head = 0;
 
-		gem_rx_refill(queue);
+		/* We get called in two cases:
+		 *  - open: we can propagate alloc errors (so fail early),
+		 *  - HRESP error: cannot propagate, we attempt to reinit
+		 *    all queues in case of failure.
+		 */
+		err = gem_rx_refill(queue);
+		if (err) {
+			last_err = err;
+			if (fail_early)
+				break;
+		}
 	}
 
 	macb_init_tieoff(bp);
+	return last_err;
 }
 
-static void macb_init_rings(struct macb *bp)
+static int macb_init_rings(struct macb *bp, bool fail_early)
 {
 	int i;
 	struct macb_dma_desc *desc = NULL;
@@ -2623,6 +2640,7 @@ static void macb_init_rings(struct macb *bp)
 	desc->ctrl |= MACB_BIT(TX_WRAP);
 
 	macb_init_tieoff(bp);
+	return 0;
 }
 
 static void macb_reset_hw(struct macb *bp)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem
  2025-12-02 17:32     ` Paolo Valerio
@ 2025-12-08 10:59       ` Théo Lebrun
  0 siblings, 0 replies; 28+ messages in thread
From: Théo Lebrun @ 2025-12-08 10:59 UTC (permalink / raw)
  To: Paolo Valerio, Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Thomas Petazzoni, Grégory Clement, Benoît Monin

On Tue Dec 2, 2025 at 6:32 PM CET, Paolo Valerio wrote:
> On 27 Nov 2025 at 03:41:56 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>>> index 5829c1f773dd..53ea1958b8e4 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> +static u32 gem_xdp_run(struct macb_queue *queue, struct xdp_buff *xdp,
>>> +		       struct net_device *dev)

[...]

>>> +{
>>> +	struct bpf_prog *prog;
>>> +	u32 act = XDP_PASS;
>>> +
>>> +	rcu_read_lock();
>>> +
>>> +	prog = rcu_dereference(queue->bp->prog);
>>> +	if (!prog)
>>> +		goto out;
>>> +
>>> +	act = bpf_prog_run_xdp(prog, xdp);
>>> +	switch (act) {
>>> +	case XDP_PASS:
>>> +		goto out;
>>> +	case XDP_REDIRECT:
>>> +		if (unlikely(xdp_do_redirect(dev, xdp, prog))) {
>>> +			act = XDP_DROP;
>>> +			break;
>>> +		}
>>> +		goto out;
>>
>> Why the `unlikely()`?
>
> just expecting the err path to be the exception, although this is not
> consistend with the XDP_TX path.
> Do you prefer to remove it?

No we can keep it, I had missed this was the error case.
Sorry about that.

>
>>> +	default:
>>> +		bpf_warn_invalid_xdp_action(dev, prog, act);
>>> +		fallthrough;
>>> +	case XDP_ABORTED:
>>> +		trace_xdp_exception(dev, prog, act);
>>> +		fallthrough;
>>> +	case XDP_DROP:
>>> +		break;
>>> +	}
>>> +
>>> +	page_pool_put_full_page(queue->page_pool,
>>> +				virt_to_head_page(xdp->data), true);
>>
>> Maybe move that to the XDP_DROP, it is the only `break` in the above
>> switch statement. It will be used by the default and XDP_ABORTED cases
>> through fallthrough. We can avoid the out label and its two gotos that
>> way.
>
> We'd not put to page pool in case the redirect fails or am I missing
> something?

Ah yes there are two break statements. I missed it.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support
  2025-12-02 17:34     ` Paolo Valerio
@ 2025-12-08 11:01       ` Théo Lebrun
  0 siblings, 0 replies; 28+ messages in thread
From: Théo Lebrun @ 2025-12-08 11:01 UTC (permalink / raw)
  To: Paolo Valerio, Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Thomas Petazzoni, Grégory Clement

On Tue Dec 2, 2025 at 6:34 PM CET, Paolo Valerio wrote:
> On 27 Nov 2025 at 04:07:52 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>>> index eeda1a3871a6..bd62d3febeb1 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> +static int macb_xdp_submit_frame(struct macb *bp, struct xdp_frame *xdpf,
>>> +				 struct net_device *dev, dma_addr_t addr)
>>> +{
>>> +	enum macb_tx_buff_type buff_type;
>>> +	struct macb_tx_buff *tx_buff;
>>> +	int cpu = smp_processor_id();
>>> +	struct macb_dma_desc *desc;
>>> +	struct macb_queue *queue;
>>> +	unsigned long flags;
>>> +	dma_addr_t mapping;
>>> +	u16 queue_index;
>>> +	int err = 0;
>>> +	u32 ctrl;
>>> +
>>> +	queue_index = cpu % bp->num_queues;
>>> +	queue = &bp->queues[queue_index];
>>> +	buff_type = !addr ? MACB_TYPE_XDP_NDO : MACB_TYPE_XDP_TX;
>>
>> I am not the biggest fan of piggy-backing on !!addr to know which
>> codepath called us. If the macb_xdp_submit_frame() call in gem_xdp_run()
>> ever gives an addr=0 coming from macb_get_addr(bp, desc), then we will
>> be submitting NDO typed frames and creating additional DMA mappings
>> which would be a really hard to debug bug.
>
> I guess we can add a separate boolean, WDYT?

I agree!

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception
  2025-12-08 10:21       ` Théo Lebrun
  2025-12-08 10:22         ` [PATCH 1/8] net: macb: move Rx buffers alloc from link up to open Théo Lebrun
@ 2025-12-08 12:53         ` Paolo Valerio
  2025-12-09  9:01           ` Théo Lebrun
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Valerio @ 2025-12-08 12:53 UTC (permalink / raw)
  To: Théo Lebrun, Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

On 08 Dec 2025 at 11:21:00 AM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> Hello Paolo, Théo, netdev,
>
> On Tue Dec 2, 2025 at 6:32 PM CET, Paolo Valerio wrote:
>> On 27 Nov 2025 at 02:38:45 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>>
>>> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>>>> Add support for receiving network frames that span multiple DMA
>>>> descriptors in the Cadence MACB/GEM Ethernet driver.
>>>>
>>>> The patch removes the requirement that limited frame reception to
>>>> a single descriptor (RX_SOF && RX_EOF), also avoiding potential
>>>> contiguous multi-page allocation for large frames. It also uses
>>>> page pool fragments allocation instead of full page for saving
>>>> memory.
>>>>
>>>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
>>>> ---
>>>>  drivers/net/ethernet/cadence/macb.h      |   4 +-
>>>>  drivers/net/ethernet/cadence/macb_main.c | 180 ++++++++++++++---------
>>>>  2 files changed, 111 insertions(+), 73 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
>>>> index dcf768bd1bc1..e2f397b7a27f 100644
>>>> --- a/drivers/net/ethernet/cadence/macb.h
>>>> +++ b/drivers/net/ethernet/cadence/macb.h
>>>> @@ -960,8 +960,7 @@ struct macb_dma_desc_ptp {
>>>>  #define PPM_FRACTION	16
>>>>  
>>>>  /* The buf includes headroom compatible with both skb and xdpf */
>>>> -#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
>>>> -#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
>>>> +#define MACB_PP_HEADROOM	XDP_PACKET_HEADROOM
>>>>  
>>>>  /* struct macb_tx_skb - data about an skb which is being transmitted
>>>>   * @skb: skb currently being transmitted, only set for the last buffer
>>>> @@ -1273,6 +1272,7 @@ struct macb_queue {
>>>>  	struct napi_struct	napi_rx;
>>>>  	struct queue_stats stats;
>>>>  	struct page_pool	*page_pool;
>>>> +	struct sk_buff		*skb;
>>>>  };
>>>>  
>>>>  struct ethtool_rx_fs_item {
>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>>>> index 985c81913ba6..be0c8e101639 100644
>>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>>> @@ -1250,21 +1250,25 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>>>  	return packets;
>>>>  }
>>>>  
>>>> -static void *gem_page_pool_get_buff(struct page_pool *pool,
>>>> +static void *gem_page_pool_get_buff(struct  macb_queue *queue,
>>>>  				    dma_addr_t *dma_addr, gfp_t gfp_mask)
>>>>  {
>>>> +	struct macb *bp = queue->bp;
>>>>  	struct page *page;
>>>> +	int offset;
>>>>  
>>>> -	if (!pool)
>>>> +	if (!queue->page_pool)
>>>>  		return NULL;
>>>>  
>>>> -	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
>>>> +	page = page_pool_alloc_frag(queue->page_pool, &offset,
>>>> +				    bp->rx_buffer_size,
>>>> +				    gfp_mask | __GFP_NOWARN);
>>>>  	if (!page)
>>>>  		return NULL;
>>>>  
>>>> -	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
>>>> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM + offset;
>>>>  
>>>> -	return page_address(page);
>>>> +	return page_address(page) + offset;
>>>>  }
>>>>  
>>>>  static void gem_rx_refill(struct macb_queue *queue, bool napi)
>>>> @@ -1286,7 +1290,7 @@ static void gem_rx_refill(struct macb_queue *queue, bool napi)
>>>>  
>>>>  		if (!queue->rx_buff[entry]) {
>>>>  			/* allocate sk_buff for this free entry in ring */
>>>> -			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
>>>> +			data = gem_page_pool_get_buff(queue, &paddr,
>>>>  						      napi ? GFP_ATOMIC : GFP_KERNEL);
>>>>  			if (unlikely(!data)) {
>>>>  				netdev_err(bp->dev,
>>>> @@ -1344,20 +1348,17 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>>  		  int budget)
>>>>  {
>>>>  	struct macb *bp = queue->bp;
>>>> -	int			buffer_size;
>>>>  	unsigned int		len;
>>>>  	unsigned int		entry;
>>>>  	void			*data;
>>>> -	struct sk_buff		*skb;
>>>>  	struct macb_dma_desc	*desc;
>>>> +	int			data_len;
>>>>  	int			count = 0;
>>>>  
>>>> -	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
>>>> -
>>>>  	while (count < budget) {
>>>>  		u32 ctrl;
>>>>  		dma_addr_t addr;
>>>> -		bool rxused;
>>>> +		bool rxused, first_frame;
>>>>  
>>>>  		entry = macb_rx_ring_wrap(bp, queue->rx_tail);
>>>>  		desc = macb_rx_desc(queue, entry);
>>>> @@ -1368,6 +1369,9 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>>  		rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>>>>  		addr = macb_get_addr(bp, desc);
>>>>  
>>>> +		dma_sync_single_for_cpu(&bp->pdev->dev,
>>>> +					addr, bp->rx_buffer_size - bp->rx_offset,
>>>> +					page_pool_get_dma_dir(queue->page_pool));
>>>
>>> page_pool_get_dma_dir() will always return the same value, we could
>>> hardcode it.
>>>
>>>>  		if (!rxused)
>>>>  			break;
>>>>  
>>>> @@ -1379,13 +1383,6 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>>  		queue->rx_tail++;
>>>>  		count++;
>>>>  
>>>> -		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
>>>> -			netdev_err(bp->dev,
>>>> -				   "not whole frame pointed by descriptor\n");
>>>> -			bp->dev->stats.rx_dropped++;
>>>> -			queue->stats.rx_dropped++;
>>>> -			break;
>>>> -		}
>>>>  		data = queue->rx_buff[entry];
>>>>  		if (unlikely(!data)) {
>>>>  			netdev_err(bp->dev,
>>>> @@ -1395,64 +1392,102 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>>>  			break;
>>>>  		}
>>>>  
>>>> -		skb = napi_build_skb(data, buffer_size);
>>>> -		if (unlikely(!skb)) {
>>>> -			netdev_err(bp->dev,
>>>> -				   "Unable to allocate sk_buff\n");
>>>> -			page_pool_put_full_page(queue->page_pool,
>>>> -						virt_to_head_page(data),
>>>> -						false);
>>>> -			break;
>>>> +		first_frame = ctrl & MACB_BIT(RX_SOF);
>>>> +		len = ctrl & bp->rx_frm_len_mask;
>>>> +
>>>> +		if (len) {
>>>> +			data_len = len;
>>>> +			if (!first_frame)
>>>> +				data_len -= queue->skb->len;
>>>> +		} else {
>>>> +			data_len = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset;
>>>>  		}
>>>>  
>>>> -		/* Properly align Ethernet header.
>>>> -		 *
>>>> -		 * Hardware can add dummy bytes if asked using the RBOF
>>>> -		 * field inside the NCFGR register. That feature isn't
>>>> -		 * available if hardware is RSC capable.
>>>> -		 *
>>>> -		 * We cannot fallback to doing the 2-byte shift before
>>>> -		 * DMA mapping because the address field does not allow
>>>> -		 * setting the low 2/3 bits.
>>>> -		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>>>> -		 */
>>>> -		skb_reserve(skb, bp->rx_offset);
>>>> -		skb_mark_for_recycle(skb);
>>>> +		if (first_frame) {
>>>> +			queue->skb = napi_build_skb(data, bp->rx_buffer_size);
>>>> +			if (unlikely(!queue->skb)) {
>>>> +				netdev_err(bp->dev,
>>>> +					   "Unable to allocate sk_buff\n");
>>>> +				goto free_frags;
>>>> +			}
>>>> +
>>>> +			/* Properly align Ethernet header.
>>>> +			 *
>>>> +			 * Hardware can add dummy bytes if asked using the RBOF
>>>> +			 * field inside the NCFGR register. That feature isn't
>>>> +			 * available if hardware is RSC capable.
>>>> +			 *
>>>> +			 * We cannot fallback to doing the 2-byte shift before
>>>> +			 * DMA mapping because the address field does not allow
>>>> +			 * setting the low 2/3 bits.
>>>> +			 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>>>> +			 */
>>>> +			skb_reserve(queue->skb, bp->rx_offset);
>>>> +			skb_mark_for_recycle(queue->skb);
>>>> +			skb_put(queue->skb, data_len);
>>>> +			queue->skb->protocol = eth_type_trans(queue->skb, bp->dev);
>>>> +
>>>> +			skb_checksum_none_assert(queue->skb);
>>>> +			if (bp->dev->features & NETIF_F_RXCSUM &&
>>>> +			    !(bp->dev->flags & IFF_PROMISC) &&
>>>> +			    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>>>> +				queue->skb->ip_summed = CHECKSUM_UNNECESSARY;
>>>> +		} else {
>>>> +			if (!queue->skb) {
>>>> +				netdev_err(bp->dev,
>>>> +					   "Received non-starting frame while expecting it\n");
>>>> +				goto free_frags;
>>>> +			}
>>>
>>> Here as well we want `queue->rx_buff[entry] = NULL;` no?
>>> Maybe put it in the free_frags codepath to ensure it isn't forgotten?
>>>
>>
>> That's correct, that slipped in this RFC.
>>
>>>> +			struct skb_shared_info *shinfo = skb_shinfo(queue->skb);
>>>> +			struct page *page = virt_to_head_page(data);
>>>> +			int nr_frags = shinfo->nr_frags;
>>>
>>> Variable definitions in the middle of a scope? I think it is acceptable
>>> when using __attribute__((cleanup)) but otherwise not so much (yet).
>>>
>>
>> I guess I simply forgot to move them.
>> Ack for this and for the previous ones.
>>
>>>> +			if (nr_frags >= ARRAY_SIZE(shinfo->frags))
>>>> +				goto free_frags;
>>>> +
>>>> +			skb_add_rx_frag(queue->skb, nr_frags, page,
>>>> +					data - page_address(page) + bp->rx_offset,
>>>> +					data_len, bp->rx_buffer_size);
>>>> +		}
>>>>  
>>>>  		/* now everything is ready for receiving packet */
>>>>  		queue->rx_buff[entry] = NULL;
>>>> -		len = ctrl & bp->rx_frm_len_mask;
>>>> -
>>>> -		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>>>>  
>>>> -		dma_sync_single_for_cpu(&bp->pdev->dev,
>>>> -					addr, len,
>>>> -					page_pool_get_dma_dir(queue->page_pool));
>>>> -		skb_put(skb, len);
>>>> -		skb->protocol = eth_type_trans(skb, bp->dev);
>>>> -		skb_checksum_none_assert(skb);
>>>> -		if (bp->dev->features & NETIF_F_RXCSUM &&
>>>> -		    !(bp->dev->flags & IFF_PROMISC) &&
>>>> -		    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>>>> -			skb->ip_summed = CHECKSUM_UNNECESSARY;
>>>> +		netdev_vdbg(bp->dev, "%s %u (len %u)\n", __func__, entry, data_len);
>>>>  
>>>> -		bp->dev->stats.rx_packets++;
>>>> -		queue->stats.rx_packets++;
>>>> -		bp->dev->stats.rx_bytes += skb->len;
>>>> -		queue->stats.rx_bytes += skb->len;
>>>> +		if (ctrl & MACB_BIT(RX_EOF)) {
>>>> +			bp->dev->stats.rx_packets++;
>>>> +			queue->stats.rx_packets++;
>>>> +			bp->dev->stats.rx_bytes += queue->skb->len;
>>>> +			queue->stats.rx_bytes += queue->skb->len;
>>>>  
>>>> -		gem_ptp_do_rxstamp(bp, skb, desc);
>>>> +			gem_ptp_do_rxstamp(bp, queue->skb, desc);
>>>>  
>>>>  #if defined(DEBUG) && defined(VERBOSE_DEBUG)
>>>> -		netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>>>> -			    skb->len, skb->csum);
>>>> -		print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>>> -			       skb_mac_header(skb), 16, true);
>>>> -		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>>> -			       skb->data, 32, true);
>>>> +			netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>>>> +				    queue->skb->len, queue->skb->csum);
>>>> +			print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>>> +				       skb_mac_header(queue->skb), 16, true);
>>>> +			print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
>>>> +				       queue->skb->data, 32, true);
>>>>  #endif
>>>>  
>>>> -		napi_gro_receive(napi, skb);
>>>> +			napi_gro_receive(napi, queue->skb);
>>>> +			queue->skb = NULL;
>>>> +		}
>>>> +
>>>> +		continue;
>>>> +
>>>> +free_frags:
>>>> +		if (queue->skb) {
>>>> +			dev_kfree_skb(queue->skb);
>>>> +			queue->skb = NULL;
>>>> +		} else {
>>>> +			page_pool_put_full_page(queue->page_pool,
>>>> +						virt_to_head_page(data),
>>>> +						false);
>>>> +		}
>>>>  	}
>>>>  
>>>>  	gem_rx_refill(queue, true);
>>>> @@ -2394,7 +2429,10 @@ static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
>>>>  	if (!macb_is_gem(bp)) {
>>>>  		bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>>>>  	} else {
>>>> -		bp->rx_buffer_size = size;
>>>> +		bp->rx_buffer_size = size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>>>> +			+ MACB_PP_HEADROOM;
>>>> +		if (bp->rx_buffer_size > PAGE_SIZE)
>>>> +			bp->rx_buffer_size = PAGE_SIZE;
>>>>  
>>>>  		if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
>>>>  			netdev_dbg(bp->dev,
>>>> @@ -2589,18 +2627,15 @@ static int macb_alloc_consistent(struct macb *bp)
>>>>  
>>>>  static void gem_create_page_pool(struct macb_queue *queue)
>>>>  {
>>>> -	unsigned int num_pages = DIV_ROUND_UP(queue->bp->rx_buffer_size, PAGE_SIZE);
>>>> -	struct macb *bp = queue->bp;
>>>>  	struct page_pool_params pp_params = {
>>>> -		.order = order_base_2(num_pages),
>>>> +		.order = 0,
>>>>  		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>>>>  		.pool_size = queue->bp->rx_ring_size,
>>>>  		.nid = NUMA_NO_NODE,
>>>>  		.dma_dir = DMA_FROM_DEVICE,
>>>>  		.dev = &queue->bp->pdev->dev,
>>>>  		.napi = &queue->napi_rx,
>>>> -		.offset = bp->rx_offset,
>>>> -		.max_len = MACB_PP_MAX_BUF_SIZE(num_pages),
>>>> +		.max_len = PAGE_SIZE,
>>>>  	};
>>>>  	struct page_pool *pool;
>>>
>>> I forgot about it in [PATCH 1/6], but the error handling if
>>> gem_create_page_pool() fails is odd. We set queue->page_pool to NULL
>>> and keep on going. Then once opened we'll fail allocating any buffer
>>> but still be open. Shouldn't we fail the link up operation?
>>>
>>> If we want to support this codepath (page pool not allocated), then we
>>> should unmask Rx interrupts only if alloc succeeded. I don't know if
>>> we'd want that though.
>>>
>>> 	queue_writel(queue, IER, bp->rx_intr_mask | ...);
>>>
>>
>> Makes sense to fail the link up.
>> Doesn't this imply to move the page pool creation and refill into
>> macb_open()?
>>
>> I didn't look into it, I'm not sure if this can potentially become a
>> bigger change.
>
> So I looked into it. Indeed buffer alloc should be done at open, doing
> it at link up (that cannot fail) makes no sense. It is probably
> historical, because on MACB it is mog_alloc_rx_buffers() that does all
> the alloc. On GEM it only does the ring buffer but not each individual
> slot buffer, which is done by ->mog_init_rings() / gem_rx_refill().
>
> I am linking a patch that applies before your series. Then the rebase
> conflict resolution is pretty simple, and the gem_create_page_pool()
> function should be adapted something like:
>

Théo, thanks for looking into this. I was pretty much working on
something similar for my next respin.

Do you prefer to post the patch separately, or are you ok if I pick this
up and send it on your behalf as part of this set?

> -- >8 ------------------------------------------------------------------
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 3dcba7d672bf..55237b840289 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -2865,7 +2865,7 @@ static int macb_alloc_consistent(struct macb *bp)
>    return -ENOMEM;
>  }
>
> -static void gem_create_page_pool(struct macb_queue *queue, int qid)
> +static int gem_create_page_pool(struct macb_queue *queue, int qid)
>  {
>    struct page_pool_params pp_params = {
>       .order = 0,
> @@ -2885,7 +2885,8 @@ static void gem_create_page_pool(struct macb_queue *queue, int qid)
>    pool = page_pool_create(&pp_params);
>    if (IS_ERR(pool)) {
>       netdev_err(queue->bp->dev, "cannot create rx page pool\n");
> -     pool = NULL;
> +     err = PTR_ERR(pool);
> +     goto clear_pool_ptr;
>    }
>
>    queue->page_pool = pool;
> @@ -2904,13 +2905,15 @@ static void gem_create_page_pool(struct macb_queue *queue, int qid)
>       goto unreg_info;
>    }
>
> -  return;
> +  return 0;
>
>  unreg_info:
>    xdp_rxq_info_unreg(&queue->xdp_q);
>  destroy_pool:
>    page_pool_destroy(pool);
> +clear_pool_ptr:
>    queue->page_pool = NULL;
> +  return err;
>  }
>
>  static void macb_init_tieoff(struct macb *bp)
>
> -- >8 ------------------------------------------------------------------
>
> Regards,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception
  2025-12-08 12:53         ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
@ 2025-12-09  9:01           ` Théo Lebrun
  0 siblings, 0 replies; 28+ messages in thread
From: Théo Lebrun @ 2025-12-09  9:01 UTC (permalink / raw)
  To: Paolo Valerio, Théo Lebrun, netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Grégory Clement, Benoît Monin, Thomas Petazzoni

On Mon Dec 8, 2025 at 1:53 PM CET, Paolo Valerio wrote:
> On 08 Dec 2025 at 11:21:00 AM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>> On Tue Dec 2, 2025 at 6:32 PM CET, Paolo Valerio wrote:
>>> On 27 Nov 2025 at 02:38:45 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>>>> I forgot about it in [PATCH 1/6], but the error handling if
>>>> gem_create_page_pool() fails is odd. We set queue->page_pool to NULL
>>>> and keep on going. Then once opened we'll fail allocating any buffer
>>>> but still be open. Shouldn't we fail the link up operation?
>>>>
>>>> If we want to support this codepath (page pool not allocated), then we
>>>> should unmask Rx interrupts only if alloc succeeded. I don't know if
>>>> we'd want that though.
>>>>
>>>> 	queue_writel(queue, IER, bp->rx_intr_mask | ...);
>>>
>>> Makes sense to fail the link up.
>>> Doesn't this imply to move the page pool creation and refill into
>>> macb_open()?
>>>
>>> I didn't look into it, I'm not sure if this can potentially become a
>>> bigger change.
>>
>> So I looked into it. Indeed buffer alloc should be done at open, doing
>> it at link up (that cannot fail) makes no sense. It is probably
>> historical, because on MACB it is mog_alloc_rx_buffers() that does all
>> the alloc. On GEM it only does the ring buffer but not each individual
>> slot buffer, which is done by ->mog_init_rings() / gem_rx_refill().
>>
>> I am linking a patch that applies before your series. Then the rebase
>> conflict resolution is pretty simple, and the gem_create_page_pool()
>> function should be adapted something like:
>
> Théo, thanks for looking into this. I was pretty much working on
> something similar for my next respin.
>
> Do you prefer to post the patch separately, or are you ok if I pick this
> up and send it on your behalf as part of this set?

You can pick it up if you agree with the patch; it'll avoid series
dependencies which is annoying to deal with for everyone.

Thanks Paolo,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-12-09  9:01 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support Paolo Valerio
2025-11-27 13:21   ` Théo Lebrun
2025-12-02 17:30     ` Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
2025-11-27 13:38   ` Théo Lebrun
2025-12-02 17:32     ` Paolo Valerio
2025-12-08 10:21       ` Théo Lebrun
2025-12-08 10:22         ` [PATCH 1/8] net: macb: move Rx buffers alloc from link up to open Théo Lebrun
2025-12-08 12:53         ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
2025-12-09  9:01           ` Théo Lebrun
2025-11-19 13:53 ` [PATCH RFC net-next 3/6] cadence: macb/gem: use the current queue number for stats Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem Paolo Valerio
2025-11-27 14:41   ` Théo Lebrun
2025-12-02 17:32     ` Paolo Valerio
2025-12-08 10:59       ` Théo Lebrun
2025-11-19 13:53 ` [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic Paolo Valerio
2025-11-27 14:49   ` Théo Lebrun
2025-12-02 17:33     ` Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support Paolo Valerio
2025-11-27 15:07   ` Théo Lebrun
2025-12-02 17:34     ` Paolo Valerio
2025-12-08 11:01       ` Théo Lebrun
2025-11-25 16:50 ` [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Théo Lebrun
2025-11-25 23:11   ` Paolo Valerio
2025-11-26 18:08 ` Théo Lebrun
2025-12-02 17:24   ` Paolo Valerio
2025-12-03 14:28     ` Théo Lebrun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).