* [PATCH net-next v9 0/7] net: bcmgenet: add XDP support
@ 2026-05-06 9:55 Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
` (6 more replies)
0 siblings, 7 replies; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
Alexei Starovoitov, Daniel Borkmann, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, bpf
Add XDP support to the bcmgenet driver, covering XDP_PASS, XDP_DROP,
XDP_TX, XDP_REDIRECT, and ndo_xdp_xmit.
The first patch converts the RX path from the existing kmalloc-based
allocation to page_pool, which is a prerequisite for XDP. The remaining
patches incrementally add XDP functionality and per-action statistics.
Tested on Raspberry Pi CM4 (BCM2711, bcmgenet, 1Gbps link):
- XDP_PASS: 943 Mbit/s TX, 935 Mbit/s RX (no regression vs baseline)
- XDP_PASS latency: 0.164ms avg, 0% packet loss
- XDP_DROP: all inbound traffic blocked as expected
- XDP_TX: TX counter increments (packet reflection working)
- Link flap with XDP attached: no errors
- Program swap under iperf3 load: no errors
- Upstream XDP selftests (xdp.py): pass_sb, drop_sb, tx_sb passing
- XDP-based EtherCAT master (~37 kHz cycle rate, all packet processing
in BPF/XDP), stable over multiple days
Previous versions:
v8: https://lore.kernel.org/netdev/20260428205846.2625550-1-nb@tipi-net.de/
v7: https://lore.kernel.org/netdev/20260416054743.1289191-1-nb@tipi-net.de/
v6: https://lore.kernel.org/netdev/20260406083536.839517-1-nb@tipi-net.de/
v5: https://lore.kernel.org/netdev/20260328230513.415790-1-nb@tipi-net.de/
v4: https://lore.kernel.org/netdev/20260323120539.136029-1-nb@tipi-net.de/
v3: https://lore.kernel.org/netdev/20260319115402.353509-1-nb@tipi-net.de/
v2: https://lore.kernel.org/netdev/20260315214914.1555777-1-nb@tipi-net.de/
v1: https://lore.kernel.org/netdev/20260313092101.1344954-1-nb@tipi-net.de/
Changes since v8 (all from Jakub Kicinski's review):
- Patch 1:
* Add explicit #include <linux/bpf.h> for XDP_PACKET_HEADROOM.
* Remove unused priv->rx_buf_len after page_pool conversion.
* Sync only the RSB and the actual received frame length in
page_pool_dma_sync_for_cpu(), instead of the full RX_BUF_LENGTH.
* Add a runt length guard (len < GENET_RSB_PAD) to prevent
__skb_put underflow on broken HW.
- Patch 4:
* Guard netdev_tx_reset_queue() in bcmgenet_tx_reclaim()'s
all=true path against ring->index == DESC_INDEX, fixing an
out-of-bounds dev->_tx[16] access during interface teardown.
* Assign explicit lowest priority to ring 16 so XDP_TX does not
preempt normal SKB TX under strict-priority arbitration.
- Patch 5:
* Stop manipulating xdp_features_clear/set_redirect_target() in
bcmgenet_xdp_setup(); ndo_xdp_xmit works without a local XDP
program, so leave NETDEV_XDP_ACT_NDO_XMIT advertised at all times.
- Patch 6:
* Increment xdp_tx_err on xdp_convert_buff_to_frame() failure.
* Add separate xdp_aborted counter for XDP_ABORTED and invalid-
action returns, instead of folding them into xdp_drop.
Changes since v7:
- No code changes; resubmitted after net-next reopened.
Changes since v6:
- Removed GENET_XDP_HEADROOM alias, use XDP_PACKET_HEADROOM
directly. (Jakub Kicinski)
- Dropped redundant __GFP_NOWARN from page_pool_alloc_pages(),
page_pool adds it automatically. (Jakub Kicinski)
- Removed floating code block in desc_rx, moved variables to outer
scope. (Jakub Kicinski)
- Make bcmgenet_run_xdp() return XDP_PASS when no program is set,
removing the if (xdp_prog) indentation from desc_rx.
(Jakub Kicinski)
Changes since v5:
- Refactored desc_rx: always prepare xdp_buff and use
bcmgenet_xdp_build_skb for both XDP and non-XDP paths, treating
no-prog as XDP_PASS. (Jakub Kicinski)
- Removed synchronize_net() before bpf_prog_put(), RCU handles
the grace period. (Jakub Kicinski)
- Save status->rx_csum before running XDP program to prevent
bpf_xdp_adjust_head from corrupting the RSB checksum.
(Jakub Kicinski)
- Tightened TSB headroom check to include sizeof(struct xdp_frame).
(Jakub Kicinski)
- Fixed reclaim gating: check for pending frames on the XDP TX ring
instead of priv->xdp_prog, so in-flight frames are still reclaimed
after XDP program detach. (Jakub Kicinski)
- Removed dead len -= ETH_FCS_LEN in patch 1. (Mohsin Bashir)
- Added patch 7: minimal ndo_change_mtu that rejects MTU values
incompatible with XDP when a program is attached. (Mohsin Bashir,
Florian Fainelli)
Changes since v4:
- Fixed unused variable warning: moved tx_ring declaration from
patch 4 to patch 5 where it is first used. (Jakub Kicinski)
Changes since v3:
- Fixed xdp_prepare_buff() called with meta_valid=false, causing
bcmgenet_xdp_build_skb() to compute metasize=UINT_MAX and corrupt
skb meta_len. Now passes true. (Simon Horman)
- Removed bcmgenet_dump_tx_queue() for ring 16 in bcmgenet_timeout().
Ring 16 has no netdev TX queue, so netdev_get_tx_queue(dev, 16)
accessed beyond the allocated _tx array. (Simon Horman)
- Fixed checkpatch alignment warnings in patches 4 and 5.
Changes since v2:
- Fixed page leak on partial bcmgenet_alloc_rx_buffers() failure:
free already-allocated rx_cbs before destroying page pool.
(Simon Horman)
- Fixed GENET_Q16_TX_BD_CNT defined as 64 instead of 32.
(Simon Horman)
- Moved XDP TX ring to a separate struct member (xdp_tx_ring)
instead of expanding tx_rings[] to DESC_INDEX+1. (Justin Chen)
- Added synchronize_net() before bpf_prog_put() in XDP prog swap.
- Removed goto drop_page inside switch; inlined page_pool_put
calls in each failure path. (Justin Chen)
- Removed unnecessary curly braces around case XDP_TX. (Justin Chen)
- Moved int err hoisting from patch 2 to patch 1. (Justin Chen)
- Kept return type on same line as function name, per driver
convention. (Justin Chen)
- XDP TX packets/bytes now counted in TX reclaim for standard
network statistics.
Changes since v1:
- Fixed tx_rings[DESC_INDEX] out-of-bounds access. Expanded array
to DESC_INDEX+1 and initialized ring 16 with dedicated BDs.
- Use ring 16 (hardware default descriptor ring) for XDP TX,
isolating from normal SKB TX queues.
- Piggyback ring 16 TX completion on RX NAPI poll (INTRL2_1 bit
collision with RX ring 0).
- Fixed ring 16 TX reclaim: skip INTRL2_1 clear, skip BQL
completion, use non-destructive reclaim in RX poll path.
- Prepend zeroed TSB before XDP TX frame data (TBUF_64B_EN requires
64-byte struct status_64 prefix on all TX buffers).
- Tested with upstream XDP selftests (xdp.py): pass_sb, drop_sb,
tx_sb all passing. The multi-buffer tests (pass_mb, drop_mb,
tx_mb) fail because bcmgenet does not support jumbo frames /
MTU changes; I plan to add ndo_change_mtu support in a follow-up
series.
Nicolai Buchwitz (7):
net: bcmgenet: convert RX path to page_pool
net: bcmgenet: register xdp_rxq_info for each RX ring
net: bcmgenet: add basic XDP support (PASS/DROP)
net: bcmgenet: add XDP_TX support
net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
net: bcmgenet: add XDP statistics counters
net: bcmgenet: reject MTU changes incompatible with XDP
drivers/net/ethernet/broadcom/Kconfig | 1 +
.../net/ethernet/broadcom/genet/bcmgenet.c | 655 +++++++++++++++---
.../net/ethernet/broadcom/genet/bcmgenet.h | 21 +-
3 files changed, 573 insertions(+), 104 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:46 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring Nicolai Buchwitz
` (5 subsequent siblings)
6 siblings, 2 replies; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
David S. Miller, Jakub Kicinski, Bhargava Marreddy, Vikas Gupta,
Rajashekar Hudumula, Eric Biggers, linux-kernel, bpf
Replace the per-packet __netdev_alloc_skb() + dma_map_single() in the
RX path with page_pool, which provides efficient page recycling and
DMA mapping management. This is a prerequisite for XDP support (which
requires stable page-backed buffers rather than SKB linear data).
Key changes:
- Create a page_pool per RX ring (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
- bcmgenet_rx_refill() allocates pages via page_pool_alloc_pages()
- bcmgenet_desc_rx() builds SKBs from pages via napi_build_skb() with
skb_mark_for_recycle() for automatic page_pool return
- Buffer layout reserves XDP_PACKET_HEADROOM (256 bytes) before the HW
RSB (64 bytes) + alignment pad (2 bytes) for future XDP headroom
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
drivers/net/ethernet/broadcom/Kconfig | 1 +
.../net/ethernet/broadcom/genet/bcmgenet.c | 232 +++++++++++-------
.../net/ethernet/broadcom/genet/bcmgenet.h | 5 +-
3 files changed, 154 insertions(+), 84 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 4287edc7ddd6..f0bac0dd1439 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -78,6 +78,7 @@ config BCMGENET
select BCM7XXX_PHY
select MDIO_BCM_UNIMAC
select DIMLIB
+ select PAGE_POOL
select BROADCOM_PHY if ARCH_BCM2835
help
This driver supports the built-in Ethernet MACs found in the
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 54f71b1e85fc..df11c4977e8f 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -35,6 +35,7 @@
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/phy.h>
+#include <linux/bpf.h>
#include <linux/unaligned.h>
@@ -52,6 +53,13 @@
#define RX_BUF_LENGTH 2048
#define SKB_ALIGNMENT 32
+/* Page pool RX buffer layout:
+ * XDP_PACKET_HEADROOM | RSB(64) + pad(2) | frame data | skb_shared_info
+ * The HW writes the 64B RSB + 2B alignment padding before the frame.
+ */
+#define GENET_RSB_PAD (sizeof(struct status_64) + 2)
+#define GENET_RX_HEADROOM (XDP_PACKET_HEADROOM + GENET_RSB_PAD)
+
/* Tx/Rx DMA register offset, skip 256 descriptors */
#define WORDS_PER_BD(p) (p->hw_params->words_per_bd)
#define DMA_DESC_SIZE (WORDS_PER_BD(priv) * sizeof(u32))
@@ -1895,21 +1903,13 @@ static struct sk_buff *bcmgenet_free_tx_cb(struct device *dev,
}
/* Simple helper to free a receive control block's resources */
-static struct sk_buff *bcmgenet_free_rx_cb(struct device *dev,
- struct enet_cb *cb)
+static void bcmgenet_free_rx_cb(struct enet_cb *cb,
+ struct page_pool *pool)
{
- struct sk_buff *skb;
-
- skb = cb->skb;
- cb->skb = NULL;
-
- if (dma_unmap_addr(cb, dma_addr)) {
- dma_unmap_single(dev, dma_unmap_addr(cb, dma_addr),
- dma_unmap_len(cb, dma_len), DMA_FROM_DEVICE);
- dma_unmap_addr_set(cb, dma_addr, 0);
+ if (cb->rx_page) {
+ page_pool_put_full_page(pool, cb->rx_page, false);
+ cb->rx_page = NULL;
}
-
- return skb;
}
/* Unlocked version of the reclaim routine */
@@ -2250,46 +2250,30 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev)
goto out;
}
-static struct sk_buff *bcmgenet_rx_refill(struct bcmgenet_priv *priv,
- struct enet_cb *cb)
+static int bcmgenet_rx_refill(struct bcmgenet_rx_ring *ring,
+ struct enet_cb *cb)
{
- struct device *kdev = &priv->pdev->dev;
- struct sk_buff *skb;
- struct sk_buff *rx_skb;
+ struct bcmgenet_priv *priv = ring->priv;
dma_addr_t mapping;
+ struct page *page;
- /* Allocate a new Rx skb */
- skb = __netdev_alloc_skb(priv->dev, priv->rx_buf_len + SKB_ALIGNMENT,
- GFP_ATOMIC | __GFP_NOWARN);
- if (!skb) {
+ page = page_pool_alloc_pages(ring->page_pool,
+ GFP_ATOMIC);
+ if (!page) {
priv->mib.alloc_rx_buff_failed++;
netif_err(priv, rx_err, priv->dev,
- "%s: Rx skb allocation failed\n", __func__);
- return NULL;
- }
-
- /* DMA-map the new Rx skb */
- mapping = dma_map_single(kdev, skb->data, priv->rx_buf_len,
- DMA_FROM_DEVICE);
- if (dma_mapping_error(kdev, mapping)) {
- priv->mib.rx_dma_failed++;
- dev_kfree_skb_any(skb);
- netif_err(priv, rx_err, priv->dev,
- "%s: Rx skb DMA mapping failed\n", __func__);
- return NULL;
+ "%s: Rx page allocation failed\n", __func__);
+ return -ENOMEM;
}
- /* Grab the current Rx skb from the ring and DMA-unmap it */
- rx_skb = bcmgenet_free_rx_cb(kdev, cb);
+ /* page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
+ mapping = page_pool_get_dma_addr(page) + XDP_PACKET_HEADROOM;
- /* Put the new Rx skb on the ring */
- cb->skb = skb;
- dma_unmap_addr_set(cb, dma_addr, mapping);
- dma_unmap_len_set(cb, dma_len, priv->rx_buf_len);
+ cb->rx_page = page;
+ cb->rx_page_offset = XDP_PACKET_HEADROOM;
dmadesc_set_addr(priv, cb->bd_addr, mapping);
- /* Return the current Rx skb to caller */
- return rx_skb;
+ return 0;
}
/* bcmgenet_desc_rx - descriptor based rx process.
@@ -2341,25 +2325,29 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
while ((rxpktprocessed < rxpkttoprocess) &&
(rxpktprocessed < budget)) {
struct status_64 *status;
+ struct page *rx_page;
+ unsigned int rx_off;
__be16 rx_csum;
+ void *hard_start;
cb = &priv->rx_cbs[ring->read_ptr];
- skb = bcmgenet_rx_refill(priv, cb);
- if (unlikely(!skb)) {
+ /* Save the received page before refilling */
+ rx_page = cb->rx_page;
+ rx_off = cb->rx_page_offset;
+
+ if (bcmgenet_rx_refill(ring, cb)) {
BCMGENET_STATS64_INC(stats, dropped);
goto next;
}
- status = (struct status_64 *)skb->data;
+ /* Sync the RSB first to read the frame length */
+ page_pool_dma_sync_for_cpu(ring->page_pool, rx_page, 0,
+ sizeof(struct status_64));
+
+ hard_start = page_address(rx_page) + rx_off;
+ status = (struct status_64 *)hard_start;
dma_length_status = status->length_status;
- if (dev->features & NETIF_F_RXCSUM) {
- rx_csum = (__force __be16)(status->rx_csum & 0xffff);
- if (rx_csum) {
- skb->csum = (__force __wsum)ntohs(rx_csum);
- skb->ip_summed = CHECKSUM_COMPLETE;
- }
- }
/* DMA flags and length are still valid no matter how
* we got the Receive Status Vector (64B RSB or register)
@@ -2367,15 +2355,23 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
dma_flag = dma_length_status & 0xffff;
len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
+ /* Sync the rest of the actual received frame */
+ if (len > sizeof(struct status_64))
+ page_pool_dma_sync_for_cpu(ring->page_pool, rx_page,
+ sizeof(struct status_64),
+ len - sizeof(struct status_64));
+
netif_dbg(priv, rx_status, dev,
"%s:p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
__func__, p_index, ring->c_index,
ring->read_ptr, dma_length_status);
- if (unlikely(len > RX_BUF_LENGTH)) {
- netif_err(priv, rx_status, dev, "oversized packet\n");
+ if (unlikely(len > RX_BUF_LENGTH || len < GENET_RSB_PAD)) {
+ netif_err(priv, rx_status, dev,
+ "invalid packet length %d\n", len);
BCMGENET_STATS64_INC(stats, length_errors);
- dev_kfree_skb_any(skb);
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
goto next;
}
@@ -2383,7 +2379,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
netif_err(priv, rx_status, dev,
"dropping fragmented packet!\n");
BCMGENET_STATS64_INC(stats, fragmented_errors);
- dev_kfree_skb_any(skb);
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
goto next;
}
@@ -2411,24 +2408,47 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
DMA_RX_RXER)) == DMA_RX_RXER)
u64_stats_inc(&stats->errors);
u64_stats_update_end(&stats->syncp);
- dev_kfree_skb_any(skb);
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
goto next;
} /* error packet */
- skb_put(skb, len);
+ /* Build SKB from the page - data starts at hard_start,
+ * frame begins after RSB(64) + pad(2) = 66 bytes.
+ */
+ skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
+ if (unlikely(!skb)) {
+ BCMGENET_STATS64_INC(stats, dropped);
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
+ goto next;
+ }
+
+ skb_mark_for_recycle(skb);
- /* remove RSB and hardware 2bytes added for IP alignment */
- skb_pull(skb, 66);
- len -= 66;
+ /* Reserve the RSB + pad, then set the data length */
+ skb_reserve(skb, GENET_RSB_PAD);
+ __skb_put(skb, len - GENET_RSB_PAD);
if (priv->crc_fwd_en) {
- skb_trim(skb, len - ETH_FCS_LEN);
- len -= ETH_FCS_LEN;
+ skb_trim(skb, skb->len - ETH_FCS_LEN);
}
+ /* Set up checksum offload */
+ if (dev->features & NETIF_F_RXCSUM) {
+ rx_csum = (__force __be16)(status->rx_csum & 0xffff);
+ if (rx_csum) {
+ skb->csum = (__force __wsum)ntohs(rx_csum);
+ skb->ip_summed = CHECKSUM_COMPLETE;
+ }
+ }
+
+ len = skb->len;
bytes_processed += len;
- /*Finish setting up the received SKB and send it to the kernel*/
+ /* Finish setting up the received SKB and send it to the
+ * kernel.
+ */
skb->protocol = eth_type_trans(skb, priv->dev);
u64_stats_update_begin(&stats->syncp);
@@ -2497,12 +2517,11 @@ static void bcmgenet_dim_work(struct work_struct *work)
dim->state = DIM_START_MEASURE;
}
-/* Assign skb to RX DMA descriptor. */
+/* Assign page_pool pages to RX DMA descriptors. */
static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
struct bcmgenet_rx_ring *ring)
{
struct enet_cb *cb;
- struct sk_buff *skb;
int i;
netif_dbg(priv, hw, priv->dev, "%s\n", __func__);
@@ -2510,10 +2529,7 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
/* loop here for each buffer needing assign */
for (i = 0; i < ring->size; i++) {
cb = ring->cbs + i;
- skb = bcmgenet_rx_refill(priv, cb);
- if (skb)
- dev_consume_skb_any(skb);
- if (!cb->skb)
+ if (bcmgenet_rx_refill(ring, cb))
return -ENOMEM;
}
@@ -2522,16 +2538,18 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
static void bcmgenet_free_rx_buffers(struct bcmgenet_priv *priv)
{
- struct sk_buff *skb;
+ struct bcmgenet_rx_ring *ring;
struct enet_cb *cb;
- int i;
-
- for (i = 0; i < priv->num_rx_bds; i++) {
- cb = &priv->rx_cbs[i];
+ int q, i;
- skb = bcmgenet_free_rx_cb(&priv->pdev->dev, cb);
- if (skb)
- dev_consume_skb_any(skb);
+ for (q = 0; q <= priv->hw_params->rx_queues; q++) {
+ ring = &priv->rx_rings[q];
+ if (!ring->page_pool)
+ continue;
+ for (i = 0; i < ring->size; i++) {
+ cb = ring->cbs + i;
+ bcmgenet_free_rx_cb(cb, ring->page_pool);
+ }
}
}
@@ -2749,6 +2767,31 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
}
+static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
+ struct bcmgenet_rx_ring *ring)
+{
+ struct page_pool_params pp_params = {
+ .order = 0,
+ .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+ .pool_size = ring->size,
+ .nid = NUMA_NO_NODE,
+ .dev = &priv->pdev->dev,
+ .dma_dir = DMA_FROM_DEVICE,
+ .offset = XDP_PACKET_HEADROOM,
+ .max_len = RX_BUF_LENGTH,
+ };
+ int err;
+
+ ring->page_pool = page_pool_create(&pp_params);
+ if (IS_ERR(ring->page_pool)) {
+ err = PTR_ERR(ring->page_pool);
+ ring->page_pool = NULL;
+ return err;
+ }
+
+ return 0;
+}
+
/* Initialize a RDMA ring */
static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
unsigned int index, unsigned int size,
@@ -2756,7 +2799,7 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
{
struct bcmgenet_rx_ring *ring = &priv->rx_rings[index];
u32 words_per_bd = WORDS_PER_BD(priv);
- int ret;
+ int ret, i;
ring->priv = priv;
ring->index = index;
@@ -2767,10 +2810,19 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
ring->cb_ptr = start_ptr;
ring->end_ptr = end_ptr - 1;
- ret = bcmgenet_alloc_rx_buffers(priv, ring);
+ ret = bcmgenet_rx_ring_create_pool(priv, ring);
if (ret)
return ret;
+ ret = bcmgenet_alloc_rx_buffers(priv, ring);
+ if (ret) {
+ for (i = 0; i < ring->size; i++)
+ bcmgenet_free_rx_cb(ring->cbs + i, ring->page_pool);
+ page_pool_destroy(ring->page_pool);
+ ring->page_pool = NULL;
+ return ret;
+ }
+
bcmgenet_init_dim(ring, bcmgenet_dim_work);
bcmgenet_init_rx_coalesce(ring);
@@ -2963,6 +3015,20 @@ static void bcmgenet_fini_rx_napi(struct bcmgenet_priv *priv)
}
}
+static void bcmgenet_destroy_rx_page_pools(struct bcmgenet_priv *priv)
+{
+ struct bcmgenet_rx_ring *ring;
+ unsigned int i;
+
+ for (i = 0; i <= priv->hw_params->rx_queues; ++i) {
+ ring = &priv->rx_rings[i];
+ if (ring->page_pool) {
+ page_pool_destroy(ring->page_pool);
+ ring->page_pool = NULL;
+ }
+ }
+}
+
/* Initialize Rx queues
*
* Queues 0-15 are priority queues. Hardware Filtering Block (HFB) can be
@@ -3034,6 +3100,7 @@ static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
}
bcmgenet_free_rx_buffers(priv);
+ bcmgenet_destroy_rx_page_pools(priv);
kfree(priv->rx_cbs);
kfree(priv->tx_cbs);
}
@@ -3110,6 +3177,7 @@ static int bcmgenet_init_dma(struct bcmgenet_priv *priv, bool flush_rx)
if (ret) {
netdev_err(priv->dev, "failed to initialize Rx queues\n");
bcmgenet_free_rx_buffers(priv);
+ bcmgenet_destroy_rx_page_pools(priv);
kfree(priv->rx_cbs);
kfree(priv->tx_cbs);
return ret;
@@ -4027,8 +4095,6 @@ static int bcmgenet_probe(struct platform_device *pdev)
/* Mii wait queue */
init_waitqueue_head(&priv->wq);
- /* Always use RX_BUF_LENGTH (2KB) buffer for all chips */
- priv->rx_buf_len = RX_BUF_LENGTH;
INIT_WORK(&priv->bcmgenet_irq_work, bcmgenet_irq_task);
priv->clk_wol = devm_clk_get_optional(&priv->pdev->dev, "enet-wol");
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 9e4110c7fdf6..7203bde37b78 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -15,6 +15,7 @@
#include <linux/phy.h>
#include <linux/dim.h>
#include <linux/ethtool.h>
+#include <net/page_pool/helpers.h>
#include "../unimac.h"
@@ -469,6 +470,8 @@ struct bcmgenet_rx_stats64 {
struct enet_cb {
struct sk_buff *skb;
+ struct page *rx_page;
+ unsigned int rx_page_offset;
void __iomem *bd_addr;
DEFINE_DMA_UNMAP_ADDR(dma_addr);
DEFINE_DMA_UNMAP_LEN(dma_len);
@@ -575,6 +578,7 @@ struct bcmgenet_rx_ring {
struct bcmgenet_net_dim dim;
u32 rx_max_coalesced_frames;
u32 rx_coalesce_usecs;
+ struct page_pool *page_pool;
struct bcmgenet_priv *priv;
};
@@ -609,7 +613,6 @@ struct bcmgenet_priv {
void __iomem *rx_bds;
struct enet_cb *rx_cbs;
unsigned int num_rx_bds;
- unsigned int rx_buf_len;
struct bcmgenet_rxnfc_rule rxnfc_rules[MAX_NUM_OF_FS_RULES];
struct list_head rxnfc_list;
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v9 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
` (4 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
David S. Miller, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, linux-kernel, bpf
Register an xdp_rxq_info per RX ring and associate it with the ring's
page_pool via MEM_TYPE_PAGE_POOL. This is required infrastructure for
XDP program execution: the XDP framework needs to know the memory model
backing each RX queue for correct page lifecycle management.
No functional change - XDP programs are not yet attached or executed.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 18 ++++++++++++++++++
drivers/net/ethernet/broadcom/genet/bcmgenet.h | 2 ++
2 files changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index df11c4977e8f..5bedc18685b0 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2789,7 +2789,23 @@ static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
return err;
}
+ err = xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, ring->index, 0);
+ if (err)
+ goto err_free_pp;
+
+ err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
+ ring->page_pool);
+ if (err)
+ goto err_unreg_rxq;
+
return 0;
+
+err_unreg_rxq:
+ xdp_rxq_info_unreg(&ring->xdp_rxq);
+err_free_pp:
+ page_pool_destroy(ring->page_pool);
+ ring->page_pool = NULL;
+ return err;
}
/* Initialize a RDMA ring */
@@ -2818,6 +2834,7 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
if (ret) {
for (i = 0; i < ring->size; i++)
bcmgenet_free_rx_cb(ring->cbs + i, ring->page_pool);
+ xdp_rxq_info_unreg(&ring->xdp_rxq);
page_pool_destroy(ring->page_pool);
ring->page_pool = NULL;
return ret;
@@ -3023,6 +3040,7 @@ static void bcmgenet_destroy_rx_page_pools(struct bcmgenet_priv *priv)
for (i = 0; i <= priv->hw_params->rx_queues; ++i) {
ring = &priv->rx_rings[i];
if (ring->page_pool) {
+ xdp_rxq_info_unreg(&ring->xdp_rxq);
page_pool_destroy(ring->page_pool);
ring->page_pool = NULL;
}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 7203bde37b78..da7b7fee896f 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -16,6 +16,7 @@
#include <linux/dim.h>
#include <linux/ethtool.h>
#include <net/page_pool/helpers.h>
+#include <net/xdp.h>
#include "../unimac.h"
@@ -579,6 +580,7 @@ struct bcmgenet_rx_ring {
u32 rx_max_coalesced_frames;
u32 rx_coalesce_usecs;
struct page_pool *page_pool;
+ struct xdp_rxq_info xdp_rxq;
struct bcmgenet_priv *priv;
};
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP)
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:47 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
` (3 subsequent siblings)
6 siblings, 2 replies; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
David S. Miller, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, linux-kernel, bpf
Add XDP program attachment via ndo_bpf and execute XDP programs in the
RX path. XDP_PASS builds an SKB from the xdp_buff (handling
xdp_adjust_head/tail), XDP_DROP returns the page to page_pool without
SKB allocation.
XDP_TX and XDP_REDIRECT are not yet supported and return XDP_ABORTED.
Advertise NETDEV_XDP_ACT_BASIC in xdp_features.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
.../net/ethernet/broadcom/genet/bcmgenet.c | 129 +++++++++++++++---
.../net/ethernet/broadcom/genet/bcmgenet.h | 4 +
2 files changed, 116 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 5bedc18685b0..ee1d4ecc2b87 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -36,6 +36,8 @@
#include <linux/ipv6.h>
#include <linux/phy.h>
#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
+#include <linux/filter.h>
#include <linux/unaligned.h>
@@ -2276,6 +2278,56 @@ static int bcmgenet_rx_refill(struct bcmgenet_rx_ring *ring,
return 0;
}
+static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
+ struct xdp_buff *xdp)
+{
+ unsigned int metasize;
+ struct sk_buff *skb;
+
+ skb = napi_build_skb(xdp->data_hard_start, PAGE_SIZE);
+ if (unlikely(!skb))
+ return NULL;
+
+ skb_mark_for_recycle(skb);
+
+ metasize = xdp->data - xdp->data_meta;
+ skb_reserve(skb, xdp->data - xdp->data_hard_start);
+ __skb_put(skb, xdp->data_end - xdp->data);
+
+ if (metasize)
+ skb_metadata_set(skb, metasize);
+
+ return skb;
+}
+
+static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
+ struct bpf_prog *prog,
+ struct xdp_buff *xdp,
+ struct page *rx_page)
+{
+ unsigned int act;
+
+ if (!prog)
+ return XDP_PASS;
+
+ act = bpf_prog_run_xdp(prog, xdp);
+
+ switch (act) {
+ case XDP_PASS:
+ return XDP_PASS;
+ case XDP_DROP:
+ page_pool_put_full_page(ring->page_pool, rx_page, true);
+ return XDP_DROP;
+ default:
+ bpf_warn_invalid_xdp_action(ring->priv->dev, prog, act);
+ fallthrough;
+ case XDP_ABORTED:
+ trace_xdp_exception(ring->priv->dev, prog, act);
+ page_pool_put_full_page(ring->page_pool, rx_page, true);
+ return XDP_ABORTED;
+ }
+}
+
/* bcmgenet_desc_rx - descriptor based rx process.
* this could be called from bottom half, or from NAPI polling method.
*/
@@ -2285,6 +2337,7 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
struct bcmgenet_rx_stats64 *stats = &ring->stats64;
struct bcmgenet_priv *priv = ring->priv;
struct net_device *dev = priv->dev;
+ struct bpf_prog *xdp_prog;
struct enet_cb *cb;
struct sk_buff *skb;
u32 dma_length_status;
@@ -2295,6 +2348,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
unsigned int p_index, mask;
unsigned int discards;
+ xdp_prog = READ_ONCE(priv->xdp_prog);
+
/* Clear status before servicing to reduce spurious interrupts */
mask = 1 << (UMAC_IRQ1_RX_INTR_SHIFT + ring->index);
bcmgenet_intrl2_1_writel(priv, mask, INTRL2_CPU_CLEAR);
@@ -2326,9 +2381,12 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
(rxpktprocessed < budget)) {
struct status_64 *status;
struct page *rx_page;
+ unsigned int xdp_act;
unsigned int rx_off;
- __be16 rx_csum;
+ struct xdp_buff xdp;
+ __be16 rx_csum = 0;
void *hard_start;
+ int pkt_len;
cb = &priv->rx_cbs[ring->read_ptr];
@@ -2413,30 +2471,34 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
goto next;
} /* error packet */
- /* Build SKB from the page - data starts at hard_start,
- * frame begins after RSB(64) + pad(2) = 66 bytes.
+ pkt_len = len - GENET_RSB_PAD;
+ if (priv->crc_fwd_en)
+ pkt_len -= ETH_FCS_LEN;
+
+ /* Save rx_csum before XDP runs - an XDP program
+ * could overwrite the RSB via bpf_xdp_adjust_head.
*/
- skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
- if (unlikely(!skb)) {
- BCMGENET_STATS64_INC(stats, dropped);
- page_pool_put_full_page(ring->page_pool, rx_page,
- true);
- goto next;
- }
+ if (dev->features & NETIF_F_RXCSUM)
+ rx_csum = (__force __be16)(status->rx_csum & 0xffff);
- skb_mark_for_recycle(skb);
+ xdp_init_buff(&xdp, PAGE_SIZE, &ring->xdp_rxq);
+ xdp_prepare_buff(&xdp, page_address(rx_page),
+ GENET_RX_HEADROOM, pkt_len, true);
- /* Reserve the RSB + pad, then set the data length */
- skb_reserve(skb, GENET_RSB_PAD);
- __skb_put(skb, len - GENET_RSB_PAD);
+ xdp_act = bcmgenet_run_xdp(ring, xdp_prog, &xdp, rx_page);
+ if (xdp_act != XDP_PASS)
+ goto next;
- if (priv->crc_fwd_en) {
- skb_trim(skb, skb->len - ETH_FCS_LEN);
+ skb = bcmgenet_xdp_build_skb(ring, &xdp);
+ if (unlikely(!skb)) {
+ BCMGENET_STATS64_INC(stats, dropped);
+ page_pool_put_full_page(ring->page_pool,
+ rx_page, true);
+ goto next;
}
/* Set up checksum offload */
if (dev->features & NETIF_F_RXCSUM) {
- rx_csum = (__force __be16)(status->rx_csum & 0xffff);
if (rx_csum) {
skb->csum = (__force __wsum)ntohs(rx_csum);
skb->ip_summed = CHECKSUM_COMPLETE;
@@ -3750,6 +3812,37 @@ static int bcmgenet_change_carrier(struct net_device *dev, bool new_carrier)
return 0;
}
+static int bcmgenet_xdp_setup(struct net_device *dev,
+ struct netdev_bpf *xdp)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct bpf_prog *old_prog;
+ struct bpf_prog *prog = xdp->prog;
+
+ if (prog && dev->mtu > PAGE_SIZE - GENET_RX_HEADROOM -
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) {
+ NL_SET_ERR_MSG_MOD(xdp->extack,
+ "MTU too large for single-page XDP buffer");
+ return -EOPNOTSUPP;
+ }
+
+ old_prog = xchg(&priv->xdp_prog, prog);
+ if (old_prog)
+ bpf_prog_put(old_prog);
+
+ return 0;
+}
+
+static int bcmgenet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+ switch (xdp->command) {
+ case XDP_SETUP_PROG:
+ return bcmgenet_xdp_setup(dev, xdp);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_open = bcmgenet_open,
.ndo_stop = bcmgenet_close,
@@ -3761,6 +3854,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_set_features = bcmgenet_set_features,
.ndo_get_stats64 = bcmgenet_get_stats64,
.ndo_change_carrier = bcmgenet_change_carrier,
+ .ndo_bpf = bcmgenet_xdp,
};
/* GENET hardware parameters/characteristics */
@@ -4063,6 +4157,7 @@ static int bcmgenet_probe(struct platform_device *pdev)
NETIF_F_RXCSUM;
dev->hw_features |= dev->features;
dev->vlan_features |= dev->features;
+ dev->xdp_features = NETDEV_XDP_ACT_BASIC;
netdev_sw_irq_coalesce_default_on(dev);
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index da7b7fee896f..3d65f0e4b4b4 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -16,6 +16,7 @@
#include <linux/dim.h>
#include <linux/ethtool.h>
#include <net/page_pool/helpers.h>
+#include <linux/bpf.h>
#include <net/xdp.h>
#include "../unimac.h"
@@ -670,6 +671,9 @@ struct bcmgenet_priv {
u8 sopass[SOPASS_MAX];
struct bcmgenet_mib_counters mib;
+
+ /* XDP */
+ struct bpf_prog *xdp_prog;
};
static inline bool bcmgenet_has_40bits(struct bcmgenet_priv *priv)
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
` (2 preceding siblings ...)
2026-05-06 9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:52 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
` (2 subsequent siblings)
6 siblings, 2 replies; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
David S. Miller, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, linux-kernel, bpf
Implement XDP_TX using ring 16 (DESC_INDEX), the hardware default
descriptor ring, dedicated to XDP TX for isolation from SKB TX queues.
Ring 16 gets 32 BDs carved from ring 0's allocation. TX completion is
piggybacked on RX NAPI poll since ring 16's INTRL2_1 bit collides with
RX ring 0, similar to how bnxt, ice, and other XDP drivers handle TX
completion within the RX poll path.
The GENET MAC has TBUF_64B_EN set globally, requiring every TX buffer
to start with a 64-byte struct status_64 (TSB). For local XDP_TX, the
TSB is prepended by backing xdp->data into the RSB area (unused after
BPF execution) and zeroing it. For foreign frames redirected from other
devices, the TSB is written into the xdp_frame headroom.
The page_pool DMA direction is changed from DMA_FROM_DEVICE to
DMA_BIDIRECTIONAL to allow TX reuse of the existing DMA mapping.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
.../net/ethernet/broadcom/genet/bcmgenet.c | 231 ++++++++++++++++--
.../net/ethernet/broadcom/genet/bcmgenet.h | 3 +
2 files changed, 211 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index ee1d4ecc2b87..f1e515526787 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -49,8 +49,10 @@
#define GENET_Q0_RX_BD_CNT \
(TOTAL_DESC - priv->hw_params->rx_queues * priv->hw_params->rx_bds_per_q)
+#define GENET_Q16_TX_BD_CNT 32
#define GENET_Q0_TX_BD_CNT \
- (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q)
+ (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q \
+ - GENET_Q16_TX_BD_CNT)
#define RX_BUF_LENGTH 2048
#define SKB_ALIGNMENT 32
@@ -1893,6 +1895,14 @@ static struct sk_buff *bcmgenet_free_tx_cb(struct device *dev,
if (cb == GENET_CB(skb)->last_cb)
return skb;
+ } else if (cb->xdpf) {
+ if (cb->xdp_dma_map)
+ dma_unmap_single(dev, dma_unmap_addr(cb, dma_addr),
+ dma_unmap_len(cb, dma_len),
+ DMA_TO_DEVICE);
+ dma_unmap_addr_set(cb, dma_addr, 0);
+ xdp_return_frame(cb->xdpf);
+ cb->xdpf = NULL;
} else if (dma_unmap_addr(cb, dma_addr)) {
dma_unmap_page(dev,
dma_unmap_addr(cb, dma_addr),
@@ -1925,10 +1935,16 @@ static unsigned int __bcmgenet_tx_reclaim(struct net_device *dev,
unsigned int pkts_compl = 0;
unsigned int txbds_ready;
unsigned int c_index;
+ struct enet_cb *tx_cb;
struct sk_buff *skb;
- /* Clear status before servicing to reduce spurious interrupts */
- bcmgenet_intrl2_1_writel(priv, (1 << ring->index), INTRL2_CPU_CLEAR);
+ /* Clear status before servicing to reduce spurious interrupts.
+ * Ring DESC_INDEX (XDP TX) has no interrupt; skip the clear to
+ * avoid clobbering RX ring 0's bit at the same position.
+ */
+ if (ring->index != DESC_INDEX)
+ bcmgenet_intrl2_1_writel(priv, BIT(ring->index),
+ INTRL2_CPU_CLEAR);
/* Compute how many buffers are transmitted since last xmit call */
c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX)
@@ -1941,8 +1957,15 @@ static unsigned int __bcmgenet_tx_reclaim(struct net_device *dev,
/* Reclaim transmitted buffers */
while (txbds_processed < txbds_ready) {
- skb = bcmgenet_free_tx_cb(&priv->pdev->dev,
- &priv->tx_cbs[ring->clean_ptr]);
+ tx_cb = &priv->tx_cbs[ring->clean_ptr];
+ if (tx_cb->xdpf) {
+ pkts_compl++;
+ bytes_compl += tx_cb->xdp_dma_map
+ ? tx_cb->xdpf->len
+ : tx_cb->xdpf->len -
+ sizeof(struct status_64);
+ }
+ skb = bcmgenet_free_tx_cb(&priv->pdev->dev, tx_cb);
if (skb) {
pkts_compl++;
bytes_compl += GENET_CB(skb)->bytes_sent;
@@ -1964,8 +1987,11 @@ static unsigned int __bcmgenet_tx_reclaim(struct net_device *dev,
u64_stats_add(&stats->bytes, bytes_compl);
u64_stats_update_end(&stats->syncp);
- netdev_tx_completed_queue(netdev_get_tx_queue(dev, ring->index),
- pkts_compl, bytes_compl);
+ /* Ring DESC_INDEX (XDP TX) has no netdev TX queue; skip BQL */
+ if (ring->index != DESC_INDEX)
+ netdev_tx_completed_queue(netdev_get_tx_queue(dev,
+ ring->index),
+ pkts_compl, bytes_compl);
return txbds_processed;
}
@@ -1999,7 +2025,10 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
}
if (skb)
dev_consume_skb_any(skb);
- netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
+ /* Ring DESC_INDEX (XDP TX) has no netdev TX queue; skip BQL */
+ if (ring->index != DESC_INDEX)
+ netdev_tx_reset_queue(netdev_get_tx_queue(dev,
+ ring->index));
bcmgenet_tdma_ring_writel(priv, ring->index,
ring->prod_index, TDMA_PROD_INDEX);
wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
@@ -2044,6 +2073,9 @@ static void bcmgenet_tx_reclaim_all(struct net_device *dev)
do {
bcmgenet_tx_reclaim(dev, &priv->tx_rings[i++], true);
} while (i <= priv->hw_params->tx_queues && netif_is_multiqueue(dev));
+
+ /* Also reclaim XDP TX ring */
+ bcmgenet_tx_reclaim(dev, &priv->xdp_tx_ring, true);
}
/* Reallocate the SKB to put enough headroom in front of it and insert
@@ -2300,11 +2332,96 @@ static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
return skb;
}
+static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
+ struct xdp_frame *xdpf, bool dma_map)
+{
+ struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
+ struct device *kdev = &priv->pdev->dev;
+ struct enet_cb *tx_cb_ptr;
+ dma_addr_t mapping;
+ unsigned int dma_len;
+ u32 len_stat;
+
+ spin_lock(&ring->lock);
+
+ if (ring->free_bds < 1) {
+ spin_unlock(&ring->lock);
+ return false;
+ }
+
+ tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
+
+ if (dma_map) {
+ void *tsb_start;
+
+ /* The GENET MAC has TBUF_64B_EN set globally, so hardware
+ * expects a 64-byte TSB prefix on every TX buffer. For
+ * redirected frames (ndo_xdp_xmit) we prepend a zeroed TSB
+ * using the frame's headroom.
+ */
+ if (unlikely(xdpf->headroom < sizeof(struct status_64))) {
+ bcmgenet_put_txcb(priv, ring);
+ spin_unlock(&ring->lock);
+ return false;
+ }
+
+ tsb_start = xdpf->data - sizeof(struct status_64);
+ memset(tsb_start, 0, sizeof(struct status_64));
+
+ dma_len = xdpf->len + sizeof(struct status_64);
+ mapping = dma_map_single(kdev, tsb_start, dma_len,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(kdev, mapping)) {
+ tx_cb_ptr->skb = NULL;
+ tx_cb_ptr->xdpf = NULL;
+ bcmgenet_put_txcb(priv, ring);
+ spin_unlock(&ring->lock);
+ return false;
+ }
+ } else {
+ struct page *page = virt_to_page(xdpf->data);
+
+ /* For local XDP_TX the caller already prepended the TSB
+ * into xdpf->data/len, so dma_len == xdpf->len.
+ */
+ dma_len = xdpf->len;
+ mapping = page_pool_get_dma_addr(page) +
+ sizeof(*xdpf) + xdpf->headroom;
+ dma_sync_single_for_device(kdev, mapping, dma_len,
+ DMA_BIDIRECTIONAL);
+ }
+
+ dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
+ dma_unmap_len_set(tx_cb_ptr, dma_len, dma_len);
+ tx_cb_ptr->skb = NULL;
+ tx_cb_ptr->xdpf = xdpf;
+ tx_cb_ptr->xdp_dma_map = dma_map;
+
+ len_stat = (dma_len << DMA_BUFLENGTH_SHIFT) |
+ (priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT) |
+ DMA_TX_APPEND_CRC | DMA_SOP | DMA_EOP;
+
+ dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping, len_stat);
+
+ ring->free_bds--;
+ ring->prod_index++;
+ ring->prod_index &= DMA_P_INDEX_MASK;
+
+ bcmgenet_tdma_ring_writel(priv, ring->index, ring->prod_index,
+ TDMA_PROD_INDEX);
+
+ spin_unlock(&ring->lock);
+
+ return true;
+}
+
static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
struct bpf_prog *prog,
struct xdp_buff *xdp,
struct page *rx_page)
{
+ struct bcmgenet_priv *priv = ring->priv;
+ struct xdp_frame *xdpf;
unsigned int act;
if (!prog)
@@ -2315,14 +2432,42 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
switch (act) {
case XDP_PASS:
return XDP_PASS;
+ case XDP_TX:
+ /* Prepend a zeroed TSB (Transmit Status Block). The GENET
+ * MAC has TBUF_64B_EN set globally, so hardware expects every
+ * TX buffer to begin with a 64-byte struct status_64. Back
+ * up xdp->data into the RSB area (which is no longer needed
+ * after the BPF program ran) and zero it.
+ */
+ if (xdp->data - xdp->data_hard_start <
+ sizeof(struct status_64) + sizeof(struct xdp_frame)) {
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
+ return XDP_DROP;
+ }
+ xdp->data -= sizeof(struct status_64);
+ xdp->data_meta -= sizeof(struct status_64);
+ memset(xdp->data, 0, sizeof(struct status_64));
+
+ xdpf = xdp_convert_buff_to_frame(xdp);
+ if (unlikely(!xdpf)) {
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
+ return XDP_DROP;
+ }
+ if (unlikely(!bcmgenet_xdp_xmit_frame(priv, xdpf, false))) {
+ xdp_return_frame_rx_napi(xdpf);
+ return XDP_DROP;
+ }
+ return XDP_TX;
case XDP_DROP:
page_pool_put_full_page(ring->page_pool, rx_page, true);
return XDP_DROP;
default:
- bpf_warn_invalid_xdp_action(ring->priv->dev, prog, act);
+ bpf_warn_invalid_xdp_action(priv->dev, prog, act);
fallthrough;
case XDP_ABORTED:
- trace_xdp_exception(ring->priv->dev, prog, act);
+ trace_xdp_exception(priv->dev, prog, act);
page_pool_put_full_page(ring->page_pool, rx_page, true);
return XDP_ABORTED;
}
@@ -2548,9 +2693,15 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
{
struct bcmgenet_rx_ring *ring = container_of(napi,
struct bcmgenet_rx_ring, napi);
+ struct bcmgenet_priv *priv = ring->priv;
struct dim_sample dim_sample = {};
unsigned int work_done;
+ /* Reclaim completed XDP TX frames (ring 16 has no interrupt) */
+ if (priv->xdp_tx_ring.free_bds < priv->xdp_tx_ring.size)
+ bcmgenet_tx_reclaim(priv->dev,
+ &priv->xdp_tx_ring, false);
+
work_done = bcmgenet_desc_rx(ring, budget);
if (work_done < budget && napi_complete_done(napi, work_done))
@@ -2781,10 +2932,11 @@ static void bcmgenet_init_rx_coalesce(struct bcmgenet_rx_ring *ring)
/* Initialize a Tx ring along with corresponding hardware registers */
static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring,
unsigned int index, unsigned int size,
- unsigned int start_ptr, unsigned int end_ptr)
+ unsigned int start_ptr,
+ unsigned int end_ptr)
{
- struct bcmgenet_tx_ring *ring = &priv->tx_rings[index];
u32 words_per_bd = WORDS_PER_BD(priv);
u32 flow_period_val = 0;
@@ -2825,8 +2977,11 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
bcmgenet_tdma_ring_writel(priv, index, end_ptr * words_per_bd - 1,
DMA_END_ADDR);
- /* Initialize Tx NAPI */
- netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
+ /* Initialize Tx NAPI for priority queues only; ring DESC_INDEX
+ * (XDP TX) has its completions handled inline in RX NAPI.
+ */
+ if (index != DESC_INDEX)
+ netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
}
static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
@@ -2838,7 +2993,7 @@ static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
.pool_size = ring->size,
.nid = NUMA_NO_NODE,
.dev = &priv->pdev->dev,
- .dma_dir = DMA_FROM_DEVICE,
+ .dma_dir = DMA_BIDIRECTIONAL,
.offset = XDP_PACKET_HEADROOM,
.max_len = RX_BUF_LENGTH,
};
@@ -2972,6 +3127,7 @@ static int bcmgenet_tdma_disable(struct bcmgenet_priv *priv)
reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
mask = (1 << (priv->hw_params->tx_queues + 1)) - 1;
+ mask |= BIT(DESC_INDEX);
mask = (mask << DMA_RING_BUF_EN_SHIFT) | DMA_EN;
reg &= ~mask;
bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
@@ -3017,14 +3173,18 @@ static int bcmgenet_rdma_disable(struct bcmgenet_priv *priv)
* with queue 1 being the highest priority queue.
*
* Queue 0 is the default Tx queue with
- * GENET_Q0_TX_BD_CNT = 256 - 4 * 32 = 128 descriptors.
+ * GENET_Q0_TX_BD_CNT = 256 - 4 * 32 - 32 = 96 descriptors.
+ *
+ * Ring 16 (DESC_INDEX) is used for XDP TX with
+ * GENET_Q16_TX_BD_CNT = 32 descriptors.
*
* The transmit control block pool is then partitioned as follows:
- * - Tx queue 0 uses tx_cbs[0..127]
- * - Tx queue 1 uses tx_cbs[128..159]
- * - Tx queue 2 uses tx_cbs[160..191]
- * - Tx queue 3 uses tx_cbs[192..223]
- * - Tx queue 4 uses tx_cbs[224..255]
+ * - Tx queue 0 uses tx_cbs[0..95]
+ * - Tx queue 1 uses tx_cbs[96..127]
+ * - Tx queue 2 uses tx_cbs[128..159]
+ * - Tx queue 3 uses tx_cbs[160..191]
+ * - Tx queue 4 uses tx_cbs[192..223]
+ * - Tx queue 16 uses tx_cbs[224..255]
*/
static void bcmgenet_init_tx_queues(struct net_device *dev)
{
@@ -3037,7 +3197,8 @@ static void bcmgenet_init_tx_queues(struct net_device *dev)
/* Initialize Tx priority queues */
for (i = 0; i <= priv->hw_params->tx_queues; i++) {
- bcmgenet_init_tx_ring(priv, i, end - start, start, end);
+ bcmgenet_init_tx_ring(priv, &priv->tx_rings[i],
+ i, end - start, start, end);
start = end;
end += priv->hw_params->tx_bds_per_q;
dma_priority[DMA_PRIO_REG_INDEX(i)] |=
@@ -3045,13 +3206,21 @@ static void bcmgenet_init_tx_queues(struct net_device *dev)
<< DMA_PRIO_REG_SHIFT(i);
}
+ /* Initialize ring 16 (descriptor ring) for XDP TX */
+ bcmgenet_init_tx_ring(priv, &priv->xdp_tx_ring,
+ DESC_INDEX, GENET_Q16_TX_BD_CNT,
+ TOTAL_DESC - GENET_Q16_TX_BD_CNT, TOTAL_DESC);
+ dma_priority[DMA_PRIO_REG_INDEX(DESC_INDEX)] |=
+ GENET_Q0_PRIORITY << DMA_PRIO_REG_SHIFT(DESC_INDEX);
+
/* Set Tx queue priorities */
bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0);
bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1);
bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2);
- /* Configure Tx queues as descriptor rings */
+ /* Configure Tx queues as descriptor rings, including ring 16 */
ring_mask = (1 << (priv->hw_params->tx_queues + 1)) - 1;
+ ring_mask |= BIT(DESC_INDEX);
bcmgenet_tdma_writel(priv, ring_mask, DMA_RING_CFG);
/* Enable Tx rings */
@@ -3761,6 +3930,21 @@ static void bcmgenet_get_stats64(struct net_device *dev,
stats->tx_dropped += tx_dropped;
}
+ /* Include XDP TX ring (DESC_INDEX) stats */
+ tx_stats = &priv->xdp_tx_ring.stats64;
+ do {
+ start = u64_stats_fetch_begin(&tx_stats->syncp);
+ tx_bytes = u64_stats_read(&tx_stats->bytes);
+ tx_packets = u64_stats_read(&tx_stats->packets);
+ tx_errors = u64_stats_read(&tx_stats->errors);
+ tx_dropped = u64_stats_read(&tx_stats->dropped);
+ } while (u64_stats_fetch_retry(&tx_stats->syncp, start));
+
+ stats->tx_bytes += tx_bytes;
+ stats->tx_packets += tx_packets;
+ stats->tx_errors += tx_errors;
+ stats->tx_dropped += tx_dropped;
+
for (q = 0; q <= priv->hw_params->rx_queues; q++) {
rx_stats = &priv->rx_rings[q].stats64;
do {
@@ -4262,6 +4446,7 @@ static int bcmgenet_probe(struct platform_device *pdev)
u64_stats_init(&priv->rx_rings[i].stats64.syncp);
for (i = 0; i <= priv->hw_params->tx_queues; i++)
u64_stats_init(&priv->tx_rings[i].stats64.syncp);
+ u64_stats_init(&priv->xdp_tx_ring.stats64.syncp);
/* libphy will determine the link state */
netif_carrier_off(dev);
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 3d65f0e4b4b4..7e5d9ab0050b 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -472,6 +472,8 @@ struct bcmgenet_rx_stats64 {
struct enet_cb {
struct sk_buff *skb;
+ struct xdp_frame *xdpf;
+ bool xdp_dma_map;
struct page *rx_page;
unsigned int rx_page_offset;
void __iomem *bd_addr;
@@ -611,6 +613,7 @@ struct bcmgenet_priv {
unsigned int num_tx_bds;
struct bcmgenet_tx_ring tx_rings[GENET_MAX_MQ_CNT + 1];
+ struct bcmgenet_tx_ring xdp_tx_ring;
/* receive variables */
void __iomem *rx_bds;
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
` (3 preceding siblings ...)
2026-05-06 9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:55 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP Nicolai Buchwitz
6 siblings, 2 replies; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
David S. Miller, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, linux-kernel, bpf
Add XDP_REDIRECT support and implement ndo_xdp_xmit for receiving
redirected frames from other devices.
XDP_REDIRECT calls xdp_do_redirect() in the RX path with
xdp_do_flush() once per NAPI poll cycle. ndo_xdp_xmit batches frames
into ring 16 under a single spinlock acquisition.
Advertise NETDEV_XDP_ACT_REDIRECT and NETDEV_XDP_ACT_NDO_XMIT in
xdp_features.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
.../net/ethernet/broadcom/genet/bcmgenet.c | 81 +++++++++++++++----
1 file changed, 67 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index f1e515526787..4e4fe785f0bf 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2332,22 +2332,22 @@ static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
return skb;
}
+/* Submit a single XDP frame to the TX ring. Caller must hold ring->lock.
+ * Returns true on success. Does not ring the doorbell - caller must
+ * write TDMA_PROD_INDEX after batching.
+ */
static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring,
struct xdp_frame *xdpf, bool dma_map)
{
- struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
struct device *kdev = &priv->pdev->dev;
struct enet_cb *tx_cb_ptr;
dma_addr_t mapping;
unsigned int dma_len;
u32 len_stat;
- spin_lock(&ring->lock);
-
- if (ring->free_bds < 1) {
- spin_unlock(&ring->lock);
+ if (ring->free_bds < 1)
return false;
- }
tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
@@ -2361,7 +2361,6 @@ static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
*/
if (unlikely(xdpf->headroom < sizeof(struct status_64))) {
bcmgenet_put_txcb(priv, ring);
- spin_unlock(&ring->lock);
return false;
}
@@ -2375,7 +2374,6 @@ static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
tx_cb_ptr->skb = NULL;
tx_cb_ptr->xdpf = NULL;
bcmgenet_put_txcb(priv, ring);
- spin_unlock(&ring->lock);
return false;
}
} else {
@@ -2407,12 +2405,14 @@ static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
ring->prod_index++;
ring->prod_index &= DMA_P_INDEX_MASK;
+ return true;
+}
+
+static void bcmgenet_xdp_ring_doorbell(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring)
+{
bcmgenet_tdma_ring_writel(priv, ring->index, ring->prod_index,
TDMA_PROD_INDEX);
-
- spin_unlock(&ring->lock);
-
- return true;
}
static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
@@ -2421,6 +2421,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
struct page *rx_page)
{
struct bcmgenet_priv *priv = ring->priv;
+ struct bcmgenet_tx_ring *tx_ring;
struct xdp_frame *xdpf;
unsigned int act;
@@ -2455,11 +2456,25 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
true);
return XDP_DROP;
}
- if (unlikely(!bcmgenet_xdp_xmit_frame(priv, xdpf, false))) {
+
+ tx_ring = &priv->xdp_tx_ring;
+ spin_lock(&tx_ring->lock);
+ if (unlikely(!bcmgenet_xdp_xmit_frame(priv, tx_ring,
+ xdpf, false))) {
+ spin_unlock(&tx_ring->lock);
xdp_return_frame_rx_napi(xdpf);
return XDP_DROP;
}
+ bcmgenet_xdp_ring_doorbell(priv, tx_ring);
+ spin_unlock(&tx_ring->lock);
return XDP_TX;
+ case XDP_REDIRECT:
+ if (unlikely(xdp_do_redirect(priv->dev, xdp, prog))) {
+ page_pool_put_full_page(ring->page_pool, rx_page,
+ true);
+ return XDP_DROP;
+ }
+ return XDP_REDIRECT;
case XDP_DROP:
page_pool_put_full_page(ring->page_pool, rx_page, true);
return XDP_DROP;
@@ -2483,6 +2498,7 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
struct bcmgenet_priv *priv = ring->priv;
struct net_device *dev = priv->dev;
struct bpf_prog *xdp_prog;
+ bool xdp_flush = false;
struct enet_cb *cb;
struct sk_buff *skb;
u32 dma_length_status;
@@ -2631,6 +2647,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
GENET_RX_HEADROOM, pkt_len, true);
xdp_act = bcmgenet_run_xdp(ring, xdp_prog, &xdp, rx_page);
+ if (xdp_act == XDP_REDIRECT)
+ xdp_flush = true;
if (xdp_act != XDP_PASS)
goto next;
@@ -2682,6 +2700,9 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
bcmgenet_rdma_ring_writel(priv, ring->index, ring->c_index, RDMA_CONS_INDEX);
}
+ if (xdp_flush)
+ xdp_do_flush();
+
ring->dim.bytes = bytes_processed;
ring->dim.packets = rxpktprocessed;
@@ -4027,6 +4048,36 @@ static int bcmgenet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
}
+static int bcmgenet_xdp_xmit(struct net_device *dev, int num_frames,
+ struct xdp_frame **frames, u32 flags)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
+ int sent = 0;
+ int i;
+
+ if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+ return -EINVAL;
+
+ if (unlikely(!netif_running(dev)))
+ return -ENETDOWN;
+
+ spin_lock(&ring->lock);
+
+ for (i = 0; i < num_frames; i++) {
+ if (!bcmgenet_xdp_xmit_frame(priv, ring, frames[i], true))
+ break;
+ sent++;
+ }
+
+ if (sent)
+ bcmgenet_xdp_ring_doorbell(priv, ring);
+
+ spin_unlock(&ring->lock);
+
+ return sent;
+}
+
static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_open = bcmgenet_open,
.ndo_stop = bcmgenet_close,
@@ -4039,6 +4090,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_get_stats64 = bcmgenet_get_stats64,
.ndo_change_carrier = bcmgenet_change_carrier,
.ndo_bpf = bcmgenet_xdp,
+ .ndo_xdp_xmit = bcmgenet_xdp_xmit,
};
/* GENET hardware parameters/characteristics */
@@ -4341,7 +4393,8 @@ static int bcmgenet_probe(struct platform_device *pdev)
NETIF_F_RXCSUM;
dev->hw_features |= dev->features;
dev->vlan_features |= dev->features;
- dev->xdp_features = NETDEV_XDP_ACT_BASIC;
+ dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
+ NETDEV_XDP_ACT_NDO_XMIT;
netdev_sw_irq_coalesce_default_on(dev);
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
` (4 preceding siblings ...)
2026-05-06 9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-06 9:55 ` [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP Nicolai Buchwitz
6 siblings, 1 reply; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
David S. Miller, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, linux-kernel, bpf
Expose per-action XDP counters via ethtool -S: xdp_pass, xdp_drop,
xdp_tx, xdp_tx_err, xdp_redirect, and xdp_redirect_err.
These use the existing soft MIB infrastructure and are incremented in
bcmgenet_run_xdp() alongside the existing driver statistics.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 17 +++++++++++++++++
drivers/net/ethernet/broadcom/genet/bcmgenet.h | 7 +++++++
2 files changed, 24 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 4e4fe785f0bf..359a297a25e6 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1170,6 +1170,14 @@ static const struct bcmgenet_stats bcmgenet_gstrings_stats[] = {
STAT_GENET_SOFT_MIB("tx_realloc_tsb", mib.tx_realloc_tsb),
STAT_GENET_SOFT_MIB("tx_realloc_tsb_failed",
mib.tx_realloc_tsb_failed),
+ /* XDP counters */
+ STAT_GENET_SOFT_MIB("xdp_pass", mib.xdp_pass),
+ STAT_GENET_SOFT_MIB("xdp_drop", mib.xdp_drop),
+ STAT_GENET_SOFT_MIB("xdp_aborted", mib.xdp_aborted),
+ STAT_GENET_SOFT_MIB("xdp_tx", mib.xdp_tx),
+ STAT_GENET_SOFT_MIB("xdp_tx_err", mib.xdp_tx_err),
+ STAT_GENET_SOFT_MIB("xdp_redirect", mib.xdp_redirect),
+ STAT_GENET_SOFT_MIB("xdp_redirect_err", mib.xdp_redirect_err),
/* Per TX queues */
STAT_GENET_Q(0),
STAT_GENET_Q(1),
@@ -2432,6 +2440,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
switch (act) {
case XDP_PASS:
+ priv->mib.xdp_pass++;
return XDP_PASS;
case XDP_TX:
/* Prepend a zeroed TSB (Transmit Status Block). The GENET
@@ -2444,6 +2453,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
sizeof(struct status_64) + sizeof(struct xdp_frame)) {
page_pool_put_full_page(ring->page_pool, rx_page,
true);
+ priv->mib.xdp_tx_err++;
return XDP_DROP;
}
xdp->data -= sizeof(struct status_64);
@@ -2454,6 +2464,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
if (unlikely(!xdpf)) {
page_pool_put_full_page(ring->page_pool, rx_page,
true);
+ priv->mib.xdp_tx_err++;
return XDP_DROP;
}
@@ -2463,19 +2474,24 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
xdpf, false))) {
spin_unlock(&tx_ring->lock);
xdp_return_frame_rx_napi(xdpf);
+ priv->mib.xdp_tx_err++;
return XDP_DROP;
}
bcmgenet_xdp_ring_doorbell(priv, tx_ring);
spin_unlock(&tx_ring->lock);
+ priv->mib.xdp_tx++;
return XDP_TX;
case XDP_REDIRECT:
if (unlikely(xdp_do_redirect(priv->dev, xdp, prog))) {
+ priv->mib.xdp_redirect_err++;
page_pool_put_full_page(ring->page_pool, rx_page,
true);
return XDP_DROP;
}
+ priv->mib.xdp_redirect++;
return XDP_REDIRECT;
case XDP_DROP:
+ priv->mib.xdp_drop++;
page_pool_put_full_page(ring->page_pool, rx_page, true);
return XDP_DROP;
default:
@@ -2483,6 +2499,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
fallthrough;
case XDP_ABORTED:
trace_xdp_exception(priv->dev, prog, act);
+ priv->mib.xdp_aborted++;
page_pool_put_full_page(ring->page_pool, rx_page, true);
return XDP_ABORTED;
}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 7e5d9ab0050b..e5e775a53f6d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -156,6 +156,13 @@ struct bcmgenet_mib_counters {
u32 tx_dma_failed;
u32 tx_realloc_tsb;
u32 tx_realloc_tsb_failed;
+ u32 xdp_pass;
+ u32 xdp_drop;
+ u32 xdp_aborted;
+ u32 xdp_tx;
+ u32 xdp_tx_err;
+ u32 xdp_redirect;
+ u32 xdp_redirect_err;
};
struct bcmgenet_tx_stats64 {
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
` (5 preceding siblings ...)
2026-05-06 9:55 ` [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters Nicolai Buchwitz
@ 2026-05-06 9:55 ` Nicolai Buchwitz
2026-05-07 19:18 ` sashiko-bot
6 siblings, 1 reply; 18+ messages in thread
From: Nicolai Buchwitz @ 2026-05-06 9:55 UTC (permalink / raw)
To: netdev
Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
Florian Fainelli, Broadcom internal kernel review list,
Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
Mohsin Bashir, David S. Miller, Jakub Kicinski,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, linux-kernel, bpf
Add a minimal ndo_change_mtu that rejects MTU values too large for
single-page XDP buffers when an XDP program is attached. Without this,
users could change the MTU at runtime and break the XDP buffer layout.
When no XDP program is attached, any MTU change is accepted, matching
the existing behavior without ndo_change_mtu.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Mohsin Bashir <hmohsin@meta.com>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 359a297a25e6..6ef76e93d9ad 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -4095,6 +4095,20 @@ static int bcmgenet_xdp_xmit(struct net_device *dev, int num_frames,
return sent;
}
+static int bcmgenet_change_mtu(struct net_device *dev, int new_mtu)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ if (priv->xdp_prog && new_mtu > PAGE_SIZE - GENET_RX_HEADROOM -
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) {
+ netdev_warn(dev, "MTU too large for single-page XDP buffer\n");
+ return -EINVAL;
+ }
+
+ WRITE_ONCE(dev->mtu, new_mtu);
+ return 0;
+}
+
static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_open = bcmgenet_open,
.ndo_stop = bcmgenet_close,
@@ -4105,6 +4119,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_eth_ioctl = phy_do_ioctl_running,
.ndo_set_features = bcmgenet_set_features,
.ndo_get_stats64 = bcmgenet_get_stats64,
+ .ndo_change_mtu = bcmgenet_change_mtu,
.ndo_change_carrier = bcmgenet_change_carrier,
.ndo_bpf = bcmgenet_xdp,
.ndo_xdp_xmit = bcmgenet_xdp_xmit,
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
@ 2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:46 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-07 19:17 UTC (permalink / raw)
To: Nicolai Buchwitz; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Out-of-bounds DMA sync in bcmgenet_desc_rx() due to unvalidated packet length
--
commit dcf6f3586789c6e4c09c02881b479202e9ad31ad
Author: Nicolai Buchwitz <nb@tipi-net.de>
net: bcmgenet: convert RX path to page_pool
This commit replaces the per-packet allocation and DMA mapping in the RX path
with a page_pool implementation. It updates the refill and descriptor routines
to allocate from the pool and build skbs using napi_build_skb.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 54f71b1e85fc4..df11c4977e8f7 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2367,15 +2355,23 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> dma_flag = dma_length_status & 0xffff;
> len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
>
> + /* Sync the rest of the actual received frame */
> + if (len > sizeof(struct status_64))
> + page_pool_dma_sync_for_cpu(ring->page_pool, rx_page,
> + sizeof(struct status_64),
> + len - sizeof(struct status_64));
Can this cause an out-of-bounds DMA sync regression?
The length is read from the hardware-written Receive Status Block and then
immediately used to sync the remainder of the frame data.
However, the bounds check against RX_BUF_LENGTH is performed after this
page_pool_dma_sync_for_cpu call.
If the hardware provides a corrupted or abnormally large length, could the
driver attempt to sync memory outside of the mapped page pool buffer? On
systems using SWIOTLB, passing a sync size larger than the mapped region
might lead to out-of-bounds copies and memory corruption.
> netif_dbg(priv, rx_status, dev,
> "%s:p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
> __func__, p_index, ring->c_index,
> ring->read_ptr, dma_length_status);
>
> - if (unlikely(len > RX_BUF_LENGTH)) {
> - netif_err(priv, rx_status, dev, "oversized packet\n");
> + if (unlikely(len > RX_BUF_LENGTH || len < GENET_RSB_PAD)) {
> + netif_err(priv, rx_status, dev,
> + "invalid packet length %d\n", len);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260506095553.55357-1-nb@tipi-net.de?part=1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP)
2026-05-06 9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
@ 2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:47 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-07 19:17 UTC (permalink / raw)
To: Nicolai Buchwitz; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unconditionally applying CHECKSUM_COMPLETE using the original hardware checksum after an XDP program has executed.
- [High] Calculation of pkt_len can underflow when receiving short frames with crc_fwd_en enabled, leading to massive out-of-bounds SKB lengths.
--
commit a85e13efa285d96e84232f2f89b59157c43c2cf5
Author: Nicolai Buchwitz <nb@tipi-net.de>
net: bcmgenet: add basic XDP support (PASS/DROP)
This commit adds basic XDP program attachment via ndo_bpf and executes
XDP programs in the RX path. It handles building SKBs from the xdp_buff
on XDP_PASS and returning pages to the page_pool on XDP_DROP.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 5bedc18685b09..ee1d4ecc2b874 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2413,30 +2471,34 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> goto next;
> } /* error packet */
>
> - /* Build SKB from the page - data starts at hard_start,
> - * frame begins after RSB(64) + pad(2) = 66 bytes.
> - */
> - skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
> - if (unlikely(!skb)) {
> - BCMGENET_STATS64_INC(stats, dropped);
> - page_pool_put_full_page(ring->page_pool, rx_page,
> - true);
> - goto next;
> - }
> + pkt_len = len - GENET_RSB_PAD;
> + if (priv->crc_fwd_en)
> + pkt_len -= ETH_FCS_LEN;
Can pkt_len underflow here if we receive a short frame?
The RX validation earlier in bcmgenet_desc_rx() only ensures that
len >= GENET_RSB_PAD. If len is less than GENET_RSB_PAD + 4 (ETH_FCS_LEN),
subtracting ETH_FCS_LEN will result in a negative pkt_len.
When this negative pkt_len is passed to xdp_prepare_buff(), xdp->data_end
will be set before xdp->data. Later, bcmgenet_xdp_build_skb() calls
__skb_put(skb, xdp->data_end - xdp->data), which lacks bounds checking and
converts the negative difference into a very large unsigned integer, causing
an out-of-bounds SKB length.
> - skb_mark_for_recycle(skb);
> + /* Save rx_csum before XDP runs - an XDP program
> + * could overwrite the RSB via bpf_xdp_adjust_head.
> + */
> + if (dev->features & NETIF_F_RXCSUM)
> + rx_csum = (__force __be16)(status->rx_csum & 0xffff);
>
> - /* Reserve the RSB + pad, then set the data length */
> - skb_reserve(skb, GENET_RSB_PAD);
> - __skb_put(skb, len - GENET_RSB_PAD);
> + xdp_init_buff(&xdp, PAGE_SIZE, &ring->xdp_rxq);
> + xdp_prepare_buff(&xdp, page_address(rx_page),
> + GENET_RX_HEADROOM, pkt_len, true);
>
> - if (priv->crc_fwd_en) {
> - skb_trim(skb, skb->len - ETH_FCS_LEN);
> + xdp_act = bcmgenet_run_xdp(ring, xdp_prog, &xdp, rx_page);
> + if (xdp_act != XDP_PASS)
> + goto next;
>
> - }
>
> - /* Set up checksum offload */
> - if (dev->features & NETIF_F_RXCSUM) {
> - rx_csum = (__force __be16)(status->rx_csum & 0xffff);
> + skb = bcmgenet_xdp_build_skb(ring, &xdp);
> + if (unlikely(!skb)) {
> + BCMGENET_STATS64_INC(stats, dropped);
> + page_pool_put_full_page(ring->page_pool,
> + rx_page, true);
> + goto next;
> + }
> +
> + /* Set up checksum offload */
> + if (dev->features & NETIF_F_RXCSUM) {
> if (rx_csum) {
> skb->csum = (__force __wsum)ntohs(rx_csum);
> skb->ip_summed = CHECKSUM_COMPLETE;
Is it safe to unconditionally apply CHECKSUM_COMPLETE using the original
hardware checksum after an XDP program has executed?
Since XDP programs are allowed to modify the packet headers and payload,
if the program modifies the packet and returns XDP_PASS, the hardware-computed
1's complement sum saved in rx_csum will no longer match the packet contents.
This could lead to the network stack detecting a checksum mismatch and
incorrectly dropping a valid packet. Should the checksum status be cleared
or set to CHECKSUM_NONE if an XDP program is present?
> }
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260506095553.55357-1-nb@tipi-net.de?part=3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support
2026-05-06 9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
@ 2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:52 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-07 19:17 UTC (permalink / raw)
To: Nicolai Buchwitz; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Deterministic packet drops on XDP_TX because the dedicated TX ring capacity (32) is smaller than the NAPI budget (64).
- [High] XDP_TX corrupts XDP metadata and copies uninitialized memory into the `xdp_frame`.
- [Medium] XDP TX completions can stall indefinitely, leading to page pool exhaustion.
- [Medium] Severe lock and MMIO contention across multiple RX NAPI contexts sharing a single XDP TX ring.
--
commit 3ad0ff68731af021753c8986d28477830093832b
Author: Nicolai Buchwitz <nb@tipi-net.de>
net: bcmgenet: add XDP_TX support
Implement XDP_TX using ring 16 (DESC_INDEX), the hardware default
descriptor ring, dedicated to XDP TX for isolation from SKB TX queues.
[ ... ]
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index ee1d4ecc2b874..f1e515526787a 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -49,8 +49,10 @@
>
> #define GENET_Q0_RX_BD_CNT \
> (TOTAL_DESC - priv->hw_params->rx_queues * priv->hw_params->rx_bds_per_q)
> +#define GENET_Q16_TX_BD_CNT 32
Will this cause deterministic packet drops on XDP_TX? The NAPI budget is
typically 64. Since the RX poll reclaims XDP TX completions only once at the
beginning of the NAPI cycle, a burst of more than 32 XDP_TX actions would
exhaust the TX ring and drop subsequent packets in that burst.
> #define GENET_Q0_TX_BD_CNT \
> - (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q)
> + (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q \
> + - GENET_Q16_TX_BD_CNT)
[ ... ]
> @@ -2315,14 +2432,42 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
> switch (act) {
> case XDP_PASS:
> return XDP_PASS;
> + case XDP_TX:
> + /* Prepend a zeroed TSB (Transmit Status Block). The GENET
> + * MAC has TBUF_64B_EN set globally, so hardware expects every
> + * TX buffer to begin with a 64-byte struct status_64. Back
> + * up xdp->data into the RSB area (which is no longer needed
> + * after the BPF program ran) and zero it.
> + */
> + if (xdp->data - xdp->data_hard_start <
> + sizeof(struct status_64) + sizeof(struct xdp_frame)) {
Does this headroom check need to account for the metadata size? If headroom
is tight, shifting the pointers might cause xdp_update_frame_from_buff()
to fail with -ENOSPC and drop the packet.
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> + return XDP_DROP;
> + }
> + xdp->data -= sizeof(struct status_64);
> + xdp->data_meta -= sizeof(struct status_64);
> + memset(xdp->data, 0, sizeof(struct status_64));
Could this corrupt XDP metadata and copy uninitialized memory?
If the XDP program attached metadata, the physical metadata bytes immediately
precede the original xdp->data. The memset() here spans the exact memory area
where the original metadata resides, overwriting it with zeros.
Because xdp->data_meta is decremented, it now points to uninitialized
headroom memory, which xdp_convert_buff_to_frame() would read as the frame's
metadata.
> +
> + xdpf = xdp_convert_buff_to_frame(xdp);
> + if (unlikely(!xdpf)) {
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> + return XDP_DROP;
> + }
[ ... ]
> @@ -2548,9 +2693,15 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
> {
> struct bcmgenet_rx_ring *ring = container_of(napi,
> struct bcmgenet_rx_ring, napi);
> + struct bcmgenet_priv *priv = ring->priv;
> struct dim_sample dim_sample = {};
> unsigned int work_done;
>
> + /* Reclaim completed XDP TX frames (ring 16 has no interrupt) */
> + if (priv->xdp_tx_ring.free_bds < priv->xdp_tx_ring.size)
> + bcmgenet_tx_reclaim(priv->dev,
> + &priv->xdp_tx_ring, false);
If the NAPI budget is not fully consumed, napi_complete_done() is called and
NAPI exits. Since there is no TX interrupt for ring 16, could XDP TX
completions stall indefinitely if RX traffic stops? This might leave buffers
in the TX ring, eventually leading to page pool exhaustion.
Also, is there a risk of severe lock and MMIO contention here?
Since the single XDP TX ring is shared across all RX queues, every active
RX NAPI will concurrently evaluate this condition. Since hardware transmission
takes time, this condition could remain true across multiple concurrent NAPI
polls. As a result, all active RX queues could simultaneously call
bcmgenet_tx_reclaim(), heavily contending for spin_lock_bh(&ring->lock) and
performing expensive MMIO reads.
> +
> work_done = bcmgenet_desc_rx(ring, budget);
>
> if (work_done < budget && napi_complete_done(napi, work_done))
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260506095553.55357-1-nb@tipi-net.de?part=4
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
2026-05-06 9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
@ 2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:55 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-07 19:17 UTC (permalink / raw)
To: Nicolai Buchwitz; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The ndo_xdp_xmit implementation can permanently stall because the XDP TX ring (ring 16) is only reclaimed during RX NAPI polling, which may never run if the interface receives no packets.
--
commit 606bc29cba0382a693d9785fe2fe2ff9ee75e598
Author: Nicolai Buchwitz <nb@tipi-net.de>
net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
This commit adds XDP_REDIRECT support and implements ndo_xdp_xmit for
receiving redirected frames from other devices.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index f1e515526787a..4e4fe785f0bf5 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2332,22 +2332,22 @@ static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
> return skb;
> }
>
> +/* Submit a single XDP frame to the TX ring. Caller must hold ring->lock.
> + * Returns true on success. Does not ring the doorbell - caller must
> + * write TDMA_PROD_INDEX after batching.
> + */
> static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
> + struct bcmgenet_tx_ring *ring,
> struct xdp_frame *xdpf, bool dma_map)
> {
> - struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
> struct device *kdev = &priv->pdev->dev;
> struct enet_cb *tx_cb_ptr;
> dma_addr_t mapping;
> unsigned int dma_len;
> u32 len_stat;
>
> - spin_lock(&ring->lock);
> -
> - if (ring->free_bds < 1) {
> - spin_unlock(&ring->lock);
> + if (ring->free_bds < 1)
> return false;
Can this ring permanently stall if the interface is only used as a redirect
destination?
Because the XDP TX ring has no TX completion interrupts, it relies entirely
on the RX NAPI handler to reclaim completed TX buffers. If the interface
receives no incoming traffic, the RX NAPI handler will never run.
Once the ring descriptors are filled, this check will continuously return
false, permanently dropping all subsequent redirected frames from
ndo_xdp_xmit.
Does ndo_xdp_xmit need to proactively reclaim TX buffers when the ring is
full, or use another mechanism to ensure completion?
> - }
>
> tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260506095553.55357-1-nb@tipi-net.de?part=5
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters
2026-05-06 9:55 ` [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters Nicolai Buchwitz
@ 2026-05-07 19:17 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-07 19:17 UTC (permalink / raw)
To: Nicolai Buchwitz; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Global XDP statistics are updated concurrently from multiple NAPI contexts without synchronization, leading to data races, lost increments, and rapid overflows.
--
commit 04cdeba59fa183a07b2fd6c8dba32b2dd262ae4d
Author: Nicolai Buchwitz <nb@tipi-net.de>
net: bcmgenet: add XDP statistics counters
Exposes per-action XDP counters via ethtool -S. These counters use the
existing soft MIB infrastructure and are incremented alongside existing
driver statistics in bcmgenet_run_xdp().
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2432,6 +2440,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
>
> switch (act) {
> case XDP_PASS:
> + priv->mib.xdp_pass++;
Since the driver supports multiple RX queues, bcmgenet_run_xdp() can be
executed concurrently by independent NAPI polling instances on different
CPUs.
Will these non-atomic read-modify-write operations on the shared priv->mib
structure lead to data races and lost increments?
> return XDP_PASS;
> case XDP_TX:
[ ... ]
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> @@ -156,6 +156,13 @@ struct bcmgenet_mib_counters {
> u32 tx_dma_failed;
> u32 tx_realloc_tsb;
> u32 tx_realloc_tsb_failed;
> + u32 xdp_pass;
Because XDP processes packets at high rates, could these 32-bit packet
counters overflow rapidly under heavy traffic?
Would it be safer to maintain these XDP counters as 64-bit values on a
per-queue basis, using u64_stats_t, and aggregate them when queried?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260506095553.55357-1-nb@tipi-net.de?part=6
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP
2026-05-06 9:55 ` [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP Nicolai Buchwitz
@ 2026-05-07 19:18 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-07 19:18 UTC (permalink / raw)
To: Nicolai Buchwitz; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The introduced `ndo_change_mtu` callback is entirely dead code based on a false premise.
--
commit 4a63ea349d9dde80e552364564a8fc480bce7552
Author: Nicolai Buchwitz <nb@tipi-net.de>
net: bcmgenet: reject MTU changes incompatible with XDP
Add a minimal ndo_change_mtu that rejects MTU values too large for
single-page XDP buffers when an XDP program is attached.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -4095,6 +4095,20 @@ static int bcmgenet_xdp_xmit(struct net_device *dev, int num_frames,
[ ... ]
> +static int bcmgenet_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + struct bcmgenet_priv *priv = netdev_priv(dev);
> +
> + if (priv->xdp_prog && new_mtu > PAGE_SIZE - GENET_RX_HEADROOM -
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) {
Is this check actually reachable?
The networking core (dev_validate_mtu()) strictly enforces that any
requested MTU cannot exceed dev->max_mtu. Because the bcmgenet driver
does not manually override dev->max_mtu, it inherits the default value
of ETH_DATA_LEN (1500 bytes) set by ether_setup().
On a system with the minimum 4KB page size, the right side of this
condition evaluates to approximately 3400 bytes. Since new_mtu is
strictly bounded to 1500, the condition new_mtu > ~3400 seems
mathematically impossible to satisfy, rendering the check dead code.
Additionally, the driver statically configures the hardware MAC to a
maximum frame length of 1522 bytes (ENET_MAX_MTU_SIZE) and the DMA ring
to a maximum receive buffer length of 2048 bytes (RX_BUF_LENGTH). Thus,
a buffer overflow rewriting the skb_shared_info appears structurally
impossible even if the MTU check were somehow bypassed.
> + netdev_warn(dev, "MTU too large for single-page XDP buffer\n");
> + return -EINVAL;
> + }
> +
> + WRITE_ONCE(dev->mtu, new_mtu);
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260506095553.55357-1-nb@tipi-net.de?part=7
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
@ 2026-05-10 18:46 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-10 18:46 UTC (permalink / raw)
To: nb
Cc: Jakub Kicinski, netdev, justin.chen, horms, mohsin.bashr, opendmb,
florian.fainelli, bcm-kernel-feedback-list, andrew+netdev,
edumazet, pabeni, hmohsin, davem, ast, daniel, hawk,
john.fastabend, sdf, linux-kernel, bpf
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: bcmgenet: convert RX path to page_pool
This patch replaces the per-packet __netdev_alloc_skb() + dma_map_single()
in the RX path with page_pool, and builds skbs via napi_build_skb() with
skb_mark_for_recycle() so pages are returned to the pool automatically.
A per-ring page_pool is created with PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
and the buffer layout reserves XDP_PACKET_HEADROOM before the HW RSB.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 54f71b1e85fc..df11c4977e8f 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2341,25 +2325,29 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> while ((rxpktprocessed < rxpkttoprocess) &&
> (rxpktprocessed < budget)) {
> struct status_64 *status;
> + struct page *rx_page;
> + unsigned int rx_off;
> __be16 rx_csum;
> + void *hard_start;
>
> cb = &priv->rx_cbs[ring->read_ptr];
> - skb = bcmgenet_rx_refill(priv, cb);
>
> - if (unlikely(!skb)) {
> + /* Save the received page before refilling */
> + rx_page = cb->rx_page;
> + rx_off = cb->rx_page_offset;
> +
> + if (bcmgenet_rx_refill(ring, cb)) {
> BCMGENET_STATS64_INC(stats, dropped);
> goto next;
> }
>
> - status = (struct status_64 *)skb->data;
> + /* Sync the RSB first to read the frame length */
> + page_pool_dma_sync_for_cpu(ring->page_pool, rx_page, 0,
> + sizeof(struct status_64));
> +
> + hard_start = page_address(rx_page) + rx_off;
> + status = (struct status_64 *)hard_start;
> dma_length_status = status->length_status;
rx_off is saved from cb->rx_page_offset, but bcmgenet_rx_refill() always
sets that field to XDP_PACKET_HEADROOM. The adjacent napi_build_skb()
call further down hardcodes the same constant as:
napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
Is the intent that rx_page_offset eventually varies (for XDP layouts),
or should the two sites either both use the constant or both derive the
size from rx_off (for example PAGE_SIZE - rx_off)? If the offset ever
changes without updating the hardcoded napi_build_skb() argument the
skb_shared_info placement becomes incorrect.
>
> /* DMA flags and length are still valid no matter how
> * we got the Receive Status Vector (64B RSB or register)
> @@ -2367,15 +2355,23 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> dma_flag = dma_length_status & 0xffff;
> len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
>
> + /* Sync the rest of the actual received frame */
> + if (len > sizeof(struct status_64))
> + page_pool_dma_sync_for_cpu(ring->page_pool, rx_page,
> + sizeof(struct status_64),
> + len - sizeof(struct status_64));
> +
Can this drive a cache-maintenance operation with an unbounded length?
len is extracted from the HW-written RSB as:
len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
which is a 16-bit field (up to 65535), and the bounds check against
RX_BUF_LENGTH only happens in the next hunk. page_pool_dma_sync_for_cpu()
on a non-coherent system ends up invalidating the cache over
[dma_addr + 64, dma_addr + 64 + (len - 64)]. If the hardware ever reports
a bogus length (glitch, stale RSB on an error condition, etc.) the
invalidation would drop dirty cache lines belonging to whatever kernel
allocation follows this page.
The previous code used a fixed priv->rx_buf_len for dma_unmap_single(),
so the HW-controlled length never sized a DMA-maintenance operation.
Would it be safer to validate len against RX_BUF_LENGTH / GENET_RSB_PAD
before this second page_pool_dma_sync_for_cpu() call, or clamp the sync
size to min(len - 64, RX_BUF_LENGTH - 64)?
> netif_dbg(priv, rx_status, dev,
> "%s:p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
> __func__, p_index, ring->c_index,
> ring->read_ptr, dma_length_status);
>
> - if (unlikely(len > RX_BUF_LENGTH)) {
> - netif_err(priv, rx_status, dev, "oversized packet\n");
> + if (unlikely(len > RX_BUF_LENGTH || len < GENET_RSB_PAD)) {
> + netif_err(priv, rx_status, dev,
> + "invalid packet length %d\n", len);
> BCMGENET_STATS64_INC(stats, length_errors);
> - dev_kfree_skb_any(skb);
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> goto next;
> }
Is the lower bound here tight enough?
len == GENET_RSB_PAD (66) is accepted and then:
__skb_put(skb, len - GENET_RSB_PAD);
yields a 0-byte skb. eth_type_trans() -> eth_skb_pull_mac() will still
read h_proto from skb->data, which points into page_pool memory that the
hardware did not write for this packet.
Similarly, if priv->crc_fwd_en is set and skb->len < ETH_FCS_LEN (len
in [66..69]):
skb_trim(skb, skb->len - ETH_FCS_LEN);
underflows to a huge unsigned value, but skb_trim()'s internal
"if (skb->len > len)" evaluates to false so the trim silently no-ops and
FCS bytes remain in the payload.
Would a bound like
len < GENET_RSB_PAD + ETH_ZLEN + (crc_fwd_en ? ETH_FCS_LEN : 0)
reject these runts more cleanly?
[ ... ]
> @@ -2411,24 +2408,47 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
[ ... ]
> - skb_put(skb, len);
> + /* Build SKB from the page - data starts at hard_start,
> + * frame begins after RSB(64) + pad(2) = 66 bytes.
> + */
> + skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
> + if (unlikely(!skb)) {
> + BCMGENET_STATS64_INC(stats, dropped);
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> + goto next;
> + }
> +
> + skb_mark_for_recycle(skb);
>
> - /* remove RSB and hardware 2bytes added for IP alignment */
> - skb_pull(skb, 66);
> - len -= 66;
> + /* Reserve the RSB + pad, then set the data length */
> + skb_reserve(skb, GENET_RSB_PAD);
> + __skb_put(skb, len - GENET_RSB_PAD);
>
> if (priv->crc_fwd_en) {
> - skb_trim(skb, len - ETH_FCS_LEN);
> - len -= ETH_FCS_LEN;
> + skb_trim(skb, skb->len - ETH_FCS_LEN);
> }
Separately, the pre-patch code incremented priv->mib.rx_dma_failed on
dma_mapping_error() inside bcmgenet_rx_refill(). With page_pool taking
over DMA mapping, nothing increments rx_dma_failed anymore, but the
counter is still exported via ethtool -S through:
STAT_GENET_SOFT_MIB("rx_dma_failed", mib.rx_dma_failed)
So the field will be permanently zero after this patch, which is an
implicit user-visible change not mentioned in the commit message. Should
rx_dma_failed be removed from the stats table (or folded into
alloc_rx_buff_failed) as part of this conversion, and the change noted in
the commit log?
[ ... ]
--
pw-bot: cr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP)
2026-05-06 9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
@ 2026-05-10 18:47 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-10 18:47 UTC (permalink / raw)
To: nb
Cc: Jakub Kicinski, netdev, justin.chen, horms, mohsin.bashr, opendmb,
florian.fainelli, bcm-kernel-feedback-list, andrew+netdev,
edumazet, pabeni, hmohsin, davem, ast, daniel, hawk,
john.fastabend, sdf, linux-kernel, bpf
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: bcmgenet: add basic XDP support (PASS/DROP)
This patch adds XDP program attachment via ndo_bpf and executes XDP
programs in the RX path, handling XDP_PASS by building an SKB from the
xdp_buff and XDP_DROP by returning the page to the page pool.
NETDEV_XDP_ACT_BASIC is advertised in xdp_features.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 5bedc18685b0..ee1d4ecc2b87 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2326,9 +2381,12 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> (rxpktprocessed < budget)) {
> struct status_64 *status;
> struct page *rx_page;
> + unsigned int xdp_act;
> unsigned int rx_off;
> - __be16 rx_csum;
> + struct xdp_buff xdp;
> + __be16 rx_csum = 0;
> void *hard_start;
> + int pkt_len;
>
> cb = &priv->rx_cbs[ring->read_ptr];
>
> @@ -2413,30 +2471,34 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> goto next;
> } /* error packet */
>
> - /* Build SKB from the page - data starts at hard_start,
> - * frame begins after RSB(64) + pad(2) = 66 bytes.
> + pkt_len = len - GENET_RSB_PAD;
> + if (priv->crc_fwd_en)
> + pkt_len -= ETH_FCS_LEN;
Can this produce a negative pkt_len on short frames?
The earlier sanity check only rejects len < GENET_RSB_PAD:
if (unlikely(len > RX_BUF_LENGTH || len < GENET_RSB_PAD)) {
...
goto next;
}
So len in the range [GENET_RSB_PAD, GENET_RSB_PAD + ETH_FCS_LEN - 1]
(i.e. 66..69) will pass that check. When crc_fwd_en is set, the
computation then yields pkt_len in [-4, -1].
The driver already tracks such frames via rx_runt_cnt / rx_runt_fcs /
rx_runt_bytes in its MIB counters, so the hardware can deliver them.
> +
> + /* Save rx_csum before XDP runs - an XDP program
> + * could overwrite the RSB via bpf_xdp_adjust_head.
> */
> - skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
> - if (unlikely(!skb)) {
> - BCMGENET_STATS64_INC(stats, dropped);
> - page_pool_put_full_page(ring->page_pool, rx_page,
> - true);
> - goto next;
> - }
> + if (dev->features & NETIF_F_RXCSUM)
> + rx_csum = (__force __be16)(status->rx_csum & 0xffff);
>
> - skb_mark_for_recycle(skb);
> + xdp_init_buff(&xdp, PAGE_SIZE, &ring->xdp_rxq);
> + xdp_prepare_buff(&xdp, page_address(rx_page),
> + GENET_RX_HEADROOM, pkt_len, true);
A negative pkt_len gets passed here as data_len, so
xdp_prepare_buff() sets xdp->data_end = xdp->data + pkt_len and
xdp->data_end is now below xdp->data.
>
> - /* Reserve the RSB + pad, then set the data length */
> - skb_reserve(skb, GENET_RSB_PAD);
> - __skb_put(skb, len - GENET_RSB_PAD);
> + xdp_act = bcmgenet_run_xdp(ring, xdp_prog, &xdp, rx_page);
> + if (xdp_act != XDP_PASS)
> + goto next;
>
> - if (priv->crc_fwd_en) {
> - skb_trim(skb, skb->len - ETH_FCS_LEN);
> + skb = bcmgenet_xdp_build_skb(ring, &xdp);
With no XDP program attached, bcmgenet_run_xdp() unconditionally
returns XDP_PASS and bcmgenet_xdp_build_skb() is called, which does:
__skb_put(skb, xdp->data_end - xdp->data);
The (ptrdiff_t) xdp->data_end - xdp->data is negative, but __skb_put()
takes an unsigned int, so it becomes a value near 4GB. __skb_put()
does not bounds-check and unconditionally advances skb->tail and
skb->len by that amount. The resulting SKB is then handed to
napi_gro_receive().
Would it make sense to tighten the length check to include ETH_FCS_LEN
when crc_fwd_en is set, or validate pkt_len >= 0 before calling
xdp_prepare_buff()?
The pre-patch code happened to be safe here because skb_trim() has
implicit underflow protection:
if (skb->len > len)
...
So removing the skb_trim path drops that protection.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support
2026-05-06 9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
@ 2026-05-10 18:52 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-10 18:52 UTC (permalink / raw)
To: nb
Cc: Jakub Kicinski, netdev, justin.chen, horms, mohsin.bashr, opendmb,
florian.fainelli, bcm-kernel-feedback-list, andrew+netdev,
edumazet, pabeni, davem, ast, daniel, hawk, john.fastabend, sdf,
linux-kernel, bpf
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: bcmgenet: add XDP_TX support
Implement XDP_TX using ring 16 (DESC_INDEX), the hardware default
descriptor ring, dedicated to XDP TX for isolation from SKB TX queues.
[ ... ]
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index ee1d4ecc2b874..f1e515526787a 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -49,8 +49,10 @@
>
> #define GENET_Q0_RX_BD_CNT \
> (TOTAL_DESC - priv->hw_params->rx_queues * priv->hw_params->rx_bds_per_q)
> +#define GENET_Q16_TX_BD_CNT 32
Will this cause deterministic packet drops on XDP_TX? The NAPI budget is
typically 64. Since the RX poll reclaims XDP TX completions only once at the
beginning of the NAPI cycle, a burst of more than 32 XDP_TX actions would
exhaust the TX ring and drop subsequent packets in that burst.
> #define GENET_Q0_TX_BD_CNT \
> - (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q)
> + (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q \
> + - GENET_Q16_TX_BD_CNT)
[ ... ]
> @@ -2315,14 +2432,42 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
> switch (act) {
> case XDP_PASS:
> return XDP_PASS;
> + case XDP_TX:
> + /* Prepend a zeroed TSB (Transmit Status Block). The GENET
> + * MAC has TBUF_64B_EN set globally, so hardware expects every
> + * TX buffer to begin with a 64-byte struct status_64. Back
> + * up xdp->data into the RSB area (which is no longer needed
> + * after the BPF program ran) and zero it.
> + */
> + if (xdp->data - xdp->data_hard_start <
> + sizeof(struct status_64) + sizeof(struct xdp_frame)) {
Does this headroom check need to account for the metadata size? If headroom
is tight, shifting the pointers might cause xdp_update_frame_from_buff()
to fail with -ENOSPC and drop the packet.
Jakub: it may be cleaner to invalidate XDP metadata instead of half-hangling it.
AFAIU it doesn't get used past this point anyway.
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> + return XDP_DROP;
> + }
> + xdp->data -= sizeof(struct status_64);
> + xdp->data_meta -= sizeof(struct status_64);
> + memset(xdp->data, 0, sizeof(struct status_64));
Could this corrupt XDP metadata and copy uninitialized memory?
If the XDP program attached metadata, the physical metadata bytes immediately
precede the original xdp->data. The memset() here spans the exact memory area
where the original metadata resides, overwriting it with zeros.
Because xdp->data_meta is decremented, it now points to uninitialized
headroom memory, which xdp_convert_buff_to_frame() would read as the frame's
metadata.
> +
> + xdpf = xdp_convert_buff_to_frame(xdp);
> + if (unlikely(!xdpf)) {
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> + return XDP_DROP;
> + }
[ ... ]
> @@ -2548,9 +2693,15 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
> {
> struct bcmgenet_rx_ring *ring = container_of(napi,
> struct bcmgenet_rx_ring, napi);
> + struct bcmgenet_priv *priv = ring->priv;
> struct dim_sample dim_sample = {};
> unsigned int work_done;
>
> + /* Reclaim completed XDP TX frames (ring 16 has no interrupt) */
> + if (priv->xdp_tx_ring.free_bds < priv->xdp_tx_ring.size)
> + bcmgenet_tx_reclaim(priv->dev,
> + &priv->xdp_tx_ring, false);
If the NAPI budget is not fully consumed, napi_complete_done() is called and
NAPI exits. Since there is no TX interrupt for ring 16, could XDP TX
completions stall indefinitely if RX traffic stops? This might leave buffers
in the TX ring, eventually leading to page pool exhaustion.
Also, is there a risk of severe lock and MMIO contention here?
Since the single XDP TX ring is shared across all RX queues, every active
RX NAPI will concurrently evaluate this condition. Since hardware transmission
takes time, this condition could remain true across multiple concurrent NAPI
polls. As a result, all active RX queues could simultaneously call
bcmgenet_tx_reclaim(), heavily contending for spin_lock_bh(&ring->lock) and
performing expensive MMIO reads.
Jakub: also, you can't reclaim XDP Tx if budget is 0
> +
> work_done = bcmgenet_desc_rx(ring, budget);
>
> if (work_done < budget && napi_complete_done(napi, work_done))
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
2026-05-06 9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
@ 2026-05-10 18:55 ` Jakub Kicinski
1 sibling, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-10 18:55 UTC (permalink / raw)
To: nb
Cc: Jakub Kicinski, netdev, justin.chen, horms, mohsin.bashr, opendmb,
florian.fainelli, bcm-kernel-feedback-list, andrew+netdev,
edumazet, pabeni, hmohsin, davem, ast, daniel, hawk,
john.fastabend, sdf, linux-kernel, bpf
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
This commit adds XDP_REDIRECT handling in the RX path and implements
ndo_xdp_xmit so other devices can redirect frames into bcmgenet via
ring 16. It also advertises NETDEV_XDP_ACT_REDIRECT and
NETDEV_XDP_ACT_NDO_XMIT in xdp_features.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index f1e515526787..4e4fe785f0bf 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -4027,6 +4048,36 @@ static int bcmgenet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> }
> }
>
> +static int bcmgenet_xdp_xmit(struct net_device *dev, int num_frames,
> + struct xdp_frame **frames, u32 flags)
> +{
> + struct bcmgenet_priv *priv = netdev_priv(dev);
> + struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
> + int sent = 0;
> + int i;
> +
> + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
> + return -EINVAL;
> +
> + if (unlikely(!netif_running(dev)))
> + return -ENETDOWN;
> +
> + spin_lock(&ring->lock);
> +
> + for (i = 0; i < num_frames; i++) {
> + if (!bcmgenet_xdp_xmit_frame(priv, ring, frames[i], true))
> + break;
> + sent++;
> + }
> +
> + if (sent)
> + bcmgenet_xdp_ring_doorbell(priv, ring);
> +
> + spin_unlock(&ring->lock);
> +
> + return sent;
> +}
Can this path make forward progress when bcmgenet is used purely as an
XDP_REDIRECT destination (for example, redirecting from a USB NIC or
Wi-Fi adapter into the on-SoC bcmgenet)?
The xdp_tx_ring (ring DESC_INDEX) has no dedicated TX interrupt and no
TX NAPI. bcmgenet_init_tx_ring() explicitly skips the TX NAPI for this
ring:
/* Initialize Tx NAPI for priority queues only; ring DESC_INDEX
* (XDP TX) has its completions handled inline in RX NAPI.
*/
if (index != DESC_INDEX)
netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
The only place that reclaims completed descriptors on this ring is
bcmgenet_rx_poll():
static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
{
...
/* Reclaim completed XDP TX frames (ring 16 has no interrupt) */
if (priv->xdp_tx_ring.free_bds < priv->xdp_tx_ring.size)
bcmgenet_tx_reclaim(priv->dev,
&priv->xdp_tx_ring, false);
...
}
Before this patch, the only producer on this ring was XDP_TX from
bcmgenet's own RX path, so reclaim and production were always paired
with RX NAPI.
With ndo_xdp_xmit, a remote device becomes a producer on the same
ring. If bcmgenet itself has no (or very sparse) RX traffic, its RX
NAPI is never scheduled, bcmgenet_tx_reclaim() on xdp_tx_ring is never
called, free_bds drains to 0, and:
> + for (i = 0; i < num_frames; i++) {
> + if (!bcmgenet_xdp_xmit_frame(priv, ring, frames[i], true))
> + break;
> + sent++;
> + }
bcmgenet_xdp_xmit_frame() returns false on the free_bds < 1 check and
bcmgenet_xdp_xmit() returns 0, so bq_xmit_all() in the redirect caller
counts every frame as a drop.
There does not appear to be a TX completion IRQ, a watchdog, or a
napi_schedule() kick from bcmgenet_xdp_xmit() to trigger reclaim
independently of RX traffic. Is there something else that drains the
ring in this configuration, or does ndo_xdp_xmit need to schedule the
RX NAPI (or an independent reclaim path) when it enqueues frames?
> @@ -4341,7 +4393,8 @@ static int bcmgenet_probe(struct platform_device *pdev)
> NETIF_F_RXCSUM;
> dev->hw_features |= dev->features;
> dev->vlan_features |= dev->features;
> - dev->xdp_features = NETDEV_XDP_ACT_BASIC;
> + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
> + NETDEV_XDP_ACT_NDO_XMIT;
Relatedly, advertising NETDEV_XDP_ACT_NDO_XMIT tells user space and
redirect maps that this device is a usable xmit target. Should this
bit only be advertised once ring 16 has an independent completion
source, or is the expectation that users of this feature will always
have concurrent RX traffic on bcmgenet to drive the reclaim?
--
pw-bot: cr
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-05-10 18:55 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:46 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:47 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:52 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-10 18:55 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters Nicolai Buchwitz
2026-05-07 19:17 ` sashiko-bot
2026-05-06 9:55 ` [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP Nicolai Buchwitz
2026-05-07 19:18 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox