Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v2 2/2] bnge: remove unsupported backing store type
From: Vikas Gupta @ 2026-04-16  5:22 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: dharmender.garg, netdev, davem, edumazet, kuba, pabeni,
	andrew+netdev, horms, linux-kernel, vsrama-krishna.nemani,
	bhargava.marreddy, rajashekar.hudumula, ajit.khaparde,
	rahul-rg.gupta
In-Reply-To: <b2735cbf-34ac-4ad8-b524-2aa0f57511f8@intel.com>

[-- Attachment #1: Type: text/plain, Size: 2548 bytes --]

On Thu, Apr 16, 2026 at 9:24 AM Przemek Kitszel
<przemyslaw.kitszel@intel.com> wrote:
>
> On 4/15/26 17:16, Vikas Gupta wrote:
> > The backing store type, BNGE_CTX_MRAV, is not applicable in Thor Ultra
> > devices. Remove it from the backing store configuration, as the firmware
>
> I guess the removed code was needed for previous devices, what is the
> impact for them?

This driver does not support previous devices. Thor Ultra devices have
split MRAV
into two separate contexts, MR and AV. Support for them will be added
in a future
patch series.

>
> > will not populate entities in this backing store type, due to which the
> > driver load fails.
> >
> > Fixes: 29c5b358f385 ("bng_en: Add backing store support")
> > Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
> > Reviewed-by: Dharmender Garg <dharmender.garg@broadcom.com>
> > ---
> >   drivers/net/ethernet/broadcom/bnge/bnge_rmem.c | 16 ----------------
> >   1 file changed, 16 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
> > index 94f15e08a88c..b066ee887a09 100644
> > --- a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
> > +++ b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
> > @@ -324,7 +324,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
> >       u32 l2_qps, qp1_qps, max_qps;
> >       u32 ena, entries_sp, entries;
> >       u32 srqs, max_srqs, min;
> > -     u32 num_mr, num_ah;
> >       u32 extra_srqs = 0;
> >       u32 extra_qps = 0;
> >       u32 fast_qpmd_qps;
> > @@ -390,21 +389,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
> >       if (!bnge_is_roce_en(bd))
> >               goto skip_rdma;
> >
> > -     ctxm = &ctx->ctx_arr[BNGE_CTX_MRAV];
> > -     /* 128K extra is needed to accommodate static AH context
> > -      * allocation by f/w.
> > -      */
> > -     num_mr = min_t(u32, ctxm->max_entries / 2, 1024 * 256);
> > -     num_ah = min_t(u32, num_mr, 1024 * 128);
> > -     ctxm->split_entry_cnt = BNGE_CTX_MRAV_AV_SPLIT_ENTRY + 1;
> > -     if (!ctxm->mrav_av_entries || ctxm->mrav_av_entries > num_ah)
> > -             ctxm->mrav_av_entries = num_ah;
> > -
> > -     rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, num_mr + num_ah, 2);
> > -     if (rc)
> > -             return rc;
> > -     ena |= FUNC_BACKING_STORE_CFG_REQ_ENABLES_MRAV;
> > -
> >       ctxm = &ctx->ctx_arr[BNGE_CTX_TIM];
> >       rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, l2_qps + qp1_qps + extra_qps, 1);
> >       if (rc)
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5465 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 3/4] nfs: make nfs_page pin-aware
From: Christoph Hellwig @ 2026-04-16  5:28 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Christoph Hellwig, trond.myklebust, anna, davem, kuba, edumazet,
	pabeni, chuck.lever, jlayton, tom, okorniev, neil, dai.ngo,
	linux-nfs, netdev
In-Reply-To: <ad6cVbDGy3alQ2uK@google.com>

On Tue, Apr 14, 2026 at 07:58:13PM +0000, Pranjal Shrivastava wrote:
> > > +			req = nfs_page_create_from_page(dreq->ctx, pagevec[i], false,
> > >  							pgbase, pos, req_len);
> > >
> > 
> > A lot of this code reads pretty odd as it's overflowing the lines.
> > 
> 
> Ahh, my bad. For some reason even checkpatch didn't catch this, I'll fix
> this here and everywhere else.

checkpatch is unfortunately completely broken :(  It misses lots of
important bits, but at the same time has complete incoherent and crazy
warnings.

^ permalink raw reply

* Re: [RFC PATCH 4/4] nfs: allow P2PDMA in direct I/O path
From: Christoph Hellwig @ 2026-04-16  5:29 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Christoph Hellwig, trond.myklebust, anna, davem, kuba, edumazet,
	pabeni, chuck.lever, jlayton, tom, okorniev, neil, dai.ngo,
	linux-nfs, netdev
In-Reply-To: <ad6c5fI0HsHkUbKH@google.com>

On Tue, Apr 14, 2026 at 08:00:37PM +0000, Pranjal Shrivastava wrote:
> > Please split theconversion to iov_iter_extract_pages into a separate
> > preparation patch, and even series.  That is a long overdue change
> > that fixes potential data corruption in XFS.
> > 
> 
> Sure, I'll send out a series with the migration to 
> iov_iter_extract_pages, should I club this with the pin-aware + folios
> for direct I/O or send it as a separate series?

I think combining all this sounds find.  I'd just do the P2P separately
as it is bound to get quite a bit more complicated.


^ permalink raw reply

* Re: [PATCH net 1/1] 8021q: free cleared egress QoS mappings safely
From: Yuan Tan @ 2026-04-16  5:35 UTC (permalink / raw)
  To: Simon Horman, Ren Wei
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, kees,
	yifanwucs, tomapufckgml, bird, ylong030
In-Reply-To: <20260415151545.GM772670@horms.kernel.org>


On 4/15/26 08:15, Simon Horman wrote:
> On Mon, Apr 13, 2026 at 05:07:20PM +0800, Ren Wei wrote:
>> From: Longxuan Yu <ylong030@ucr.edu>
>>
>> vlan_dev_set_egress_priority() leaves cleared egress priority mapping
>> nodes in the hash until device teardown. Repeated set/clear cycles with
>> distinct skb priorities therefore allocate an unbounded number of
>> vlan_priority_tci_mapping objects and leak memory.
>>
>> Delete mappings when vlan_prio is cleared instead of keeping
>> tombstones. The TX fast path and reporting paths walk the lists without
>> RTNL, so convert the egress mapping lists to RCU-protected pointers and
>> defer freeing removed nodes until after a grace period.
>>
>> Cc: stable@kernel.org
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Reported-by: Yifan Wu <yifanwucs@gmail.com>
>> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
>> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
>> Suggested-by: Xin Liu <bird@lzu.edu.cn>
>> Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
>> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>> ---
>>  include/linux/if_vlan.h  | 23 +++++++++++--------
>>  net/8021q/vlan_dev.c     | 48 +++++++++++++++++++++++-----------------
>>  net/8021q/vlan_netlink.c |  9 +++-----
>>  net/8021q/vlanproc.c     | 12 ++++++----
>>  4 files changed, 53 insertions(+), 39 deletions(-)
> There is a lot of change here. And I'd suggest splitting the patch up into
> (at least) two patches:
>
> 1. Convert mappings to use RCU
> 2. Fix bug
>
> As is, the bug fix itself is difficult to isolate amongst the other changes.
>
> Also, AI generated review suggests that this bug was introduced by commit
> b020cb488586 ("[VLAN]: Keep track of number of QoS mappings"). If so,
> it would be appropriate to use that commit in the Fixes tag.
>
Thank you very much for your review and suggestions. We will try to
revise it in this direction.
May I ask whether we should include your “Suggested-by” tag in the patch?


^ permalink raw reply

* Re: [PATCH v3 net] ax25: fix OOB read after address header strip in ax25_rcv()
From: Ashutosh Desai @ 2026-04-16  5:39 UTC (permalink / raw)
  To: David Laight
  Cc: netdev, linux-hams, jreuter, davem, edumazet, kuba, pabeni, horms,
	stable, linux-kernel
In-Reply-To: <20260415085921.757b48a0@pumpkin>

On Wed, 15 Apr 2026 08:59:21 +0100, David Laight wrote:
> Is it just worth linearising the skb on entry to all this code?

Thanks for the feedback, David.

skb_linearize() on entry is a nice idea for simplifying sanity checks
overall, but it wouldn't fix this particular bug on its own - the issue
is skb->len dropping to zero after skb_pull(), not non-linear data. We'd
still need a length check regardless. pskb_may_pull(skb, 2) handles both
in one call.

That said, linearizing on entry to ax25_rcv() as a cleanup to simplify
future checks sounds worthwhile - happy to send that as a separate
net-next patch.

^ permalink raw reply

* [PATCH RFC] xfrm: enforce SPI uniqueness for inbound SAs only
From: Antony Antony @ 2026-04-16  5:44 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Aakash Kumar S, Yan Yan
  Cc: Abed Mohammad Kamaluddin, Nathan Harold, netdev, devel,
	Antony Antony

Per RFC 4301 section 4.4.2.1, the SPI is selected by the receiving
end, which is interpreted as making SPI uniqueness an inbound-only
requirement.

Commit 94f39804d891 ("xfrm: Duplicate SPI Handling") introduced
xfrm_state_lookup_spi_proto() to fix duplicate SPI allocation for
inbound SAs with different destination addresses.  However, it enforces
global uniqueness by (spi, proto) across all states regardless of
direction, which causes SPI allocation to fail for outbound SAs when
the same (spi, proto) is already in use by an inbound SA.

When x->dir == XFRM_DIR_IN, enforce SPI uniqueness via
xfrm_state_lookup_spi_proto() scoped to inbound SAs. SAs created via
PF_KEY, without direction, or with XFRM_DIR_OUT restore the
pre 94f39804d891, RFC 2401 lookup by (daddr, spi, proto).

Reported-by: Yan Yan <evitayan@google.com>
Fixes: 94f39804d891 ("xfrm: Duplicate SPI Handling")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
 net/xfrm/xfrm_state.c | 16 ++++++++++++++--
 net/xfrm/xfrm_user.c  |  6 +++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1748d374abca..b1ec95141512 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1698,15 +1698,21 @@ struct xfrm_state *xfrm_state_lookup_byspi(struct net *net, __be32 spi,
 }
 EXPORT_SYMBOL(xfrm_state_lookup_byspi);
 
-static struct xfrm_state *xfrm_state_lookup_spi_proto(struct net *net, __be32 spi, u8 proto)
+static struct xfrm_state *xfrm_state_lookup_input_spi(struct net *net,
+						      __be32 spi, u8 proto,
+						      u8 dir)
 {
 	struct xfrm_state *x;
 	unsigned int i;
 
 	for (i = 0; i <= net->xfrm.state_hmask; i++) {
 		hlist_for_each_entry(x, xfrm_state_deref_prot(net->xfrm.state_byspi, net) + i, byspi) {
+			if (x->dir != dir)
+				continue;
+
 			if (x->id.spi == spi && x->id.proto == proto)
 				return x;
+
 		}
 	}
 	return NULL;
@@ -2578,6 +2584,7 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high,
 	struct xfrm_state *x0;
 	int err = -ENOENT;
 	u32 range = high - low + 1;
+	u32 mark = x->mark.v & x->mark.m;
 	__be32 newspi = 0;
 
 	spin_lock_bh(&x->lock);
@@ -2599,7 +2606,12 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high,
 		newspi = htonl(spi);
 
 		spin_lock_bh(&net->xfrm.xfrm_state_lock);
-		x0 = xfrm_state_lookup_spi_proto(net, newspi, x->id.proto);
+		if (x->dir == XFRM_SA_DIR_IN)
+			x0 = xfrm_state_lookup_input_spi(net, newspi,
+							 x->id.proto, x->dir);
+		else
+			x0 = xfrm_state_lookup(net, mark, &x->id.daddr, newspi,
+					       x->id.proto, x->props.family);
 		if (!x0) {
 			x->id.spi = newspi;
 			h = xfrm_spi_hash(net, &x->id.daddr, newspi, x->id.proto, x->props.family);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d56450f61669..f9db2d2c392b 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1883,13 +1883,13 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto out_noput;
 	}
 
+	if (attrs[XFRMA_SA_DIR])
+		x->dir = nla_get_u8(attrs[XFRMA_SA_DIR]);
+
 	err = xfrm_alloc_spi(x, p->min, p->max, extack);
 	if (err)
 		goto out;
 
-	if (attrs[XFRMA_SA_DIR])
-		x->dir = nla_get_u8(attrs[XFRMA_SA_DIR]);
-
 	resp_skb = xfrm_state_netlink(skb, x, nlh->nlmsg_seq);
 	if (IS_ERR(resp_skb)) {
 		err = PTR_ERR(resp_skb);

---
base-commit: 426c355742f02cf743b347d9d7dbdc1bfbfa31ef
change-id: 20260330-alloc-spi-dir-3b6e2f4b34e9

Best regards,
--  
Antony Antony <antony.antony@secunet.com>


^ permalink raw reply related

* [PATCH net-next v7 3/7] net: bcmgenet: add basic XDP support (PASS/DROP)
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Add XDP program attachment via ndo_bpf and execute XDP programs in the
RX path. XDP_PASS builds an SKB from the xdp_buff (handling
xdp_adjust_head/tail), XDP_DROP returns the page to page_pool without
SKB allocation.

XDP_TX and XDP_REDIRECT are not yet supported and return XDP_ABORTED.

Advertise NETDEV_XDP_ACT_BASIC in xdp_features.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 129 +++++++++++++++---
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   4 +
 2 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index dd00196b9d4b..b09e5c3c3543 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -35,6 +35,8 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/phy.h>
+#include <linux/bpf_trace.h>
+#include <linux/filter.h>
 
 #include <linux/unaligned.h>
 
@@ -2273,6 +2275,56 @@ static int bcmgenet_rx_refill(struct bcmgenet_rx_ring *ring,
 	return 0;
 }
 
+static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
+					      struct xdp_buff *xdp)
+{
+	unsigned int metasize;
+	struct sk_buff *skb;
+
+	skb = napi_build_skb(xdp->data_hard_start, PAGE_SIZE);
+	if (unlikely(!skb))
+		return NULL;
+
+	skb_mark_for_recycle(skb);
+
+	metasize = xdp->data - xdp->data_meta;
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
+	__skb_put(skb, xdp->data_end - xdp->data);
+
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	return skb;
+}
+
+static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
+				     struct bpf_prog *prog,
+				     struct xdp_buff *xdp,
+				     struct page *rx_page)
+{
+	unsigned int act;
+
+	if (!prog)
+		return XDP_PASS;
+
+	act = bpf_prog_run_xdp(prog, xdp);
+
+	switch (act) {
+	case XDP_PASS:
+		return XDP_PASS;
+	case XDP_DROP:
+		page_pool_put_full_page(ring->page_pool, rx_page, true);
+		return XDP_DROP;
+	default:
+		bpf_warn_invalid_xdp_action(ring->priv->dev, prog, act);
+		fallthrough;
+	case XDP_ABORTED:
+		trace_xdp_exception(ring->priv->dev, prog, act);
+		page_pool_put_full_page(ring->page_pool, rx_page, true);
+		return XDP_ABORTED;
+	}
+}
+
 /* bcmgenet_desc_rx - descriptor based rx process.
  * this could be called from bottom half, or from NAPI polling method.
  */
@@ -2282,6 +2334,7 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 	struct bcmgenet_rx_stats64 *stats = &ring->stats64;
 	struct bcmgenet_priv *priv = ring->priv;
 	struct net_device *dev = priv->dev;
+	struct bpf_prog *xdp_prog;
 	struct enet_cb *cb;
 	struct sk_buff *skb;
 	u32 dma_length_status;
@@ -2292,6 +2345,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 	unsigned int p_index, mask;
 	unsigned int discards;
 
+	xdp_prog = READ_ONCE(priv->xdp_prog);
+
 	/* Clear status before servicing to reduce spurious interrupts */
 	mask = 1 << (UMAC_IRQ1_RX_INTR_SHIFT + ring->index);
 	bcmgenet_intrl2_1_writel(priv, mask, INTRL2_CPU_CLEAR);
@@ -2323,9 +2378,12 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 	       (rxpktprocessed < budget)) {
 		struct status_64 *status;
 		struct page *rx_page;
+		unsigned int xdp_act;
 		unsigned int rx_off;
-		__be16 rx_csum;
+		struct xdp_buff xdp;
+		__be16 rx_csum = 0;
 		void *hard_start;
+		int pkt_len;
 
 		cb = &priv->rx_cbs[ring->read_ptr];
 
@@ -2402,30 +2460,34 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 			goto next;
 		} /* error packet */
 
-		/* Build SKB from the page - data starts at hard_start,
-		 * frame begins after RSB(64) + pad(2) = 66 bytes.
+		pkt_len = len - GENET_RSB_PAD;
+		if (priv->crc_fwd_en)
+			pkt_len -= ETH_FCS_LEN;
+
+		/* Save rx_csum before XDP runs - an XDP program
+		 * could overwrite the RSB via bpf_xdp_adjust_head.
 		 */
-		skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
-		if (unlikely(!skb)) {
-			BCMGENET_STATS64_INC(stats, dropped);
-			page_pool_put_full_page(ring->page_pool, rx_page,
-						true);
-			goto next;
-		}
+		if (dev->features & NETIF_F_RXCSUM)
+			rx_csum = (__force __be16)(status->rx_csum & 0xffff);
 
-		skb_mark_for_recycle(skb);
+		xdp_init_buff(&xdp, PAGE_SIZE, &ring->xdp_rxq);
+		xdp_prepare_buff(&xdp, page_address(rx_page),
+				 GENET_RX_HEADROOM, pkt_len, true);
 
-		/* Reserve the RSB + pad, then set the data length */
-		skb_reserve(skb, GENET_RSB_PAD);
-		__skb_put(skb, len - GENET_RSB_PAD);
+		xdp_act = bcmgenet_run_xdp(ring, xdp_prog, &xdp, rx_page);
+		if (xdp_act != XDP_PASS)
+			goto next;
 
-		if (priv->crc_fwd_en) {
-			skb_trim(skb, skb->len - ETH_FCS_LEN);
+		skb = bcmgenet_xdp_build_skb(ring, &xdp);
+		if (unlikely(!skb)) {
+			BCMGENET_STATS64_INC(stats, dropped);
+			page_pool_put_full_page(ring->page_pool,
+						rx_page, true);
+			goto next;
 		}
 
 		/* Set up checksum offload */
 		if (dev->features & NETIF_F_RXCSUM) {
-			rx_csum = (__force __be16)(status->rx_csum & 0xffff);
 			if (rx_csum) {
 				skb->csum = (__force __wsum)ntohs(rx_csum);
 				skb->ip_summed = CHECKSUM_COMPLETE;
@@ -3743,6 +3805,37 @@ static int bcmgenet_change_carrier(struct net_device *dev, bool new_carrier)
 	return 0;
 }
 
+static int bcmgenet_xdp_setup(struct net_device *dev,
+			      struct netdev_bpf *xdp)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct bpf_prog *old_prog;
+	struct bpf_prog *prog = xdp->prog;
+
+	if (prog && dev->mtu > PAGE_SIZE - GENET_RX_HEADROOM -
+	    SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) {
+		NL_SET_ERR_MSG_MOD(xdp->extack,
+				   "MTU too large for single-page XDP buffer");
+		return -EOPNOTSUPP;
+	}
+
+	old_prog = xchg(&priv->xdp_prog, prog);
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	return 0;
+}
+
+static int bcmgenet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bcmgenet_xdp_setup(dev, xdp);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static const struct net_device_ops bcmgenet_netdev_ops = {
 	.ndo_open		= bcmgenet_open,
 	.ndo_stop		= bcmgenet_close,
@@ -3754,6 +3847,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
 	.ndo_set_features	= bcmgenet_set_features,
 	.ndo_get_stats64	= bcmgenet_get_stats64,
 	.ndo_change_carrier	= bcmgenet_change_carrier,
+	.ndo_bpf		= bcmgenet_xdp,
 };
 
 /* GENET hardware parameters/characteristics */
@@ -4056,6 +4150,7 @@ static int bcmgenet_probe(struct platform_device *pdev)
 			 NETIF_F_RXCSUM;
 	dev->hw_features |= dev->features;
 	dev->vlan_features |= dev->features;
+	dev->xdp_features = NETDEV_XDP_ACT_BASIC;
 
 	netdev_sw_irq_coalesce_default_on(dev);
 
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 82a6d29f481d..1459473ac1b0 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -16,6 +16,7 @@
 #include <linux/dim.h>
 #include <linux/ethtool.h>
 #include <net/page_pool/helpers.h>
+#include <linux/bpf.h>
 #include <net/xdp.h>
 
 #include "../unimac.h"
@@ -671,6 +672,9 @@ struct bcmgenet_priv {
 	u8 sopass[SOPASS_MAX];
 
 	struct bcmgenet_mib_counters mib;
+
+	/* XDP */
+	struct bpf_prog *xdp_prog;
 };
 
 static inline bool bcmgenet_has_40bits(struct bcmgenet_priv *priv)
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 0/7] net: bcmgenet: add XDP support
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, bpf

Add XDP support to the bcmgenet driver, covering XDP_PASS, XDP_DROP,
XDP_TX, XDP_REDIRECT, and ndo_xdp_xmit.

The first patch converts the RX path from the existing kmalloc-based
allocation to page_pool, which is a prerequisite for XDP. The remaining
patches incrementally add XDP functionality and per-action statistics.

Tested on Raspberry Pi CM4 (BCM2711, bcmgenet, 1Gbps link):
- XDP_PASS: 943 Mbit/s TX, 935 Mbit/s RX (no regression vs baseline)
- XDP_PASS latency: 0.164ms avg, 0% packet loss
- XDP_DROP: all inbound traffic blocked as expected
- XDP_TX: TX counter increments (packet reflection working)
- Link flap with XDP attached: no errors
- Program swap under iperf3 load: no errors
- Upstream XDP selftests (xdp.py): pass_sb, drop_sb, tx_sb passing
- XDP-based EtherCAT master (~37 kHz cycle rate, all packet processing
  in BPF/XDP), stable over multiple days

Previous versions:
  v6: https://lore.kernel.org/netdev/20260406083536.839517-1-nb@tipi-net.de/
  v5: https://lore.kernel.org/netdev/20260328230513.415790-1-nb@tipi-net.de/
  v4: https://lore.kernel.org/netdev/20260323120539.136029-1-nb@tipi-net.de/
  v3: https://lore.kernel.org/netdev/20260319115402.353509-1-nb@tipi-net.de/
  v2: https://lore.kernel.org/netdev/20260315214914.1555777-1-nb@tipi-net.de/
  v1: https://lore.kernel.org/netdev/20260313092101.1344954-1-nb@tipi-net.de/

Changes since v6:
  - Removed GENET_XDP_HEADROOM alias, use XDP_PACKET_HEADROOM
    directly. (Jakub Kicinski)
  - Dropped redundant __GFP_NOWARN from page_pool_alloc_pages(),
    page_pool adds it automatically. (Jakub Kicinski)
  - Removed floating code block in desc_rx, moved variables to outer
    scope. (Jakub Kicinski)
  - Make bcmgenet_run_xdp() return XDP_PASS when no program is set,
    removing the if (xdp_prog) indentation from desc_rx.
    (Jakub Kicinski)

Changes since v5:
  - Refactored desc_rx: always prepare xdp_buff and use
    bcmgenet_xdp_build_skb for both XDP and non-XDP paths, treating
    no-prog as XDP_PASS. (Jakub Kicinski)
  - Removed synchronize_net() before bpf_prog_put(), RCU handles
    the grace period. (Jakub Kicinski)
  - Save status->rx_csum before running XDP program to prevent
    bpf_xdp_adjust_head from corrupting the RSB checksum.
    (Jakub Kicinski)
  - Tightened TSB headroom check to include sizeof(struct xdp_frame).
    (Jakub Kicinski)
  - Fixed reclaim gating: check for pending frames on the XDP TX ring
    instead of priv->xdp_prog, so in-flight frames are still reclaimed
    after XDP program detach. (Jakub Kicinski)
  - Removed dead len -= ETH_FCS_LEN in patch 1. (Mohsin Bashir)
  - Added patch 7: minimal ndo_change_mtu that rejects MTU values
    incompatible with XDP when a program is attached. (Mohsin Bashir,
    Florian Fainelli)

Changes since v4:
  - Fixed unused variable warning: moved tx_ring declaration from
    patch 4 to patch 5 where it is first used. (Jakub Kicinski)

Changes since v3:
  - Fixed xdp_prepare_buff() called with meta_valid=false, causing
    bcmgenet_xdp_build_skb() to compute metasize=UINT_MAX and corrupt
    skb meta_len. Now passes true. (Simon Horman)
  - Removed bcmgenet_dump_tx_queue() for ring 16 in bcmgenet_timeout().
    Ring 16 has no netdev TX queue, so netdev_get_tx_queue(dev, 16)
    accessed beyond the allocated _tx array. (Simon Horman)
  - Fixed checkpatch alignment warnings in patches 4 and 5.

Changes since v2:
  - Fixed page leak on partial bcmgenet_alloc_rx_buffers() failure:
    free already-allocated rx_cbs before destroying page pool.
    (Simon Horman)
  - Fixed GENET_Q16_TX_BD_CNT defined as 64 instead of 32.
    (Simon Horman)
  - Moved XDP TX ring to a separate struct member (xdp_tx_ring)
    instead of expanding tx_rings[] to DESC_INDEX+1. (Justin Chen)
  - Added synchronize_net() before bpf_prog_put() in XDP prog swap.
  - Removed goto drop_page inside switch; inlined page_pool_put
    calls in each failure path. (Justin Chen)
  - Removed unnecessary curly braces around case XDP_TX. (Justin Chen)
  - Moved int err hoisting from patch 2 to patch 1. (Justin Chen)
  - Kept return type on same line as function name, per driver
    convention. (Justin Chen)
  - XDP TX packets/bytes now counted in TX reclaim for standard
    network statistics.

Changes since v1:
  - Fixed tx_rings[DESC_INDEX] out-of-bounds access. Expanded array
    to DESC_INDEX+1 and initialized ring 16 with dedicated BDs.
  - Use ring 16 (hardware default descriptor ring) for XDP TX,
    isolating from normal SKB TX queues.
  - Piggyback ring 16 TX completion on RX NAPI poll (INTRL2_1 bit
    collision with RX ring 0).
  - Fixed ring 16 TX reclaim: skip INTRL2_1 clear, skip BQL
    completion, use non-destructive reclaim in RX poll path.
  - Prepend zeroed TSB before XDP TX frame data (TBUF_64B_EN requires
    64-byte struct status_64 prefix on all TX buffers).
  - Tested with upstream XDP selftests (xdp.py): pass_sb, drop_sb,
    tx_sb all passing. The multi-buffer tests (pass_mb, drop_mb,
    tx_mb) fail because bcmgenet does not support jumbo frames /
    MTU changes; I plan to add ndo_change_mtu support in a follow-up
    series.

Nicolai Buchwitz (7):
  net: bcmgenet: convert RX path to page_pool
  net: bcmgenet: register xdp_rxq_info for each RX ring
  net: bcmgenet: add basic XDP support (PASS/DROP)
  net: bcmgenet: add XDP_TX support
  net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
  net: bcmgenet: add XDP statistics counters
  net: bcmgenet: reject MTU changes incompatible with XDP

 drivers/net/ethernet/broadcom/Kconfig         |   1 +
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 637 +++++++++++++++---
 .../net/ethernet/broadcom/genet/bcmgenet.h    |  19 +
 3 files changed, 559 insertions(+), 98 deletions(-)

--
2.51.0


^ permalink raw reply

* [PATCH net-next v7 4/7] net: bcmgenet: add XDP_TX support
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Implement XDP_TX using ring 16 (DESC_INDEX), the hardware default
descriptor ring, dedicated to XDP TX for isolation from SKB TX queues.

Ring 16 gets 32 BDs carved from ring 0's allocation. TX completion is
piggybacked on RX NAPI poll since ring 16's INTRL2_1 bit collides with
RX ring 0, similar to how bnxt, ice, and other XDP drivers handle TX
completion within the RX poll path.

The GENET MAC has TBUF_64B_EN set globally, requiring every TX buffer
to start with a 64-byte struct status_64 (TSB). For local XDP_TX, the
TSB is prepended by backing xdp->data into the RSB area (unused after
BPF execution) and zeroing it. For foreign frames redirected from other
devices, the TSB is written into the xdp_frame headroom.

The page_pool DMA direction is changed from DMA_FROM_DEVICE to
DMA_BIDIRECTIONAL to allow TX reuse of the existing DMA mapping.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 224 ++++++++++++++++--
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   3 +
 2 files changed, 205 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index b09e5c3c3543..3f3682e39267 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -48,8 +48,10 @@
 
 #define GENET_Q0_RX_BD_CNT	\
 	(TOTAL_DESC - priv->hw_params->rx_queues * priv->hw_params->rx_bds_per_q)
+#define GENET_Q16_TX_BD_CNT	32
 #define GENET_Q0_TX_BD_CNT	\
-	(TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q)
+	(TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q \
+	 - GENET_Q16_TX_BD_CNT)
 
 #define RX_BUF_LENGTH		2048
 #define SKB_ALIGNMENT		32
@@ -1892,6 +1894,14 @@ static struct sk_buff *bcmgenet_free_tx_cb(struct device *dev,
 		if (cb == GENET_CB(skb)->last_cb)
 			return skb;
 
+	} else if (cb->xdpf) {
+		if (cb->xdp_dma_map)
+			dma_unmap_single(dev, dma_unmap_addr(cb, dma_addr),
+					 dma_unmap_len(cb, dma_len),
+					 DMA_TO_DEVICE);
+		dma_unmap_addr_set(cb, dma_addr, 0);
+		xdp_return_frame(cb->xdpf);
+		cb->xdpf = NULL;
 	} else if (dma_unmap_addr(cb, dma_addr)) {
 		dma_unmap_page(dev,
 			       dma_unmap_addr(cb, dma_addr),
@@ -1924,10 +1934,16 @@ static unsigned int __bcmgenet_tx_reclaim(struct net_device *dev,
 	unsigned int pkts_compl = 0;
 	unsigned int txbds_ready;
 	unsigned int c_index;
+	struct enet_cb *tx_cb;
 	struct sk_buff *skb;
 
-	/* Clear status before servicing to reduce spurious interrupts */
-	bcmgenet_intrl2_1_writel(priv, (1 << ring->index), INTRL2_CPU_CLEAR);
+	/* Clear status before servicing to reduce spurious interrupts.
+	 * Ring DESC_INDEX (XDP TX) has no interrupt; skip the clear to
+	 * avoid clobbering RX ring 0's bit at the same position.
+	 */
+	if (ring->index != DESC_INDEX)
+		bcmgenet_intrl2_1_writel(priv, BIT(ring->index),
+					 INTRL2_CPU_CLEAR);
 
 	/* Compute how many buffers are transmitted since last xmit call */
 	c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX)
@@ -1940,8 +1956,15 @@ static unsigned int __bcmgenet_tx_reclaim(struct net_device *dev,
 
 	/* Reclaim transmitted buffers */
 	while (txbds_processed < txbds_ready) {
-		skb = bcmgenet_free_tx_cb(&priv->pdev->dev,
-					  &priv->tx_cbs[ring->clean_ptr]);
+		tx_cb = &priv->tx_cbs[ring->clean_ptr];
+		if (tx_cb->xdpf) {
+			pkts_compl++;
+			bytes_compl += tx_cb->xdp_dma_map
+				? tx_cb->xdpf->len
+				: tx_cb->xdpf->len -
+				  sizeof(struct status_64);
+		}
+		skb = bcmgenet_free_tx_cb(&priv->pdev->dev, tx_cb);
 		if (skb) {
 			pkts_compl++;
 			bytes_compl += GENET_CB(skb)->bytes_sent;
@@ -1963,8 +1986,11 @@ static unsigned int __bcmgenet_tx_reclaim(struct net_device *dev,
 	u64_stats_add(&stats->bytes, bytes_compl);
 	u64_stats_update_end(&stats->syncp);
 
-	netdev_tx_completed_queue(netdev_get_tx_queue(dev, ring->index),
-				  pkts_compl, bytes_compl);
+	/* Ring DESC_INDEX (XDP TX) has no netdev TX queue; skip BQL */
+	if (ring->index != DESC_INDEX)
+		netdev_tx_completed_queue(netdev_get_tx_queue(dev,
+							      ring->index),
+					  pkts_compl, bytes_compl);
 
 	return txbds_processed;
 }
@@ -2041,6 +2067,9 @@ static void bcmgenet_tx_reclaim_all(struct net_device *dev)
 	do {
 		bcmgenet_tx_reclaim(dev, &priv->tx_rings[i++], true);
 	} while (i <= priv->hw_params->tx_queues && netif_is_multiqueue(dev));
+
+	/* Also reclaim XDP TX ring */
+	bcmgenet_tx_reclaim(dev, &priv->xdp_tx_ring, true);
 }
 
 /* Reallocate the SKB to put enough headroom in front of it and insert
@@ -2297,11 +2326,96 @@ static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
 	return skb;
 }
 
+static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
+				     struct xdp_frame *xdpf, bool dma_map)
+{
+	struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
+	struct device *kdev = &priv->pdev->dev;
+	struct enet_cb *tx_cb_ptr;
+	dma_addr_t mapping;
+	unsigned int dma_len;
+	u32 len_stat;
+
+	spin_lock(&ring->lock);
+
+	if (ring->free_bds < 1) {
+		spin_unlock(&ring->lock);
+		return false;
+	}
+
+	tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
+
+	if (dma_map) {
+		void *tsb_start;
+
+		/* The GENET MAC has TBUF_64B_EN set globally, so hardware
+		 * expects a 64-byte TSB prefix on every TX buffer.  For
+		 * redirected frames (ndo_xdp_xmit) we prepend a zeroed TSB
+		 * using the frame's headroom.
+		 */
+		if (unlikely(xdpf->headroom < sizeof(struct status_64))) {
+			bcmgenet_put_txcb(priv, ring);
+			spin_unlock(&ring->lock);
+			return false;
+		}
+
+		tsb_start = xdpf->data - sizeof(struct status_64);
+		memset(tsb_start, 0, sizeof(struct status_64));
+
+		dma_len = xdpf->len + sizeof(struct status_64);
+		mapping = dma_map_single(kdev, tsb_start, dma_len,
+					 DMA_TO_DEVICE);
+		if (dma_mapping_error(kdev, mapping)) {
+			tx_cb_ptr->skb = NULL;
+			tx_cb_ptr->xdpf = NULL;
+			bcmgenet_put_txcb(priv, ring);
+			spin_unlock(&ring->lock);
+			return false;
+		}
+	} else {
+		struct page *page = virt_to_page(xdpf->data);
+
+		/* For local XDP_TX the caller already prepended the TSB
+		 * into xdpf->data/len, so dma_len == xdpf->len.
+		 */
+		dma_len = xdpf->len;
+		mapping = page_pool_get_dma_addr(page) +
+			  sizeof(*xdpf) + xdpf->headroom;
+		dma_sync_single_for_device(kdev, mapping, dma_len,
+					   DMA_BIDIRECTIONAL);
+	}
+
+	dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
+	dma_unmap_len_set(tx_cb_ptr, dma_len, dma_len);
+	tx_cb_ptr->skb = NULL;
+	tx_cb_ptr->xdpf = xdpf;
+	tx_cb_ptr->xdp_dma_map = dma_map;
+
+	len_stat = (dma_len << DMA_BUFLENGTH_SHIFT) |
+		   (priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT) |
+		   DMA_TX_APPEND_CRC | DMA_SOP | DMA_EOP;
+
+	dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping, len_stat);
+
+	ring->free_bds--;
+	ring->prod_index++;
+	ring->prod_index &= DMA_P_INDEX_MASK;
+
+	bcmgenet_tdma_ring_writel(priv, ring->index, ring->prod_index,
+				  TDMA_PROD_INDEX);
+
+	spin_unlock(&ring->lock);
+
+	return true;
+}
+
 static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 				     struct bpf_prog *prog,
 				     struct xdp_buff *xdp,
 				     struct page *rx_page)
 {
+	struct bcmgenet_priv *priv = ring->priv;
+	struct xdp_frame *xdpf;
 	unsigned int act;
 
 	if (!prog)
@@ -2312,14 +2426,42 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 	switch (act) {
 	case XDP_PASS:
 		return XDP_PASS;
+	case XDP_TX:
+		/* Prepend a zeroed TSB (Transmit Status Block).  The GENET
+		 * MAC has TBUF_64B_EN set globally, so hardware expects every
+		 * TX buffer to begin with a 64-byte struct status_64.  Back
+		 * up xdp->data into the RSB area (which is no longer needed
+		 * after the BPF program ran) and zero it.
+		 */
+		if (xdp->data - xdp->data_hard_start <
+		    sizeof(struct status_64) + sizeof(struct xdp_frame)) {
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
+			return XDP_DROP;
+		}
+		xdp->data -= sizeof(struct status_64);
+		xdp->data_meta -= sizeof(struct status_64);
+		memset(xdp->data, 0, sizeof(struct status_64));
+
+		xdpf = xdp_convert_buff_to_frame(xdp);
+		if (unlikely(!xdpf)) {
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
+			return XDP_DROP;
+		}
+		if (unlikely(!bcmgenet_xdp_xmit_frame(priv, xdpf, false))) {
+			xdp_return_frame_rx_napi(xdpf);
+			return XDP_DROP;
+		}
+		return XDP_TX;
 	case XDP_DROP:
 		page_pool_put_full_page(ring->page_pool, rx_page, true);
 		return XDP_DROP;
 	default:
-		bpf_warn_invalid_xdp_action(ring->priv->dev, prog, act);
+		bpf_warn_invalid_xdp_action(priv->dev, prog, act);
 		fallthrough;
 	case XDP_ABORTED:
-		trace_xdp_exception(ring->priv->dev, prog, act);
+		trace_xdp_exception(priv->dev, prog, act);
 		page_pool_put_full_page(ring->page_pool, rx_page, true);
 		return XDP_ABORTED;
 	}
@@ -2537,9 +2679,15 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 {
 	struct bcmgenet_rx_ring *ring = container_of(napi,
 			struct bcmgenet_rx_ring, napi);
+	struct bcmgenet_priv *priv = ring->priv;
 	struct dim_sample dim_sample = {};
 	unsigned int work_done;
 
+	/* Reclaim completed XDP TX frames (ring 16 has no interrupt) */
+	if (priv->xdp_tx_ring.free_bds < priv->xdp_tx_ring.size)
+		bcmgenet_tx_reclaim(priv->dev,
+				    &priv->xdp_tx_ring, false);
+
 	work_done = bcmgenet_desc_rx(ring, budget);
 
 	if (work_done < budget && napi_complete_done(napi, work_done))
@@ -2770,10 +2918,11 @@ static void bcmgenet_init_rx_coalesce(struct bcmgenet_rx_ring *ring)
 
 /* Initialize a Tx ring along with corresponding hardware registers */
 static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
+				  struct bcmgenet_tx_ring *ring,
 				  unsigned int index, unsigned int size,
-				  unsigned int start_ptr, unsigned int end_ptr)
+				  unsigned int start_ptr,
+				  unsigned int end_ptr)
 {
-	struct bcmgenet_tx_ring *ring = &priv->tx_rings[index];
 	u32 words_per_bd = WORDS_PER_BD(priv);
 	u32 flow_period_val = 0;
 
@@ -2814,8 +2963,11 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
 	bcmgenet_tdma_ring_writel(priv, index, end_ptr * words_per_bd - 1,
 				  DMA_END_ADDR);
 
-	/* Initialize Tx NAPI */
-	netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
+	/* Initialize Tx NAPI for priority queues only; ring DESC_INDEX
+	 * (XDP TX) has its completions handled inline in RX NAPI.
+	 */
+	if (index != DESC_INDEX)
+		netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
 }
 
 static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
@@ -2827,7 +2979,7 @@ static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
 		.pool_size = ring->size,
 		.nid = NUMA_NO_NODE,
 		.dev = &priv->pdev->dev,
-		.dma_dir = DMA_FROM_DEVICE,
+		.dma_dir = DMA_BIDIRECTIONAL,
 		.offset = XDP_PACKET_HEADROOM,
 		.max_len = RX_BUF_LENGTH,
 	};
@@ -2961,6 +3113,7 @@ static int bcmgenet_tdma_disable(struct bcmgenet_priv *priv)
 
 	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
 	mask = (1 << (priv->hw_params->tx_queues + 1)) - 1;
+	mask |= BIT(DESC_INDEX);
 	mask = (mask << DMA_RING_BUF_EN_SHIFT) | DMA_EN;
 	reg &= ~mask;
 	bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
@@ -3006,14 +3159,18 @@ static int bcmgenet_rdma_disable(struct bcmgenet_priv *priv)
  * with queue 1 being the highest priority queue.
  *
  * Queue 0 is the default Tx queue with
- * GENET_Q0_TX_BD_CNT = 256 - 4 * 32 = 128 descriptors.
+ * GENET_Q0_TX_BD_CNT = 256 - 4 * 32 - 32 = 96 descriptors.
+ *
+ * Ring 16 (DESC_INDEX) is used for XDP TX with
+ * GENET_Q16_TX_BD_CNT = 32 descriptors.
  *
  * The transmit control block pool is then partitioned as follows:
- * - Tx queue 0 uses tx_cbs[0..127]
- * - Tx queue 1 uses tx_cbs[128..159]
- * - Tx queue 2 uses tx_cbs[160..191]
- * - Tx queue 3 uses tx_cbs[192..223]
- * - Tx queue 4 uses tx_cbs[224..255]
+ * - Tx queue 0 uses tx_cbs[0..95]
+ * - Tx queue 1 uses tx_cbs[96..127]
+ * - Tx queue 2 uses tx_cbs[128..159]
+ * - Tx queue 3 uses tx_cbs[160..191]
+ * - Tx queue 4 uses tx_cbs[192..223]
+ * - Tx queue 16 uses tx_cbs[224..255]
  */
 static void bcmgenet_init_tx_queues(struct net_device *dev)
 {
@@ -3026,7 +3183,8 @@ static void bcmgenet_init_tx_queues(struct net_device *dev)
 
 	/* Initialize Tx priority queues */
 	for (i = 0; i <= priv->hw_params->tx_queues; i++) {
-		bcmgenet_init_tx_ring(priv, i, end - start, start, end);
+		bcmgenet_init_tx_ring(priv, &priv->tx_rings[i],
+				      i, end - start, start, end);
 		start = end;
 		end += priv->hw_params->tx_bds_per_q;
 		dma_priority[DMA_PRIO_REG_INDEX(i)] |=
@@ -3034,13 +3192,19 @@ static void bcmgenet_init_tx_queues(struct net_device *dev)
 			<< DMA_PRIO_REG_SHIFT(i);
 	}
 
+	/* Initialize ring 16 (descriptor ring) for XDP TX */
+	bcmgenet_init_tx_ring(priv, &priv->xdp_tx_ring,
+			      DESC_INDEX, GENET_Q16_TX_BD_CNT,
+			      TOTAL_DESC - GENET_Q16_TX_BD_CNT, TOTAL_DESC);
+
 	/* Set Tx queue priorities */
 	bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0);
 	bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1);
 	bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2);
 
-	/* Configure Tx queues as descriptor rings */
+	/* Configure Tx queues as descriptor rings, including ring 16 */
 	ring_mask = (1 << (priv->hw_params->tx_queues + 1)) - 1;
+	ring_mask |= BIT(DESC_INDEX);
 	bcmgenet_tdma_writel(priv, ring_mask, DMA_RING_CFG);
 
 	/* Enable Tx rings */
@@ -3754,6 +3918,21 @@ static void bcmgenet_get_stats64(struct net_device *dev,
 		stats->tx_dropped += tx_dropped;
 	}
 
+	/* Include XDP TX ring (DESC_INDEX) stats */
+	tx_stats = &priv->xdp_tx_ring.stats64;
+	do {
+		start = u64_stats_fetch_begin(&tx_stats->syncp);
+		tx_bytes = u64_stats_read(&tx_stats->bytes);
+		tx_packets = u64_stats_read(&tx_stats->packets);
+		tx_errors = u64_stats_read(&tx_stats->errors);
+		tx_dropped = u64_stats_read(&tx_stats->dropped);
+	} while (u64_stats_fetch_retry(&tx_stats->syncp, start));
+
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets += tx_packets;
+	stats->tx_errors += tx_errors;
+	stats->tx_dropped += tx_dropped;
+
 	for (q = 0; q <= priv->hw_params->rx_queues; q++) {
 		rx_stats = &priv->rx_rings[q].stats64;
 		do {
@@ -4257,6 +4436,7 @@ static int bcmgenet_probe(struct platform_device *pdev)
 		u64_stats_init(&priv->rx_rings[i].stats64.syncp);
 	for (i = 0; i <= priv->hw_params->tx_queues; i++)
 		u64_stats_init(&priv->tx_rings[i].stats64.syncp);
+	u64_stats_init(&priv->xdp_tx_ring.stats64.syncp);
 
 	/* libphy will determine the link state */
 	netif_carrier_off(dev);
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 1459473ac1b0..8966d32efe2f 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -472,6 +472,8 @@ struct bcmgenet_rx_stats64 {
 
 struct enet_cb {
 	struct sk_buff      *skb;
+	struct xdp_frame    *xdpf;
+	bool                xdp_dma_map;
 	struct page         *rx_page;
 	unsigned int        rx_page_offset;
 	void __iomem *bd_addr;
@@ -611,6 +613,7 @@ struct bcmgenet_priv {
 	unsigned int num_tx_bds;
 
 	struct bcmgenet_tx_ring tx_rings[GENET_MAX_MQ_CNT + 1];
+	struct bcmgenet_tx_ring xdp_tx_ring;
 
 	/* receive variables */
 	void __iomem *rx_bds;
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 1/7] net: bcmgenet: convert RX path to page_pool
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	David S. Miller, Jakub Kicinski, Vikas Gupta, Rajashekar Hudumula,
	Bhargava Marreddy, Sasha Levin, Arnd Bergmann, Eric Biggers,
	linux-kernel
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Replace the per-packet __netdev_alloc_skb() + dma_map_single() in the
RX path with page_pool, which provides efficient page recycling and
DMA mapping management. This is a prerequisite for XDP support (which
requires stable page-backed buffers rather than SKB linear data).

Key changes:
- Create a page_pool per RX ring (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
- bcmgenet_rx_refill() allocates pages via page_pool_alloc_pages()
- bcmgenet_desc_rx() builds SKBs from pages via napi_build_skb() with
  skb_mark_for_recycle() for automatic page_pool return
- Buffer layout reserves XDP_PACKET_HEADROOM (256 bytes) before the HW
  RSB (64 bytes) + alignment pad (2 bytes) for future XDP headroom

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
---
 drivers/net/ethernet/broadcom/Kconfig         |   1 +
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 217 +++++++++++-------
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   4 +
 3 files changed, 143 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index dd164acafd01..a6c388dacba6 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -78,6 +78,7 @@ config BCMGENET
 	select BCM7XXX_PHY
 	select MDIO_BCM_UNIMAC
 	select DIMLIB
+	select PAGE_POOL
 	select BROADCOM_PHY if ARCH_BCM2835
 	help
 	  This driver supports the built-in Ethernet MACs found in the
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 482a31e7b72b..94732414b48b 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -52,6 +52,13 @@
 #define RX_BUF_LENGTH		2048
 #define SKB_ALIGNMENT		32
 
+/* Page pool RX buffer layout:
+ * XDP_PACKET_HEADROOM | RSB(64) + pad(2) | frame data | skb_shared_info
+ * The HW writes the 64B RSB + 2B alignment padding before the frame.
+ */
+#define GENET_RSB_PAD		(sizeof(struct status_64) + 2)
+#define GENET_RX_HEADROOM	(XDP_PACKET_HEADROOM + GENET_RSB_PAD)
+
 /* Tx/Rx DMA register offset, skip 256 descriptors */
 #define WORDS_PER_BD(p)		(p->hw_params->words_per_bd)
 #define DMA_DESC_SIZE		(WORDS_PER_BD(priv) * sizeof(u32))
@@ -1895,21 +1902,13 @@ static struct sk_buff *bcmgenet_free_tx_cb(struct device *dev,
 }
 
 /* Simple helper to free a receive control block's resources */
-static struct sk_buff *bcmgenet_free_rx_cb(struct device *dev,
-					   struct enet_cb *cb)
+static void bcmgenet_free_rx_cb(struct enet_cb *cb,
+				struct page_pool *pool)
 {
-	struct sk_buff *skb;
-
-	skb = cb->skb;
-	cb->skb = NULL;
-
-	if (dma_unmap_addr(cb, dma_addr)) {
-		dma_unmap_single(dev, dma_unmap_addr(cb, dma_addr),
-				 dma_unmap_len(cb, dma_len), DMA_FROM_DEVICE);
-		dma_unmap_addr_set(cb, dma_addr, 0);
+	if (cb->rx_page) {
+		page_pool_put_full_page(pool, cb->rx_page, false);
+		cb->rx_page = NULL;
 	}
-
-	return skb;
 }
 
 /* Unlocked version of the reclaim routine */
@@ -2248,46 +2247,30 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev)
 	goto out;
 }
 
-static struct sk_buff *bcmgenet_rx_refill(struct bcmgenet_priv *priv,
-					  struct enet_cb *cb)
+static int bcmgenet_rx_refill(struct bcmgenet_rx_ring *ring,
+			      struct enet_cb *cb)
 {
-	struct device *kdev = &priv->pdev->dev;
-	struct sk_buff *skb;
-	struct sk_buff *rx_skb;
+	struct bcmgenet_priv *priv = ring->priv;
 	dma_addr_t mapping;
+	struct page *page;
 
-	/* Allocate a new Rx skb */
-	skb = __netdev_alloc_skb(priv->dev, priv->rx_buf_len + SKB_ALIGNMENT,
-				 GFP_ATOMIC | __GFP_NOWARN);
-	if (!skb) {
+	page = page_pool_alloc_pages(ring->page_pool,
+				     GFP_ATOMIC);
+	if (!page) {
 		priv->mib.alloc_rx_buff_failed++;
 		netif_err(priv, rx_err, priv->dev,
-			  "%s: Rx skb allocation failed\n", __func__);
-		return NULL;
-	}
-
-	/* DMA-map the new Rx skb */
-	mapping = dma_map_single(kdev, skb->data, priv->rx_buf_len,
-				 DMA_FROM_DEVICE);
-	if (dma_mapping_error(kdev, mapping)) {
-		priv->mib.rx_dma_failed++;
-		dev_kfree_skb_any(skb);
-		netif_err(priv, rx_err, priv->dev,
-			  "%s: Rx skb DMA mapping failed\n", __func__);
-		return NULL;
+			  "%s: Rx page allocation failed\n", __func__);
+		return -ENOMEM;
 	}
 
-	/* Grab the current Rx skb from the ring and DMA-unmap it */
-	rx_skb = bcmgenet_free_rx_cb(kdev, cb);
+	/* page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
+	mapping = page_pool_get_dma_addr(page) + XDP_PACKET_HEADROOM;
 
-	/* Put the new Rx skb on the ring */
-	cb->skb = skb;
-	dma_unmap_addr_set(cb, dma_addr, mapping);
-	dma_unmap_len_set(cb, dma_len, priv->rx_buf_len);
+	cb->rx_page = page;
+	cb->rx_page_offset = XDP_PACKET_HEADROOM;
 	dmadesc_set_addr(priv, cb->bd_addr, mapping);
 
-	/* Return the current Rx skb to caller */
-	return rx_skb;
+	return 0;
 }
 
 /* bcmgenet_desc_rx - descriptor based rx process.
@@ -2339,25 +2322,28 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 	while ((rxpktprocessed < rxpkttoprocess) &&
 	       (rxpktprocessed < budget)) {
 		struct status_64 *status;
+		struct page *rx_page;
+		unsigned int rx_off;
 		__be16 rx_csum;
+		void *hard_start;
 
 		cb = &priv->rx_cbs[ring->read_ptr];
-		skb = bcmgenet_rx_refill(priv, cb);
 
-		if (unlikely(!skb)) {
+		/* Save the received page before refilling */
+		rx_page = cb->rx_page;
+		rx_off = cb->rx_page_offset;
+
+		if (bcmgenet_rx_refill(ring, cb)) {
 			BCMGENET_STATS64_INC(stats, dropped);
 			goto next;
 		}
 
-		status = (struct status_64 *)skb->data;
+		page_pool_dma_sync_for_cpu(ring->page_pool, rx_page, 0,
+					   RX_BUF_LENGTH);
+
+		hard_start = page_address(rx_page) + rx_off;
+		status = (struct status_64 *)hard_start;
 		dma_length_status = status->length_status;
-		if (dev->features & NETIF_F_RXCSUM) {
-			rx_csum = (__force __be16)(status->rx_csum & 0xffff);
-			if (rx_csum) {
-				skb->csum = (__force __wsum)ntohs(rx_csum);
-				skb->ip_summed = CHECKSUM_COMPLETE;
-			}
-		}
 
 		/* DMA flags and length are still valid no matter how
 		 * we got the Receive Status Vector (64B RSB or register)
@@ -2373,7 +2359,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 		if (unlikely(len > RX_BUF_LENGTH)) {
 			netif_err(priv, rx_status, dev, "oversized packet\n");
 			BCMGENET_STATS64_INC(stats, length_errors);
-			dev_kfree_skb_any(skb);
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
 			goto next;
 		}
 
@@ -2381,7 +2368,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 			netif_err(priv, rx_status, dev,
 				  "dropping fragmented packet!\n");
 			BCMGENET_STATS64_INC(stats, fragmented_errors);
-			dev_kfree_skb_any(skb);
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
 			goto next;
 		}
 
@@ -2409,24 +2397,47 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 						DMA_RX_RXER)) == DMA_RX_RXER)
 				u64_stats_inc(&stats->errors);
 			u64_stats_update_end(&stats->syncp);
-			dev_kfree_skb_any(skb);
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
 			goto next;
 		} /* error packet */
 
-		skb_put(skb, len);
+		/* Build SKB from the page - data starts at hard_start,
+		 * frame begins after RSB(64) + pad(2) = 66 bytes.
+		 */
+		skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
+		if (unlikely(!skb)) {
+			BCMGENET_STATS64_INC(stats, dropped);
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
+			goto next;
+		}
 
-		/* remove RSB and hardware 2bytes added for IP alignment */
-		skb_pull(skb, 66);
-		len -= 66;
+		skb_mark_for_recycle(skb);
+
+		/* Reserve the RSB + pad, then set the data length */
+		skb_reserve(skb, GENET_RSB_PAD);
+		__skb_put(skb, len - GENET_RSB_PAD);
 
 		if (priv->crc_fwd_en) {
-			skb_trim(skb, len - ETH_FCS_LEN);
-			len -= ETH_FCS_LEN;
+			skb_trim(skb, skb->len - ETH_FCS_LEN);
 		}
 
+		/* Set up checksum offload */
+		if (dev->features & NETIF_F_RXCSUM) {
+			rx_csum = (__force __be16)(status->rx_csum & 0xffff);
+			if (rx_csum) {
+				skb->csum = (__force __wsum)ntohs(rx_csum);
+				skb->ip_summed = CHECKSUM_COMPLETE;
+			}
+		}
+
+		len = skb->len;
 		bytes_processed += len;
 
-		/*Finish setting up the received SKB and send it to the kernel*/
+		/* Finish setting up the received SKB and send it to the
+		 * kernel.
+		 */
 		skb->protocol = eth_type_trans(skb, priv->dev);
 
 		u64_stats_update_begin(&stats->syncp);
@@ -2495,12 +2506,11 @@ static void bcmgenet_dim_work(struct work_struct *work)
 	dim->state = DIM_START_MEASURE;
 }
 
-/* Assign skb to RX DMA descriptor. */
+/* Assign page_pool pages to RX DMA descriptors. */
 static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
 				     struct bcmgenet_rx_ring *ring)
 {
 	struct enet_cb *cb;
-	struct sk_buff *skb;
 	int i;
 
 	netif_dbg(priv, hw, priv->dev, "%s\n", __func__);
@@ -2508,10 +2518,7 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
 	/* loop here for each buffer needing assign */
 	for (i = 0; i < ring->size; i++) {
 		cb = ring->cbs + i;
-		skb = bcmgenet_rx_refill(priv, cb);
-		if (skb)
-			dev_consume_skb_any(skb);
-		if (!cb->skb)
+		if (bcmgenet_rx_refill(ring, cb))
 			return -ENOMEM;
 	}
 
@@ -2520,16 +2527,18 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
 
 static void bcmgenet_free_rx_buffers(struct bcmgenet_priv *priv)
 {
-	struct sk_buff *skb;
+	struct bcmgenet_rx_ring *ring;
 	struct enet_cb *cb;
-	int i;
+	int q, i;
 
-	for (i = 0; i < priv->num_rx_bds; i++) {
-		cb = &priv->rx_cbs[i];
-
-		skb = bcmgenet_free_rx_cb(&priv->pdev->dev, cb);
-		if (skb)
-			dev_consume_skb_any(skb);
+	for (q = 0; q <= priv->hw_params->rx_queues; q++) {
+		ring = &priv->rx_rings[q];
+		if (!ring->page_pool)
+			continue;
+		for (i = 0; i < ring->size; i++) {
+			cb = ring->cbs + i;
+			bcmgenet_free_rx_cb(cb, ring->page_pool);
+		}
 	}
 }
 
@@ -2747,6 +2756,31 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
 	netif_napi_add_tx(priv->dev, &ring->napi, bcmgenet_tx_poll);
 }
 
+static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
+					struct bcmgenet_rx_ring *ring)
+{
+	struct page_pool_params pp_params = {
+		.order = 0,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+		.pool_size = ring->size,
+		.nid = NUMA_NO_NODE,
+		.dev = &priv->pdev->dev,
+		.dma_dir = DMA_FROM_DEVICE,
+		.offset = XDP_PACKET_HEADROOM,
+		.max_len = RX_BUF_LENGTH,
+	};
+	int err;
+
+	ring->page_pool = page_pool_create(&pp_params);
+	if (IS_ERR(ring->page_pool)) {
+		err = PTR_ERR(ring->page_pool);
+		ring->page_pool = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
 /* Initialize a RDMA ring */
 static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
 				 unsigned int index, unsigned int size,
@@ -2754,7 +2788,7 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
 {
 	struct bcmgenet_rx_ring *ring = &priv->rx_rings[index];
 	u32 words_per_bd = WORDS_PER_BD(priv);
-	int ret;
+	int ret, i;
 
 	ring->priv = priv;
 	ring->index = index;
@@ -2765,10 +2799,19 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
 	ring->cb_ptr = start_ptr;
 	ring->end_ptr = end_ptr - 1;
 
-	ret = bcmgenet_alloc_rx_buffers(priv, ring);
+	ret = bcmgenet_rx_ring_create_pool(priv, ring);
 	if (ret)
 		return ret;
 
+	ret = bcmgenet_alloc_rx_buffers(priv, ring);
+	if (ret) {
+		for (i = 0; i < ring->size; i++)
+			bcmgenet_free_rx_cb(ring->cbs + i, ring->page_pool);
+		page_pool_destroy(ring->page_pool);
+		ring->page_pool = NULL;
+		return ret;
+	}
+
 	bcmgenet_init_dim(ring, bcmgenet_dim_work);
 	bcmgenet_init_rx_coalesce(ring);
 
@@ -2961,6 +3004,20 @@ static void bcmgenet_fini_rx_napi(struct bcmgenet_priv *priv)
 	}
 }
 
+static void bcmgenet_destroy_rx_page_pools(struct bcmgenet_priv *priv)
+{
+	struct bcmgenet_rx_ring *ring;
+	unsigned int i;
+
+	for (i = 0; i <= priv->hw_params->rx_queues; ++i) {
+		ring = &priv->rx_rings[i];
+		if (ring->page_pool) {
+			page_pool_destroy(ring->page_pool);
+			ring->page_pool = NULL;
+		}
+	}
+}
+
 /* Initialize Rx queues
  *
  * Queues 0-15 are priority queues. Hardware Filtering Block (HFB) can be
@@ -3032,6 +3089,7 @@ static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
 	}
 
 	bcmgenet_free_rx_buffers(priv);
+	bcmgenet_destroy_rx_page_pools(priv);
 	kfree(priv->rx_cbs);
 	kfree(priv->tx_cbs);
 }
@@ -3108,6 +3166,7 @@ static int bcmgenet_init_dma(struct bcmgenet_priv *priv, bool flush_rx)
 	if (ret) {
 		netdev_err(priv->dev, "failed to initialize Rx queues\n");
 		bcmgenet_free_rx_buffers(priv);
+		bcmgenet_destroy_rx_page_pools(priv);
 		kfree(priv->rx_cbs);
 		kfree(priv->tx_cbs);
 		return ret;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 9e4110c7fdf6..11a0ec563a89 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -15,6 +15,7 @@
 #include <linux/phy.h>
 #include <linux/dim.h>
 #include <linux/ethtool.h>
+#include <net/page_pool/helpers.h>
 
 #include "../unimac.h"
 
@@ -469,6 +470,8 @@ struct bcmgenet_rx_stats64 {
 
 struct enet_cb {
 	struct sk_buff      *skb;
+	struct page         *rx_page;
+	unsigned int        rx_page_offset;
 	void __iomem *bd_addr;
 	DEFINE_DMA_UNMAP_ADDR(dma_addr);
 	DEFINE_DMA_UNMAP_LEN(dma_len);
@@ -575,6 +578,7 @@ struct bcmgenet_rx_ring {
 	struct bcmgenet_net_dim dim;
 	u32		rx_max_coalesced_frames;
 	u32		rx_coalesce_usecs;
+	struct page_pool *page_pool;
 	struct bcmgenet_priv *priv;
 };
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Register an xdp_rxq_info per RX ring and associate it with the ring's
page_pool via MEM_TYPE_PAGE_POOL. This is required infrastructure for
XDP program execution: the XDP framework needs to know the memory model
backing each RX queue for correct page lifecycle management.

No functional change - XDP programs are not yet attached or executed.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 18 ++++++++++++++++++
 drivers/net/ethernet/broadcom/genet/bcmgenet.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 94732414b48b..dd00196b9d4b 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2778,7 +2778,23 @@ static int bcmgenet_rx_ring_create_pool(struct bcmgenet_priv *priv,
 		return err;
 	}
 
+	err = xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, ring->index, 0);
+	if (err)
+		goto err_free_pp;
+
+	err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
+					 ring->page_pool);
+	if (err)
+		goto err_unreg_rxq;
+
 	return 0;
+
+err_unreg_rxq:
+	xdp_rxq_info_unreg(&ring->xdp_rxq);
+err_free_pp:
+	page_pool_destroy(ring->page_pool);
+	ring->page_pool = NULL;
+	return err;
 }
 
 /* Initialize a RDMA ring */
@@ -2807,6 +2823,7 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
 	if (ret) {
 		for (i = 0; i < ring->size; i++)
 			bcmgenet_free_rx_cb(ring->cbs + i, ring->page_pool);
+		xdp_rxq_info_unreg(&ring->xdp_rxq);
 		page_pool_destroy(ring->page_pool);
 		ring->page_pool = NULL;
 		return ret;
@@ -3012,6 +3029,7 @@ static void bcmgenet_destroy_rx_page_pools(struct bcmgenet_priv *priv)
 	for (i = 0; i <= priv->hw_params->rx_queues; ++i) {
 		ring = &priv->rx_rings[i];
 		if (ring->page_pool) {
+			xdp_rxq_info_unreg(&ring->xdp_rxq);
 			page_pool_destroy(ring->page_pool);
 			ring->page_pool = NULL;
 		}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 11a0ec563a89..82a6d29f481d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -16,6 +16,7 @@
 #include <linux/dim.h>
 #include <linux/ethtool.h>
 #include <net/page_pool/helpers.h>
+#include <net/xdp.h>
 
 #include "../unimac.h"
 
@@ -579,6 +580,7 @@ struct bcmgenet_rx_ring {
 	u32		rx_max_coalesced_frames;
 	u32		rx_coalesce_usecs;
 	struct page_pool *page_pool;
+	struct xdp_rxq_info xdp_rxq;
 	struct bcmgenet_priv *priv;
 };
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 7/7] net: bcmgenet: reject MTU changes incompatible with XDP
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	Mohsin Bashir, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Add a minimal ndo_change_mtu that rejects MTU values too large for
single-page XDP buffers when an XDP program is attached. Without this,
users could change the MTU at runtime and break the XDP buffer layout.

When no XDP program is attached, any MTU change is accepted, matching
the existing behavior without ndo_change_mtu.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Mohsin Bashir <hmohsin@meta.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index e74494be7a23..e74cc055210c 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -4087,6 +4087,20 @@ static int bcmgenet_xdp_xmit(struct net_device *dev, int num_frames,
 	return sent;
 }
 
+static int bcmgenet_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	if (priv->xdp_prog && new_mtu > PAGE_SIZE - GENET_RX_HEADROOM -
+	    SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) {
+		netdev_warn(dev, "MTU too large for single-page XDP buffer\n");
+		return -EINVAL;
+	}
+
+	WRITE_ONCE(dev->mtu, new_mtu);
+	return 0;
+}
+
 static const struct net_device_ops bcmgenet_netdev_ops = {
 	.ndo_open		= bcmgenet_open,
 	.ndo_stop		= bcmgenet_close,
@@ -4097,6 +4111,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
 	.ndo_eth_ioctl		= phy_do_ioctl_running,
 	.ndo_set_features	= bcmgenet_set_features,
 	.ndo_get_stats64	= bcmgenet_get_stats64,
+	.ndo_change_mtu		= bcmgenet_change_mtu,
 	.ndo_change_carrier	= bcmgenet_change_carrier,
 	.ndo_bpf		= bcmgenet_xdp,
 	.ndo_xdp_xmit		= bcmgenet_xdp_xmit,
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Add XDP_REDIRECT support and implement ndo_xdp_xmit for receiving
redirected frames from other devices.

XDP_REDIRECT calls xdp_do_redirect() in the RX path with
xdp_do_flush() once per NAPI poll cycle. ndo_xdp_xmit batches frames
into ring 16 under a single spinlock acquisition.

Advertise NETDEV_XDP_ACT_REDIRECT and NETDEV_XDP_ACT_NDO_XMIT in
xdp_features.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 87 ++++++++++++++++---
 1 file changed, 73 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 3f3682e39267..f94e9e287fe9 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2326,22 +2326,22 @@ static struct sk_buff *bcmgenet_xdp_build_skb(struct bcmgenet_rx_ring *ring,
 	return skb;
 }
 
+/* Submit a single XDP frame to the TX ring. Caller must hold ring->lock.
+ * Returns true on success. Does not ring the doorbell - caller must
+ * write TDMA_PROD_INDEX after batching.
+ */
 static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
+				     struct bcmgenet_tx_ring *ring,
 				     struct xdp_frame *xdpf, bool dma_map)
 {
-	struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
 	struct device *kdev = &priv->pdev->dev;
 	struct enet_cb *tx_cb_ptr;
 	dma_addr_t mapping;
 	unsigned int dma_len;
 	u32 len_stat;
 
-	spin_lock(&ring->lock);
-
-	if (ring->free_bds < 1) {
-		spin_unlock(&ring->lock);
+	if (ring->free_bds < 1)
 		return false;
-	}
 
 	tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
 
@@ -2355,7 +2355,6 @@ static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
 		 */
 		if (unlikely(xdpf->headroom < sizeof(struct status_64))) {
 			bcmgenet_put_txcb(priv, ring);
-			spin_unlock(&ring->lock);
 			return false;
 		}
 
@@ -2369,7 +2368,6 @@ static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
 			tx_cb_ptr->skb = NULL;
 			tx_cb_ptr->xdpf = NULL;
 			bcmgenet_put_txcb(priv, ring);
-			spin_unlock(&ring->lock);
 			return false;
 		}
 	} else {
@@ -2401,12 +2399,14 @@ static bool bcmgenet_xdp_xmit_frame(struct bcmgenet_priv *priv,
 	ring->prod_index++;
 	ring->prod_index &= DMA_P_INDEX_MASK;
 
+	return true;
+}
+
+static void bcmgenet_xdp_ring_doorbell(struct bcmgenet_priv *priv,
+				       struct bcmgenet_tx_ring *ring)
+{
 	bcmgenet_tdma_ring_writel(priv, ring->index, ring->prod_index,
 				  TDMA_PROD_INDEX);
-
-	spin_unlock(&ring->lock);
-
-	return true;
 }
 
 static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
@@ -2415,6 +2415,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 				     struct page *rx_page)
 {
 	struct bcmgenet_priv *priv = ring->priv;
+	struct bcmgenet_tx_ring *tx_ring;
 	struct xdp_frame *xdpf;
 	unsigned int act;
 
@@ -2449,11 +2450,25 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 						true);
 			return XDP_DROP;
 		}
-		if (unlikely(!bcmgenet_xdp_xmit_frame(priv, xdpf, false))) {
+
+		tx_ring = &priv->xdp_tx_ring;
+		spin_lock(&tx_ring->lock);
+		if (unlikely(!bcmgenet_xdp_xmit_frame(priv, tx_ring,
+						      xdpf, false))) {
+			spin_unlock(&tx_ring->lock);
 			xdp_return_frame_rx_napi(xdpf);
 			return XDP_DROP;
 		}
+		bcmgenet_xdp_ring_doorbell(priv, tx_ring);
+		spin_unlock(&tx_ring->lock);
 		return XDP_TX;
+	case XDP_REDIRECT:
+		if (unlikely(xdp_do_redirect(priv->dev, xdp, prog))) {
+			page_pool_put_full_page(ring->page_pool, rx_page,
+						true);
+			return XDP_DROP;
+		}
+		return XDP_REDIRECT;
 	case XDP_DROP:
 		page_pool_put_full_page(ring->page_pool, rx_page, true);
 		return XDP_DROP;
@@ -2477,6 +2492,7 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 	struct bcmgenet_priv *priv = ring->priv;
 	struct net_device *dev = priv->dev;
 	struct bpf_prog *xdp_prog;
+	bool xdp_flush = false;
 	struct enet_cb *cb;
 	struct sk_buff *skb;
 	u32 dma_length_status;
@@ -2617,6 +2633,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 				 GENET_RX_HEADROOM, pkt_len, true);
 
 		xdp_act = bcmgenet_run_xdp(ring, xdp_prog, &xdp, rx_page);
+		if (xdp_act == XDP_REDIRECT)
+			xdp_flush = true;
 		if (xdp_act != XDP_PASS)
 			goto next;
 
@@ -2668,6 +2686,9 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
 		bcmgenet_rdma_ring_writel(priv, ring->index, ring->c_index, RDMA_CONS_INDEX);
 	}
 
+	if (xdp_flush)
+		xdp_do_flush();
+
 	ring->dim.bytes = bytes_processed;
 	ring->dim.packets = rxpktprocessed;
 
@@ -3998,10 +4019,16 @@ static int bcmgenet_xdp_setup(struct net_device *dev,
 		return -EOPNOTSUPP;
 	}
 
+	if (!prog)
+		xdp_features_clear_redirect_target(dev);
+
 	old_prog = xchg(&priv->xdp_prog, prog);
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
+	if (prog)
+		xdp_features_set_redirect_target(dev, false);
+
 	return 0;
 }
 
@@ -4015,6 +4042,36 @@ static int bcmgenet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
+static int bcmgenet_xdp_xmit(struct net_device *dev, int num_frames,
+			     struct xdp_frame **frames, u32 flags)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct bcmgenet_tx_ring *ring = &priv->xdp_tx_ring;
+	int sent = 0;
+	int i;
+
+	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+		return -EINVAL;
+
+	if (unlikely(!netif_running(dev)))
+		return -ENETDOWN;
+
+	spin_lock(&ring->lock);
+
+	for (i = 0; i < num_frames; i++) {
+		if (!bcmgenet_xdp_xmit_frame(priv, ring, frames[i], true))
+			break;
+		sent++;
+	}
+
+	if (sent)
+		bcmgenet_xdp_ring_doorbell(priv, ring);
+
+	spin_unlock(&ring->lock);
+
+	return sent;
+}
+
 static const struct net_device_ops bcmgenet_netdev_ops = {
 	.ndo_open		= bcmgenet_open,
 	.ndo_stop		= bcmgenet_close,
@@ -4027,6 +4084,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
 	.ndo_get_stats64	= bcmgenet_get_stats64,
 	.ndo_change_carrier	= bcmgenet_change_carrier,
 	.ndo_bpf		= bcmgenet_xdp,
+	.ndo_xdp_xmit		= bcmgenet_xdp_xmit,
 };
 
 /* GENET hardware parameters/characteristics */
@@ -4329,7 +4387,8 @@ static int bcmgenet_probe(struct platform_device *pdev)
 			 NETIF_F_RXCSUM;
 	dev->hw_features |= dev->features;
 	dev->vlan_features |= dev->features;
-	dev->xdp_features = NETDEV_XDP_ACT_BASIC;
+	dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
+			    NETDEV_XDP_ACT_NDO_XMIT;
 
 	netdev_sw_irq_coalesce_default_on(dev);
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 6/7] net: bcmgenet: add XDP statistics counters
From: Nicolai Buchwitz @ 2026-04-16  5:47 UTC (permalink / raw)
  To: netdev
  Cc: Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Nicolai Buchwitz,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260416054743.1289191-1-nb@tipi-net.de>

Expose per-action XDP counters via ethtool -S: xdp_pass, xdp_drop,
xdp_tx, xdp_tx_err, xdp_redirect, and xdp_redirect_err.

These use the existing soft MIB infrastructure and are incremented in
bcmgenet_run_xdp() alongside the existing driver statistics.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 15 +++++++++++++++
 drivers/net/ethernet/broadcom/genet/bcmgenet.h |  6 ++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index f94e9e287fe9..e74494be7a23 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1169,6 +1169,13 @@ static const struct bcmgenet_stats bcmgenet_gstrings_stats[] = {
 	STAT_GENET_SOFT_MIB("tx_realloc_tsb", mib.tx_realloc_tsb),
 	STAT_GENET_SOFT_MIB("tx_realloc_tsb_failed",
 			    mib.tx_realloc_tsb_failed),
+	/* XDP counters */
+	STAT_GENET_SOFT_MIB("xdp_pass", mib.xdp_pass),
+	STAT_GENET_SOFT_MIB("xdp_drop", mib.xdp_drop),
+	STAT_GENET_SOFT_MIB("xdp_tx", mib.xdp_tx),
+	STAT_GENET_SOFT_MIB("xdp_tx_err", mib.xdp_tx_err),
+	STAT_GENET_SOFT_MIB("xdp_redirect", mib.xdp_redirect),
+	STAT_GENET_SOFT_MIB("xdp_redirect_err", mib.xdp_redirect_err),
 	/* Per TX queues */
 	STAT_GENET_Q(0),
 	STAT_GENET_Q(1),
@@ -2426,6 +2433,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 
 	switch (act) {
 	case XDP_PASS:
+		priv->mib.xdp_pass++;
 		return XDP_PASS;
 	case XDP_TX:
 		/* Prepend a zeroed TSB (Transmit Status Block).  The GENET
@@ -2438,6 +2446,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 		    sizeof(struct status_64) + sizeof(struct xdp_frame)) {
 			page_pool_put_full_page(ring->page_pool, rx_page,
 						true);
+			priv->mib.xdp_tx_err++;
 			return XDP_DROP;
 		}
 		xdp->data -= sizeof(struct status_64);
@@ -2457,19 +2466,24 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 						      xdpf, false))) {
 			spin_unlock(&tx_ring->lock);
 			xdp_return_frame_rx_napi(xdpf);
+			priv->mib.xdp_tx_err++;
 			return XDP_DROP;
 		}
 		bcmgenet_xdp_ring_doorbell(priv, tx_ring);
 		spin_unlock(&tx_ring->lock);
+		priv->mib.xdp_tx++;
 		return XDP_TX;
 	case XDP_REDIRECT:
 		if (unlikely(xdp_do_redirect(priv->dev, xdp, prog))) {
+			priv->mib.xdp_redirect_err++;
 			page_pool_put_full_page(ring->page_pool, rx_page,
 						true);
 			return XDP_DROP;
 		}
+		priv->mib.xdp_redirect++;
 		return XDP_REDIRECT;
 	case XDP_DROP:
+		priv->mib.xdp_drop++;
 		page_pool_put_full_page(ring->page_pool, rx_page, true);
 		return XDP_DROP;
 	default:
@@ -2477,6 +2491,7 @@ static unsigned int bcmgenet_run_xdp(struct bcmgenet_rx_ring *ring,
 		fallthrough;
 	case XDP_ABORTED:
 		trace_xdp_exception(priv->dev, prog, act);
+		priv->mib.xdp_drop++;
 		page_pool_put_full_page(ring->page_pool, rx_page, true);
 		return XDP_ABORTED;
 	}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 8966d32efe2f..c4e85c185702 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -156,6 +156,12 @@ struct bcmgenet_mib_counters {
 	u32	tx_dma_failed;
 	u32	tx_realloc_tsb;
 	u32	tx_realloc_tsb_failed;
+	u32	xdp_pass;
+	u32	xdp_drop;
+	u32	xdp_tx;
+	u32	xdp_tx_err;
+	u32	xdp_redirect;
+	u32	xdp_redirect_err;
 };
 
 struct bcmgenet_tx_stats64 {
-- 
2.51.0


^ permalink raw reply related

* Re: [EXTERNAL] Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
From: Antony Antony @ 2026-04-16  5:51 UTC (permalink / raw)
  To: Yan Yan
  Cc: antony.antony, Aakash Kumar Shankarappa, Nathan Harold,
	Tobias Brunner, Steffen Klassert, paul@nohats.ca,
	netdev@vger.kernel.org, Herbert Xu, David S . Miller,
	Eric Dumazet, Jakub Kicinski, pabeni@redhat.com, horms@kernel.org,
	akamluddin@marvell.com, greg@kroah.com
In-Reply-To: <CADHa2dA+Yq_K2A=Pk131yYYOX3Btncbp4Q+9UoTu_7egLPJhQg@mail.gmail.com>


On Wed, Apr 08, 2026 at 15:49:00 -0700, Yan Yan wrote:
>    Hi Antony,
>    The patch looks good to me, and I verified that it fixes the
>    regression.

I rebased the patch to ipsec and sent as RFC ipsec. I am sending it as
Fix, instead of ipsec-next.

I could not test it easily both.  
Aakash and Yan would you like to test it and may be send tag, that may motivate Steffen to accept it instead of waiting for me to test it.

regards,
-antony

> 
>    On Tue, Mar 31, 2026 at 3:47 AM Antony Antony
>    <[1]antony.antony@secunet.com> wrote:
> 
>      Hi,
>      I have tweaked the patch a bit more, uniqueness is only when
>      x->dir == XFRM_SA_DIR_IN. See the attached patch.
>      I have added tags Fixes:
>      and
>      Reported-by: Yan Yan <[2]evitayan@google.com>
>      Yan, are you ok with this?
>      So fart I don't have any tests for this, so more tags welcome:)
>      regards,
>      -antony
>      PS: recent libreswan is setting direction. Thta should not be
>      problem.
>      On Mon, Mar 30, 2026 at 20:34:07 +0000, Aakash Kumar Shankarappa
>      wrote:
>      >    Hi Antony,
>      >
>      >    Thanks for the patch. Yes the x->dir based gating approach
>      looks good
>      >    to me and it works for Marvell.
>      >
>      >    Also this seems like the right direction. It preserves backward
>      >    compatibility for existing users, while still allowing strict
>      RFC 4301
>      >    complaint SPI uniqueness as an opt-in feature. Anybody who
>      wants the
>      >    stricter behaviour can upgrade to Strongswan 6.0.0+ and
>      leverage
>      >    XFRM_SA_DIR_IN.
>      >
>      >    Thanks, Aakash
>      >
>      >    From: Antony Antony <[3]antony.antony@secunet.com>
>      >    Date: Monday, 30 March 2026 at 10:24 PM
>      >    To: Yan Yan <[4]evitayan@google.com>
>      >    Cc: Nathan Harold <[5]nharold@google.com>, Tobias Brunner
>      >    <[6]tobias@strongswan.org>, [7]antony.antony@secunet.com
>      >    <[8]antony.antony@secunet.com>, Steffen Klassert
>      >    <[9]steffen.klassert@secunet.com>, [10]paul@nohats.ca
>      <[11]paul@nohats.ca>,
>      >    [12]netdev@vger.kernel.org <[13]netdev@vger.kernel.org>,
>      Herbert Xu
>      >    <[14]herbert@gondor.apana.org.au>, David S . Miller
>      <[15]davem@davemloft.net>,
>      >    Eric Dumazet <[16]edumazet@google.com>, Jakub Kicinski
>      <[17]kuba@kernel.org>,
>      >    [18]pabeni@redhat.com <[19]pabeni@redhat.com>,
>      [20]horms@kernel.org
>      >    <[21]horms@kernel.org>, Aakash Kumar Shankarappa
>      >    <[22]saakashkumar@marvell.com>, [23]akamluddin@marvell.com
>      >    <[24]akamluddin@marvell.com>, [25]greg@kroah.com
>      <[26]greg@kroah.com>
>      >    Subject: [EXTERNAL] Re: [REGRESSION] Discussion on "xfrm:
>      Duplicate SPI
>      >    Handling"
>      >    Prioritize security for external emails:
>      >    Confirm sender and content safety before clicking links or
>      opening
>      >    attachments
>      >    [1]Report Suspicious
>      >
>      >
>      >    Hi, I looked into this. I feel a simple solution is use x->dir
>      as
>      >    Nathan proposed. When dir is not set we get pre commit
>      94f39804d891
>      >    ("xfrm: Duplicate SPI Handling") behaviour. When XFRM_SA_DIR is
>      set
>      >    alloc_spi() returns per direction unique spi. Another benfit
>      is, this
>      >    would also keep PF_KEY use case as it was before that comit.
>      Here is
>      >    simple RFC patch attached. How does this look? strongswan
>      6.0.0, from
>      >    Dec 2024, sets x->dir. Aakash would this work for for marvell?
>      regads,
>      >    -antony On Fri, Mar 27, 2026 at 17:05:13 -0700, Yan Yan wrote:
>      > Hi
>      >    all, > I wanted to send a friendly ping to see if we are
>      aligning on
>      >    making > the strict global SPI uniqueness requirement optional,
>      perhaps
>      >    via a > toggle or by leveraging the XFRM_SA_DIR attribute as
>      previously
>      >    > discussed. > Are there any other questions or concerns
>      regarding this
>      >    approach, or > anything else we should clarify to ensure
>      backward
>      >    compatibility while > meeting the needs of modern standards? >
>      Best, >
>      >    Yan > > On Tue, Feb 24, 2026 at 3:53 PM Nathan Harold
>      >    <[1][27]nharold@google.com> > wrote: > > > That should still be
>      allowed
>      >    when using the intended APIs (i.e. > ALLOCSPI > > for the
>      inbound and
>      >    NEWSA for the outbound SA). ALLOCSPI might > enforce > > a
>      unique SPI
>      >    without considering the address, as that's intended > for > >
>      local,
>      >    inbound SAs, where the kernel has full control (looking at >
>      the > >
>      >    the patch, it's certainly not ideal, as it goes through all >
>      installed
>      >    > > SAs to find a duplicate and it prevents an inbound SPI that
>      >
>      >    matches an > > existing outbound SPI - I guess that could be
>      resolved
>      >    by using > separate > > tables for in- and outbound SAs). But
>      that must
>      >    not prevent > installing > > outbound SAs with the same SPI to
>      another
>      >    peer using NEWSA, which > still > > uses a hash that includes
>      the
>      >    destination address (that must > always be > > the case because
>      peers
>      >    are free to allocate whatever SPI they > want). > Agreed that
>      there are
>      >    some unfortunate limitations with the current > patch. Keying
>      off the
>      >    inclusion of XFRM_SA_DIR would resolve the > issue > you noted
>      >    (conflating inbound and outbound SPIs) and function as an >
>      opt-in for
>      >    this enforcement. Whatever the mechanism though, the new >
>      behavior
>      >    should be opt-in rather than opt-out in order to maintain >
>      backwards
>      >    compatibility. > > In my opinion, you are using the API
>      incorrectly...
>      >    I also > > don't think there are any benefits in that
>      "consistent
>      >    larval > lifecycle" > > (if you found any, please let us know).
>      > The
>      >    Android architecture is multi-tenant and allows userspace apps
>      > to >
>      >    establish SAs. At the time we designed it, this felt like the >
>      >    cleanest > way to facilitate leak-free resource management
>      because the
>      >    chain of > associations between kernel resources could be
>      symmetrical
>      >    (and > managing them was already quite complicated). Mea culpa
>      >    (Nathan). > But, > correctly or not, it has/had worked for many
>      years.
>      >    > > By the way, are you using the min/max option for inbound
>      SAs as >
>      >    well, > > with an SPI generated in userland? That would seem
>      like a >
>      >    violation of > > the intention of the API as well (i.e. letting
>      the
>      >    kernel control > the > > local SPIs). > We provide following
>      two
>      >    Android APIs for app developers: >
>      >
>      [2][2][28]https://urldefense.proofpoint.com/v2/url?u=https-3A__devel
>      oper.an
>      >
>      droid.com_reference_android_net_IpSecManager-23&d=DwIDaQ&c=nKjWec2b6
>      R0m
>      >
>      OyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKyp
>      JWI
>      >
>      SL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=9txNFXC-wFiRF
>      V_Q
>      >    JlIQLW2AWSbERIOWcGHJfuQP2ZA&e= >
>      >    allocateSecurityParameterIndex(java.net.InetAddress) >
>      >
>      [3][3][29]https://urldefense.proofpoint.com/v2/url?u=https-3A__devel
>      oper.an
>      >
>      droid.com_reference_android_net_IpSecManager-23&d=DwIDaQ&c=nKjWec2b6
>      R0m
>      >
>      OyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKyp
>      JWI
>      >
>      SL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=9txNFXC-wFiRF
>      V_Q
>      >    JlIQLW2AWSbERIOWcGHJfuQP2ZA&e= >
>      >    allocateSecurityParameterIndex(java.net.InetAddress,%20int) >
>      Indeed,
>      >    allocateSecurityParameterIndex is direction-agnostic; both >
>      overloads
>      >    are implemented internally by including the min and max >
>      values in
>      >    ALLOCSPI. For the variant where app developers provide a >
>      specific SPI
>      >    (which is also useful in testing), Android simply sets > both
>      the min
>      >    and max parameters to that exact value. Our > understanding >
>      of
>      >    #xfrm_alloc_spi is that min/max are required for ALLOCSPI, and
>      >
>      >    otherwise ENOENT will be returned. > Note that we also use the
>      DADDR as
>      >    a mandatory part of the tuple > because of the issue mentioned
>      above:
>      >    SPIs are only unique in > conjunction with a DADDR, regardless
>      of
>      >    direction, and accordingly, > that’s how Android is expecting
>      the
>      >    uniqueness requirement be > enforced. In this way, 5 duplicate
>      SPIs can
>      >    be used on 5 unique IP > addresses on the same machine;
>      therefore, a
>      >    strict "SPI only" > interpretation for ALLOCSPI (or SPI
>      handling in
>      >    general) is curious. > We feel that ALLOCSPI should really
>      enforce the
>      >    same uniqueness > requirements as the SAD. > Best, > Nathan and
>      Yan >
>      >    -Nathan > On Wed, Feb 18, 2026 at 12:42 AM Tobias Brunner >
>      >    <[4][30]tobias@strongswan.org> wrote: > > > > Hi Yan, > > > > >
>      For every
>      >    inbound SA, we allocate SPIs before negotiation. For > > >
>      outbound
>      >    SAs, we allocate SPIs once requested by the peer. We > only > >
>      >
>      >    require the (SPI, destination address) combo to be unique.
>      Thus, > we >
>      >    > > may have an inbound and outbound SA sharing an SPI with >
>      different
>      >    > > > destinations, or multiple outbound SAs to different peers
>      >
>      >    sharing an > > > SPI. > > > > That should still be allowed when
>      using
>      >    the intended APIs (i.e. > ALLOCSPI > > for the inbound and
>      NEWSA for
>      >    the outbound SA). ALLOCSPI might > enforce > > a unique SPI
>      without
>      >    considering the address, as that's intended > for > > local,
>      inbound
>      >    SAs, where the kernel has full control (looking at > the > >
>      the patch,
>      >    it's certainly not ideal, as it goes through all > installed >
>      > SAs to
>      >    find a duplicate and it prevents an inbound SPI that > matches
>      an > >
>      >    existing outbound SPI - I guess that could be resolved by using
>      >
>      >    separate > > tables for in- and outbound SAs). But that must
>      not
>      >    prevent > installing > > outbound SAs with the same SPI to
>      another peer
>      >    using NEWSA, which > still > > uses a hash that includes the
>      >    destination address (that must > always be > > the case because
>      peers
>      >    are free to allocate whatever SPI they > want). > > > > >> If
>      so, why
>      >    would you use ALLOCSPI and not just install the > outbound SA?
>      Is it to
>      >    avoid differences for in- and outbound SAs > (ALLOCSPI+UPDSA
>      vs.
>      >    NEWSA)?" > > > > > > Exactly—it is primarily for code symmetry.
>      By
>      >    using ALLOCSPI + > UPDSA > > > for both directions, we maintain
>      a
>      >    consistent larval lifecycle > and > > > make it easier to
>      maintain. > >
>      >    > > In my opinion, you are using the API incorrectly. ALLOCSPI
>      is >
>      >    intended > > to allocate a free local SPI for an inbound SA.
>      That is,
>      >    reserve > it > > before and while the details of the SA are
>      negotiated
>      >    with the > peer > > using IKE. This step isn't necessary for
>      outbound
>      >    SAs and forcing > such > > an allocation, after all the details
>      are
>      >    known, to the responder's > SPI > > (which I assume you do via
>      min/max
>      >    option) doesn't feel right. I > also > > don't think there are
>      any
>      >    benefits in that "consistent larval > lifecycle" > > (if you
>      found any,
>      >    please let us know). And the difference > between > > UPDSA and
>      NEWSA
>      >    is the nlmsg_type (there are some attributes that > are > >
>      different
>      >    for in- and outbound SAs, especially if you set the > direction
>      > > in
>      >    newer kernels, but that's the case regardless of the message >
>      type). >
>      >    > > > By the way, are you using the min/max option for inbound
>      SAs as >
>      >    well, > > with an SPI generated in userland? That would seem
>      like a >
>      >    violation of > > the intention of the API as well (i.e. letting
>      the
>      >    kernel control > the > > local SPIs). > > > > As the XFRM API
>      basically
>      >    mirrors PF_KEYv2 here, you can find more > about > > the two
>      ways to
>      >    install SAs in RFC 2367 (SADB_GETSPI/UPDATE vs. > SADB_ADD). >
>      > > >
>      >    Regards, > > Tobias > > > > -- > > -- > Best, > Yan > >
>      References > >
>      >    1. mailto:[31]nharold@google.com > 2.
>      >
>      [4][32]https://urldefense.proofpoint.com/v2/url?u=https-3A__develope
>      r.andro
>      >
>      id.com_reference_android_net_IpSecManager-23allocateSecurityParamete
>      rIn
>      >
>      dex-28java.net.InetAddress-29&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=r6
>      Wzn
>      >
>      5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkfA2yD1_H
>      51r
>      >
>      d5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=UtpwkuoswT-6aK0he6dSS-0dvZfIcRj
>      bfj
>      >    _Eu4A-c6E&e= > 3.
>      >
>      [5][33]https://urldefense.proofpoint.com/v2/url?u=https-3A__develope
>      r.andro
>      >
>      id.com_reference_android_net_IpSecManager-23allocateSecurityParamete
>      rIn
>      >
>      dex-28java.net.InetAddress&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=r6Wzn
>      5Ln
>      >
>      Vsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkfA2yD1_H51r
>      d5M
>      >
>      6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=fVmYoLIpyuMX3AtAkU7ejR1dno_8QtnhCK
>      LMD
>      >    4BmURc&e=, int) > 4. mailto:[34]tobias@strongswan.org
>      >
>      > References
>      >
>      >    Visible links:
>      >    1.
>      [35]https://us-phishalarm-ewt.proofpoint.com/EWT/v1/CRVmXkqW!tG3Tv5d
>      8inv1_6DXc1X1B4ctthRq2qCkR8nIF_n_SOJiXQ-SqG_LUk--J5LZEwV9jGdXABQriDr
>      uLYnPEg$
>      >    2.
>      [36]https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.a
>      ndroid.com_reference_android_net_IpSecManager-23&d=DwIDaQ&c=nKjWec2b
>      6R0mOyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNM
>      KKypJWISL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=9txNFX
>      C-wFiRFV_QJlIQLW2AWSbERIOWcGHJfuQP2ZA&e=
>      >    3.
>      [37]https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.a
>      ndroid.com_reference_android_net_IpSecManager-23&d=DwIDaQ&c=nKjWec2b
>      6R0mOyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNM
>      KKypJWISL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=9txNFX
>      C-wFiRFV_QJlIQLW2AWSbERIOWcGHJfuQP2ZA&e=
>      >    4.
>      [38]https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.a
>      ndroid.com_reference_android_net_IpSecManager-23allocateSecurityPara
>      meterIndex-28java.net.InetAddress-29&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xt
>      fQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5Bb
>      GkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=UtpwkuoswT-6aK0he6
>      dSS-0dvZfIcRjbfj_Eu4A-c6E&e=
>      >    5.
>      [39]https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.a
>      ndroid.com_reference_android_net_IpSecManager-23allocateSecurityPara
>      meterIndex-28java.net.InetAddress&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
>      r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkf
>      A2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=fVmYoLIpyuMX3AtAkU7ej
>      R1dno_8QtnhCKLMD4BmURc&e=
>      >
>      >    Hidden links:
>      >    7. [40]https://aka.ms/GetOutlookForMac
> 
>    --
> 
>    --
>    Best,
>    Yan
> 
> References
> 
>    1. mailto:antony.antony@secunet.com
>    2. mailto:evitayan@google.com
>    3. mailto:antony.antony@secunet.com
>    4. mailto:evitayan@google.com
>    5. mailto:nharold@google.com
>    6. mailto:tobias@strongswan.org
>    7. mailto:antony.antony@secunet.com
>    8. mailto:antony.antony@secunet.com
>    9. mailto:steffen.klassert@secunet.com
>   10. mailto:paul@nohats.ca
>   11. mailto:paul@nohats.ca
>   12. mailto:netdev@vger.kernel.org
>   13. mailto:netdev@vger.kernel.org
>   14. mailto:herbert@gondor.apana.org.au
>   15. mailto:davem@davemloft.net
>   16. mailto:edumazet@google.com
>   17. mailto:kuba@kernel.org
>   18. mailto:pabeni@redhat.com
>   19. mailto:pabeni@redhat.com
>   20. mailto:horms@kernel.org
>   21. mailto:horms@kernel.org
>   22. mailto:saakashkumar@marvell.com
>   23. mailto:akamluddin@marvell.com
>   24. mailto:akamluddin@marvell.com
>   25. mailto:greg@kroah.com
>   26. mailto:greg@kroah.com
>   27. mailto:nharold@google.com
>   28. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.an
>   29. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.an
>   30. mailto:tobias@strongswan.org
>   31. mailto:nharold@google.com
>   32. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.andro
>   33. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.andro
>   34. mailto:tobias@strongswan.org
>   35. https://us-phishalarm-ewt.proofpoint.com/EWT/v1/CRVmXkqW!tG3Tv5d8inv1_6DXc1X1B4ctthRq2qCkR8nIF_n_SOJiXQ-SqG_LUk--J5LZEwV9jGdXABQriDruLYnPEg$
>   36. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_reference_android_net_IpSecManager-23&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=9txNFXC-wFiRFV_QJlIQLW2AWSbERIOWcGHJfuQP2ZA&e=
>   37. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_reference_android_net_IpSecManager-23&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=9txNFXC-wFiRFV_QJlIQLW2AWSbERIOWcGHJfuQP2ZA&e=
>   38. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_reference_android_net_IpSecManager-23allocateSecurityParameterIndex-28java.net.InetAddress-29&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=UtpwkuoswT-6aK0he6dSS-0dvZfIcRjbfj_Eu4A-c6E&e=
>   39. https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.android.com_reference_android_net_IpSecManager-23allocateSecurityParameterIndex-28java.net.InetAddress&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=r6Wzn5LnVsk7Tgc5x4l_c04I_Hr_8TYqFn-YFi_gjqI&m=JsnNMKKypJWISL5BbGkfA2yD1_H51rd5M6YO4NG3WpRdteRwP5OdfTJFNEK0Xiec&s=fVmYoLIpyuMX3AtAkU7ejR1dno_8QtnhCKLMD4BmURc&e=
>   40. https://aka.ms/GetOutlookForMac

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v3 5/5] iavf: refactor virtchnl polling into single function
From: Jose Ignacio Tornos Martinez @ 2026-04-16  5:51 UTC (permalink / raw)
  To: aleksandr.loktionov
  Cc: anthony.l.nguyen, davem, edumazet, intel-wired-lan,
	jesse.brandeburg, jtornosm, kuba, netdev, pabeni,
	przemyslaw.kitszel
In-Reply-To: <IA3PR11MB8986843CDCC4F6DC6CD015FDE5252@IA3PR11MB8986.namprd11.prod.outlook.com>

Hello Aleksandr,

Thank you for your comments.
I wanted to link this patch in some way with patch 3/5, but you are right,
perhaps as a refactoring, better for net-next.
Anyway, I am going to wait for Przemek and
"iavf: add iavf_poll_virtchnl_response()" merge, after that I will rebase
and I will create another version of the series, dropping this for now.

Best regards
Jose Ignacio


^ permalink raw reply

* Re: [net-next v1 1/3] net: phy: motorcomm: Add yt8531_set_ds() mdio_locked bool parameter
From: Minda Chen @ 2026-04-16  6:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Frank, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <f308dd00-e589-48e8-8edb-d3b9ed5565e6@lunn.ch>



> 
> On Wed, Apr 15, 2026 at 05:26:52PM +0800, Minda Chen wrote:
> > yt8531_set_ds() default set register with mdio lock and only called
> > with YT8531 PHY. But new type YT8531s support RGMII and has the same
> > pin strength setting with YT8531, YT8531s need to call yt8531_set_ds()
> > setting pin drive strength. But Its config init function
> > yt8521_config_init() already get the mdio lock with phy_select_page().
> >
> > Need to add ytphy API without lock in yt8531_set_ds() and a new bool
> > parameter for YT8531s RGMII case.
> 
> This is ugly.
> 
> Please try to modify the code so that both PHYs can call
> yt8531_set_ds() in the same locking context. You then don't need the
> mdio_locked parameter.
> 
>     Andrew
> 
> ---
> pw-bot: cr

Okay Thanks Andrew.

Hi Frank
   Could you review patch3? Thanks.

^ permalink raw reply

* RE: [net-next,v3,06/13] net: renesas: rswitch: add forwarding rules for gwca
From: Michael Dege @ 2026-04-16  6:19 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: paul@pbarker.dev, niklas.soderlund, kuba@kernel.org,
	Yoshihiro Shimoda, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, edumazet@google.com,
	linux-renesas-soc@vger.kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net
In-Reply-To: <20260402125933.234477-1-pabeni@redhat.com>

Hello Paolo,

Thank you very much for your review.

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, April 2, 2026 3:00 PM
> To: Michael Dege <michael.dege@renesas.com>
> Cc: paul@pbarker.dev; niklas.soderlund <niklas.soderlund@ragnatech.se>; kuba@kernel.org; Yoshihiro
> Shimoda <yoshihiro.shimoda.uh@renesas.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> edumazet@google.com; pabeni@redhat.com; linux-renesas-soc@vger.kernel.org; andrew+netdev@lunn.ch;
> davem@davemloft.net
> Subject: Re: [net-next,v3,06/13] net: renesas: rswitch: add forwarding rules for gwca
>
> This is an AI-generated review of your patch. The human sending this email has considered the AI
> review valid, or at least plausible.
>
> For local reproduction steps see
> https://netdev-ai.bots.linux.dev/ai-
> local.html&data=05%7C02%7Cmichael.dege%40renesas.com%7Cce95b7b1d9df4483358208de90b7b638%7C53d82571da19
> 47e49cb4625a166a4a2a%7C0%7C0%7C639107315865778020%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=dee%2F%2Bmcpl10Pk%
> 2BHTcL0u75PN%2F4FFyv9xB2kWqnwj%2B9Q%3D&reserved=0
> ---
> net: renesas: rswitch: add forwarding rules for gwca
>
> Add rules to forward packets from the Ethernet ports to the CPU port (GWCA) using L2 forwarding
> instead of port forwarding.
>
> > diff --git a/drivers/net/ethernet/renesas/rswitch_l2.c
> > b/drivers/net/ethernet/renesas/rswitch_l2.c
> > index 709524c8a5c4..cabf1f92ad0c 100644
> > --- a/drivers/net/ethernet/renesas/rswitch_l2.c
> > +++ b/drivers/net/ethernet/renesas/rswitch_l2.c
> > @@ -93,10 +93,32 @@ static void rswitch_update_l2_hw_forwarding(struct rswitch_private *priv)
> >     }
> >  }
> >
> > +static void rswitch_update_l2_hw_forwarding_gwca(struct
> > +rswitch_private *priv) {
> > +   struct rswitch_device *rdev;
> > +   u32 fwpc0_set, fwpc0_clr, fwpc2_set, fwpc2_clr;
> > +
> > +   fwpc0_clr = FWPC0_MACSSA | FWPC0_MACDSA | FWPC0_MACRUDA;
> > +   fwpc0_set = fwpc0_clr;
> > +   fwpc2_clr = FIELD_PREP(FWPC2_LTWFW, BIT(AGENT_INDEX_GWCA));
> > +   fwpc2_set = fwpc2_clr;
> > +
> > +   (priv->offload_brdev) ? (fwpc0_clr = 0, fwpc2_set = 0)
> > +                         : (fwpc0_set = 0, fwpc2_set = 0);
> >                                             ^^^^^^^^^^^
>
> Does the false branch have a copy-paste error? Looking at the ternary
> operator:
>
> The TRUE branch (when offload_brdev is set) assigns:
>     fwpc0_clr = 0
>     fwpc2_set = 0
>
> The FALSE branch (when offload_brdev is NULL) assigns:
>     fwpc0_set = 0
>     fwpc2_set = 0   <- assigns to fwpc2_set
>
> Both branches assign to fwpc2_set, leaving fwpc2_clr unchanged with the initialized value
> FIELD_PREP(FWPC2_LTWFW, BIT(AGENT_INDEX_GWCA)).
>
> This appears to break the logic for non-offloaded mode. The FALSE branch passes fwpc2_clr with the
> GWCA bit set to rswitch_modify(), which clears the GWCA bit from the FWPC2 registers. Is this the
> intended behavior when bridge offload is disabled?
>
> The symmetric pattern in the TRUE branch suggests the FALSE branch should be (fwpc0_set = 0, fwpc2_clr
> = 0) instead.

Good catch, this is indeed a copy paste error. I fixed it for the next version.

Best regards,

Michael
>
> > +
> > +   rswitch_modify(priv->addr, FWPC0(AGENT_INDEX_GWCA), fwpc0_clr,
> > +fwpc0_set);
> > +
> > +   rswitch_for_all_ports(priv, rdev) {
> > +           rswitch_modify(priv->addr, FWPC2(rdev->etha->index),
> > +                          fwpc2_clr, fwpc2_set);
> > +   }
> > +}
> > +
> >  void rswitch_update_l2_offload(struct rswitch_private *priv)  {
> >     rswitch_update_l2_hw_learning(priv);
> >     rswitch_update_l2_hw_forwarding(priv);
> > +   rswitch_update_l2_hw_forwarding_gwca(priv);
> >  }
>
> [ ... ]


^ permalink raw reply

* Re: [PATCH net-next 5/6] net: stmmac: move PHY handling out of __stmmac_open()/release()
From: Alexander Stein @ 2026-04-16  6:20 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, Maxime Coquelin, netdev, Paolo Abeni
In-Reply-To: <ad-LtOBrKREM1tCk@shell.armlinux.org.uk>

Am Mittwoch, 15. April 2026, 14:59:32 CEST schrieb Russell King (Oracle):
> On Wed, Apr 15, 2026 at 08:08:40AM +0200, Alexander Stein wrote:
> > Hi,
> > 
> > Am Dienstag, 23. September 2025, 13:26:19 CEST schrieb Russell King (Oracle):
> > > Move the PHY attachment/detachment from the network driver out of
> > > __stmmac_open() and __stmmac_release() into stmmac_open() and
> > > stmmac_release() where these actions will only happen when the
> > > interface is administratively brought up or down. It does not make
> > > sense to detach and re-attach the PHY during a change of MTU.
> > 
> > Sorry for coming up now. But I recently noticed this commit breaks changing
> > the MTU on i.MX8MP. Once I simply change the MTU I run into some DMA error:
> > $ ip link set dev end1 mtu 1400
> > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
> > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-1
> > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-2
> > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-3
> > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-4
> > imx-dwmac 30bf0000.ethernet end1: Link is Down
> > imx-dwmac 30bf0000.ethernet end1: Failed to reset the dma
> > imx-dwmac 30bf0000.ethernet end1: stmmac_hw_setup: DMA engine initialization failed
> 
> This basically means that a clock is missing. Please provide more
> information:
> 
> - what kernel version are you using?

Currently I am using v6.18.22.
$ ethtool -i end1
driver: st_gmac
version: 6.18.22
firmware-version: 
expansion-rom-version: 
bus-info: 30bf0000.ethernet
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

> - has EEE been negotiated?

No. It is marked as not supported

$ ethtool --show-eee end1
EEE settings for end1:
        EEE status: not supported

> - does the problem persist when EEE is disabled?

As EEE is not supported the problem occurs even with EEE disabled.

> - which PHY is attached to stmmac?

It is a TI DP83867.

imx-dwmac 30bf0000.ethernet eth1: PHY [stmmac-1:03] driver [TI DP83867] (irq=136)

> - which PHY interface mode is being used to connect the PHY to stmmac?

For this interface
> phy-mode = "rgmii-id";
is set.

In case it is helpful. My platform is arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts
Thanks for assisting. If there a further questions, don't hesitate to ask.

Thanks and best regards
Alexander
-- 
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
http://www.tq-group.com/




^ permalink raw reply

* Re: [net,PATCH v3 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Sebastian Andrzej Siewior @ 2026-04-16  6:21 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
	Yicong Hui, linux-kernel
In-Reply-To: <7734527a-d08b-49fa-b258-c37c5ae2da55@nabladev.com>

On 2026-04-16 01:14:35 [+0200], Marek Vasut wrote:
> > spin_unlock_bh(&ks->statelock)? After that unlock, the softirq must be
> > processed and __netdev_alloc_skb() _could_ observe pending softirqs but
> > not from ks8851.
> Because __netdev_alloc_skb() also enables/disables BH , see the "else"

Yes. But there is no softirq raised in that part. That softirq is raised
by netif_wake_queue() within a bh disabled section. Therefore upon the
unlock the softirq must be invoked.
After that, rhe allocation later on may invoke softirqs which were
raised but I don't see how ks8851 can be part of it.
Before commit 0913ec336a6c0 ("net: ks8851: Fix deadlock with the SPI
chip variant") there was no _bh around it meaning the softirq was raised
but not invoked immediately. This happened on the bh unlock during
memory allocation. Therefore I am saying this backtrace is from an older
kernel.

If there is a flaw in my the theory please explain _how_ you managed
that get that backtrace. I am sure it must have from an older kernel and
_now_ this lockup also happens on !RT kernels (except for the SPI
platform).

Sebastian

^ permalink raw reply

* [PATCH] selftests: net: add RDMA CM observability and regression scripts
From: Chenguang Zhao @ 2026-04-16  6:22 UTC (permalink / raw)
  To: Shuah Khan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Chenguang Zhao, netdev, linux-kselftest

Add a minimal RDMA CM selftest suite that captures observability
baselines and runs trace, counter-delta, and fault-injection oriented
checks, plus a review-loop helper for repeated validation rounds.

Usage (client side):
- export
  CM_WORKLOAD_CMD='ib_send_bw -d <dev> -i <port> -R -g <gid> <server_ip>'
  (User can customize CM_WORKLOAD_CMD)
- sudo -E make -C tools/testing/selftests
  TARGETS=drivers/net/rdma run_tests

Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
---
  The first patch adds a focused RDMA CM selftest suite under
  kselftest to make CM behavior easier to observe and validate
  in routine regression runs.

  It introduces baseline collection, trace-sequence checks,
  counter-delta checks, and failslab-based recovery checks, plus
  a review-loop script for one-shot serial execution. It also
  registers drivers/net/rdma in the top-level selftests TARGETS,
  so the suite runs through standard kselftest flow
  (make ... TARGETS=drivers/net/rdma run_tests) instead of requiring
  manual script-by-script execution.
---
 tools/testing/selftests/Makefile              |   1 +
 .../selftests/drivers/net/rdma/Makefile       |  13 ++
 .../selftests/drivers/net/rdma/README.md      | 168 ++++++++++++++++++
 .../drivers/net/rdma/rdma_cm_baseline.sh      |  58 ++++++
 .../drivers/net/rdma/rdma_cm_counter_delta.sh |  72 ++++++++
 .../net/rdma/rdma_cm_fault_injection.sh       |  95 ++++++++++
 .../drivers/net/rdma/rdma_cm_review_loop.sh   |  35 ++++
 .../net/rdma/rdma_cm_trace_sequence.sh        |  83 +++++++++
 .../selftests/drivers/net/rdma/rdma_common.sh | 126 +++++++++++++
 9 files changed, 651 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/net/rdma/Makefile
 create mode 100644 tools/testing/selftests/drivers/net/rdma/README.md
 create mode 100755 tools/testing/selftests/drivers/net/rdma/rdma_cm_baseline.sh
 create mode 100755 tools/testing/selftests/drivers/net/rdma/rdma_cm_counter_delta.sh
 create mode 100755 tools/testing/selftests/drivers/net/rdma/rdma_cm_fault_injection.sh
 create mode 100755 tools/testing/selftests/drivers/net/rdma/rdma_cm_review_loop.sh
 create mode 100755 tools/testing/selftests/drivers/net/rdma/rdma_cm_trace_sequence.sh
 create mode 100755 tools/testing/selftests/drivers/net/rdma/rdma_common.sh

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 984abb6d42ab..0df7034f46b2 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -22,6 +22,7 @@ TARGETS += drivers/ntsync
 TARGETS += drivers/s390x/uvdevice
 TARGETS += drivers/net
 TARGETS += drivers/net/bonding
+TARGETS += drivers/net/rdma
 TARGETS += drivers/net/netconsole
 TARGETS += drivers/net/team
 TARGETS += drivers/net/virtio_net
diff --git a/tools/testing/selftests/drivers/net/rdma/Makefile b/tools/testing/selftests/drivers/net/rdma/Makefile
new file mode 100644
index 000000000000..42d042aac1f0
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_PROGS := \
+	rdma_cm_baseline.sh \
+	rdma_cm_trace_sequence.sh \
+	rdma_cm_counter_delta.sh \
+	rdma_cm_fault_injection.sh
+
+TEST_FILES := \
+	rdma_common.sh \
+	rdma_cm_review_loop.sh
+
+include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/rdma/README.md b/tools/testing/selftests/drivers/net/rdma/README.md
new file mode 100644
index 000000000000..a9caca638b20
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/README.md
@@ -0,0 +1,168 @@
+# RDMA CM Selftests Usage Guide
+
+These scripts provide baseline observability and regression checks for RDMA/CM
+paths under the Linux `kselftest` framework.
+
+Files:
+
+- `rdma_cm_baseline.sh`
+- `rdma_cm_trace_sequence.sh`
+- `rdma_cm_counter_delta.sh`
+- `rdma_cm_fault_injection.sh`
+- `rdma_cm_review_loop.sh`
+- `rdma_common.sh`
+
+The scripts use a fixed test flow and only require workload commands from
+environment variables.
+
+## 1. Use Cases
+
+- CM main-flow observability checks (REQ/REP/RTU)
+- CM counter delta validation
+- Recovery validation after fault injection (`failslab`)
+- One-shot serial regression run
+
+## 2. Requirements
+
+- root privileges
+- Reachable client/server network path
+- Perftest command available on the remote side (default: `ib_send_bw -R`)
+- For fault injection: kernel support for `failslab` and access to
+  `/sys/kernel/debug/failslab`
+
+## 3. Recommended Execution Order
+
+```bash
+./rdma_cm_baseline.sh
+./rdma_cm_trace_sequence.sh
+./rdma_cm_counter_delta.sh
+./rdma_cm_fault_injection.sh
+```
+
+Or run all in sequence:
+
+```bash
+./rdma_cm_review_loop.sh
+```
+
+## 4. Quick Start (Two Hosts)
+
+### 4.1 Server side (recommended: loop and keep listening)
+
+```bash
+while true; do
+  ib_send_bw -d <server_ibdev> -i <server_port> -R
+  sleep 1
+done
+```
+
+### 4.2 Client side (set workload command)
+
+```bash
+export CM_WORKLOAD_CMD='ib_send_bw -d rocep1s0f0 -i 1 -R -g 3 192.168.1.22'
+export CM_VALIDATE_RECOVERY_CMD="${CM_WORKLOAD_CMD}"
+
+./rdma_cm_review_loop.sh
+```
+
+### 4.3 Run through kselftest harness
+
+```bash
+sudo -E make -C tools/testing/selftests TARGETS=drivers/net/rdma run_tests
+```
+
+`sudo -E` keeps exported workload variables for test scripts.
+
+### 4.4 Run a single script directly
+
+```bash
+cd tools/testing/selftests/drivers/net/rdma
+sudo -E ./rdma_cm_counter_delta.sh
+```
+
+## 5. Configuration Parameters
+
+Only workload command variables are supported:
+
+- `CM_WORKLOAD_CMD`: required; workload command used by trace/counter/fault tests
+- `CM_VALIDATE_RECOVERY_CMD`: optional; command for recovery stage in fault
+  injection test (falls back to `CM_WORKLOAD_CMD`)
+
+Fixed internal settings:
+
+- Counter pre-wait: `2s`
+- Recovery pre-wait: `2s`
+- Failslab path: `/sys/kernel/debug/failslab`
+- Failslab knobs: `task-filter=1`, `probability=1`, `interval=100`, `times=1`
+- Counter limits: `cm_rx_duplicates.* <= 10`, `cm_tx_retries.* <= 10`
+- Trace log path: `/tmp/rdma_cm_trace.<timestamp>.log`
+
+## 6. Exit Codes
+
+- `0`: pass
+- `4`: skip (environment not ready, e.g. missing tracefs/failslab/counters)
+- other non-zero: fail
+
+## 7. Result Interpretation
+
+When running with kselftest (`make ... run_tests`), TAP output looks like:
+
+```text
+ok 1 selftests: drivers/net/rdma: rdma_cm_baseline.sh
+ok 2 selftests: drivers/net/rdma: rdma_cm_trace_sequence.sh
+not ok 3 selftests: drivers/net/rdma: rdma_cm_counter_delta.sh # exit=1
+```
+
+- `ok N ...`: that script passed
+- `not ok N ... # exit=1`: that script failed
+- `not ok N ... # exit=4`: that script was skipped by environment checks
+
+When running `rdma_cm_review_loop.sh` directly, check the final summary block:
+
+```text
+==== summary ====
+baseline=0
+trace=0
+counters=1
+fault_injection=0
+```
+
+Each value is the corresponding script return code.
+
+## 8. Common Issues
+
+### 8.1 `cm counters are unavailable under /sys/class/infiniband`
+
+The script did not find `cm_tx_msgs` (and related) counters. Check:
+
+- whether `cm_tx_msgs` exists under any available RDMA port path
+
+### 8.2 `missing CM send trace events (req/rep/rtu)`
+
+This usually means workload did not create a CM handshake. Verify
+`CM_WORKLOAD_CMD` and remote server readiness.
+
+### 8.3 `Unexpected CM event ... 8`
+
+Usually means the server was not ready for the next connection. Try:
+
+- keep server in a listening loop
+- ensure the remote server is still listening before the recovery stage
+
+### 8.4 `failslab is unavailable`
+
+Expected skip when failslab is not exposed by kernel/debugfs. Check:
+
+```bash
+mount | grep debugfs
+ls /sys/kernel/debug/failslab
+```
+
+## 9. Minimal Regression Profile
+
+```bash
+export CM_WORKLOAD_CMD='ib_send_bw -d <client_ibdev> -i 1 -R -g <gid_idx> <server_ip>'
+export CM_VALIDATE_RECOVERY_CMD="${CM_WORKLOAD_CMD}"
+
+./rdma_cm_review_loop.sh
+```
diff --git a/tools/testing/selftests/drivers/net/rdma/rdma_cm_baseline.sh b/tools/testing/selftests/drivers/net/rdma/rdma_cm_baseline.sh
new file mode 100755
index 000000000000..b0d8b3e46470
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/rdma_cm_baseline.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "${SCRIPT_DIR}/rdma_common.sh"
+
+require_root
+require_cmd date
+require_cmd uname
+
+trace_dir="$(tracefs_dir || true)"
+counter_root="$(find_cm_counter_root || true)"
+out_dir="/tmp/rdma_cm_baseline.$(date +%s)"
+dmesg_lines=400
+dmesg_pattern="ib_cm|infiniband|rdma|roce|mlx|hns_roce|irdma|siw|rxe"
+
+mkdir -p "${out_dir}"
+
+log_info "writing baseline to ${out_dir}"
+
+{
+	echo "timestamp=$(date -u +%FT%TZ)"
+	echo "kernel=$(uname -r)"
+	echo "hostname=$(uname -n)"
+	echo "dmesg_lines=${dmesg_lines}"
+	echo "dmesg_pattern=${dmesg_pattern}"
+} >"${out_dir}/env.txt"
+
+if [[ -n "${trace_dir}" && -d "${trace_dir}/events/ib_cma" ]]; then
+	find "${trace_dir}/events/ib_cma" -maxdepth 2 -name enable -print \
+		>"${out_dir}/trace_events.list" 2>/dev/null || true
+else
+	log_warn "tracefs or ib_cma trace events are unavailable"
+fi
+
+if [[ -n "${counter_root}" ]]; then
+	{
+		echo "counter_root=${counter_root}"
+		for group in "${RDMA_COUNTER_GROUPS[@]}"; do
+			for attr in "${RDMA_COUNTER_ATTRS[@]}"; do
+				value="$(read_cm_counter "${counter_root}" "${group}" "${attr}")"
+				echo "${group}.${attr}=${value}"
+			done
+		done
+	} >"${out_dir}/cm_counters.before"
+else
+	log_warn "cm counters are unavailable under /sys/class/infiniband"
+fi
+
+if command -v dmesg >/dev/null 2>&1; then
+	dmesg | tail -n "${dmesg_lines}" | grep -E "${dmesg_pattern}" \
+		>"${out_dir}/dmesg.rdma.tail" || true
+fi
+
+log_info "baseline collection completed"
+exit 0
diff --git a/tools/testing/selftests/drivers/net/rdma/rdma_cm_counter_delta.sh b/tools/testing/selftests/drivers/net/rdma/rdma_cm_counter_delta.sh
new file mode 100755
index 000000000000..060adf9fe78a
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/rdma_cm_counter_delta.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "${SCRIPT_DIR}/rdma_common.sh"
+
+require_root
+counter_root="$(find_cm_counter_root || true)"
+counter_wait_sec=2
+
+if [[ -z "${counter_root}" ]]; then
+	log_warn "cm counters are unavailable under /sys/class/infiniband"
+	exit "${ksft_skip}"
+fi
+
+declare -A before after
+
+for group in "${RDMA_COUNTER_GROUPS[@]}"; do
+	for attr in "${RDMA_COUNTER_ATTRS[@]}"; do
+		key="${group}.${attr}"
+		before["${key}"]="$(read_cm_counter "${counter_root}" "${group}" "${attr}")"
+	done
+done
+
+if [[ "${counter_wait_sec}" != "0" ]]; then
+	log_info "waiting ${counter_wait_sec}s before workload"
+	sleep "${counter_wait_sec}"
+fi
+
+workload_rc=0
+run_workload || workload_rc=$?
+if [[ "${workload_rc}" -eq "${ksft_skip}" ]]; then
+	exit "${ksft_skip}"
+fi
+if [[ "${workload_rc}" -ne 0 ]]; then
+	log_err "workload failed with rc=${workload_rc}"
+	exit "${workload_rc}"
+fi
+
+for group in "${RDMA_COUNTER_GROUPS[@]}"; do
+	for attr in "${RDMA_COUNTER_ATTRS[@]}"; do
+		key="${group}.${attr}"
+		after["${key}"]="$(read_cm_counter "${counter_root}" "${group}" "${attr}")"
+		delta=$((after["${key}"] - before["${key}"]))
+		echo "${key}.delta=${delta}"
+		if ((delta < 0)); then
+			log_err "counter regressed: ${key}"
+			exit 1
+		fi
+	done
+done
+
+dup_limit=10
+retry_limit=10
+
+for attr in "${RDMA_COUNTER_ATTRS[@]}"; do
+	dup_delta=$((after["cm_rx_duplicates.${attr}"] - before["cm_rx_duplicates.${attr}"]))
+	retry_delta=$((after["cm_tx_retries.${attr}"] - before["cm_tx_retries.${attr}"]))
+
+	if ((dup_delta > dup_limit)); then
+		log_err "duplicate counter exceeds limit: ${attr}=${dup_delta}"
+		exit 1
+	fi
+	if ((retry_delta > retry_limit)); then
+		log_err "retry counter exceeds limit: ${attr}=${retry_delta}"
+		exit 1
+	fi
+done
+
+exit 0
diff --git a/tools/testing/selftests/drivers/net/rdma/rdma_cm_fault_injection.sh b/tools/testing/selftests/drivers/net/rdma/rdma_cm_fault_injection.sh
new file mode 100755
index 000000000000..0202ee901386
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/rdma_cm_fault_injection.sh
@@ -0,0 +1,95 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "${SCRIPT_DIR}/rdma_common.sh"
+
+require_root
+
+debugfs_fail="/sys/kernel/debug/failslab"
+recovery_wait_sec=2
+if [[ ! -d "${debugfs_fail}" ]]; then
+	log_warn "failslab is unavailable: ${debugfs_fail}"
+	exit "${ksft_skip}"
+fi
+
+for knob in probability interval times task-filter; do
+	if [[ ! -f "${debugfs_fail}/${knob}" ]]; then
+		log_warn "failslab knob missing: ${knob}"
+		exit "${ksft_skip}"
+	fi
+done
+
+orig_probability="$(cat "${debugfs_fail}/probability")"
+orig_interval="$(cat "${debugfs_fail}/interval")"
+orig_times="$(cat "${debugfs_fail}/times")"
+orig_task_filter="$(cat "${debugfs_fail}/task-filter")"
+
+restore_knobs()
+{
+	echo "${orig_probability}" >"${debugfs_fail}/probability" || true
+	echo "${orig_interval}" >"${debugfs_fail}/interval" || true
+	echo "${orig_times}" >"${debugfs_fail}/times" || true
+	echo "${orig_task_filter}" >"${debugfs_fail}/task-filter" || true
+}
+
+trap restore_knobs EXIT
+
+log_failslab_state()
+{
+	local state="$1"
+	local task_filter probability interval times
+
+	task_filter="$(cat "${debugfs_fail}/task-filter")"
+	probability="$(cat "${debugfs_fail}/probability")"
+	interval="$(cat "${debugfs_fail}/interval")"
+	times="$(cat "${debugfs_fail}/times")"
+
+	log_info "failslab ${state}: task-filter=${task_filter} probability=${probability}"
+	log_info "failslab ${state}: interval=${interval} times=${times}"
+}
+
+echo 1 >"${debugfs_fail}/task-filter"
+echo 1 >"${debugfs_fail}/probability"
+echo 100 >"${debugfs_fail}/interval"
+echo 1 >"${debugfs_fail}/times"
+log_failslab_state "enabled"
+
+if [[ -z "${CM_WORKLOAD_CMD:-}" && -n "${CM_VALIDATE_RECOVERY_CMD:-}" ]]; then
+	CM_WORKLOAD_CMD="${CM_VALIDATE_RECOVERY_CMD}"
+	log_warn "CM_WORKLOAD_CMD is not set; fallback to CM_VALIDATE_RECOVERY_CMD"
+fi
+
+injected_rc=0
+run_workload || injected_rc=$?
+if [[ "${injected_rc}" -eq "${ksft_skip}" ]]; then
+	exit "${ksft_skip}"
+fi
+log_info "workload rc under injection=${injected_rc}"
+
+echo 0 >"${debugfs_fail}/probability"
+echo 0 >"${debugfs_fail}/times"
+echo 0 >"${debugfs_fail}/task-filter"
+log_failslab_state "disabled"
+
+recovery_cmd="${CM_VALIDATE_RECOVERY_CMD:-${CM_WORKLOAD_CMD:-}}"
+if [[ -z "${recovery_cmd}" ]]; then
+	log_warn "CM_VALIDATE_RECOVERY_CMD and CM_WORKLOAD_CMD are both unset"
+	exit "${ksft_skip}"
+fi
+
+if [[ "${recovery_wait_sec}" != "0" ]]; then
+	log_info "waiting ${recovery_wait_sec}s before recovery workload"
+	sleep "${recovery_wait_sec}"
+fi
+
+log_info "running recovery workload: ${recovery_cmd}"
+if ! bash -c "${recovery_cmd}"; then
+	log_err "recovery workload failed after disabling fault injection"
+	log_err "hint: ensure remote server is restarted and listening for a second connection"
+	exit 1
+fi
+
+exit 0
diff --git a/tools/testing/selftests/drivers/net/rdma/rdma_cm_review_loop.sh b/tools/testing/selftests/drivers/net/rdma/rdma_cm_review_loop.sh
new file mode 100755
index 000000000000..c156090b17e3
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/rdma_cm_review_loop.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+cd "${SCRIPT_DIR}"
+
+declare -A rc
+
+run_step()
+{
+	local name="$1"
+	local cmd="$2"
+
+	echo "==== ${name} ===="
+	if bash -c "${cmd}"; then
+		rc["${name}"]=0
+	else
+		rc["${name}"]=$?
+	fi
+	echo "==== ${name} rc=${rc["${name}"]} ===="
+}
+
+run_step baseline "./rdma_cm_baseline.sh"
+run_step trace "./rdma_cm_trace_sequence.sh"
+run_step counters "./rdma_cm_counter_delta.sh"
+run_step fault_injection "./rdma_cm_fault_injection.sh"
+
+echo "==== summary ===="
+for name in baseline trace counters fault_injection; do
+	echo "${name}=${rc["${name}"]}"
+done
+
+exit 0
diff --git a/tools/testing/selftests/drivers/net/rdma/rdma_cm_trace_sequence.sh b/tools/testing/selftests/drivers/net/rdma/rdma_cm_trace_sequence.sh
new file mode 100755
index 000000000000..7e68289345e8
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/rdma_cm_trace_sequence.sh
@@ -0,0 +1,83 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "${SCRIPT_DIR}/rdma_common.sh"
+
+require_root
+require_cmd bash
+require_cmd grep
+
+trace_dir="$(tracefs_dir || true)"
+if [[ -z "${trace_dir}" ]]; then
+	log_warn "tracefs is unavailable"
+	exit "${ksft_skip}"
+fi
+
+if [[ ! -d "${trace_dir}/events/ib_cma" ]]; then
+	log_warn "ib_cma trace events are unavailable"
+	exit "${ksft_skip}"
+fi
+
+workload_rc=0
+
+cleanup_trace()
+{
+	local event
+
+	for event in icm_send_req icm_send_rep icm_send_rtu icm_recv_unknown_attr; do
+		[[ -f "${trace_dir}/events/ib_cma/${event}/enable" ]] && \
+			echo 0 >"${trace_dir}/events/ib_cma/${event}/enable"
+	done
+	[[ -f "${trace_dir}/events/ib_cma/enable" ]] && echo 0 >"${trace_dir}/events/ib_cma/enable"
+	echo 0 >"${trace_dir}/tracing_on"
+}
+
+trap cleanup_trace EXIT
+
+echo 0 >"${trace_dir}/tracing_on"
+echo >"${trace_dir}/trace"
+echo 1 >"${trace_dir}/events/ib_cma/enable"
+
+for event in icm_send_req icm_send_rep icm_send_rtu; do
+	if [[ -f "${trace_dir}/events/ib_cma/${event}/enable" ]]; then
+		echo 1 >"${trace_dir}/events/ib_cma/${event}/enable"
+	fi
+done
+
+echo 1 >"${trace_dir}/tracing_on"
+run_workload || workload_rc=$?
+echo 0 >"${trace_dir}/tracing_on"
+
+if [[ "${workload_rc}" -eq "${ksft_skip}" ]]; then
+	exit "${ksft_skip}"
+fi
+
+trace_log="/tmp/rdma_cm_trace.$(date +%s).log"
+cat "${trace_dir}/trace" >"${trace_log}"
+log_info "captured trace at ${trace_log}"
+
+if ! grep -Eq "icm_send_(req|rep|rtu)" "${trace_log}"; then
+	log_err "missing CM send trace events (req/rep/rtu)"
+	exit 1
+fi
+
+err_lines="$(grep "icm_.*_err" "${trace_log}" || true)"
+if [[ -n "${err_lines}" ]]; then
+	# DREP send failure while already in TIMEWAIT is a common teardown
+	# race and is tolerated for this smoke-style validation script.
+	untolerated_err_lines="$(
+		printf '%s\n' "${err_lines}" | \
+			grep -Ev "icm_send_drep_err: .*state=TIMEWAIT" || true
+	)"
+	if [[ -n "${untolerated_err_lines}" ]]; then
+		log_err "error trace event detected in ib_cma path"
+		printf '%s\n' "${untolerated_err_lines}" >&2
+		exit 1
+	fi
+	log_warn "only tolerated TIMEWAIT drep errors observed"
+fi
+
+exit 0
diff --git a/tools/testing/selftests/drivers/net/rdma/rdma_common.sh b/tools/testing/selftests/drivers/net/rdma/rdma_common.sh
new file mode 100755
index 000000000000..ee3d8b0d86b2
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/rdma/rdma_common.sh
@@ -0,0 +1,126 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ksft_skip=4
+RET=0
+
+RDMA_COUNTER_GROUPS=(
+	cm_tx_msgs
+	cm_tx_retries
+	cm_rx_msgs
+	cm_rx_duplicates
+)
+
+RDMA_COUNTER_ATTRS=(
+	req
+	mra
+	rej
+	rep
+	rtu
+	dreq
+	drep
+	sidr_req
+	sidr_rep
+	lap
+	apr
+)
+
+log_info()
+{
+	echo "INFO: $*"
+}
+
+log_warn()
+{
+	echo "WARN: $*" >&2
+}
+
+log_err()
+{
+	echo "ERROR: $*" >&2
+}
+
+require_root()
+{
+	if [[ "$(id -u)" -ne 0 ]]; then
+		log_warn "this test requires root privileges"
+		exit "${ksft_skip}"
+	fi
+}
+
+require_cmd()
+{
+	local cmd="$1"
+
+	command -v "${cmd}" >/dev/null 2>&1 || {
+		log_warn "missing required command: ${cmd}"
+		exit "${ksft_skip}"
+	}
+}
+
+tracefs_dir()
+{
+	if [[ -d /sys/kernel/tracing ]]; then
+		echo /sys/kernel/tracing
+	elif [[ -d /sys/kernel/debug/tracing ]]; then
+		echo /sys/kernel/debug/tracing
+	else
+		return 1
+	fi
+}
+
+find_cm_counter_root()
+{
+	local base
+	local port
+	local candidate
+
+	for base in /sys/class/infiniband/*; do
+		[[ -d "${base}" ]] || continue
+
+		for port in "${base}"/ports/*; do
+			[[ -d "${port}" ]] || continue
+			# RoCE / newer sysfs: cm_* groups live directly under ports/<N>/
+			if [[ -d "${port}/cm_tx_msgs" ]]; then
+				echo "${port}"
+				return 0
+			fi
+			# Legacy layout: under counters/ or hw_counters/
+			for candidate in "${port}/counters" "${port}/hw_counters"; do
+				[[ -d "${candidate}/cm_tx_msgs" ]] || continue
+				echo "${candidate}"
+				return 0
+			done
+		done
+	done
+
+	return 1
+}
+
+read_cm_counter()
+{
+	local root="$1"
+	local group="$2"
+	local attr="$3"
+	local path="${root}/${group}/${attr}"
+
+	if [[ -f "${path}" ]]; then
+		cat "${path}" 2>/dev/null
+	else
+		echo 0
+	fi
+}
+
+run_workload()
+{
+	local cmd="${CM_WORKLOAD_CMD:-}"
+
+	if [[ -z "${cmd}" ]]; then
+		log_warn "CM_WORKLOAD_CMD is not set"
+		return "${ksft_skip}"
+	fi
+
+	log_info "running workload: ${cmd}"
+	bash -c "${cmd}"
+}
+
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v3 4/4] rust_binder: report netlink transactions
From: Alice Ryhl @ 2026-04-16  7:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Danilo Krummrich,
	Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, linux-kernel, rust-for-linux, netdev
In-Reply-To: <96fff93c-c7ae-4480-8e0e-9f72c02a7b34@lunn.ch>

On Thu, Apr 16, 2026 at 02:46:18AM +0200, Andrew Lunn wrote:
> On Wed, Apr 15, 2026 at 09:37:54AM +0000, Alice Ryhl wrote:
> > From: Carlos Llamas <cmllamas@google.com>
> > 
> > The Android Binder driver supports a netlink API that reports
> > transaction *failures* to a userapce daemon. This allows devices to
> 
> userspace ?

Good catch, thanks.

Alice

^ permalink raw reply

* RE: [net-next,v3,13/13] net: renesas: rswitch: add vlan aware switching
From: Michael Dege @ 2026-04-16  7:04 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: paul@pbarker.dev, niklas.soderlund, kuba@kernel.org,
	Yoshihiro Shimoda, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, edumazet@google.com,
	linux-renesas-soc@vger.kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net
In-Reply-To: <20260402125946.234499-1-pabeni@redhat.com>

Hello Paolo,

Thank you for your review comments.

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, April 2, 2026 3:00 PM
> To: Michael Dege <michael.dege@renesas.com>
> Cc: paul@pbarker.dev; niklas.soderlund <niklas.soderlund@ragnatech.se>; kuba@kernel.org; Yoshihiro
> Shimoda <yoshihiro.shimoda.uh@renesas.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> edumazet@google.com; pabeni@redhat.com; linux-renesas-soc@vger.kernel.org; andrew+netdev@lunn.ch;
> davem@davemloft.net
> Subject: Re: [net-next,v3,13/13] net: renesas: rswitch: add vlan aware switching
>
> This is an AI-generated review of your patch. The human sending this email has considered the AI
> review valid, or at least plausible.
>
> For local reproduction steps see
> https://netdev-ai.bots.linux.dev/ai-
> local.html&data=05%7C02%7Cmichael.dege%40renesas.com%7C10d1c680d41c4e17dc2408de90b7bf4a%7C53d82571da19
> 47e49cb4625a166a4a2a%7C0%7C0%7C639107316024723538%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=KVia3Rm%2FIDae%2Fq
> NCrKk08fsOpcv%2FnSMZoApYSZsLCYI%3D&reserved=0
> ---
> net: renesas: rswitch: add vlan aware switching
>
> This commit adds vlan support to the L2 HW bridge, with limited support on R-Car S4 (not fully 802.1Q
> compliant).
>
> > diff --git a/drivers/net/ethernet/renesas/Kconfig
> > b/drivers/net/ethernet/renesas/Kconfig
> > index 9b7559c88bee..2494ff60e8c2 100644
> > --- a/drivers/net/ethernet/renesas/Kconfig
> > +++ b/drivers/net/ethernet/renesas/Kconfig
>
> [ ... ]
>
> > diff --git a/drivers/net/ethernet/renesas/rswitch_l2.c
> > b/drivers/net/ethernet/renesas/rswitch_l2.c
> > index ea95a87ed234..b7d0e7f94ecd 100644
> > --- a/drivers/net/ethernet/renesas/rswitch_l2.c
> > +++ b/drivers/net/ethernet/renesas/rswitch_l2.c
>
> [ ... ]
>
> > @@ -212,61 +197,397 @@ static int rswitch_netdevice_event(struct
> > notifier_block *nb,
>
> [ ... ]
>
> > +static int rswitch_handle_port_obj_add(struct net_device *ndev,
> > +                                  struct notifier_block *nb,
> > +                                  struct switchdev_notifier_port_obj_info *info) {
> > +   struct switchdev_obj_port_vlan *p_vlan = SWITCHDEV_OBJ_PORT_VLAN(info->obj);
> > +   struct rswitch_private *priv;
> > +   int err;
> > +
> > +   priv = container_of(nb, struct rswitch_private,
> > +rswitch_switchdev_blocking_nb);
> > +
> > +   if ((p_vlan->flags & BRIDGE_VLAN_INFO_MASTER) ||
> > +       (p_vlan->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) ||
> > +       (p_vlan->flags & BRIDGE_VLAN_INFO_RANGE_END) ||
> > +       (p_vlan->flags & BRIDGE_VLAN_INFO_ONLY_OPTS))
> > +           return NOTIFY_DONE;
> > +
> > +   switch (info->obj->id) {
> > +   case SWITCHDEV_OBJ_ID_PORT_VLAN:
>
> Can the flag checks access the wrong memory? The code casts info->obj to switchdev_obj_port_vlan
> before checking info->obj->id. When the notifier delivers a non-VLAN object (such as
> SWITCHDEV_OBJ_ID_PORT_MDB for multicast group changes), SWITCHDEV_OBJ_PORT_VLAN() performs an
> incorrect container_of cast, and p_vlan->flags reads bytes from a different struct type.
>
> For switchdev_obj_port_mdb, the p_vlan->flags field overlaps with the first
> 2 bytes of the addr[ETH_ALEN] MAC address field. Multicast MAC addresses always have bit 0 of the
> first byte set, which matches BRIDGE_VLAN_INFO_MASTER (1<<0), causing the function to return
> NOTIFY_DONE instead of falling through to the default case that returns -EOPNOTSUPP.
>
> Other switchdev drivers (lan966x, mlxsw, cpsw) perform the SWITCHDEV_OBJ_PORT_VLAN cast inside the
> SWITCHDEV_OBJ_ID_PORT_VLAN case, not before the type switch.
>
> The same issue exists in rswitch_handle_port_obj_del().

Thank you this makes sense, I changed it.

Best regards,

Michael
>
> > +           if (!is_rdev(ndev))
> > +                   err = rswitch_port_obj_do_add_gwca(ndev, priv, p_vlan);
> > +           else
> > +                   err = rswitch_port_obj_do_add(ndev, p_vlan);
> > +
> > +           if (err < 0)
> > +                   return err;
> > +
> > +           break;
> > +   default:
> > +           return -EOPNOTSUPP;
> > +   }
> > +
> > +   info->handled = true;
> > +
> > +   return NOTIFY_DONE;
> > +}
> > +
> > +static int rswitch_handle_port_obj_del(struct net_device *ndev,
> > +                                  struct notifier_block *nb,
> > +                                  struct switchdev_notifier_port_obj_info *info) {
> > +   struct switchdev_obj_port_vlan *p_vlan = SWITCHDEV_OBJ_PORT_VLAN(info->obj);
> > +   struct rswitch_private *priv;
> > +   int err;
> > +
> > +   priv = container_of(nb, struct rswitch_private,
> > +rswitch_switchdev_blocking_nb);
> > +
> > +   if ((p_vlan->flags & BRIDGE_VLAN_INFO_MASTER) ||
> > +       (p_vlan->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) ||
> > +       (p_vlan->flags & BRIDGE_VLAN_INFO_RANGE_END) ||
> > +       (p_vlan->flags & BRIDGE_VLAN_INFO_ONLY_OPTS))
> > +           return NOTIFY_DONE;
> > +
> > +   switch (info->obj->id) {
> > +   case SWITCHDEV_OBJ_ID_PORT_VLAN:
> > +           if (!is_rdev(ndev))
> > +                   err = rswitch_port_obj_do_del_gwca(ndev, priv, p_vlan);
> > +           else
> > +                   err = rswitch_port_obj_do_del(ndev, p_vlan);
> > +
> > +           if (err < 0)
> > +                   return err;
> > +
> > +           break;
> > +   default:
> > +           return -EOPNOTSUPP;
> > +   }
> > +
> > +   info->handled = true;
> > +
> > +   return NOTIFY_DONE;
> > +}
>
> [ ... ]


^ permalink raw reply

* [syzbot ci] Re: net: tunnel: fix stale transport header after GRE/TEB decap
From: syzbot ci @ 2026-04-16  7:04 UTC (permalink / raw)
  To: davem, dsahern, edumazet, horms, jiayuan.chen, kuba, linux-kernel,
	netdev, pabeni, pshelar, syzbot, tom
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260416034610.8873-1-jiayuan.chen@linux.dev>

syzbot ci has tested the following series

[v1] net: tunnel: fix stale transport header after GRE/TEB decap
https://lore.kernel.org/all/20260416034610.8873-1-jiayuan.chen@linux.dev
* [PATCH net v1 1/2] net: tunnel: fix stale transport header after GRE/TEB decap
* [PATCH net-next v1 2/2] net: add DEBUG_NET_WARN_ON_ONCE for negative transport offset

and found the following issue:
WARNING in udpv6_err

Full report is available here:
https://ci.syzbot.org/series/3886f2f1-a6d5-4c5c-8dc8-bc1cec577567

***

WARNING in udpv6_err

tree:      net
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net.git
base:      1f5ffc672165ff851063a5fd044b727ab2517ae3
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/06cf41a2-60fe-4f9b-8f68-57eb6d1e48cc/config

------------[ cut here ]------------
off < 0
WARNING: ./include/linux/skbuff.h:3239 at udpv6_err+0x1521/0x16d0, CPU#0: kworker/u9:2/51
Modules linked in:

CPU: 0 UID: 0 PID: 51 Comm: kworker/u9:2 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: wg-kex-wg0 wg_packet_handshake_send_worker

RIP: 0010:udpv6_err+0x1521/0x16d0
Code: f7 b0 01 89 44 24 30 49 bf 00 00 00 00 00 fc ff df e9 ea ef ff ff e8 8e f2 66 f7 90 0f 0b 90 e9 2a f9 ff ff e8 80 f2 66 f7 90 <0f> 0b 90 e9 bb f9 ff ff e8 72 f2 66 f7 90 0f 0b 90 e9 1a fa ff ff
RSP: 0018:ffffc900000074e0 EFLAGS: 00010246

RAX: ffffffff8a5e6c30 RBX: 0000000080000000 RCX: ffff888106e91d80
RDX: 0000000000000100 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc90000007670 R08: ffff88810be30ae0 R09: 000000000000234e
R10: 000000000000003c R11: ffffffff8a5e5710 R12: ffff888116513390
R13: ffff888114c38900 R14: fffffffffffffff8 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff88818dc43000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555561513898 CR3: 000000016fad2000 CR4: 00000000000006f0
Call Trace:
 <IRQ>
 icmpv6_notify+0x407/0x850
 icmpv6_rcv+0x13b0/0x1d80
 ip6_protocol_deliver_rcu+0xe37/0x1610
 ip6_input_finish+0x191/0x370
 NF_HOOK+0x336/0x3c0
 ip6_input+0x16a/0x270
 NF_HOOK+0x336/0x3c0
 process_backlog+0x7dd/0x1950
 __napi_poll+0xae/0x340
 net_rx_action+0x627/0xf70
 handle_softirqs+0x22a/0x840
 do_softirq+0x76/0xd0
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130
 wg_socket_send_skb_to_peer+0x16b/0x1d0
 wg_packet_handshake_send_worker+0x203/0x350
 process_scheduled_works+0xb5d/0x1860


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH net 00/14] Netfilter/IPVS fixes for net
From: Pablo Neira Ayuso @ 2026-04-16  7:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260416013101.221555-1-pablo@netfilter.org>

Hi,

I am preparing a v2 to address so AI generated comment, I should be
ready in a few hours.

Thanks.

On Thu, Apr 16, 2026 at 03:30:47AM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> The following patchset contains Netfilter/IPVS fixes for net: Mostly
> addressing very old bugs in the SIP conntrack helper string parser,
> unsafe arp_tables match support with legacy IEEE1394, restrict xt_realm
> to IPv4 and incorrect use of RCU lists in nat core and nftables. This
> batch also includes one IPVS MTU fix. The exception is a fix for a
> recent issue related to broken double-tagged vlan in the flowtable.
> 
> 1) Fix possible stack recursion in nft_fwd_netdev from egress path,
>    from Weiming Shi.
> 
> 2) Fix unsafe port parser in SIP helper, from Jenny Guanni Qu.
> 
> 3) Fix arp_tables match with IEEE1394 ARP payload, allowing to
>    reach bytes off the skb boundary, from Weiming Shi.
> 
> 4) Reject unsafe nfnetlink_osf configurations from control plane,
>    this is addressing a possible division by zero, from Xiang Mei.
> 
> 5) nft_osf actually only supports IPv4, restrict it.
> 
> 6) Fix double-tagged-vlan support (again) in the flowtable, from
>    Eric Woudstra.
> 
> 7) Remove unsafe use of sprintf to fix possible buffer overflow
>    in the SIP NAT helper, from Florian Westphal.
> 
> 8) Restrict xt_mac, xt_owner and xt_physdev to inet families only;
>    xt_realm is only for ipv4, otherwise null-pointer-deref is possible.
> 
> 9) Use kfree_rcu() in nat core to release hooks, this can be an issue
>    once nfnetlink_hook gets support to dump NAT hook information,
>    not currently a real issue but better fix it now.
> 
> 10) Fix MTU checks in IPVS, from Yingnan Zhang.
> 
> 11) Use list_del_rcu() in chain and flowtable hook unregistration,
>     concurrent RCU reader could be walking over the hook list,
>     from Florian Westphal.
> 
> 12) Add list_splice_rcu(), this is required to fix unsafe
>     splice to RCU protected hook list. Reviewed by Paul McKenney.
> 
> 13) Use list_splice_rcu() to splice new chain and flowtable hooks.
> 
> 14) Add shim nft_trans_hook object to track chain and flowtable
>     hook deletions and flag them as removed, instead of unsafely
>     moving around hooks in the RCU-protected hook list. This allows
>     to restore the previous state from the abort path.
> 
> Please, pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-04-16
> 
> Thanks.
> 
> ----------------------------------------------------------------
> 
> The following changes since commit 2dddb34dd0d07b01fa770eca89480a4da4f13153:
> 
>   net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers (2026-04-12 15:22:58 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-26-04-16
> 
> for you to fetch changes up to e349f90da812aeddd22c3914a2cc639b51e4eb48:
> 
>   netfilter: nf_tables: add hook transactions for device deletions (2026-04-16 02:47:58 +0200)
> 
> ----------------------------------------------------------------
> netfilter pull request 26-04-16
> 
> ----------------------------------------------------------------
> Eric Woudstra (1):
>       netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
> 
> Florian Westphal (2):
>       netfilter: conntrack: remove sprintf usage
>       netfilter: nf_tables: use list_del_rcu for netlink hooks
> 
> Jenny Guanni Qu (1):
>       netfilter: nf_conntrack_sip: add bounds-checked port parsing helper
> 
> Pablo Neira Ayuso (6):
>       netfilter: nft_osf: restrict it to ipv4
>       netfilter: xtables: restrict several matches to inet family
>       netfilter: nat: use kfree_rcu to release ops
>       rculist: add list_splice_rcu() for private lists
>       netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase
>       netfilter: nf_tables: add hook transactions for device deletions
> 
> Weiming Shi (2):
>       netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
>       netfilter: arp_tables: fix IEEE1394 ARP payload parsing in arp_packet_match()
> 
> Xiang Mei (1):
>       netfilter: nfnetlink_osf: fix divide-by-zero in OSF_WSS_MODULO
> 
> Yingnan Zhang (1):
>       ipvs: fix MTU check for GSO packets in tunnel mode
> 
>  include/linux/rculist.h               |  29 ++++++
>  include/net/netfilter/nf_dup_netdev.h |  13 +++
>  include/net/netfilter/nf_tables.h     |  13 +++
>  net/ipv4/netfilter/arp_tables.c       |  14 ++-
>  net/ipv4/netfilter/iptable_nat.c      |   2 +-
>  net/ipv6/netfilter/ip6table_nat.c     |   2 +-
>  net/netfilter/ipvs/ip_vs_xmit.c       |  19 +++-
>  net/netfilter/nf_conntrack_sip.c      |  80 +++++++++++-----
>  net/netfilter/nf_dup_netdev.c         |  16 ----
>  net/netfilter/nf_flow_table_ip.c      |  25 ++++-
>  net/netfilter/nf_nat_amanda.c         |   2 +-
>  net/netfilter/nf_nat_core.c           |  10 +-
>  net/netfilter/nf_nat_sip.c            |  33 ++++---
>  net/netfilter/nf_tables_api.c         | 168 ++++++++++++++++++++++++----------
>  net/netfilter/nfnetlink_osf.c         |   4 +
>  net/netfilter/nft_fwd_netdev.c        |   7 ++
>  net/netfilter/nft_osf.c               |   6 +-
>  net/netfilter/xt_mac.c                |  34 ++++---
>  net/netfilter/xt_owner.c              |  37 +++++---
>  net/netfilter/xt_physdev.c            |  29 ++++--
>  net/netfilter/xt_realm.c              |   2 +-
>  21 files changed, 393 insertions(+), 152 deletions(-)
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox