Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: meth: check skb allocation in meth_init_rx_ring()
From: Andrew Lunn @ 2026-06-22  8:01 UTC (permalink / raw)
  To: Pavan Chebbi
  Cc: Haoxiang Li, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, stable
In-Reply-To: <CALs4sv2dr2QsFU_DUDNAMgr4MDxHcRrHqer+Kdm7dP+4TUT0eg@mail.gmail.com>

On Mon, Jun 22, 2026 at 11:27:41AM +0530, Pavan Chebbi wrote:
> On Mon, Jun 22, 2026 at 10:20 AM Haoxiang Li <haoxiang_li2024@163.com> wrote:
> >
> > meth_init_rx_ring() does not check the return value of alloc_skb().
> > If the allocation fails, the NULL skb is passed to skb_reserve() and
> > then dereferenced through skb->head.
> >
> > Add check for alloc_skb() to prevent potential null pointer dereference.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
> > ---
> >  drivers/net/ethernet/sgi/meth.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/sgi/meth.c b/drivers/net/ethernet/sgi/meth.c
> > index f7c3a5a766b7..ceff3cc937ad 100644
> > --- a/drivers/net/ethernet/sgi/meth.c
> > +++ b/drivers/net/ethernet/sgi/meth.c
> > @@ -228,6 +228,9 @@ static int meth_init_rx_ring(struct meth_private *priv)
> >
> >         for (i = 0; i < RX_RING_ENTRIES; i++) {
> >                 priv->rx_skbs[i] = alloc_skb(METH_RX_BUFF_SIZE, 0);
> > +               if (!priv->rx_skbs[i])
> > +                       return -ENOMEM;
> > +
> 
> I think the fix is not complete. The caller meth_open() will not free
> any successfully allocated skbs if the function ever returns -ENOMEM.

There is also the question, does anybody care? Are SGI machines still
used? This is a Fast Ethernet driver, written in 2003. It has no
Maintainer. Maybe it would be better to just remove the driver?

At least drop the Fixes: tag, it does not fit the Stable rules.

https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

    Andrew

---
pw-bot: cr

^ permalink raw reply

* [PATCH 5/7] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Eric Dumazet <edumazet@google.com>

KCSAN reported a data race involving net->xfrm.policy_count access.

Add missing READ_ONCE()/WRITE_ONCE() annotations on
xfrm_policy_count and xfrm_policy_default.

Fixes: 2518c7c2b3d7 ("[XFRM]: Hash policies when non-prefixed.")
Reported-by: syzbot+d85ba1c732720b9a4097@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2b9e96.99669fcc.12a77b.0006.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h     |  8 ++++----
 net/xfrm/xfrm_policy.c | 24 ++++++++++++------------
 net/xfrm/xfrm_user.c   | 18 +++++++++---------
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 874409127e29..35a743129329 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1250,8 +1250,8 @@ int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb,
 static inline bool __xfrm_check_nopolicy(struct net *net, struct sk_buff *skb,
 					 int dir)
 {
-	if (!net->xfrm.policy_count[dir] && !secpath_exists(skb))
-		return net->xfrm.policy_default[dir] == XFRM_USERPOLICY_ACCEPT;
+	if (!READ_ONCE(net->xfrm.policy_count[dir]) && !secpath_exists(skb))
+		return READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_ACCEPT;
 
 	return false;
 }
@@ -1351,8 +1351,8 @@ static inline int xfrm_route_forward(struct sk_buff *skb, unsigned short family)
 {
 	struct net *net = dev_net(skb->dev);
 
-	if (!net->xfrm.policy_count[XFRM_POLICY_OUT] &&
-	    net->xfrm.policy_default[XFRM_POLICY_OUT] == XFRM_USERPOLICY_ACCEPT)
+	if (!READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]) &&
+	    READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]) == XFRM_USERPOLICY_ACCEPT)
 		return true;
 
 	return (skb_dst(skb)->flags & DST_NOXFRM) ||
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 959544425692..1f4afd580105 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -685,7 +685,7 @@ static void xfrm_byidx_resize(struct net *net)
 
 static inline int xfrm_bydst_should_resize(struct net *net, int dir, int *total)
 {
-	unsigned int cnt = net->xfrm.policy_count[dir];
+	unsigned int cnt = READ_ONCE(net->xfrm.policy_count[dir]);
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
 
 	if (total)
@@ -711,12 +711,12 @@ static inline int xfrm_byidx_should_resize(struct net *net, int total)
 
 void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si)
 {
-	si->incnt = net->xfrm.policy_count[XFRM_POLICY_IN];
-	si->outcnt = net->xfrm.policy_count[XFRM_POLICY_OUT];
-	si->fwdcnt = net->xfrm.policy_count[XFRM_POLICY_FWD];
-	si->inscnt = net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX];
-	si->outscnt = net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX];
-	si->fwdscnt = net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX];
+	si->incnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN]);
+	si->outcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]);
+	si->fwdcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD]);
+	si->inscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX]);
+	si->outscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX]);
+	si->fwdscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX]);
 	si->spdhcnt = net->xfrm.policy_idx_hmask;
 	si->spdhmcnt = xfrm_policy_hashmax;
 }
@@ -2318,7 +2318,7 @@ static void __xfrm_policy_link(struct xfrm_policy *pol, int dir)
 	}
 
 	list_add(&pol->walk.all, &net->xfrm.policy_all);
-	net->xfrm.policy_count[dir]++;
+	WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] + 1);
 	xfrm_pol_hold(pol);
 }
 
@@ -2337,7 +2337,7 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
 	}
 
 	list_del_init(&pol->walk.all);
-	net->xfrm.policy_count[dir]--;
+	WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] - 1);
 
 	return pol;
 }
@@ -3222,7 +3222,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
 
 		/* To accelerate a bit...  */
 		if (!if_id && ((dst_orig->flags & DST_NOXFRM) ||
-			       !net->xfrm.policy_count[XFRM_POLICY_OUT]))
+			       !READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT])))
 			goto nopol;
 
 		xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo, if_id);
@@ -3296,7 +3296,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
 
 nopol:
 	if ((!dst_orig->dev || !(dst_orig->dev->flags & IFF_LOOPBACK)) &&
-	    net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
+	    READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) {
 		err = -EPERM;
 		goto error;
 	}
@@ -3750,7 +3750,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
 		const bool is_crypto_offload = sp &&
 			(xfrm_input_state(skb)->xso.type == XFRM_DEV_OFFLOAD_CRYPTO);
 
-		if (net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
+		if (READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOPOLS);
 			return 0;
 		}
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3b1cf29bc402..61eb5de33b87 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2485,9 +2485,9 @@ static int xfrm_notify_userpolicy(struct net *net)
 	}
 
 	up = nlmsg_data(nlh);
-	up->in = net->xfrm.policy_default[XFRM_POLICY_IN];
-	up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD];
-	up->out = net->xfrm.policy_default[XFRM_POLICY_OUT];
+	up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]);
+	up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]);
+	up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]);
 
 	nlmsg_end(skb, nlh);
 
@@ -2511,13 +2511,13 @@ static int xfrm_set_default(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct xfrm_userpolicy_default *up = nlmsg_data(nlh);
 
 	if (xfrm_userpolicy_is_valid(up->in))
-		net->xfrm.policy_default[XFRM_POLICY_IN] = up->in;
+		WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN], up->in);
 
 	if (xfrm_userpolicy_is_valid(up->fwd))
-		net->xfrm.policy_default[XFRM_POLICY_FWD] = up->fwd;
+		WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD], up->fwd);
 
 	if (xfrm_userpolicy_is_valid(up->out))
-		net->xfrm.policy_default[XFRM_POLICY_OUT] = up->out;
+		WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT], up->out);
 
 	rt_genid_bump_all(net);
 
@@ -2547,9 +2547,9 @@ static int xfrm_get_default(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	r_up = nlmsg_data(r_nlh);
-	r_up->in = net->xfrm.policy_default[XFRM_POLICY_IN];
-	r_up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD];
-	r_up->out = net->xfrm.policy_default[XFRM_POLICY_OUT];
+	r_up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]);
+	r_up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]);
+	r_up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]);
 	nlmsg_end(r_skb, r_nlh);
 
 	return nlmsg_unicast(xfrm_net_nlsk(net, skb), r_skb, portid);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Sanman Pradhan <psanman@juniper.net>

The XFRM compat layer (CONFIG_XFRM_USER_COMPAT) translates 32-bit xfrm
netlink and setsockopt messages into the native 64-bit layout. It is
only needed on architectures where the 32-bit and 64-bit ABIs disagree
on u64 alignment, which the kernel encodes as COMPAT_FOR_U64_ALIGNMENT.

That symbol is defined only by arch/x86. XFRM_USER_COMPAT depends on it,
so the translator can never be built on any other architecture,
including arm64, which still provides a 32-bit compat ABI (CONFIG_COMPAT)
for AArch32 EL0 userspace. On arm64 the AArch32 EABI already aligns u64
to 8 bytes, identical to the AArch64 ABI, so no translation is required
and the native code path is correct for 32-bit tasks.

However, xfrm_user_rcv_msg() and xfrm_user_policy() gate on
in_compat_syscall() alone and then call xfrm_get_translator(), which
returns NULL when no translator is registered. On arm64 that is always
the case, so every xfrm netlink message and the XFRM_POLICY setsockopt
issued by a 32-bit task returns -EOPNOTSUPP. A 32-bit userspace process
on arm64 (and on any other arch with CONFIG_COMPAT but without
COMPAT_FOR_U64_ALIGNMENT) therefore cannot configure XFRM state or
policy through the XFRM_USER netlink API, and cannot use the XFRM_POLICY
setsockopt path, because both fail before reaching the native parser.

The translator series replaced the blanket compat rejection with a
translator lookup. That made the path usable on x86 when the translator
is available, but left architectures that cannot build the translator
permanently rejected even when their compat layout already matches the
native layout. Let those architectures use the native parser instead.

Gate the translator requirement on COMPAT_FOR_U64_ALIGNMENT instead of
on in_compat_syscall() alone. Gating on the ABI property rather than on
CONFIG_XFRM_USER_COMPAT is deliberate: on x86 with IA32_EMULATION=y but
XFRM_USER_COMPAT=n, a 32-bit task must still be rejected rather than
routed through the native parser, which would misread genuinely
4-byte-aligned x86-32 messages. COMPAT_FOR_U64_ALIGNMENT is the ABI
property that makes the XFRM translator mandatory.

Only the receive/input direction needs the guard. The send, dump and
notification paths already call the translator as "if (xtr) { ... }"
with no error on NULL, so on arches without a translator they no-op and
the kernel emits native 64-bit-layout messages, which is what an AArch32
task expects.

Tested on Juniper SRX hardware: with the fix, 32-bit IPsec userspace
netlink and XFRM_POLICY setsockopt operations that previously failed
with -EOPNOTSUPP now succeed; x86 behaviour is unchanged by inspection.

Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
Cc: stable@vger.kernel.org
Signed-off-by: Sanman Pradhan <psanman@juniper.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c | 2 +-
 net/xfrm/xfrm_user.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 589c3b6e4679..d8457ceaf28c 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2976,7 +2976,7 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen)
 	if (IS_ERR(data))
 		return PTR_ERR(data);

-	if (in_compat_syscall()) {
+	if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) {
 		struct xfrm_translator *xtr = xfrm_get_translator();

 		if (!xtr) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 71a4b7278eba..3b1cf29bc402 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3472,7 +3472,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (!netlink_net_capable(skb, CAP_NET_ADMIN))
 		return -EPERM;

-	if (in_compat_syscall()) {
+	if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) {
 		struct xfrm_translator *xtr = xfrm_get_translator();

 		if (!xtr)
-- 
2.43.0

^ permalink raw reply related

* [PATCH 3/7] xfrm: Fix dev use-after-free in xfrm async resumption
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Dong Chenchen <dongchenchen2@huawei.com>

xfrm async resumption hold skb->dev refcnt until after transport_finish.
However, xfrm_rcv_cb may modify skb->dev to tunnel dev without taking
device reference, such as vti_rcv_cb. The subsequent async resumption
will decrement the tunnel device's reference count, which lead to uaf
of tunnel dev and refcnt leak of orig dev as below:

unregister_netdevice: waiting for vti1 to become free. Usage count = -2

Stash the original skb->dev to fix refcnt imbalance. The new skb->dev set
by xfrm_rcv_cb can race with device teardown. Extend rcu protection over
xfrm_rcv_cb and transport_finish to prevent races.

Fixes: 1c428b038400 ("xfrm: hold dev ref until after transport_finish NF_HOOK")
Reported-by: Xu Chunxiao <xuchunxiao3@huawei.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/xfrm4_input.c |  2 --
 net/ipv6/xfrm6_input.c |  2 --
 net/xfrm/xfrm_input.c  | 29 ++++++++++++++++-------------
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index c2eac844bcdb..f6f2a8ef3f88 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -76,8 +76,6 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
 	NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
 		dev_net(dev), NULL, skb, dev, NULL,
 		xfrm4_rcv_encap_finish);
-	if (async)
-		dev_put(dev);
 	return 0;
 }
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 699a001ac166..89d0443b5307 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -71,8 +71,6 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
 		dev_net(dev), NULL, skb, dev, NULL,
 		xfrm6_transport_finish2);
-	if (async)
-		dev_put(dev);
 	return 0;
 }
 
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index e4c2cd24936d..eecab337bd0a 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -467,6 +467,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 {
 	const struct xfrm_state_afinfo *afinfo;
 	struct net *net = dev_net(skb->dev);
+	struct net_device *dev = skb->dev;
 	int err;
 	__be32 seq;
 	__be32 seq_hi;
@@ -493,7 +494,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 					       LINUX_MIB_XFRMINSTATEINVALID);
 
 			if (encap_type == -1)
-				dev_put(skb->dev);
+				dev_put(dev);
 			goto drop;
 		}
 
@@ -655,16 +656,16 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 		if (!crypto_done) {
 			spin_unlock(&x->lock);
-			dev_hold(skb->dev);
+			dev_hold(dev);
 
 			nexthdr = x->type->input(x, skb);
 			if (nexthdr == -EINPROGRESS) {
 				if (async)
-					dev_put(skb->dev);
+					dev_put(dev);
 				return 0;
 			}
 
-			dev_put(skb->dev);
+			dev_put(dev);
 			spin_lock(&x->lock);
 		}
 resume:
@@ -699,7 +700,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		err = xfrm_inner_mode_input(x, skb);
 		if (err == -EINPROGRESS) {
 			if (async)
-				dev_put(skb->dev);
+				dev_put(dev);
 			return 0;
 		} else if (err) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
@@ -726,9 +727,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		crypto_done = false;
 	} while (!err);
 
+	rcu_read_lock();
 	err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
-	if (err)
+	if (err) {
+		rcu_read_unlock();
 		goto drop;
+	}
 
 	nf_reset_ct(skb);
 
@@ -739,8 +743,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		if (skb_valid_dst(skb))
 			skb_dst_drop(skb);
 		if (async)
-			dev_put(skb->dev);
+			dev_put(dev);
 		gro_cells_receive(&gro_cells, skb);
+		rcu_read_unlock();
 		return 0;
 	} else {
 		xo = xfrm_offload(skb);
@@ -748,23 +753,21 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			xfrm_gro = xo->flags & XFRM_GRO;
 
 		err = -EAFNOSUPPORT;
-		rcu_read_lock();
 		afinfo = xfrm_state_afinfo_get_rcu(x->props.family);
 		if (likely(afinfo))
 			err = afinfo->transport_finish(skb, xfrm_gro || async);
-		rcu_read_unlock();
 		if (xfrm_gro) {
 			sp = skb_sec_path(skb);
 			if (sp)
 				sp->olen = 0;
 			if (skb_valid_dst(skb))
 				skb_dst_drop(skb);
-			if (async)
-				dev_put(skb->dev);
 			gro_cells_receive(&gro_cells, skb);
-			return err;
 		}
 
+		if (async)
+			dev_put(dev);
+		rcu_read_unlock();
 		return err;
 	}
 
@@ -772,7 +775,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 	spin_unlock(&x->lock);
 drop:
 	if (async)
-		dev_put(skb->dev);
+		dev_put(dev);
 	xfrm_rcv_cb(skb, family, x && x->type ? x->type->proto : nexthdr, -1);
 	kfree_skb(skb);
 	return 0;
-- 
2.43.0


^ permalink raw reply related

* [PATCH 7/7] xfrm: validate selector family and prefixlen during match
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Eric Dumazet <edumazet@google.com>

syzbot reported a shift-out-of-bounds in xfrm_selector_match()
due to AF_UNSPEC selector with large prefixlen (e.g. 128) matched
against IPv4 flow (when XFRM_STATE_AF_UNSPEC is set).

Fix this by:

- Rejecting mismatched families in xfrm_selector_match.
- Returning false in addr4_match if prefixlen > 32.
- Returning false in addr_match if prefixlen > 128 (prevents overflow).

Fixes: 3f0ab59e6537 ("xfrm: validate new SA's prefixlen using SA family when sel.family is unset")
Reported-by: syzbot+9383b1ff0df4b29ca5e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2fbe35.be3f099c.2836ae.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h     | 7 +++++++
 net/xfrm/xfrm_policy.c | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 35a743129329..f8c909b0f0c3 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -943,6 +943,9 @@ static inline bool addr_match(const void *token1, const void *token2,
 	unsigned int pdw;
 	unsigned int pbi;
 
+	if (prefixlen > 128)
+		return false;
+
 	pdw = prefixlen >> 5;	  /* num of whole u32 in prefix */
 	pbi = prefixlen &  0x1f;  /* num of bits in incomplete u32 in prefix */
 
@@ -967,6 +970,10 @@ static inline bool addr4_match(__be32 a1, __be32 a2, u8 prefixlen)
 	/* C99 6.5.7 (3): u32 << 32 is undefined behaviour */
 	if (sizeof(long) == 4 && prefixlen == 0)
 		return true;
+
+	if (prefixlen > 32)
+		return false;
+
 	return !((a1 ^ a2) & htonl(~0UL << (32 - prefixlen)));
 }
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 1f4afd580105..639934f30016 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -242,6 +242,9 @@ __xfrm6_selector_match(const struct xfrm_selector *sel, const struct flowi *fl)
 bool xfrm_selector_match(const struct xfrm_selector *sel, const struct flowi *fl,
 			 unsigned short family)
 {
+	if (family != sel->family && sel->family != AF_UNSPEC)
+		return false;
+
 	switch (family) {
 	case AF_INET:
 		return __xfrm4_selector_match(sel, fl);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 4/7] xfrm: Fix xfrm state cache insertion race
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Herbert Xu <herbert@gondor.apana.org.au>

The xfrm input state cache insertion code checks the validity of
the state before acquiring the global xfrm_state_lock.  Thus it's
possible for someone else to kill the state after it passed the
validity check, and then the insertion will add the dead state
to the cache.

Fix this by moving the validity check inside the lock.

This entire function is called on the input path, where BH must
be off (e.g., the caller of this function xfrm_input acquires
its spinlocks without disabling BH).

So there is no need to disable BH here or take the RCU read lock.
Remove both and replace them with an assertion that trips if BH
is accidentally enabled on some future calling path.

Fixes: 81a331a0e72d ("xfrm: Add an inbound percpu state cache.")
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d8457ceaf28c..9e87f7028201 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1207,9 +1207,11 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
 	struct hlist_head *state_cache_input;
 	struct xfrm_state *x = NULL;
 
+	/* BH is always disabled on the input path. */
+	lockdep_assert_in_softirq();
+
 	state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
 
-	rcu_read_lock();
 	hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
 		if (x->props.family != family ||
 		    x->id.spi       != spi ||
@@ -1227,20 +1229,25 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
 	xfrm_hash_ptrs_get(net, &state_ptrs);
 
 	x = __xfrm_state_lookup(&state_ptrs, mark, daddr, spi, proto, family);
-
-	if (x && x->km.state == XFRM_STATE_VALID) {
-		spin_lock_bh(&net->xfrm.xfrm_state_lock);
-		if (hlist_unhashed(&x->state_cache_input)) {
+	if (x) {
+		spin_lock(&net->xfrm.xfrm_state_lock);
+		if (x->km.state != XFRM_STATE_VALID) {
+			/*
+			 * The state is about to be destroyed.
+			 *
+			 * Don't add it to the cache but still
+			 * return it to the caller.
+			 */
+		} else if (hlist_unhashed(&x->state_cache_input)) {
 			hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
 		} else {
 			hlist_del_rcu(&x->state_cache_input);
 			hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
 		}
-		spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+		spin_unlock(&net->xfrm.xfrm_state_lock);
 	}
 
 out:
-	rcu_read_unlock();
 	return x;
 }
 EXPORT_SYMBOL(xfrm_input_state_lookup);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 2/7] net: af_key: initialize alg_key_len for IPComp states
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Zijing Yin <yzjaurora@gmail.com>

pfkey_msg2xfrm_state() handles the IPComp (SADB_X_SATYPE_IPCOMP) case by
allocating x->calg and copying only the algorithm name:

	x->calg = kmalloc_obj(*x->calg);
	if (!x->calg) {
		err = -ENOMEM;
		goto out;
	}
	strcpy(x->calg->alg_name, a->name);
	x->props.calgo = sa->sadb_sa_encrypt;

Unlike the authentication (x->aalg) and encryption (x->ealg) branches of
the same function, the compression branch never initializes
calg->alg_key_len.  IPComp carries no key and the allocation only
reserves sizeof(struct xfrm_algo) (i.e. no room for a key), so the field
is left containing uninitialized slab data.

calg->alg_key_len is later used as a length by xfrm_algo_clone() when an
IPComp state is cloned during XFRM_MSG_MIGRATE:

	xfrm_state_migrate()
	  xfrm_state_clone_and_setup()
	    x->calg = xfrm_algo_clone(orig->calg);
	      kmemdup(orig, xfrm_alg_len(orig));

where xfrm_alg_len() returns sizeof(*alg) + (alg_key_len + 7) / 8.  With
a non-zero garbage alg_key_len, kmemdup() reads past the end of the
68-byte calg object.  Adding an IPComp SA via PF_KEY and then migrating
it triggers (net-next, KASAN, init_on_alloc=0):

  BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x44/0x60
  Read of size 4164 at addr ff11000025a74980 by task diag2/9287
  CPU: 3 UID: 0 PID: 9287 Comm: diag2 7.1.0-rc6-g903db046d557 #1
  Call Trace:
   <TASK>
   dump_stack_lvl+0x10e/0x1f0
   print_report+0xf7/0x600
   kasan_report+0xe4/0x120
   kasan_check_range+0x105/0x1b0
   __asan_memcpy+0x23/0x60
   kmemdup_noprof+0x44/0x60
   xfrm_state_migrate+0x70a/0x1da0
   xfrm_migrate+0x753/0x18a0
   xfrm_do_migrate+0xb47/0xf10
   xfrm_user_rcv_msg+0x411/0xb50
   netlink_rcv_skb+0x158/0x420
   xfrm_netlink_rcv+0x71/0x90
   netlink_unicast+0x584/0x850
   netlink_sendmsg+0x8b0/0xdc0
   ____sys_sendmsg+0x9f7/0xb90
   ___sys_sendmsg+0x134/0x1d0
   __sys_sendmsg+0x16d/0x220
   do_syscall_64+0x116/0x7d0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   </TASK>

  Allocated by task 9287:
   kasan_save_stack+0x33/0x60
   kasan_save_track+0x14/0x30
   __kasan_kmalloc+0xaa/0xb0
   pfkey_add+0x2652/0x2ea0
   pfkey_process+0x6d0/0x830
   pfkey_sendmsg+0x42c/0x850
   __sys_sendto+0x461/0x4b0
   __x64_sys_sendto+0xe0/0x1c0
   do_syscall_64+0x116/0x7d0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  The buggy address belongs to the object at ff11000025a74980
   which belongs to the cache kmalloc-96 of size 96
  The buggy address is located 0 bytes inside of
   allocated 68-byte region [ff11000025a74980, ff11000025a749c4)

Depending on the uninitialized value the same field can instead request
an oversized kmemdup() allocation and make the migration clone fail.

The XFRM netlink path is not affected: verify_one_alg() rejects an
XFRMA_ALG_COMP attribute shorter than xfrm_alg_len(), so a calg added via
XFRM_MSG_NEWSA is always self-consistent.

Initialize calg->alg_key_len to 0, matching the aalg/ealg branches.

Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: stable@vger.kernel.org
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/key/af_key.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9cffeef18cd9..3216f897a305 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1218,6 +1218,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 				goto out;
 			}
 			strcpy(x->calg->alg_name, a->name);
+			x->calg->alg_key_len = 0;
 			x->props.calgo = sa->sadb_sa_encrypt;
 		} else {
 			int keysize = 0;
-- 
2.43.0


^ permalink raw reply related

* [PATCH 6/7] espintcp: use sk_msg_free_partial to fix partial send
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20260622075726.29685-1-steffen.klassert@secunet.com>

From: Sabrina Dubroca <sd@queasysnail.net>

sk_msg_free_partial() ensures consistency of the skmsg at every
iteration, without having to manually handle uncharges and offsets.
This simplifies the code, and fixes some bugs in skmsg accounting when
we don't send the full contents.

Cc: stable@vger.kernel.org
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Reported-by: Aaron Esau <aaron1esau@gmail.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/espintcp.c | 34 +++++++---------------------------
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index d9035546375e..374e1b964438 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -212,43 +212,23 @@ static int espintcp_sendskmsg_locked(struct sock *sk,
 	struct sk_msg *skmsg = &emsg->skmsg;
 	bool more = flags & MSG_MORE;
 	struct scatterlist *sg;
-	int done = 0;
 	int ret;
 
-	sg = &skmsg->sg.data[skmsg->sg.start];
 	do {
 		struct bio_vec bvec;
-		size_t size = sg->length - emsg->offset;
-		int offset = sg->offset + emsg->offset;
-		struct page *p;
-
-		emsg->offset = 0;
 
+		sg = &skmsg->sg.data[skmsg->sg.start];
 		if (sg_is_last(sg) && !more)
 			msghdr.msg_flags &= ~MSG_MORE;
 
-		p = sg_page(sg);
-retry:
-		bvec_set_page(&bvec, p, size, offset);
-		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
-		ret = tcp_sendmsg_locked(sk, &msghdr, size);
-		if (ret < 0) {
-			emsg->offset = offset - sg->offset;
-			skmsg->sg.start += done;
+		bvec_set_page(&bvec, sg_page(sg), sg->length, sg->offset);
+		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, sg->length);
+		ret = tcp_sendmsg_locked(sk, &msghdr, sg->length);
+		if (ret < 0)
 			return ret;
-		}
-
-		if (ret != size) {
-			offset += ret;
-			size -= ret;
-			goto retry;
-		}
 
-		done++;
-		put_page(p);
-		sk_mem_uncharge(sk, sg->length);
-		sg = sg_next(sg);
-	} while (sg);
+		sk_msg_free_partial(sk, skmsg, ret);
+	} while (skmsg->sg.size);
 
 	memset(emsg, 0, sizeof(*emsg));
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 0/7] pull request (net): ipsec 2026-06-22
From: Steffen Klassert @ 2026-06-22  7:57 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev

1) xfrm: use compat translator only for u64 alignment mismatch
   Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT
   so 32-bit compat tasks on arches whose 32-bit ABI already matches
   the native 64-bit layout are no longer rejected with -EOPNOTSUPP.
   From Sanman Pradhan.

2) net: af_key: initialize alg_key_len for IPComp states
   Initialize the alg_key_len to 0 in the IPComp branch of
   pfkey_msg2xfrm_state() so an uninitialized value cannot drive
   xfrm_alg_len() into a slab-out-of-bounds kmemdup during
   XFRM_MSG_MIGRATE. From Zijing Yin.

3) xfrm: Fix dev use-after-free in xfrm async resumption
   Stash the original skb->dev and extend the RCU critical section
   across xfrm_rcv_cb() and transport_finish() to prevent a
   tunnel-device UAF and original-device refcount leak when a
   callback replaces skb->dev. From Dong Chenchen.

4) xfrm: Fix xfrm state cache insertion race
   Move the state-validity check inside xfrm_state_lock in the
   input state cache insertion path so a state cannot be killed
   between the check and the insert. From Herbert Xu.

5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
   Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count
   and xfrm_policy_default to silence the KCSAN data race reported
   on net->xfrm.policy_count. From Eric Dumazet.

6) espintcp: use sk_msg_free_partial to fix partial send
   Replace the manual skmsg accounting in espintcp with
   sk_msg_free_partial() so the skmsg stays consistent on every
   iteration and the partial-send accounting bugs go away.
   From Sabrina Dubroca.

7) xfrm: validate selector family and prefixlen during match
   Reject mismatched address families in xfrm_selector_match() and
   bound prefixlen in addr4_match()/addr_match() to prevent the
   shift-out-of-bounds syzbot reported when an AF_UNSPEC selector
   with a large prefixlen is matched against an IPv4 flow.
   From Eric Dumazet.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 9bf10032894f429b3e221de63cf95a8544511a90:

  Merge branch 'tipc-fix-netlink-gate-and-receive-path-bugs' (2026-06-11 16:01:19 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git tags/ipsec-2026-06-22

for you to fetch changes up to 40f0b1047918539f0b0f795ac65e35336b4c2c78:

  xfrm: validate selector family and prefixlen during match (2026-06-17 11:17:27 +0200)

----------------------------------------------------------------
ipsec-2026-06-22

----------------------------------------------------------------
Dong Chenchen (1):
      xfrm: Fix dev use-after-free in xfrm async resumption

Eric Dumazet (2):
      xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
      xfrm: validate selector family and prefixlen during match

Herbert Xu (1):
      xfrm: Fix xfrm state cache insertion race

Sabrina Dubroca (1):
      espintcp: use sk_msg_free_partial to fix partial send

Sanman Pradhan (1):
      xfrm: use compat translator only for u64 alignment mismatch

Zijing Yin (1):
      net: af_key: initialize alg_key_len for IPComp states

 include/net/xfrm.h     | 15 +++++++++++----
 net/ipv4/xfrm4_input.c |  2 --
 net/ipv6/xfrm6_input.c |  2 --
 net/key/af_key.c       |  1 +
 net/xfrm/espintcp.c    | 34 +++++++---------------------------
 net/xfrm/xfrm_input.c  | 29 ++++++++++++++++-------------
 net/xfrm/xfrm_policy.c | 27 +++++++++++++++------------
 net/xfrm/xfrm_state.c  | 23 +++++++++++++++--------
 net/xfrm/xfrm_user.c   | 20 ++++++++++----------
 9 files changed, 75 insertions(+), 78 deletions(-)

^ permalink raw reply

* Re: Re: [PATCH net-next v8 3/6] net: stmmac: eic7700: make RGMII delay properties optional
From: Andrew Lunn @ 2026-06-22  7:52 UTC (permalink / raw)
  To: 李志
  Cc: Maxime Chevallier, devicetree, andrew+netdev, davem, edumazet,
	kuba, robh, krzk+dt, conor+dt, netdev, pabeni, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, pjw, palmer, aou, alex, linux-riscv,
	linux-stm32, linux-arm-kernel, linux-kernel, ningyu, linmin,
	pinkesh.vaghela, pritesh.patel, weishangjuan, horms, lee
In-Reply-To: <512b77d5.993b.19eed207fc9.Coremail.lizhi2@eswincomputing.com>

> I'm preparing a v9 of the series. The next revision will address the
> issues reported by Sashiko review, mainly DT binding schema and DTS
> warnings.
> 
> Before I post v9, I'd like to check whether you have any concerns or
> suggestions regarding the driver changes.

From what i remember, i think the patch was O.K, but i've looked at
100s of other patches since then. The commit message sounds like the
basic design is correct.

     Andrew

^ permalink raw reply

* Re: "ip help" output is an error
From: David Laight @ 2026-06-22  7:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Dmitri Seletski, netdev
In-Reply-To: <20260621082105.1196ef72@phoenix.local>

On Sun, 21 Jun 2026 08:21:05 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Sat, 20 Jun 2026 10:36:31 +0100
> Dmitri Seletski <drjoms@gmail.com> wrote:
> 
> > Hello iproute2 maintainers,
> > 
> > I am reporting an inconsistency regarding the exit status of the ip help 
> > command.
> > 
> > Current Behavior:
> > When running ip help, the command prints the help documentation to 
> > stdout, but exits with a non-zero status (error). This causes issues in 
> > shell scripts that rely on exit codes for control flow.
> > 
> > Steps to reproduce:
> > bash
> > 
> > # This returns "FAIL" because the exit code is non-zero
> > if ip help > /dev/null; then
> >      echo "SUCCESS"
> > else
> >      echo "FAIL"
> > fi
> > 
> > Expected Behavior:
> > Since the command successfully performs the requested task (displaying 
> > help information) and does not encounter a system error, it should 
> > return an exit code of 0.
> > 
> > Context:
> > This behavior breaks standard Bash logic for automation. For example:
> > ip help && echo "This will not execute"
> > 
> > "ip help |grep br" - this will bring no result.
> > 
> > Current version tested: iproute2-6.19.0
> > 
> > Thank you for your time and for maintaining this tool.
> > 
> > Regards,
> > Dmitri Seletski
> > 
> >   
> 
> Yes iproute2 doesn't do a great job of handling error codes
> with usage vs help. Its a bug and no one has bothered to fix it.
> 

The version I've got does write(2, "Usage...", 972); exit(-1);
Changing it to do write(1, ...) is likely to break scripts, and making
it do exit(0) is likely cause new scripts to fail on old systems.

The 'grep' works fine if you redirect stderr to stdout. 

The exit(-1) is a bug; the parameter is only 8 bits and the high bit
is expected to be used to indicate abnormal termination (eg by a signal).
That should probably be changed to exit(1), there doesn't seem to be
a standard way to differentiate between command line errors and
operational ones.

	David


^ permalink raw reply

* [PATCH v4] net: mvneta: re-enable percpu interrupt on resume
From: Yun Zhou @ 2026-06-22  7:43 UTC (permalink / raw)
  To: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	bigeasy, clrkwllms, rostedt
  Cc: netdev, linux-kernel, linux-rt-devel, yun.zhou

On Marvell MPIC platforms (Armada 370/XP/38x), mvneta uses a percpu
IRQ disable/enable scheme for NAPI: the ISR (mvneta_percpu_isr) calls
disable_percpu_irq() to mask the MPIC per-CPU interrupt and schedules
NAPI poll, which calls enable_percpu_irq() on completion to unmask.

If suspend occurs while NAPI poll is pending (between
disable_percpu_irq in the ISR and enable_percpu_irq in poll
completion), the interrupt is never re-enabled:

  1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
     => MPIC masked, percpu_enabled cpumask bit cleared
  2. NAPI poll does not complete before suspend proceeds
     (on PREEMPT_RT this is highly likely since softirqs run in
     ksoftirqd which gets frozen; on non-RT it can happen when
     softirq processing is deferred to ksoftirqd)
  3. mvneta_stop_dev => napi_disable(): cancels the pending poll
     without executing the completion path
  4. suspend_device_irqs => IRQCHIP_MASK_ON_SUSPEND: masks MPIC
     (already masked, but records IRQS_SUSPENDED)
  5. Resume: mpic_resume checks irq_percpu_is_enabled() => false
     (bit was cleared in step 1) => skips unmask
  6. mvneta_start_dev only restores device-level INTR_NEW_MASK,
     does not touch the MPIC per-CPU mask

Result: MPIC per-CPU interrupt stays masked permanently. The NIC
generates interrupts (INTR_NEW_CAUSE != 0) but the CPU never
receives them, causing complete loss of network connectivity.

Fix by calling on_each_cpu(mvneta_percpu_enable) in the resume path
to unconditionally unmask the MPIC per-CPU interrupt regardless of
pre-suspend state.

Fixes: 12bb03b436da ("net: mvneta: Handle per-cpu interrupts")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v4:
  - Rewrite commit message with accurate root cause analysis.

v3:
  - Dropped the free_irq/request_irq approach (incorrect root cause).
  - Instead, call on_each_cpu(mvneta_percpu_enable) in the resume path
    to ensure the MPIC percpu IRQ is unmasked, matching mvneta_open().
  - Updated commit message with correct root cause analysis.

v2:
  - Move request_irq before cpuhp registration in resume (matching
    mvneta_open ordering) so that failure does not leave cpuhp
    callbacks registered on a non-functional device.
  - On request_irq failure, call netif_device_detach() to prevent
    further traffic on the dead interface.

 drivers/net/ethernet/marvell/mvneta.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 488f2663ad2c..543e566425c1 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5918,6 +5918,9 @@ static int mvneta_resume(struct device *device)
 	rtnl_unlock();
 	mvneta_set_rx_mode(dev);
 
+	if (!pp->neta_armada3700)
+		on_each_cpu(mvneta_percpu_enable, pp, true);
+
 	return 0;
 }
 #endif
-- 
2.43.0


^ permalink raw reply related

* Re: [syzbot] [net?] INFO: task hung in nsim_destroy (4)
From: syzbot @ 2026-06-22  7:42 UTC (permalink / raw)
  To: andrew+netdev, andrew, davem, edumazet, kuba, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <000000000000f9be320619be1c0a@google.com>

syzbot has found a reproducer for the following issue on:

HEAD commit:    b85966adbf5d Merge tag 'net-next-7.2' of git://git.kernel...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11f167b6580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9a9f723a32776544
dashboard link: https://syzkaller.appspot.com/bug?extid=8141dcbd23a8f857798a
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15cf400a580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1013a50e580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d65306d96573/disk-b85966ad.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ef43139aab0e/vmlinux-b85966ad.xz
kernel image: https://storage.googleapis.com/syzbot-assets/26d4d1ab67c3/bzImage-b85966ad.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8141dcbd23a8f857798a@syzkaller.appspotmail.com

INFO: task kworker/R-netns:8 blocked for more than 140 seconds.
      Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/R-netns state:D stack:27768 pid:8     tgid:8     ppid:2      task_flags:0x4208060 flags:0x00080000
Workqueue: netns cleanup_net
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5504 [inline]
 __schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
 __schedule_loop kernel/sched/core.c:7307 [inline]
 schedule+0x164/0x360 kernel/sched/core.c:7322
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7379
 __mutex_lock_common kernel/locking/mutex.c:726 [inline]
 __mutex_lock+0x7bf/0x1550 kernel/locking/mutex.c:821
 rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
 rtnl_net_dev_lock+0x257/0x2f0 net/core/dev.c:2163
 unregister_netdevice_notifier_dev_net+0x96/0x450 net/core/dev.c:2208
 nsim_destroy+0xfd/0x800 drivers/net/netdevsim/netdev.c:1183
 __nsim_dev_port_del+0x14e/0x200 drivers/net/netdevsim/dev.c:1547
 nsim_dev_port_del_all drivers/net/netdevsim/dev.c:1561 [inline]
 nsim_dev_reload_destroy+0x288/0x490 drivers/net/netdevsim/dev.c:1785
 nsim_dev_reload_down+0x8a/0xc0 drivers/net/netdevsim/dev.c:1038
 devlink_reload+0x1c5/0x890 net/devlink/dev.c:462
 devlink_pernet_pre_exit+0x1ff/0x420 net/devlink/core.c:560
 ops_pre_exit_list net/core/net_namespace.c:161 [inline]
 ops_undo_list+0x17d/0x8d0 net/core/net_namespace.c:234
 cleanup_net+0x572/0x810 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 rescuer_thread+0x7b6/0x10b0 kernel/workqueue.c:3621
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
INFO: task kworker/0:1:10 blocked for more than 142 seconds.
      Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:1     state:D stack:27096 pid:10    tgid:10    ppid:2      task_flags:0x4208060 flags:0x00080000
Workqueue: events request_firmware_work_func
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5504 [inline]
 __schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
 __schedule_loop kernel/sched/core.c:7307 [inline]
 schedule+0x164/0x360 kernel/sched/core.c:7322
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7379
 __mutex_lock_common kernel/locking/mutex.c:726 [inline]
 __mutex_lock+0x7bf/0x1550 kernel/locking/mutex.c:821
 regdb_fw_cb+0x7d/0x1c0 net/wireless/reg.c:1005
 request_firmware_work_func+0xf2/0x1a0 drivers/base/firmware_loader/main.c:1152
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
INFO: task kworker/u8:0:12 blocked for more than 143 seconds.
      Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:0    state:D stack:23864 pid:12    tgid:12    ppid:2      task_flags:0x4208060 flags:0x00080000
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5504 [inline]
 __schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
 __schedule_loop kernel/sched/core.c:7307 [inline]
 schedule+0x164/0x360 kernel/sched/core.c:7322
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7379
 __mutex_lock_common kernel/locking/mutex.c:726 [inline]
 __mutex_lock+0x7bf/0x1550 kernel/locking/mutex.c:821
 rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
 addrconf_dad_work+0x116/0x15c0 net/ipv6/addrconf.c:4223
 </TASK>
INFO: task syz-executor831:5632 blocked for more than 144 seconds.
      Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor831 state:R  running task     stack:22336 pid:5632  tgid:5632  ppid:5629   task_flags:0x400140 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5504 [inline]
 __schedule+0x17d9/0x56c0 kernel/sched/core.c:7228
 </TASK>
INFO: lockdep is turned off.
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 31 Comm: khungtaskd Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 nmi_cpu_backtrace+0x274/0x2d0 lib/nmi_backtrace.c:113
 nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
 __sys_info lib/sys_info.c:157 [inline]
 sys_info+0x135/0x170 lib/sys_info.c:165
 check_hung_uninterruptible_tasks kernel/hung_task.c:353 [inline]
 watchdog+0xfd7/0x1030 kernel/hung_task.c:561
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 808 Comm: kworker/0:2 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Workqueue: wg-crypt-wg0 wg_packet_encrypt_worker
RIP: 0010:memset_orig+0x25/0xb0 arch/x86/lib/memset_64.S:64
Code: 90 90 90 90 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 74 48 89 d1 48 c1 e9 06 <74> 39 66 0f 1f 84 00 00 00 00 00 48 ff c9 48 89 07 48 89 47 08 48
RSP: 0018:ffffc90000006f20 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffc90000007078 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffffc90000007078
RBP: 1ffffffff2189698 R08: ffffc90000007087 R09: 0000000000000000
R10: ffffc90000007078 R11: fffff52000000e11 R12: dffffc0000000000
R13: ffffffff90c4b4c0 R14: ffffc90000007028 R15: ffffc90000007070
FS:  0000000000000000(0000) GS:ffff88812527c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055968609b450 CR3: 000000000e746000 CR4: 00000000003526f0
Call Trace:
 <IRQ>
 unwind_next_frame+0xd04/0x2550 arch/x86/kernel/unwind_orc.c:621
 __unwind_start+0x514/0x660 arch/x86/kernel/unwind_orc.c:787
 unwind_start arch/x86/include/asm/unwind.h:64 [inline]
 arch_stack_walk+0xe3/0x150 arch/x86/kernel/stacktrace.c:24
 stack_trace_save+0xa9/0x100 kernel/stacktrace.c:122
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2700 [inline]
 slab_free mm/slub.c:6310 [inline]
 kmem_cache_free+0x182/0x650 mm/slub.c:6437
 nft_synproxy_eval_v4+0x383/0x530 net/netfilter/nft_synproxy.c:61
 nft_synproxy_do_eval+0x335/0x550 net/netfilter/nft_synproxy.c:142
 expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
 nft_do_chain+0x48d/0x1b10 net/netfilter/nf_tables_core.c:285
 nft_do_chain_inet+0x360/0x4b0 net/netfilter/nft_chain_filter.c:162
 nf_hook_entry_hookfn include/linux/netfilter.h:158 [inline]
 nf_hook_slow+0xc5/0x220 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:273 [inline]
 NF_HOOK+0x21f/0x3c0 include/linux/netfilter.h:316
 NF_HOOK+0x336/0x3c0 include/linux/netfilter.h:318
 __netif_receive_skb_one_core net/core/dev.c:6206 [inline]
 __netif_receive_skb net/core/dev.c:6319 [inline]
 process_backlog+0xa34/0x1860 net/core/dev.c:6670
 __napi_poll+0xaa/0x330 net/core/dev.c:7729
 napi_poll net/core/dev.c:7792 [inline]
 net_rx_action+0x61d/0xf50 net/core/dev.c:7949
 handle_softirqs+0x225/0x840 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 spin_unlock_bh include/linux/spinlock.h:396 [inline]
 ptr_ring_consume_bh include/linux/ptr_ring.h:393 [inline]
 wg_packet_encrypt_worker+0x16e2/0x1760 drivers/net/wireguard/send.c:293
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.314 msecs


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply

* Re: [PATCH iwl-net] idpf: fix max_vport related crash on allocation error during init
From: Simon Horman @ 2026-06-22  7:30 UTC (permalink / raw)
  To: Emil Tantilov
  Cc: intel-wired-lan, netdev, anthony.l.nguyen, przemyslaw.kitszel,
	andrew+netdev, davem, edumazet, kuba, pabeni, madhu.chittim
In-Reply-To: <20260618192325.8694-1-emil.s.tantilov@intel.com>

On Thu, Jun 18, 2026 at 12:23:25PM -0700, Emil Tantilov wrote:
> Set adapter->max_vports only after successful allocation of vports, netdevs
> and  vport_config buffers. This fixes possible crashes on reset or rmmod,
> following failed allocation on init
> 
> [  305.981402] idpf 0000:83:00.0: enabling device (0100 -> 0102)
> [  305.994464] idpf 0000:83:00.0: Device HW Reset initiated
> [  320.416872] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  320.416918] #PF: supervisor read access in kernel mode
> [  320.416942] #PF: error_code(0x0000) - not-present page
> [  320.416963] PGD 2099657067 P4D 0
> [  320.416983] Oops: Oops: 0000 [#1] SMP NOPTI
> ...
> [  320.417093] RIP: 0010:idpf_remove+0x118/0x200 [idpf]
> [  320.417130] Code: 8b bb 98 09 00 00 e8 17 0f 5b e5 48 8b bb e8 08 00 00 e8 0b 0f 5b e5 66 83 bb 28 06 00 00 00 48 8b bb 20 06 00 00 74 49 31 ed <48> 8b 04 ef 48 85 c0 74 2f 48 8b 78 20 e8 66 58 91 e5 48 8b 83 20
> [  320.417183] RSP: 0018:ff7322212903fdb8 EFLAGS: 00010246
> [  320.417205] RAX: 0000000000000000 RBX: ff4463de40300000 RCX: ff7322212903fd4c
> [  320.417228] RDX: 0000000000000001 RSI: ffffffffa7f7d100 RDI: 0000000000000000
> [  320.417250] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [  320.417272] R10: 0000000000000001 R11: ff4463de3a638f58 R12: ff4463be89ac7000
> [  320.417294] R13: ff4463be89ac7198 R14: ff4463be94fc7198 R15: ffffffffc0f10f20
> [  320.417317] FS:  00007f963c0e6740(0000) GS:ff4463fdd65d8000(0000) knlGS:0000000000000000
> [  320.417342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  320.417362] CR2: 0000000000000000 CR3: 00000020ba674002 CR4: 0000000000773ef0
> [  320.417385] PKRU: 55555554
> [  320.417398] Call Trace:
> [  320.417412]  <TASK>
> [  320.417429]  pci_device_remove+0x42/0xb0
> [  320.417459]  device_release_driver_internal+0x1a9/0x210
> [  320.417492]  driver_detach+0x4b/0x90
> [  320.417516]  bus_remove_driver+0x70/0x100
> [  320.417539]  pci_unregister_driver+0x2e/0xb0
> [  320.417564]  __do_sys_delete_module.constprop.0+0x190/0x2f0
> [  320.417592]  ? kmem_cache_free+0x31e/0x550
> [  320.417619]  ? lockdep_hardirqs_on_prepare+0xde/0x190
> [  320.417644]  ? do_syscall_64+0x38/0x6b0
> [  320.417665]  do_syscall_64+0xc8/0x6b0
> [  320.417683]  ? clear_bhb_loop+0x30/0x80
> [  320.417706]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  320.417727] RIP: 0033:0x7f963bb30beb
> 
> Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>

FTR, an AI generated review of this patch is available on sashiko.dev.
I think that the issue raised there can be looked at in the context of
possible follow-up.

^ permalink raw reply

* Re: [PATCH v1 net] ipv4: fib: Don't ignore error route in local/main tables.
From: Ido Schimmel @ 2026-06-22  7:05 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Ahern, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260619212753.3367244-1-kuniyu@google.com>

On Fri, Jun 19, 2026 at 09:27:20PM +0000, Kuniyuki Iwashima wrote:
> When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added,
> fib_lookup() performs route lookup directly on two tables.
> 
> Since the first lookup does not properly bail out, the result
> of an error route in the merged local/main table could be
> overwritten by another route in the default table:
> 
>   # unshare -n
>   # ip link set lo up
>   # ip route add 192.168.0.0/24 dev lo table 253
>   # ip route add unreachable 192.168.0.0/24
>   # ip route get 192.168.0.1
>   192.168.0.1 dev lo table default uid 0
>       cache <local>
> 
> Once a random rule is added, the error route is respected:
> 
>   # ip rule add table 0
>   # ip rule del table 0
>   # ip route get 192.168.0.1
>   RTNETLINK answers: No route to host
> 
> Let's fix the inconsistent behaviour.
> 
> Fixes: f4530fa574df ("ipv4: Avoid overhead when no custom FIB rules are installed.")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH] net/mlx5: Free steering tag data on release
From: Tariq Toukan @ 2026-06-22  6:53 UTC (permalink / raw)
  To: lirongqing, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-rdma, linux-kernel
In-Reply-To: <20260613153725.1874-1-lirongqing@baidu.com>



On 13/06/2026 18:37, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> mlx5_st_alloc_index() allocates an mlx5_st_idx_data object for
> each new steering tag table index and stores it in the xarray.
> When the last user releases the index, mlx5_st_dealloc_index()
> removes the entry from the xarray but did not free the backing
> object, leaking memory.
> 
> Free idx_data after erasing the xarray entry once the refcount
> reaches zero.
> 
> Fixes: 888a7776f4fb0 ("net/mlx5: Add support for device steering tag")
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lib/st.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
> index 997be91..7cedc34 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
> @@ -175,6 +175,7 @@ int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index)
>   
>   	if (refcount_dec_and_test(&idx_data->usecount)) {
>   		xa_erase(&st->idx_xa, st_index);
> +		kfree(idx_data);
>   		/* We leave PCI config space as was before, no mkey will refer to it */
>   	}
>   

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>


^ permalink raw reply

* Re: [PATCH] net/mlx5: Fix L3 tunnel entropy refcount leak
From: Tariq Toukan @ 2026-06-22  6:49 UTC (permalink / raw)
  To: lirongqing, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-rdma, linux-kernel
In-Reply-To: <20260613153631.1752-1-lirongqing@baidu.com>



On 13/06/2026 18:36, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> mlx5_tun_entropy_refcount_inc() counts both VXLAN and L2-to-L3
> tunnel reformat entries as entropy-enabling users. The matching
> decrement path only handled VXLAN, leaving L2-to-L3 tunnel entries
> counted after release.
> 
> Handle MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL in
> mlx5_tun_entropy_refcount_dec() as well so the enabling entry
> refcount remains balanced.
> 
> Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c
> index 4571c56..97f6097 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c
> @@ -176,7 +176,8 @@ void mlx5_tun_entropy_refcount_dec(struct mlx5_tun_entropy *tun_entropy,
>   				   int reformat_type)
>   {
>   	mutex_lock(&tun_entropy->lock);
> -	if (reformat_type == MLX5_REFORMAT_TYPE_L2_TO_VXLAN)
> +	if (reformat_type == MLX5_REFORMAT_TYPE_L2_TO_VXLAN ||
> +	    reformat_type == MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL)
>   		tun_entropy->num_enabling_entries--;
>   	else if (reformat_type == MLX5_REFORMAT_TYPE_L2_TO_NVGRE &&
>   		 --tun_entropy->num_disabling_entries == 0)

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>


^ permalink raw reply

* Re: [PATCH net 3/3] net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable
From: Shay Drori @ 2026-06-22  6:40 UTC (permalink / raw)
  To: Simon Horman, tariqt
  Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, mbloch,
	leon, ohartoov, edwards, msanalla, phaddad, parav, gbayer, kees,
	moshe, rongweil, jacob.e.keller, netdev, linux-rdma, linux-kernel,
	gal
In-Reply-To: <20260618124820.890808-3-horms@kernel.org>



On 18/06/2026 15:48, Simon Horman wrote:
> External email: Use caution opening links or attachments
> 
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> 
> Simon says: I think the first two issues flagged below warrant
>              consideration within the context of this patch.
> 
>              But the third and final issue flagged below as pre-existing
>              seems to be something to consider in the context of
>              possible follow-up.
> ---
> net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable
> 
> This commit addresses a bad memory access crash by skipping peer flows when
> mlx5_lag_get_dev_seq() returns an error. This occurs when the peer isn't in
> the LAG or when no device is marked as master.
> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> index a9001d1c902ff..c6e6534a5e23b 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> @@ -2155,6 +2155,9 @@ static void mlx5e_tc_del_fdb_peers_flow(struct mlx5e_tc_flow *flow)
>>        devcom = flow->priv->mdev->priv.eswitch->devcom;
>>        mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
>>                i = mlx5_lag_get_dev_seq(peer_esw->dev);
>> +             if (i < 0)
>> +                     continue;
>> +
> 
> [Severity: Critical]
> Does skipping the remainder of this loop prevent the flow from being
> unlinked from the doubly-linked list?
> 
> By using continue here in mlx5e_tc_del_fdb_peers_flow(), we skip calling
> mlx5e_tc_del_fdb_peer_flow(), which means flow->peer[i] is never
> unlinked from the esw->offloads.peer_flows[original_i] list.

only in case a LAG member is removed from ldev, mlx5_lag_get_dev_seq()
will return error.
before LAG member is removed, esw->devcom is cleanup, which invoke
mlx5e_tc_clean_fdb_peer_flows(), which remove all peer flows.
Hence, no flow remains.

> 
> If the flow is subsequently freed via mlx5e_flow_put(), will this leave a
> freed list node resident in the active list and guarantee a use-after-free
> list corruption during the next list traversal?
> 
>>                mlx5e_tc_del_fdb_peer_flow(flow, i);
>>        }
>>   }
> 
> [ ... ]
> 
>> @@ -5526,6 +5529,9 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw)
>>
>>        mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
>>                i = mlx5_lag_get_dev_seq(peer_esw->dev);
>> +             if (i < 0)
>> +                     continue;
>> +
> 
> [Severity: Critical]
> Similarly to the above, does this skip unlinking during flow cleanup when
> the LAG sequence is unavailable?

By the time get_dev_seq() can fail, the member is already out of ldev
- and clean_fdb_peer_flows() (devcom unpair) ran before that, while seq
was still valid, so the flows are already unlinked.
The guard just covers that later window; nothing remains to leak.


> 
>>                list_for_each_entry_safe(flow, tmp, &esw->offloads.peer_flows[i], peer[i])
>>                        mlx5e_tc_del_fdb_peers_flow(flow);
>>        }
> 
> 
> [Severity: Critical]
> This isn't a bug introduced by this patch, but is there a similar missing
> bounds check during flow addition in mlx5e_tc_add_fdb_peer_flow() in this
> same file?
> 
> If a flow is added concurrently with LAG teardown, mlx5_lag_get_dev_seq()
> can return -ENOENT (-2):
> 
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:mlx5e_tc_add_fdb_peer_flow() {
>      ...
>      int i = mlx5_lag_get_dev_seq(peer_esw->dev);
>      ...
>      list_add_tail(&flow->peer[i], &esw->offloads.peer_flows[i]);
>      ...
> }
> 
> Does this result in an out-of-bounds write to a negative array index?


No, if LAG is destroyed, than is_peer_flow_needed will return false and
we won't enter here.
the whole peer loop runs under the devcom read lock
(mlx5_devcom_for_each_peer_begin), while devcom unpair - which is what
precedes LAG member removal and runs clean_fdb_peer_flows - takes the
write lock. The read lock therefore blocks teardown for the duration, so
mlx5_lag_get_dev_seq() can't go negative here.

^ permalink raw reply

* Re: [PATCH net] ipv6: Fix null-ptr-deref in fib6_nh_mtu_change().
From: Ido Schimmel @ 2026-06-22  6:37 UTC (permalink / raw)
  To: xmei5; +Cc: dsahern, netdev, davem, edumazet, pabeni, kuba, horms, bestswngs
In-Reply-To: <20260619045334.2427073-1-xmei5@asu.edu>

On Thu, Jun 18, 2026 at 09:53:34PM -0700, xmei5@asu.edu wrote:
> From: Xiang Mei <xmei5@asu.edu>
> 
> fib6_nh_mtu_change() re-fetches idev via __in6_dev_get(arg->dev) and
> dereferences idev->cnf.mtu6 without a NULL check. addrconf_ifdown()
> clears dev->ip6_ptr with RCU_INIT_POINTER() after rt6_disable_ip() has
> released tb6_lock, so the RA-driven MTU walk can observe a NULL idev and
> oops. The caller rt6_mtu_change_route() guards its own __in6_dev_get(),
> but this re-fetch is unguarded; nexthop-backed routes survive
> addrconf_ifdown()'s flush, so the walk still reaches it after ip6_ptr is
> nulled.
> 
> Return 0 when idev is NULL, matching rt6_mtu_change_route() and the
> fib6_mtu() fix in commit 5ad509c1fdad ("ipv6: Fix null-ptr-deref in
> fib6_mtu().").
> 
>   Oops: general protection fault, ... KASAN: null-ptr-deref in range
>         [0x00000000000002a8-0x00000000000002af]
>   RIP: 0010:fib6_nh_mtu_change+0x203/0x990
>    rt6_mtu_change_route+0x141/0x1d0
>    __fib6_clean_all+0xd0/0x160
>    rt6_mtu_change+0xb4/0x100
>    ndisc_router_discovery+0x24b5/0x2cb0
>    icmpv6_rcv+0x12e9/0x1710
>    ipv6_rcv+0x39b/0x410
> 
> Fixes: c0b220cf7d80 ("ipv6: Refactor exception functions")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-22  6:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König, David Howells, Simona Vetter, Randy Dunlap,
	Luca Ceresoli, Philipp Stanner, linux-block, LKML,
	open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
	io-uring, audit, bpf, Network Development, dri-devel,
	linux-perf-use., linux-trace-kernel, kexec, live-patching,
	linux-modules, Linux Crypto Mailing List, Linux Power Management,
	rcu, sched-ext, linux-mm, virtualization, damon,
	clang-built-linux, chengkaitao, Muchun Song
In-Reply-To: <CAADnVQJmPWFT01b7DuLdtafv=8FyB84GYHNZ8zSTck+9Aw0JpA@mail.gmail.com>



在 2026/6/22 13:28, Alexei Starovoitov 写道:
> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>
>> From: chengkaitao <chengkaitao@kylinos.cn>
>>
>> The list_for_each*_safe() helpers are used when the loop body may remove
>> the current entry.  Their current interface, however, forces every caller
>> to define a temporary cursor outside the macro and pass it in, even when
>> the caller never uses that cursor directly.  For most call sites this
>> extra cursor is just boilerplate required by the macro implementation.
>>
>> This is awkward because the saved next pointer is an internal detail of
>> the iteration.  Callers that only remove or move the current entry do not
>> need to spell it out.
>>
>> The _safe() suffix has also caused confusion.  Christian Koenig pointed
>> out that the name is easy to read as a thread-safe variant, especially
>> for beginners, even though it only means that the iterator keeps enough
>> state to tolerate removal of the current entry.  He suggested _mutable()
>> as a clearer description of what the loop permits.
>>
>> Add *_mutable() iterator variants for list, hlist and llist.  The new
>> helpers are variadic and support both forms.  In the common case, the
>> caller omits the temporary cursor and the macro creates a unique internal
>> cursor with typeof(pos) and __UNIQUE_ID().  If a loop really needs an
>> explicit temporary cursor, the caller can still pass it and the helper
>> keeps the existing *_safe() behaviour.
>>
>> For example, a call site may use the shorter form:
>>
>>   list_for_each_entry_mutable(pos, head, member)
>>
>> or keep the explicit temporary cursor form:
>>
>>   list_for_each_entry_mutable(pos, tmp, head, member)
>>
>> The existing *_safe() helpers remain available for compatibility.  This
>> series only converts users in mm, block, kernel, init and io_uring.  If
>> this approach looks acceptable, the remaining users can be converted in
>> follow-up series.
>>
>> Changes in v3 (Christian König, Andy Shevchenko):
>> - Convert safe list walks to mutable iterators
>>
>> Changes in v2 (Muchun Song, Andy Shevchenko):
>> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>>   cursor change directly in the existing list_for_each_entry*() helpers.
>> - Open-code special list walks that rely on updating the loop cursor in
>>   the body, preserving their existing traversal semantics.
>>
>> Link to v2:
>> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>>
>> Link to v1:
>> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>>
>> Kaitao Cheng (7):
>>   list: Add mutable iterator variants
>>   llist: Add mutable iterator variants
>>   mm: Use mutable list iterators
>>   block: Use mutable list iterators
>>   kernel: Use mutable list iterators
>>   initramfs: Use mutable list iterator
>>   io_uring: Use mutable list iterators
>>
>>  block/bfq-iosched.c                 |  17 +-
>>  block/blk-cgroup.c                  |  12 +-
>>  block/blk-flush.c                   |   4 +-
>>  block/blk-iocost.c                  |  18 +-
>>  block/blk-mq.c                      |   8 +-
>>  block/blk-throttle.c                |   4 +-
>>  block/kyber-iosched.c               |   4 +-
>>  block/partitions/ldm.c              |   8 +-
>>  block/sed-opal.c                    |   4 +-
>>  include/linux/list.h                | 269 ++++++++++++++++++++++++----
>>  include/linux/llist.h               |  81 +++++++--
>>  init/initramfs.c                    |   5 +-
>>  io_uring/cancel.c                   |   6 +-
>>  io_uring/poll.c                     |   3 +-
>>  io_uring/rw.c                       |   4 +-
>>  io_uring/timeout.c                  |   8 +-
>>  io_uring/uring_cmd.c                |   3 +-
>>  kernel/audit_tree.c                 |   4 +-
>>  kernel/audit_watch.c                |  16 +-
>>  kernel/auditfilter.c                |   4 +-
>>  kernel/auditsc.c                    |   4 +-
>>  kernel/bpf/arena.c                  |  10 +-
>>  kernel/bpf/arraymap.c               |   8 +-
>>  kernel/bpf/bpf_local_storage.c      |   3 +-
>>  kernel/bpf/bpf_lru_list.c           |  25 ++-
>>  kernel/bpf/btf.c                    |  18 +-
>>  kernel/bpf/cgroup.c                 |   7 +-
>>  kernel/bpf/cpumap.c                 |   4 +-
>>  kernel/bpf/devmap.c                 |  10 +-
>>  kernel/bpf/helpers.c                |   8 +-
>>  kernel/bpf/local_storage.c          |   4 +-
>>  kernel/bpf/memalloc.c               |  16 +-
>>  kernel/bpf/offload.c                |   8 +-
>>  kernel/bpf/states.c                 |   4 +-
>>  kernel/bpf/stream.c                 |   4 +-
>>  kernel/bpf/verifier.c               |   6 +-
>>  kernel/cgroup/cgroup-v1.c           |   4 +-
>>  kernel/cgroup/cgroup.c              |  54 +++---
>>  kernel/cgroup/dmem.c                |  12 +-
>>  kernel/cgroup/rdma.c                |   8 +-
>>  kernel/events/core.c                |  44 +++--
>>  kernel/events/uprobes.c             |  12 +-
>>  kernel/exit.c                       |   8 +-
>>  kernel/fail_function.c              |   4 +-
>>  kernel/gcov/clang.c                 |   4 +-
>>  kernel/irq_work.c                   |   4 +-
>>  kernel/kexec_core.c                 |   4 +-
>>  kernel/kprobes.c                    |  16 +-
>>  kernel/livepatch/core.c             |   4 +-
>>  kernel/livepatch/core.h             |   4 +-
>>  kernel/liveupdate/kho_block.c       |   4 +-
>>  kernel/liveupdate/luo_flb.c         |   4 +-
>>  kernel/locking/rwsem.c              |   2 +-
>>  kernel/locking/test-ww_mutex.c      |   2 +-
>>  kernel/module/main.c                |  11 +-
>>  kernel/padata.c                     |   4 +-
>>  kernel/power/snapshot.c             |   8 +-
>>  kernel/power/wakelock.c             |   4 +-
>>  kernel/printk/printk.c              |  11 +-
>>  kernel/ptrace.c                     |   4 +-
>>  kernel/rcu/rcutorture.c             |   3 +-
>>  kernel/rcu/tasks.h                  |   9 +-
>>  kernel/rcu/tree.c                   |   6 +-
>>  kernel/resource.c                   |   4 +-
>>  kernel/sched/core.c                 |   4 +-
>>  kernel/sched/ext.c                  |  22 +--
>>  kernel/sched/fair.c                 |  28 +--
>>  kernel/sched/topology.c             |   4 +-
>>  kernel/sched/wait.c                 |   4 +-
>>  kernel/seccomp.c                    |   4 +-
>>  kernel/signal.c                     |  11 +-
>>  kernel/smp.c                        |   4 +-
>>  kernel/taskstats.c                  |   8 +-
>>  kernel/time/clockevents.c           |   6 +-
>>  kernel/time/clocksource.c           |   4 +-
>>  kernel/time/posix-cpu-timers.c      |   4 +-
>>  kernel/time/posix-timers.c          |   3 +-
>>  kernel/torture.c                    |   3 +-
>>  kernel/trace/bpf_trace.c            |   4 +-
>>  kernel/trace/ftrace.c               |  49 +++--
>>  kernel/trace/ring_buffer.c          |  25 ++-
>>  kernel/trace/trace.c                |  12 +-
>>  kernel/trace/trace_dynevent.c       |   6 +-
>>  kernel/trace/trace_dynevent.h       |   5 +-
>>  kernel/trace/trace_events.c         |  35 ++--
>>  kernel/trace/trace_events_filter.c  |   4 +-
>>  kernel/trace/trace_events_hist.c    |   8 +-
>>  kernel/trace/trace_events_trigger.c |  17 +-
>>  kernel/trace/trace_events_user.c    |  16 +-
>>  kernel/trace/trace_stat.c           |   4 +-
>>  kernel/user-return-notifier.c       |   3 +-
>>  kernel/workqueue.c                  |  16 +-
>>  mm/backing-dev.c                    |   8 +-
>>  mm/balloon.c                        |   8 +-
>>  mm/cma.c                            |   4 +-
>>  mm/compaction.c                     |   4 +-
>>  mm/damon/core.c                     |   4 +-
>>  mm/damon/sysfs-schemes.c            |   4 +-
>>  mm/dmapool.c                        |   4 +-
>>  mm/huge_memory.c                    |   8 +-
>>  mm/hugetlb.c                        |  56 +++---
>>  mm/hugetlb_vmemmap.c                |  16 +-
>>  mm/khugepaged.c                     |  14 +-
>>  mm/kmemleak.c                       |   7 +-
>>  mm/ksm.c                            |  25 +--
>>  mm/list_lru.c                       |   4 +-
>>  mm/memcontrol-v1.c                  |   8 +-
>>  mm/memory-failure.c                 |  12 +-
>>  mm/memory-tiers.c                   |   4 +-
>>  mm/migrate.c                        |  23 ++-
>>  mm/mmu_notifier.c                   |   9 +-
>>  mm/page_alloc.c                     |   8 +-
>>  mm/page_reporting.c                 |   2 +-
>>  mm/percpu.c                         |  11 +-
>>  mm/pgtable-generic.c                |   4 +-
>>  mm/rmap.c                           |  10 +-
>>  mm/shmem.c                          |   9 +-
>>  mm/slab_common.c                    |  14 +-
>>  mm/slub.c                           |  33 ++--
>>  mm/swapfile.c                       |   4 +-
>>  mm/userfaultfd.c                    |  12 +-
>>  mm/vmalloc.c                        |  24 +--
>>  mm/vmscan.c                         |   7 +-
>>  mm/zsmalloc.c                       |   4 +-
>>  124 files changed, 875 insertions(+), 681 deletions(-)
> 
> Not sure what you were thinking, but this diff stat
> is not landable.

[PATCH v3 1/7] and [PATCH v3 2/7] contain the main logic and can
be merged directly. They are also compatible with the old API.
[PATCH v3 3/7] through [PATCH v3 7/7] are just simple interface
replacements and do not change any functional logic. They can be
left unmerged for now; individual modules can pick them up later
if needed.

In v2, Andy Shevchenko mentioned: "If it's done by Linus himself
during the day when he prepares -rc1, it's fine." Even so, the
changes in this patch series are indeed quite large and touch
almost every subsystem. I have only converted part of them for
now, so I wanted to send this out first and see what people think.

-- 
Thanks
Kaitao Cheng


^ permalink raw reply

* Re: [PATCH] net: meth: check skb allocation in meth_init_rx_ring()
From: Pavan Chebbi @ 2026-06-22  5:57 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, stable
In-Reply-To: <20260622044914.664749-1-haoxiang_li2024@163.com>

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Mon, Jun 22, 2026 at 10:20 AM Haoxiang Li <haoxiang_li2024@163.com> wrote:
>
> meth_init_rx_ring() does not check the return value of alloc_skb().
> If the allocation fails, the NULL skb is passed to skb_reserve() and
> then dereferenced through skb->head.
>
> Add check for alloc_skb() to prevent potential null pointer dereference.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
> ---
>  drivers/net/ethernet/sgi/meth.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/sgi/meth.c b/drivers/net/ethernet/sgi/meth.c
> index f7c3a5a766b7..ceff3cc937ad 100644
> --- a/drivers/net/ethernet/sgi/meth.c
> +++ b/drivers/net/ethernet/sgi/meth.c
> @@ -228,6 +228,9 @@ static int meth_init_rx_ring(struct meth_private *priv)
>
>         for (i = 0; i < RX_RING_ENTRIES; i++) {
>                 priv->rx_skbs[i] = alloc_skb(METH_RX_BUFF_SIZE, 0);
> +               if (!priv->rx_skbs[i])
> +                       return -ENOMEM;
> +

I think the fix is not complete. The caller meth_open() will not free
any successfully allocated skbs if the function ever returns -ENOMEM.

>                 /* 8byte status vector + 3quad padding + 2byte padding,
>                  * to put data on 64bit aligned boundary */
>                 skb_reserve(priv->rx_skbs[i],METH_RX_HEAD);
> --
> 2.25.1
>
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply

* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Alexei Starovoitov @ 2026-06-22  5:28 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König, David Howells, Simona Vetter, Randy Dunlap,
	Luca Ceresoli, Philipp Stanner, linux-block, LKML,
	open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
	io-uring, audit, bpf, Network Development, dri-devel,
	linux-perf-use., linux-trace-kernel, kexec, live-patching,
	linux-modules, Linux Crypto Mailing List, Linux Power Management,
	rcu, sched-ext, linux-mm, virtualization, damon,
	clang-built-linux, chengkaitao
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>

On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>
> From: chengkaitao <chengkaitao@kylinos.cn>
>
> The list_for_each*_safe() helpers are used when the loop body may remove
> the current entry.  Their current interface, however, forces every caller
> to define a temporary cursor outside the macro and pass it in, even when
> the caller never uses that cursor directly.  For most call sites this
> extra cursor is just boilerplate required by the macro implementation.
>
> This is awkward because the saved next pointer is an internal detail of
> the iteration.  Callers that only remove or move the current entry do not
> need to spell it out.
>
> The _safe() suffix has also caused confusion.  Christian Koenig pointed
> out that the name is easy to read as a thread-safe variant, especially
> for beginners, even though it only means that the iterator keeps enough
> state to tolerate removal of the current entry.  He suggested _mutable()
> as a clearer description of what the loop permits.
>
> Add *_mutable() iterator variants for list, hlist and llist.  The new
> helpers are variadic and support both forms.  In the common case, the
> caller omits the temporary cursor and the macro creates a unique internal
> cursor with typeof(pos) and __UNIQUE_ID().  If a loop really needs an
> explicit temporary cursor, the caller can still pass it and the helper
> keeps the existing *_safe() behaviour.
>
> For example, a call site may use the shorter form:
>
>   list_for_each_entry_mutable(pos, head, member)
>
> or keep the explicit temporary cursor form:
>
>   list_for_each_entry_mutable(pos, tmp, head, member)
>
> The existing *_safe() helpers remain available for compatibility.  This
> series only converts users in mm, block, kernel, init and io_uring.  If
> this approach looks acceptable, the remaining users can be converted in
> follow-up series.
>
> Changes in v3 (Christian König, Andy Shevchenko):
> - Convert safe list walks to mutable iterators
>
> Changes in v2 (Muchun Song, Andy Shevchenko):
> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>   cursor change directly in the existing list_for_each_entry*() helpers.
> - Open-code special list walks that rely on updating the loop cursor in
>   the body, preserving their existing traversal semantics.
>
> Link to v2:
> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>
> Link to v1:
> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>
> Kaitao Cheng (7):
>   list: Add mutable iterator variants
>   llist: Add mutable iterator variants
>   mm: Use mutable list iterators
>   block: Use mutable list iterators
>   kernel: Use mutable list iterators
>   initramfs: Use mutable list iterator
>   io_uring: Use mutable list iterators
>
>  block/bfq-iosched.c                 |  17 +-
>  block/blk-cgroup.c                  |  12 +-
>  block/blk-flush.c                   |   4 +-
>  block/blk-iocost.c                  |  18 +-
>  block/blk-mq.c                      |   8 +-
>  block/blk-throttle.c                |   4 +-
>  block/kyber-iosched.c               |   4 +-
>  block/partitions/ldm.c              |   8 +-
>  block/sed-opal.c                    |   4 +-
>  include/linux/list.h                | 269 ++++++++++++++++++++++++----
>  include/linux/llist.h               |  81 +++++++--
>  init/initramfs.c                    |   5 +-
>  io_uring/cancel.c                   |   6 +-
>  io_uring/poll.c                     |   3 +-
>  io_uring/rw.c                       |   4 +-
>  io_uring/timeout.c                  |   8 +-
>  io_uring/uring_cmd.c                |   3 +-
>  kernel/audit_tree.c                 |   4 +-
>  kernel/audit_watch.c                |  16 +-
>  kernel/auditfilter.c                |   4 +-
>  kernel/auditsc.c                    |   4 +-
>  kernel/bpf/arena.c                  |  10 +-
>  kernel/bpf/arraymap.c               |   8 +-
>  kernel/bpf/bpf_local_storage.c      |   3 +-
>  kernel/bpf/bpf_lru_list.c           |  25 ++-
>  kernel/bpf/btf.c                    |  18 +-
>  kernel/bpf/cgroup.c                 |   7 +-
>  kernel/bpf/cpumap.c                 |   4 +-
>  kernel/bpf/devmap.c                 |  10 +-
>  kernel/bpf/helpers.c                |   8 +-
>  kernel/bpf/local_storage.c          |   4 +-
>  kernel/bpf/memalloc.c               |  16 +-
>  kernel/bpf/offload.c                |   8 +-
>  kernel/bpf/states.c                 |   4 +-
>  kernel/bpf/stream.c                 |   4 +-
>  kernel/bpf/verifier.c               |   6 +-
>  kernel/cgroup/cgroup-v1.c           |   4 +-
>  kernel/cgroup/cgroup.c              |  54 +++---
>  kernel/cgroup/dmem.c                |  12 +-
>  kernel/cgroup/rdma.c                |   8 +-
>  kernel/events/core.c                |  44 +++--
>  kernel/events/uprobes.c             |  12 +-
>  kernel/exit.c                       |   8 +-
>  kernel/fail_function.c              |   4 +-
>  kernel/gcov/clang.c                 |   4 +-
>  kernel/irq_work.c                   |   4 +-
>  kernel/kexec_core.c                 |   4 +-
>  kernel/kprobes.c                    |  16 +-
>  kernel/livepatch/core.c             |   4 +-
>  kernel/livepatch/core.h             |   4 +-
>  kernel/liveupdate/kho_block.c       |   4 +-
>  kernel/liveupdate/luo_flb.c         |   4 +-
>  kernel/locking/rwsem.c              |   2 +-
>  kernel/locking/test-ww_mutex.c      |   2 +-
>  kernel/module/main.c                |  11 +-
>  kernel/padata.c                     |   4 +-
>  kernel/power/snapshot.c             |   8 +-
>  kernel/power/wakelock.c             |   4 +-
>  kernel/printk/printk.c              |  11 +-
>  kernel/ptrace.c                     |   4 +-
>  kernel/rcu/rcutorture.c             |   3 +-
>  kernel/rcu/tasks.h                  |   9 +-
>  kernel/rcu/tree.c                   |   6 +-
>  kernel/resource.c                   |   4 +-
>  kernel/sched/core.c                 |   4 +-
>  kernel/sched/ext.c                  |  22 +--
>  kernel/sched/fair.c                 |  28 +--
>  kernel/sched/topology.c             |   4 +-
>  kernel/sched/wait.c                 |   4 +-
>  kernel/seccomp.c                    |   4 +-
>  kernel/signal.c                     |  11 +-
>  kernel/smp.c                        |   4 +-
>  kernel/taskstats.c                  |   8 +-
>  kernel/time/clockevents.c           |   6 +-
>  kernel/time/clocksource.c           |   4 +-
>  kernel/time/posix-cpu-timers.c      |   4 +-
>  kernel/time/posix-timers.c          |   3 +-
>  kernel/torture.c                    |   3 +-
>  kernel/trace/bpf_trace.c            |   4 +-
>  kernel/trace/ftrace.c               |  49 +++--
>  kernel/trace/ring_buffer.c          |  25 ++-
>  kernel/trace/trace.c                |  12 +-
>  kernel/trace/trace_dynevent.c       |   6 +-
>  kernel/trace/trace_dynevent.h       |   5 +-
>  kernel/trace/trace_events.c         |  35 ++--
>  kernel/trace/trace_events_filter.c  |   4 +-
>  kernel/trace/trace_events_hist.c    |   8 +-
>  kernel/trace/trace_events_trigger.c |  17 +-
>  kernel/trace/trace_events_user.c    |  16 +-
>  kernel/trace/trace_stat.c           |   4 +-
>  kernel/user-return-notifier.c       |   3 +-
>  kernel/workqueue.c                  |  16 +-
>  mm/backing-dev.c                    |   8 +-
>  mm/balloon.c                        |   8 +-
>  mm/cma.c                            |   4 +-
>  mm/compaction.c                     |   4 +-
>  mm/damon/core.c                     |   4 +-
>  mm/damon/sysfs-schemes.c            |   4 +-
>  mm/dmapool.c                        |   4 +-
>  mm/huge_memory.c                    |   8 +-
>  mm/hugetlb.c                        |  56 +++---
>  mm/hugetlb_vmemmap.c                |  16 +-
>  mm/khugepaged.c                     |  14 +-
>  mm/kmemleak.c                       |   7 +-
>  mm/ksm.c                            |  25 +--
>  mm/list_lru.c                       |   4 +-
>  mm/memcontrol-v1.c                  |   8 +-
>  mm/memory-failure.c                 |  12 +-
>  mm/memory-tiers.c                   |   4 +-
>  mm/migrate.c                        |  23 ++-
>  mm/mmu_notifier.c                   |   9 +-
>  mm/page_alloc.c                     |   8 +-
>  mm/page_reporting.c                 |   2 +-
>  mm/percpu.c                         |  11 +-
>  mm/pgtable-generic.c                |   4 +-
>  mm/rmap.c                           |  10 +-
>  mm/shmem.c                          |   9 +-
>  mm/slab_common.c                    |  14 +-
>  mm/slub.c                           |  33 ++--
>  mm/swapfile.c                       |   4 +-
>  mm/userfaultfd.c                    |  12 +-
>  mm/vmalloc.c                        |  24 +--
>  mm/vmscan.c                         |   7 +-
>  mm/zsmalloc.c                       |   4 +-
>  124 files changed, 875 insertions(+), 681 deletions(-)

Not sure what you were thinking, but this diff stat
is not landable.

pw-bot: cr

^ permalink raw reply

* Re: [PATCH v3 5/7] kernel: Use mutable list iterators
From: bot+bpf-ci @ 2026-06-22  5:22 UTC (permalink / raw)
  To: kaitao.cheng, paul, eparis, ast, daniel, andrii, eddyz87, memxor,
	davem, kuba, hawk, john.fastabend, tj, hannes, mkoutny, dev,
	mripard, natalie.vock, peterz, mingo, acme, namhyung, mhiramat,
	oleg, oberpar, akpm, baoquan.he, rppt, pasha.tatashin, pratyush,
	naveen, jpoimboe, jikos, mbenes, pmladek, will, boqun, mcgrof,
	petr.pavlu, da.gomez, samitolvanen, steffen.klassert,
	daniel.m.jordan, rafael, dave, paulmck, josh, frederic,
	neeraj.upadhyay, joelagnelf, urezki, juri.lelli, vincent.guittot,
	kees, bsingharora, anna-maria, tglx, jstultz, kpsingh,
	mattbobrowski, nathan, martin.lau, song, mark.rutland,
	mathieu.desnoyers, dietmar.eggemann, void, rostedt
  Cc: audit, linux-kernel, bpf, netdev, cgroups, dri-devel,
	linux-perf-users, linux-trace-kernel, kexec, live-patching,
	linux-modules, linux-crypto, linux-pm, rcu, sched-ext, llvm,
	chengkaitao, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260622042811.31684-1-kaitao.cheng@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 1983 bytes --]

> diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
> index c4dfbc293bae..9e076106bee7 100644
> --- a/kernel/trace/trace_dynevent.c
> +++ b/kernel/trace/trace_dynevent.c

[ ... ]

> @@ -100,7 +100,7 @@ int dyn_event_release(const char *raw_command, struct dyn_event_operations *type
>  		return -EINVAL;
>
>  	mutex_lock(&event_mutex);
> -	for_each_dyn_event_safe(pos, n) {
> +	for_each_dyn_event_safe(pos) {
>  		if (type && type != pos->ops)
>  			continue;
>  		if (!pos->ops->match(system, event,

Does this leave the variable n unused? Looking at the function
declaration around line 74, it still has:

	struct dyn_event *pos, *n;

but after this change n is no longer passed to for_each_dyn_event_safe().
The macro now hides the temporary cursor internally, so n appears to be
dead code. This would trigger -Wunused-variable warnings and break builds
with CONFIG_WERROR=y.

For comparison, the sibling change at line 219 (dyn_events_release_all)
correctly removed the *tmp variable from its declaration.

> diff --git a/kernel/trace/trace_dynevent.h b/kernel/trace/trace_dynevent.h
> index beee3f8d7544..a4dc0812284f 100644
> --- a/kernel/trace/trace_dynevent.h
> +++ b/kernel/trace/trace_dynevent.h
> @@ -115,10 +115,9 @@ int dyn_event_create(const char *raw_command, struct dyn_event_operations *type)
>  /*
>   * for_each_dyn_event	-	iterate over the dyn_event list safely
>   * @pos:	the struct dyn_event * to use as a loop cursor
> - * @n:		the struct dyn_event * to use as temporary storage
>   */
> -#define for_each_dyn_event_safe(pos, n)	\
> -	list_for_each_entry_safe(pos, n, &dyn_event_list, list)
> +#define for_each_dyn_event_safe(pos)	\
> +	list_for_each_entry_mutable(pos, &dyn_event_list, list)


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27930295027

^ permalink raw reply

* Re: [PATCH v2 1/2] drm/drm_ras: Add drm_ras netlink error event
From: Tauro, Riana @ 2026-06-22  5:19 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, dri-devel, netdev, aravind.iddamsetty, anshuman.gupta,
	rodrigo.vivi, joonas.lahtinen, kuba, simona.vetter, airlied,
	pratik.bari, joshua.santosh.ranjan, ashwin.kumar.kulkarni,
	shubham.kumar, ravi.kishore.koppuravuri, maarten.lankhorst,
	mallesh.koujalagi, soham.purkait, Zack McKevitt, Lijo Lazar,
	Hawking Zhang, David S. Miller, Paolo Abeni, Eric Dumazet
In-Reply-To: <ajPDFcp36k1mGcdB@black.igk.intel.com>


On 18-06-2026 15:36, Raag Jadav wrote:
> On Thu, Jun 11, 2026 at 10:51:46AM +0530, Riana Tauro wrote:
>> Define a new netlink event 'error-event' and a new multicast group
>> 'error-notify' in drm_ras. Each event contains device name, node and
>> error information to identify the error triggering the event.
>>
>> Add drm_ras_nl_error_event() to trigger an event from the driver.
>> Userspace must subscribe to 'error-notify' to receive 'error-event'
>> notifications.
>>
>> Usage:
>>
>> $ sudo ynl --family drm_ras --subscribe error-notify
> ...
>
>>   operations:
>>     list:
>> @@ -124,3 +151,24 @@ operations:
>>         do:
>>           request:
>>             attributes: *id-attrs
>> +    -
>> +      name: error-event
>> +      doc: >-
>> +           Notify userspace of an error event.
>> +           The event includes the device, node and error information
>> +           of the error that triggered the event.
>> +      attribute-set: error-event-attrs
>> +      mcgrp: error-notify
> This looks much closer to "notify:" property, which IIUC it's not. Looking
> at some of the existing examples, a better name could be something like
> 'error-monitor' or 'error-report' to make it a bit distinguishable.

Yeah makes sense.  Will change the group name
Thank you for the review :)

Thanks
Riana

>
> Or perhaps it could be just me without the coffee :(
> so I'll leave it to you.
>
> Reviewed-by: Raag Jadav <raag.jadav@intel.com>
>
>> +      event:
>> +        attributes:
>> +          - device-name
>> +          - node-id
>> +          - node-name
>> +          - error-id
>> +          - error-name
>> +          - error-value
>> +
>> +mcast-groups:
>> +  list:
>> +    -
>> +      name: error-notify

^ permalink raw reply

* RE: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Selvamani Rajagopal @ 2026-06-22  5:14 UTC (permalink / raw)
  To: Parthiban.Veerasooran@microchip.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org,
	conor+dt@kernel.org, Piergiorgio Beruto
  Cc: andrew@lunn.ch, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Conor.Dooley@microchip.com,
	devicetree@vger.kernel.org
In-Reply-To: <CYYPR02MB9828CD98EEEB9B218A940E4483E32@CYYPR02MB9828.namprd02.prod.outlook.com>

> 
> AI review bot Sashiko suggested one potential issue where skb pointers aren't protected.
> But those
> concerns are in transmit path. This crash seems to be in receive path. If you think that
> might help,
> I can generate a patch for that.


Parthiban,

I just submitted a patch for "net" tree. I was able to see one crash though. Crash signature
was different from yours. As I remember, yours is NULL pointer access. Mine was due to 
trying to place the data beyond the "end" point.

Anyway, if you have time to spare and want to try and see if it fixes your crash, I would appreciate 
the feedback..

https://patchwork.kernel.org/project/netdevbpf/list/?series=1114495

> 
> What do you suggest? Since you are able to see the crash, would you have time to
> investigate?
> 
> Sincerely
> Selva

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox