Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net-next 00/15] ip6mr: No RTNL for RTNL_FAMILY_IP6MR rtnetlink.
From: Kuniyuki Iwashima @ 2026-04-12 22:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260412141114.143d0858@kernel.org>

On Sun, Apr 12, 2026 at 2:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 12 Apr 2026 13:50:59 -0700 Kuniyuki Iwashima wrote:
> > > Hitting a bunch of:
> > >
> > >   SKIP      no netlink MFC interface
> > >
> > > on the new test here. Do we need to add something to .../config ?
> >
> > No, I used SKIP() intentionally becuase only IPv4 has the MFC
> > netlink interface and IPv6 does not have the corresponding one.
> >
> > Should I just return 0 in this case instead of SKIP() ?
>
> XFAIL

Will use XFAIL_ADD().

Thanks !

^ permalink raw reply

* Re: [PATCH net-next] ppp: tear down bridge before clearing pch->chan
From: Jakub Kicinski @ 2026-04-12 22:27 UTC (permalink / raw)
  To: qingfang.deng
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	yuehaibing, kuniyu, kees, bigeasy, linux-ppp, netdev,
	linux-kernel, tparkin, jchapman, gnault
In-Reply-To: <20260410093859.69237-1-qingfang.deng@linux.dev>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ppp: tear down bridge before clearing pch->chan

This commit moves ppp_unbridge_channels() before pch->chan is cleared
and removes the NULL check for pchb->chan in ppp_channel_bridge_input(),
relying on synchronize_rcu() inside ppp_unbridge_channels() for safety.

> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
> index b097d1b38ac9c..3a609d48a4248 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -2285,17 +2285,11 @@ static bool ppp_channel_bridge_input(struct channel *pch, struct sk_buff *skb)
>  		goto out_rcu;
>  
>  	spin_lock_bh(&pchb->downl);
> -	if (!pchb->chan) {
> -		/* channel got unregistered */
> -		kfree_skb(skb);
> -		goto outl;
> -	}
>  
>  	skb_scrub_packet(skb, !net_eq(pch->chan_net, pchb->chan_net));
>  	if (!pchb->chan->ops->start_xmit(pchb->chan, skb))

Could removing this check lead to a NULL pointer dereference if both channels
in a bridge (A and B) are unregistered concurrently?

If Thread 1 unregisters A and calls ppp_unbridge_channels(A), it clears
both A->bridge and B->bridge to NULL, then blocks on synchronize_rcu().

If Thread 2 unregisters B and calls ppp_unbridge_channels(B), it finds
B->bridge is already NULL and returns -EINVAL early, bypassing its own
synchronize_rcu() call.

Thread 2 would then proceed to set B->chan = NULL while an RCU reader
(Thread 3) might still be active here, leading to a crash when it tries
to access pchb->chan->ops->start_xmit.

[ ... ]

> @@ -2997,6 +2991,8 @@ ppp_unregister_channel(struct ppp_channel *chan)
>  	 * the channel's start_xmit or ioctl routine before we proceed.
>  	 */
>  	ppp_disconnect_channel(pch);
> +	ppp_unbridge_channels(pch);
> +

Does moving ppp_unbridge_channels() here introduce a re-bridging race window?

At this point, the channel is still in the pn->all_channels list. After
ppp_unbridge_channels(pch) completes, pch->bridge is NULL.

Could a concurrent PPPIOCBRIDGECHAN ioctl from another channel find this
channel via ppp_find_channel() and successfully establish a new bridge
with it, since ppp_bridge_channels() only verifies that pch->bridge is NULL?

If so, ppp_unregister_channel() would then unconditionally set pch->chan = NULL
and remove it from the list without tearing down the newly formed bridge.
This regression could crash the peer channel when it processes a packet.

>  	down_write(&pch->chan_sem);
>  	spin_lock_bh(&pch->downl);
>  	pch->chan = NULL;
-- 
pw-bot: cr

^ permalink raw reply

* [PATCH net v3] netfilter: nft_set_rbtree: fix use count leak on transaction abort
From: Marko Jevtic @ 2026-04-12 22:28 UTC (permalink / raw)
  To: pablo, fw, netfilter-devel
  Cc: phil, coreteam, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel

nft_rbtree_abort() does not handle elements moved to the expired list
by inline GC during __nft_rbtree_insert(). When inline GC encounters
expired elements during overlap detection, it calls
nft_rbtree_gc_elem_move() which deactivates element data (decrementing
chain/object use counts), removes the element from the rbtree, and
queues it for deferred freeing. On commit, these elements are freed
via nft_rbtree_gc_queue(). On abort, however, the expired list is
ignored entirely.

This leaves use counts permanently decremented after abort.

This restores transactional semantics by ensuring that inline GC side
effects are fully rolled back on abort:

- Introduce a separate tx_gc list for elements collected during insert
  (transaction-scoped), distinct from the existing expired list used
  by commit-time gc_scan (commit-scoped). This prevents abort from
  touching committed expired elements left over from a prior gc_queue
  OOM.

- On commit: splice tx_gc into expired after publishing the new binary
  search blob, then drain via gc_queue as before.

- On abort: iterate tx_gc, re-activate element data (restoring use
  counts), and re-insert into the rbtree. Elements remain expired and
  will be properly collected on the next successful commit.

- Extract nft_rbtree_node_insert() helper from __nft_rbtree_insert()
  to share the tree insertion logic with the abort restore path.

- Add WARN_ON_ONCE in commit early-return path to catch any violation
  of the invariant that tx_gc is empty when no tree changes occurred.

- Reset start_rbe_cookie on abort so insertion state from a failed
  transaction does not persist.

Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Marko Jevtic <marko.jevtic@codereflect.io>
---
v3:
- add Fixes tag
- narrow the changelog to the abort-side use-count accounting bug

v2:
- introduce a transaction-scoped tx_gc list for insert-time GC
- restore tx_gc entries on abort and splice them to expired on commit
- export nft_setelem_data_activate() and factor out nft_rbtree_node_insert()

 include/net/netfilter/nf_tables.h |  3 +
 net/netfilter/nf_tables_api.c     |  4 +-
 net/netfilter/nft_set_rbtree.c    | 96 ++++++++++++++++++++++---------
 3 files changed, 74 insertions(+), 29 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index ec8a8ec9c..f8c912332 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1910,6 +1910,9 @@ struct nft_trans_gc *nft_trans_gc_catchall_async(struct nft_trans_gc *gc,
 						 unsigned int gc_seq);
 struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc);
 
+void nft_setelem_data_activate(const struct net *net,
+				 const struct nft_set *set,
+				 struct nft_elem_priv *elem_priv);
 void nft_setelem_data_deactivate(const struct net *net,
 				 const struct nft_set *set,
 				 struct nft_elem_priv *elem_priv);
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 8c42247a1..8e783db3f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5837,7 +5837,7 @@ static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
 	}
 }
 
-static void nft_setelem_data_activate(const struct net *net,
-				      const struct nft_set *set,
-				      struct nft_elem_priv *elem_priv);
+void nft_setelem_data_activate(const struct net *net,
+			       const struct nft_set *set,
+			       struct nft_elem_priv *elem_priv);
 
@@ -7656,7 +7656,7 @@ static int nft_setelem_active_next(const struct net *net,
 	return nft_set_elem_active(ext, genmask);
 }
 
-static void nft_setelem_data_activate(const struct net *net,
-				      const struct nft_set *set,
-				      struct nft_elem_priv *elem_priv)
+void nft_setelem_data_activate(const struct net *net,
+			       const struct nft_set *set,
+			       struct nft_elem_priv *elem_priv)
 {
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 737c339de..e1f76d6ef 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -36,6 +36,7 @@ struct nft_rbtree {
 	unsigned long		start_rbe_cookie;
 	unsigned long		last_gc;
 	struct list_head	expired;
+	struct list_head	tx_gc;
 	u64			last_tstamp;
 };
 
@@ -194,14 +195,14 @@ nft_rbtree_get(const struct net *net, const struct nft_set *set,
 
 static void nft_rbtree_gc_elem_move(struct net *net, struct nft_set *set,
 				    struct nft_rbtree *priv,
-				    struct nft_rbtree_elem *rbe)
+				    struct nft_rbtree_elem *rbe,
+				    struct list_head *target_list)
 {
 	lockdep_assert_held_write(&priv->lock);
 	nft_setelem_data_deactivate(net, set, &rbe->priv);
 	rb_erase(&rbe->node, &priv->root);
 
-	/* collected later on in commit callback */
-	list_add(&rbe->list, &priv->expired);
+	list_add(&rbe->list, target_list);
 }
 
 static const struct nft_rbtree_elem *
@@ -229,10 +230,10 @@ nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
 	rbe_prev = NULL;
 	if (prev) {
 		rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
-		nft_rbtree_gc_elem_move(net, set, priv, rbe_prev);
+		nft_rbtree_gc_elem_move(net, set, priv, rbe_prev, &priv->tx_gc);
 	}
 
-	nft_rbtree_gc_elem_move(net, set, priv, rbe);
+	nft_rbtree_gc_elem_move(net, set, priv, rbe, &priv->tx_gc);
 
 	return rbe_prev;
 }
@@ -335,6 +336,35 @@ static bool nft_rbtree_insert_same_interval(const struct net *net,
 	return false;
 }
 
+static void nft_rbtree_node_insert(const struct nft_set *set,
+				   struct nft_rbtree *priv,
+				   struct nft_rbtree_elem *new)
+{
+	struct nft_rbtree_elem *rbe;
+	struct rb_node *parent, **p;
+	int d;
+
+	lockdep_assert_held_write(&priv->lock);
+
+	parent = NULL;
+	p = &priv->root.rb_node;
+	while (*p) {
+		parent = *p;
+		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
+		d = nft_rbtree_cmp(set, rbe, new);
+		if (d < 0)
+			p = &parent->rb_left;
+		else if (d > 0)
+			p = &parent->rb_right;
+		else if (nft_rbtree_interval_end(rbe))
+			p = &parent->rb_left;
+		else
+			p = &parent->rb_right;
+	}
+	rb_link_node_rcu(&new->node, parent, p);
+	rb_insert_color(&new->node, &priv->root);
+}
+
 static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
 			       struct nft_rbtree_elem *new,
 			       struct nft_elem_priv **elem_priv, u64 tstamp)
@@ -516,25 +546,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
 		return -ENOTEMPTY;
 
 	/* Accepted element: pick insertion point depending on key value */
-	parent = NULL;
-	p = &priv->root.rb_node;
-	while (*p != NULL) {
-		parent = *p;
-		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
-		d = nft_rbtree_cmp(set, rbe, new);
-
-		if (d < 0)
-			p = &parent->rb_left;
-		else if (d > 0)
-			p = &parent->rb_right;
-		else if (nft_rbtree_interval_end(rbe))
-			p = &parent->rb_left;
-		else
-			p = &parent->rb_right;
-	}
-
-	rb_link_node_rcu(&new->node, parent, p);
-	rb_insert_color(&new->node, &priv->root);
+	nft_rbtree_node_insert(set, priv, new);
 	return 0;
 }
 
@@ -920,11 +932,11 @@ static void nft_rbtree_gc_scan(struct nft_set *set)
 		 */
 		write_lock_bh(&priv->lock);
 		if (rbe_end) {
-			nft_rbtree_gc_elem_move(net, set, priv, rbe_end);
+			nft_rbtree_gc_elem_move(net, set, priv, rbe_end, &priv->expired);
 			rbe_end = NULL;
 		}
 
-		nft_rbtree_gc_elem_move(net, set, priv, rbe);
+		nft_rbtree_gc_elem_move(net, set, priv, rbe, &priv->expired);
 		write_unlock_bh(&priv->lock);
 	}
 
@@ -974,6 +986,7 @@ static int nft_rbtree_init(const struct nft_set *set,
 	rwlock_init(&priv->lock);
 	priv->root = RB_ROOT;
 	INIT_LIST_HEAD(&priv->expired);
+	INIT_LIST_HEAD(&priv->tx_gc);
 
 	priv->array = NULL;
 	priv->array_next = NULL;
@@ -1000,6 +1013,11 @@ static void nft_rbtree_destroy(const struct nft_ctx *ctx,
 		nf_tables_set_elem_destroy(ctx, set, &rbe->priv);
 	}
 
+	list_for_each_entry_safe(rbe, next, &priv->tx_gc, list) {
+		list_del(&rbe->list);
+		nf_tables_set_elem_destroy(ctx, set, &rbe->priv);
+	}
+
 	while ((node = priv->root.rb_node) != NULL) {
 		rb_erase(node, &priv->root);
 		rbe = rb_entry(node, struct nft_rbtree_elem, node);
@@ -1047,8 +1065,10 @@ static void nft_rbtree_commit(struct nft_set *set)
 	struct rb_node *node;
 
 	/* No changes, skip, eg. elements updates only. */
-	if (!priv->array_next)
+	if (!priv->array_next) {
+		WARN_ON_ONCE(!list_empty(&priv->tx_gc));
 		return;
+	}
 
 	/* GC can be performed if the binary search blob is going
 	 * to be rebuilt.  It has to be done in two phases: first
@@ -1116,13 +1136,35 @@ static void nft_rbtree_commit(struct nft_set *set)
 	/* New blob is public, queue collected entries for freeing.
 	 * call_rcu ensures elements stay around until readers are done.
 	 */
+	list_splice_tail_init(&priv->tx_gc, &priv->expired);
 	nft_rbtree_gc_queue(set);
 }
 
 static void nft_rbtree_abort(const struct nft_set *set)
 {
 	struct nft_rbtree *priv = nft_set_priv(set);
+	struct nft_rbtree_elem *rbe, *tmp;
 	struct nft_array *array_next;
+	struct net *net;
+
+	/* Restore elements that inline GC moved to the tx_gc list during
+	 * insert: their data was deactivated (use counts decremented) but
+	 * the transaction was aborted, so re-activate and re-insert to
+	 * undo GC side effects and restore transactional rollback semantics.
+	 */
+	if (!list_empty(&priv->tx_gc)) {
+		net = read_pnet(&set->net);
+
+		write_lock_bh(&priv->lock);
+		list_for_each_entry_safe(rbe, tmp, &priv->tx_gc, list) {
+			list_del_init(&rbe->list);
+			nft_setelem_data_activate(net, set, &rbe->priv);
+			nft_rbtree_node_insert(set, priv, rbe);
+		}
+		write_unlock_bh(&priv->lock);
+	}
+
+	priv->start_rbe_cookie = 0;
 
 	if (!priv->array_next)
 		return;
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH net v2] net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers
From: patchwork-bot+netdevbpf @ 2026-04-12 22:30 UTC (permalink / raw)
  To: Daniel Golle
  Cc: nbd, lorenzo, andrew+netdev, davem, edumazet, kuba, pabeni,
	matthias.bgg, angelogioacchino.delregno, pablo, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <ec995ab8ce8be423267a1cc093147a74d2eb9d82.1775789829.git.daniel@makrotopia.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 03:57:52 +0100 you wrote:
> The PPE enforces output frame size limits via per-tag-layer VLAN_MTU
> registers that the driver never initializes. The hardware defaults do
> not account for PPPoE overhead, causing the PPE to punt encapsulated
> frames back to the CPU instead of forwarding them.
> 
> Initialize the registers at PPE start and on MTU changes using the
> maximum GMAC MTU. This is a conservative approximation -- the actual
> per-PPE requirement depends on egress path, but using the global
> maximum ensures the limits are never too small.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers
    https://git.kernel.org/netdev/net/c/2dddb34dd0d0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2 1/2] pppox: remove sk_pppox() helper
From: patchwork-bot+netdevbpf @ 2026-04-12 22:30 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, kees, gnault,
	ericwouds, dawid.osuchowski, netdev, linux-kernel, linux-ppp
In-Reply-To: <20260410054954.114031-1-qingfang.deng@linux.dev>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 13:49:49 +0800 you wrote:
> The sk member can be directly accessed from struct pppox_sock without
> relying on type casting. Remove the sk_pppox() helper and update all
> call sites to use po->sk directly.
> 
> Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
> ---
> Changes in v2: none
>  Link to v1: https://lore.kernel.org/r/20260408015138.280687-1-qingfang.deng@linux.dev
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] pppox: remove sk_pppox() helper
    https://git.kernel.org/netdev/net-next/c/105369d627b9
  - [net-next,v2,2/2] pppox: convert pppox_sk() to use container_of()
    https://git.kernel.org/netdev/net-next/c/6bc78039a77a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v3] netfilter: nft_set_rbtree: fix use count leak on transaction abort
From: Florian Westphal @ 2026-04-12 22:31 UTC (permalink / raw)
  To: Marko Jevtic
  Cc: pablo, netfilter-devel, phil, coreteam, davem, edumazet, kuba,
	pabeni, horms, netdev, linux-kernel
In-Reply-To: <20260412222801.34965-1-marko.jevtic@codereflect.io>

Marko Jevtic <marko.jevtic@codereflect.io> wrote:
> nft_rbtree_abort() does not handle elements moved to the expired list
> by inline GC during __nft_rbtree_insert(). When inline GC encounters
> expired elements during overlap detection, it calls
> nft_rbtree_gc_elem_move() which deactivates element data (decrementing
> chain/object use counts), removes the element from the rbtree, and
> queues it for deferred freeing. On commit, these elements are freed
> via nft_rbtree_gc_queue(). On abort, however, the expired list is
> ignored entirely.
> 
> This leaves use counts permanently decremented after abort.

I have not seen a reason/answer why this needs to be rolled back.
GC is an implementation detail, its not part of the transaction.

It could also be done from work queue, for example, not just from insert
or commit.

I see no reason to change the existing approach.

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] IPA v5.2 support
From: patchwork-bot+netdevbpf @ 2026-04-12 22:40 UTC (permalink / raw)
  To: Luca Weiss
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
	conor+dt, elder, ~postmarketos/upstreaming, phone-devel,
	linux-arm-msm, netdev, devicetree, linux-kernel,
	krzysztof.kozlowski, horms
In-Reply-To: <20260410-ipa-v5-2-v2-0-778422a05060@fairphone.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 09:40:06 +0200 you wrote:
> Add support for IPA v5.2 which can be found in the Milos SoC.
> 
> Note: This series has been split up into two, one for net(-next), one
> for the qcom dts bits.
> 
> Changes in v2:
> - Split the series, drop applied IPA fixes, mark as net-next
> - Pick up tags
> - Link to v1: https://patch.msgid.link/20260403-milos-ipa-v1-0-01e9e4e03d3e@fairphone.com
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] dt-bindings: net: qcom,ipa: add Milos compatible
    https://git.kernel.org/netdev/net-next/c/d471d70cc964
  - [net-next,v2,2/2] net: ipa: add IPA v5.2 configuration data
    https://git.kernel.org/netdev/net-next/c/4bf38bac1b2e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v3 0/2] net: fix skb_ext BUILD_BUG_ON failures with GCOV
From: patchwork-bot+netdevbpf @ 2026-04-12 22:40 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: davem, edumazet, kuba, pabeni, horms, linux, arnd, oberpar,
	zaslonko, netdev, linux-kernel, ptikhomirov, vasileios.almpanis
In-Reply-To: <20260410162150.3105738-1-khorenko@virtuozzo.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 19:21:48 +0300 you wrote:
> This mini-series fixes build failures in net/core/skbuff.c when the
> kernel is built with CONFIG_GCOV_PROFILE_ALL=y.
> 
> This is part of a larger effort to add -fprofile-update=atomic to
> global CFLAGS_GCOV (posted earlier as a combined series):
>   https://lore.kernel.org/lkml/20260401142020.1434243-1-khorenko@virtuozzo.com/T/#t
> 
> [...]

Here is the summary with links:
  - [v3,1/2] net: fix skb_ext_total_length() BUILD_BUG_ON with CONFIG_GCOV_PROFILE_ALL
    https://git.kernel.org/netdev/net-next/c/c0b4382c86e3
  - [v3,2/2] net: add noinline __init __no_profile to skb_extensions_init() for GCOV compatibility
    https://git.kernel.org/netdev/net-next/c/29b1ee8788c5

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH bpf v4 0/2] bpf: fix short IPv4/IPv6 handling in test_run_skb
From: patchwork-bot+netdevbpf @ 2026-04-12 22:50 UTC (permalink / raw)
  To: sun jian
  Cc: ast, daniel, andrii, davem, edumazet, kuba, pabeni, shuah,
	martin.lau, eddyz87, song, yonghong.song, john.fastabend, kpsingh,
	sdf, haoluo, jolsa, horms, syzbot+619b9ef527f510a57cfc, bpf,
	netdev, linux-kselftest, linux-kernel
In-Reply-To: <20260408034623.180320-1-sun.jian.kdev@gmail.com>

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed,  8 Apr 2026 11:46:21 +0800 you wrote:
> bpf_prog_test_run_skb() may access IPv4/IPv6 network headers based on
> skb->protocol even when the provided test input only contains an
> Ethernet header.
> 
> Fix it by rejecting such short IPv4/IPv6 inputs before accessing the
> L3 headers, and add a selftest that exercises the reported
> bpf_skb_adjust_room() path on ETH_HLEN-sized IPv4/IPv6 EtherType
> inputs.
> 
> [...]

Here is the summary with links:
  - [bpf,v4,1/2] bpf: reject short IPv4/IPv6 inputs in bpf_prog_test_run_skb
    https://git.kernel.org/bpf/bpf-next/c/12bec2bd4b76
  - [bpf,v4,2/2] selftests/bpf: cover short IPv4/IPv6 inputs with adjust_room
    https://git.kernel.org/bpf/bpf-next/c/f1cc94665df9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2 1/2] keys, dns: drop unused upayload->data NUL terminator
From: Thorsten Blum @ 2026-04-12 23:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Tim Bird, netdev, linux-kernel
In-Reply-To: <20260412141004.22c6686c@kernel.org>

On Sun, Apr 12, 2026 at 02:10:04PM -0700, Jakub Kicinski wrote:
> On Fri, 10 Apr 2026 00:57:02 +0200 Thorsten Blum wrote:
> > In dns_resolver_preparse(), do not NUL-terminate ->data and allocate one
> > byte less. The NUL terminator is never used and only ->datalen bytes are
> > accessed.
> 
> I can't see where this is used at all.
> Please write better commit messages, there's no way this 1 byte
> is worth the amount of time I wasted trying to review this :/

The point of patch 1/2 is not the removed NUL terminator itself, but to
prepare for patch 2/2, which adds __counted_by() and requires ->datalen
to match the number of elements in ->data.

Currently, that is not the case because ->data includes an extra NUL
despite never being used as a C string. Removing the unused terminator
makes the length match the allocation size and allows adding the
__counted_by() annotation.

I can fold this into the __counted_by() patch if you prefer.

^ permalink raw reply

* [PATCH net-next v3] r8169: Use napi_schedule_irqoff()
From: Matt Vollrath @ 2026-04-12 23:29 UTC (permalink / raw)
  To: netdev
  Cc: Matt Vollrath, edumazet, pabeni, hkallweit1, kuba, andrew+netdev,
	nic_swsd

napi_schedule() masks hard interrupts while doing its work, which is
redundant when called from an interrupt handler where hard interrupts
are already masked. Use napi_schedule_irqoff() instead to bypass this
redundant masking. This is an optimization.

This is a partial reversion of a previous fix:
Commit 2734a24e6e5d ("r8169: fix issue with forced threading in combination with shared interrupts")
was applied in 2020 to work around an issue with forced threading.
IRQ handlers were run without interrupts masked, and RX interrupts could
be missed in the race, causing delays.

This was fixed in 2021 by masking interrupts in forced thread context:
Commit 81e2073c175b ("genirq: Disable interrupts for force threaded handlers")

Compatibility with PREEMPT_RT also came in 2021:
Commit 8380c81d5c4f ("net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT")

Tested on a Lenovo RTL8168h/8111h.

Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 791277e750ba..4c0ad0de3410 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4873,7 +4873,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		phy_mac_interrupt(tp->phydev);

 	rtl_irq_disable(tp);
-	napi_schedule(&tp->napi);
+	napi_schedule_irqoff(&tp->napi);
 out:
 	rtl_ack_events(tp, status);

-- 
2.43.0

Changes:
v3:
* Describe the history of this schedule call
v2:
* CC the maintainers, make the CI board green

^ permalink raw reply related

* Re: [PATCH net-next v2 1/2] keys, dns: drop unused upayload->data NUL terminator
From: Jakub Kicinski @ 2026-04-13  0:05 UTC (permalink / raw)
  To: Thorsten Blum
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Tim Bird, netdev, linux-kernel
In-Reply-To: <adwlFgKPdW4zDVb_@linux.dev>

On Mon, 13 Apr 2026 01:04:54 +0200 Thorsten Blum wrote:
> On Sun, Apr 12, 2026 at 02:10:04PM -0700, Jakub Kicinski wrote:
> > On Fri, 10 Apr 2026 00:57:02 +0200 Thorsten Blum wrote:  
> > > In dns_resolver_preparse(), do not NUL-terminate ->data and allocate one
> > > byte less. The NUL terminator is never used and only ->datalen bytes are
> > > accessed.  
> > 
> > I can't see where this is used at all.
> > Please write better commit messages, there's no way this 1 byte
> > is worth the amount of time I wasted trying to review this :/  
> 
> The point of patch 1/2 is not the removed NUL terminator itself, but to
> prepare for patch 2/2, which adds __counted_by() and requires ->datalen
> to match the number of elements in ->data.
> 
> Currently, that is not the case because ->data includes an extra NUL
> despite never being used as a C string. Removing the unused terminator
> makes the length match the allocation size and allows adding the
> __counted_by() annotation.
> 
> I can fold this into the __counted_by() patch if you prefer.

I understand that part, but I don't get where the data from which 
the terminating character is removed, is used. Only other access
I saw was freeing it, the rest of the callback seem to looking
at the error, not the data..

^ permalink raw reply

* Re: [PATCH v2] nfc: hci: fix OOB heap read on short HCP frames
From: Ashutosh Desai @ 2026-04-13  0:06 UTC (permalink / raw)
  To: kuba; +Cc: netdev, edumazet, davem, pabeni, horms, linux-kernel
In-Reply-To: <20260412134218.34cbe88d@kernel.org>

On Sun, 12 Apr 2026 13:42:18 -0700 Jakub Kicinski wrote:
> As Eric mentioned elsewhere - he did not suggest any of this,
> merely reviewed your submission.

Agree, that tag was incorrect on my part. Will remove it in the
next version.

> How did a broken packet get enqueued in the first place?

You are right to point that out. nfc_hci_recv_from_llc() already
gates the queue with pskb_may_pull(), so a short skb cannot reach
nfc_hci_msg_rx_work() to begin with. The same holds for the nci
path. Those two checks are redundant and will be dropped in v3.

^ permalink raw reply

* Re: [PATCH net-next v3] r8169: Use napi_schedule_irqoff()
From: Jakub Kicinski @ 2026-04-13  0:06 UTC (permalink / raw)
  To: Matt Vollrath
  Cc: netdev, edumazet, pabeni, hkallweit1, andrew+netdev, nic_swsd
In-Reply-To: <20260412232914.31463-1-tactii@gmail.com>

On Sun, 12 Apr 2026 19:29:14 -0400 Matt Vollrath wrote:
> napi_schedule() masks hard interrupts while doing its work, which is
> redundant when called from an interrupt handler where hard interrupts
> are already masked. Use napi_schedule_irqoff() instead to bypass this
> redundant masking. This is an optimization.

Linus tagged final v7.0, net-next is closed. See for more information:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html

^ permalink raw reply

* Re: [PATCH net-next v9 00/10] net: phy_port: SFP modules representation and phy_port listing
From: Russell King (Oracle) @ 2026-04-13  0:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxime Chevallier, Paolo Abeni, davem, Andrew Lunn, Eric Dumazet,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260412142732.5dec7ebe@kernel.org>

On Sun, Apr 12, 2026 at 02:27:32PM -0700, Jakub Kicinski wrote:
> On Thu, 9 Apr 2026 10:40:13 +0200 Maxime Chevallier wrote:
> > Let's see if the PHY crew have things to say on the overall approach :)
> 
> Not a word from them. I suspect we need call a meeting or just apply
> this after the merge window..

Sorry, no opportunity has presented itself yet to review this, and
won't do for a few more days due to appointments.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] keys, dns: drop unused upayload->data NUL terminator
From: Thorsten Blum @ 2026-04-13  0:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Tim Bird, netdev, linux-kernel
In-Reply-To: <20260412170508.1f33a371@kernel.org>

On Sun, Apr 12, 2026 at 05:05:08PM -0700, Jakub Kicinski wrote:
> On Mon, 13 Apr 2026 01:04:54 +0200 Thorsten Blum wrote:
> > On Sun, Apr 12, 2026 at 02:10:04PM -0700, Jakub Kicinski wrote:
> > > On Fri, 10 Apr 2026 00:57:02 +0200 Thorsten Blum wrote:  
> > > > In dns_resolver_preparse(), do not NUL-terminate ->data and allocate one
> > > > byte less. The NUL terminator is never used and only ->datalen bytes are
> > > > accessed.  
> > > 
> > > I can't see where this is used at all.
> > > Please write better commit messages, there's no way this 1 byte
> > > is worth the amount of time I wasted trying to review this :/  
> > 
> > The point of patch 1/2 is not the removed NUL terminator itself, but to
> > prepare for patch 2/2, which adds __counted_by() and requires ->datalen
> > to match the number of elements in ->data.
> > 
> > Currently, that is not the case because ->data includes an extra NUL
> > despite never being used as a C string. Removing the unused terminator
> > makes the length match the allocation size and allows adding the
> > __counted_by() annotation.
> > 
> > I can fold this into the __counted_by() patch if you prefer.
> 
> I understand that part, but I don't get where the data from which 
> the terminating character is removed, is used. Only other access
> I saw was freeing it, the rest of the callback seem to looking
> at the error, not the data..

->data and ->datalen are used in multiple places.

For example, in dns_query() in net/dns_resolver/dns_query.c:

	upayload = user_key_payload_locked(rkey);
	len = upayload->datalen;

	if (_result) {
		ret = -ENOMEM;
		*_result = kmemdup_nul(upayload->data, len, GFP_KERNEL);
		if (!*_result)
			goto put;
	}

In cifs_set_cifscreds() in fs/smb/client/connect.c:

	/* find first : in payload */
	payload = upayload->data;
	delim = strnchr(payload, upayload->datalen, ':');

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Sam Edwards @ 2026-04-13  1:42 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, netdev, Paolo Abeni
In-Reply-To: <aduq7Lvkfrz971Rb@shell.armlinux.org.uk>

On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> As the dwmac 5.0 core receive path seems to lock up after the first
> RBU, I never see more than one of those at a time.
>
> Right now, I consider this pretty much unsolvable - I've spent quite
> some time looking at it and trying various approaches, nothing seems
> to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
> paths does seem to improve the situation a little with the 480Mbps
> case, because I think it means that we're reading the descriptors in
> a more timely manner after the hardware has updated them.

Hey Russell,

I'd like to repro this but I currently can't boot net-next. My issue
is the same as [1], and the patch to fix it [2] isn't yet committed
anywhere apparently.

This prevents my Jetson Xavier NX from starting at all (and after
enough attempts, corrupts eMMC); I'm surprised you're not suffering
the same effects. But because this bug lives in the IOMMU subsystem
(and it has somewhat inconsistent effects), perhaps this is just a
different way it manifests? Could you confirm whether your dwmac hang
happens with IOMMU disabled, and/or with [1] reverted or [2] applied?

I'm using a defconfig build and a fairly minimal cmdline (just
console=, root=, and rootwait).

Cheers,
Sam

[1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/
[2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/

^ permalink raw reply

* Re: [net-next v38] mctp pcc: Implement MCTP over PCC Transport
From: Jeremy Kerr @ 2026-04-13  2:15 UTC (permalink / raw)
  To: Jakub Kicinski, admiyo
  Cc: matt, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, sudeep.holla, Jonathan.Cameron, lihuisong
In-Reply-To: <20260410032441.1844450-1-kuba@kernel.org>

Hi Adam,

> > +       memcpy_toio(outbox->chan->shmem,  skb->data, skb->len);
> 
> Is it possible to read out of bounds here if the skb is fragmented?
> 
> The skb->data pointer only points to the linear portion of the packet, while
> skb->len represents the total packet length including page fragments.
> skb_cow_head() does not linearize the packet, so a call to skb_linearize()
> might be needed before copying.

I assume that we should only be seeing linear skbs here, as the driver
does not advertise NETIF_F_FRAGLIST or NETIF_F_SG.

(that said, this could support fragmented skbs quite easily, but that
would be more suitable for a follow-up change)

Cheers,


Jeremy

^ permalink raw reply

* [PATCH net,v2 1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
From: KhaiWenTan @ 2026-04-13  2:03 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, ovidiu.panait.rb,
	vladimir.oltean
  Cc: netdev, linux-stm32, linux-arm-kernel, linux-kernel,
	yoong.siang.song, hong.aun.looi, khai.wen.tan, KhaiWenTan

get_interfaces() will update both the plat->phy_interfaces and
mdio_bus_data->default_an_inband based on reading a SERDES register. As
get_interfaces() will be called after default_an_inband had already been
read, dwmac-intel regressed as a result with incorrect default_an_inband
value in phylink_config.

Therefore, we moved the priv->plat->get_interfaces() to be executed first
before assigning mdio_bus_data->default_an_inband to
config->default_an_inband to ensure default_an_inband is in correct value.

Fixes: d3836052fe09 ("net: stmmac: intel: convert speed_mode_2500() to get_interfaces()")
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
---
v2:
  - update commit message for better understanding (Russell King)
  - corrected the blamed commit (Russell King)
v1: https://patchwork.kernel.org/project/netdevbpf/patch/20260410020735.327590-1-khai.wen.tan@linux.intel.com/
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 13d3cac056be..c92054648a7e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1345,10 +1345,6 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
 	priv->tx_lpi_clk_stop = priv->plat->flags &
 				STMMAC_FLAG_EN_TX_LPI_CLOCKGATING;
 
-	mdio_bus_data = priv->plat->mdio_bus_data;
-	if (mdio_bus_data)
-		config->default_an_inband = mdio_bus_data->default_an_inband;
-
 	/* Get the PHY interface modes (at the PHY end of the link) that
 	 * are supported by the platform.
 	 */
@@ -1356,6 +1352,10 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
 		priv->plat->get_interfaces(priv, priv->plat->bsp_priv,
 					   config->supported_interfaces);
 
+	mdio_bus_data = priv->plat->mdio_bus_data;
+	if (mdio_bus_data)
+		config->default_an_inband = mdio_bus_data->default_an_inband;
+
 	/* Set the platform/firmware specified interface mode if the
 	 * supported interfaces have not already been provided using
 	 * phy_interface as a last resort.
-- 
2.43.0


^ permalink raw reply related

* [PATCH v3] nfc: hci: fix out-of-bounds read in HCP header parsing
From: Ashutosh Desai @ 2026-04-13  2:43 UTC (permalink / raw)
  To: netdev; +Cc: kuba, edumazet, davem, pabeni, horms, linux-kernel,
	Ashutosh Desai
In-Reply-To: <20260413000627.3273477-1-ashutoshdesai993@gmail.com>

nfc_hci_recv_from_llc() and nci_hci_data_received_cb() cast skb->data
to struct hcp_packet and read the message header byte without checking
that enough data is present in the linear sk_buff area. A malicious NFC
peer can send a 1-byte HCP frame that passes through the SHDLC layer
and reaches these functions, causing an out-of-bounds heap read.

Fix this by adding pskb_may_pull() before each cast to ensure the full
2-byte HCP header is pulled into the linear area before it is accessed.

Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
---
v3: drop redundant pskb_may_pull checks from msg_rx_work functions,
    remove incorrect Suggested-by tag
v2: switch skb->len check to pskb_may_pull

 net/nfc/hci/core.c | 5 +++++
 net/nfc/nci/hci.c  | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
index 0d33c81a1..cd9cf6c94 100644
--- a/net/nfc/hci/core.c
+++ b/net/nfc/hci/core.c
@@ -904,6 +904,11 @@ static void nfc_hci_recv_from_llc(struct nfc_hci_dev *hdev, struct sk_buff *skb)
 	 * unblock waiting cmd context. Otherwise, enqueue to dispatch
 	 * in separate context where handler can also execute command.
 	 */
+	if (!pskb_may_pull(hcp_skb, NFC_HCI_HCP_HEADER_LEN)) {
+		kfree_skb(hcp_skb);
+		return;
+	}
+
 	packet = (struct hcp_packet *)hcp_skb->data;
 	type = HCP_MSG_GET_TYPE(packet->message.header);
 	if (type == NFC_HCI_HCP_RESPONSE) {
diff --git a/net/nfc/nci/hci.c b/net/nfc/nci/hci.c
index 40ae8e5a7..6e633da25 100644
--- a/net/nfc/nci/hci.c
+++ b/net/nfc/nci/hci.c
@@ -482,6 +482,11 @@ void nci_hci_data_received_cb(void *context,
 	 * unblock waiting cmd context. Otherwise, enqueue to dispatch
 	 * in separate context where handler can also execute command.
 	 */
+	if (!pskb_may_pull(hcp_skb, NCI_HCI_HCP_HEADER_LEN)) {
+		kfree_skb(hcp_skb);
+		return;
+	}
+
 	packet = (struct nci_hcp_packet *)hcp_skb->data;
 	type = NCI_HCP_MSG_GET_TYPE(packet->message.header);
 	if (type == NCI_HCI_HCP_RESPONSE) {
-- 
2.34.1


^ permalink raw reply related

* RE: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
From: Tung Quang Nguyen @ 2026-04-13  3:06 UTC (permalink / raw)
  To: Ren Wei
  Cc: jmaloy@redhat.com, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	yifanwucs@gmail.com, tomapufckgml@gmail.com, yuantan098@gmail.com,
	bird@lzu.edu.cn, enjou1224z@gmail.com, caoruide123@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <1316452e465e9a96fce44ec15130a14f3872149f.1775809727.git.caoruide123@gmail.com>

>Subject: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
>
>From: Ruide Cao <caoruide123@gmail.com>
>
>tipc_get_gap_ack_blks() reads len, ugack_cnt and bgack_cnt directly from
>msg_data(hdr) before verifying that a STATE message actually contains the
>fixed Gap ACK block header in its logical data area.
>
>A peer that negotiates TIPC_GAP_ACK_BLOCK can send a short STATE message
>with a declared TIPC payload shorter than struct tipc_gap_ack_blks and still
>append a few physical bytes after the header. The helper then trusts those
>bytes as Gap ACK metadata, and the forged bgack_cnt/len values can drive the
>broadcast receive path into kmemdup() beyond the skb boundary.
Can you explain how that peer can alter the STATE message ? If it can, what concrete values are used  and on what fields of the STATE messages ?
>
>Fix this by rejecting Gap ACK parsing unless the logical STATE payload is large
>enough to cover the fixed header, and by rejecting declared Gap ACK lengths
>that are smaller than the fixed header or larger than the logical payload.
>Return 0 for invalid lengths so malformed Gap ACK data is not treated as a
>valid payload offset, and drop unicast STATE messages that advertise Gap ACK
>support but still yield an invalid Gap ACK length. This keeps malformed Gap
>ACK data ignored without misaligning monitor payload parsing.
>
>Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
>Cc: stable@kernel.org
>Reported-by: Yifan Wu <yifanwucs@gmail.com>
>Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>Co-developed-by: Yuan Tan <yuantan098@gmail.com>
>Signed-off-by: Yuan Tan <yuantan098@gmail.com>
>Suggested-by: Xin Liu <bird@lzu.edu.cn>
>Tested-by: Ren Wei <enjou1224z@gmail.com>
>Signed-off-by: Ruide Cao <caoruide123@gmail.com>
>Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>---
> net/tipc/link.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
>diff --git a/net/tipc/link.c b/net/tipc/link.c index 49dfc098d89b..44678d98939a
>100644
>--- a/net/tipc/link.c
>+++ b/net/tipc/link.c
>@@ -1415,12 +1415,22 @@ u16 tipc_get_gap_ack_blks(struct
>tipc_gap_ack_blks **ga, struct tipc_link *l,
> 			  struct tipc_msg *hdr, bool uc)
> {
> 	struct tipc_gap_ack_blks *p;
>-	u16 sz = 0;
>+	u16 sz = 0, dlen = msg_data_sz(hdr);
>
> 	/* Does peer support the Gap ACK blocks feature? */
> 	if (l->peer_caps & TIPC_GAP_ACK_BLOCK) {
>+		u16 min_sz = struct_size(p, gacks, 0);
>+
>+		if (dlen < min_sz)
>+			goto ignore;
This checking is redundant because with existing sanity checking, the invalid gap ACK blocks will not be used to release acked messages in transmit queue.
>+
> 		p = (struct tipc_gap_ack_blks *)msg_data(hdr);
> 		sz = ntohs(p->len);
>+		if (sz < min_sz || sz > dlen) {
>+			sz = 0;
>+			goto ignore;
>+		}
This checking is redundant. Existing sanity checking is good enough.
>+
> 		/* Sanity check */
> 		if (sz == struct_size(p, gacks, size_add(p->ugack_cnt, p-
>>bgack_cnt))) {
> 			/* Good, check if the desired type exists */ @@ -
>1434,6 +1444,8 @@ u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga,
>struct tipc_link *l,
> 			}
> 		}
> 	}
>+
>+ignore:
> 	/* Other cases: ignore! */
> 	p = NULL;
>
>@@ -2270,7 +2282,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct
>sk_buff *skb,
> 	case STATE_MSG:
> 		/* Validate Gap ACK blocks, drop if invalid */
> 		glen = tipc_get_gap_ack_blks(&ga, l, hdr, true);
>-		if (glen > dlen)
>+		if (glen > dlen || ((l->peer_caps & TIPC_GAP_ACK_BLOCK) &&
>!glen))
This checking is redundant. Existing sanity checking is good enough.
> 			break;
>
> 		l->rcv_nxt_state = msg_seqno(hdr) + 1;
>--
>2.34.1
>


^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] bpf: Use kmalloc_nolock() universally in local storage
From: Slava Imameev @ 2026-04-13  3:48 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: ameryhung, andrii, ast, bot+bpf-ci, bpf, clm, daniel, eddyz87,
	ihor.solodrai, kernel-team, martin.lau, memxor, netdev,
	yonghong.song, linux-open-source
In-Reply-To: <CAADnVQKeFF--bgnZZSU12UY0muuwYA=7EdzLyOi837oZs+bXTA@mail.gmail.com>

On Fri, 10 Apr 2026 21:39:00 -0700 Alexei Starovoitov wrote:
> >
> >
> > This allows value sizes up to ~65KB. Before this patch, socket and
> > inode storage used bpf_map_kzalloc() (backed by regular kmalloc)
> > which could handle those large sizes. After this patch, any
> > elem_size above KMALLOC_MAX_CACHE_SIZE will silently fail: the map
> > creation succeeds via bpf_local_storage_map_alloc_check() but every
> > element allocation returns NULL.
> >
> > Should BPF_LOCAL_STORAGE_MAX_VALUE_SIZE be updated to use
> > KMALLOC_MAX_CACHE_SIZE instead of KMALLOC_MAX_SIZE now that all
> > storage types go through kmalloc_nolock()?
> >
> > Slava Imameev raised the same concern for task storage in
> > https://lore.kernel.org/bpf/20260410014341.47043-1-slava.imameev@crowdstrike.com/
> 
> Right. Let's update it, but I don't think it's a regression.
> On a loaded system kmalloc_large() rarely succeeds for order 2+.
> That's why kmalloc_nolock() doesn't attempt to bridge that gap.
> One or two contiguous physical pages is the best one can expect.
> In early bpf days we picked KMALLOC_MAX_SIZE assuming that
> it's a realistic max for kmalloc().
> It turned out to be wishful thinking.
> kmalloc_large concept should really be removed.
> It deceives users into thinking that it's usable.

In defense of supporting 8KB-64KB allocations for local
storage, we can consider BPF_MAP_TYPE_HASH with BPF_F_NO_PREALLOC
as providing similar functionality to replace the missing 8KB-64KB
local storage allocation support. However, these map entry
allocations can also fail with similar probability since they
depend on the same underlying allocator.


^ permalink raw reply

* [PATCH net-next] pppoe: optimize hash with word access
From: Qingfang Deng @ 2026-04-13  3:52 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Qingfang Deng, Guillaume Nault, Kees Cook,
	Eric Woudstra, netdev, linux-kernel

Currently, hash_item() processes the 6-byte Ethernet address and the
2-byte session ID byte-wise to compute a hash.

Optimize this by using 16-bit word operations: XOR three 16-bit words
from the Ethernet address and the 16-bit session ID, then fold the
result. This reduces the total number of loads and XORs. The Ethernet
addresses in a skb and struct pppoe_addr are both 2-byte aligned, so the
u16 pointer cast is safe.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
---
 drivers/net/ppp/pppoe.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index d546a7af0d54..e2e70628958b 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -136,15 +136,15 @@ static inline int cmp_addr(struct pppoe_addr *a, __be16 sid, char *addr)
 #error 8 must be a multiple of PPPOE_HASH_BITS
 #endif
 
-static int hash_item(__be16 sid, unsigned char *addr)
+static u8 hash_item(__be16 sid, const u8 addr[ETH_ALEN])
 {
-	unsigned char hash = 0;
+	const u16 *addr16 = (const u16 *)addr;
 	unsigned int i;
+	u16 hash16;
+	u8 hash;
 
-	for (i = 0; i < ETH_ALEN; i++)
-		hash ^= addr[i];
-	for (i = 0; i < sizeof(sid_t) * 8; i += 8)
-		hash ^= (__force __u32)sid >> i;
+	hash16 = addr16[0] ^ addr16[1] ^ addr16[2] ^ (__force u16)sid;
+	hash = (hash16 >> 8) ^ hash16;
 	for (i = 8; (i >>= 1) >= PPPOE_HASH_BITS;)
 		hash ^= hash >> i;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH v3 net] openvswitch: limit vport upcall portids to the number of CPUs
From: Weiming Shi @ 2026-04-13  3:55 UTC (permalink / raw)
  To: Aaron Conole, Eelco Chaudron, Ilya Maximets, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Thomas Graf, Pravin B Shelar, Alex Wang, netdev,
	dev, linux-kernel, Xiang Mei, Weiming Shi

The vport netlink reply helpers allocate a fixed-size skb with
nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
array via ovs_vport_get_upcall_portids().  Since
ovs_vport_set_upcall_portids() accepts any non-zero multiple of
sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
array large enough to overflow the reply buffer, causing nla_put() to
fail with -EMSGSIZE and hitting BUG_ON(err < 0).  On systems with
unprivileged user namespaces enabled (e.g., Ubuntu default), this is
reachable via unshare -Urn since OVS vport mutation operations use
GENL_UNS_ADMIN_PERM.

  kernel BUG at net/openvswitch/datapath.c:2414!
  Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
  CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1
  RIP: 0010:ovs_vport_cmd_set+0x34c/0x400
  Call Trace:
   <TASK>
   genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116)
   genl_rcv_msg (net/netlink/genetlink.c:1194)
   netlink_rcv_skb (net/netlink/af_netlink.c:2550)
   genl_rcv (net/netlink/genetlink.c:1219)
   netlink_unicast (net/netlink/af_netlink.c:1344)
   netlink_sendmsg (net/netlink/af_netlink.c:1894)
   __sys_sendto (net/socket.c:2206)
   __x64_sys_sendto (net/socket.c:2209)
   do_syscall_64 (arch/x86/entry/syscall_64.c:63)
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
   </TASK>
  Kernel panic - not syncing: Fatal exception

Reject attempts to set more PIDs than num_possible_cpus() in
ovs_vport_set_upcall_portids(), and pre-compute the worst-case reply
size in ovs_vport_cmd_msg_size() based on that bound, similar to the
existing ovs_dp_cmd_msg_size().

Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's.")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v3:
 - Cap PID array at num_possible_cpus() in ovs_vport_set_upcall_portids().
 - Add ovs_vport_cmd_msg_size() for worst-case reply allocation.
 - Keep BUG_ON()s, fix Fixes tag.
v2:
 - Dynamically size reply skb instead of using fixed NLMSG_DEFAULT_SIZE.
 - Drop WARN_ON_ONCE; use plain error returns instead.

 net/openvswitch/datapath.c | 23 +++++++++++++++++++++--
 net/openvswitch/vport.c    |  3 +++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e209099218b4..4049bfa1c4df 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2184,9 +2184,28 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
 	return err;
 }

+static size_t ovs_vport_cmd_msg_size(void)
+{
+	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
+
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_PORT_NO */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_TYPE */
+	msgsize += nla_total_size(IFNAMSIZ);
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_IFINDEX */
+	msgsize += nla_total_size(sizeof(s32)); /* OVS_VPORT_ATTR_NETNSID */
+	msgsize += nla_total_size_64bit(sizeof(struct ovs_vport_stats));
+	msgsize += nla_total_size(nla_total_size_64bit(sizeof(u64)) +
+				  nla_total_size_64bit(sizeof(u64)));
+	msgsize += nla_total_size(num_possible_cpus() * sizeof(u32));
+	msgsize += nla_total_size(nla_total_size(sizeof(u16)) +
+				  nla_total_size(nla_total_size(0)));
+
+	return msgsize;
+}
+
 static struct sk_buff *ovs_vport_cmd_alloc_info(void)
 {
-	return nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	return genlmsg_new(ovs_vport_cmd_msg_size(), GFP_KERNEL);
 }

 /* Called with ovs_mutex, only via ovs_dp_notify_wq(). */
@@ -2196,7 +2215,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *vport, struct net *net,
 	struct sk_buff *skb;
 	int retval;

-	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	skb = ovs_vport_cmd_alloc_info();
 	if (!skb)
 		return ERR_PTR(-ENOMEM);

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 23f629e94a36..ccd43bc47bc6 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -406,6 +406,9 @@ int ovs_vport_set_upcall_portids(struct vport *vport, const struct nlattr *ids)
 	if (!nla_len(ids) || nla_len(ids) % sizeof(u32))
 		return -EINVAL;

+	if (nla_len(ids) / sizeof(u32) > num_possible_cpus())
+		return -EINVAL;
+
 	old = ovsl_dereference(vport->upcall_portids);

 	vport_portids = kmalloc(sizeof(*vport_portids) + nla_len(ids),
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH] RDS: Fix memory leak in rds_rdma_extra_size()
From: Allison Henderson @ 2026-04-13  4:18 UTC (permalink / raw)
  To: Xiaobo Liu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20260412124455.2008-1-cppcoffee@gmail.com>

On Sun, 2026-04-12 at 20:44 +0800, Xiaobo Liu wrote:
> Free iov->iov when copy_from_user() or page count validation fails in rds_rdma_extra_size().
> 
> This preserves the existing success path and avoids leaking the allocated iovec array on error.

Hi Xiaobo,

Thanks for catching this.  The fix itself looks correct, but it will need your
Signed-off-by line.  Also be sure to note the target tree and subsystem in the subject
line like this "[PATCH net v2] net/rds: Fix memory leak in rds_rdma_extra_size()", and
make sure the commit message wraps at about 72 characters.  Other than that I think
the patch looks good.

Thank you!
Allison

> ---
>  net/rds/rdma.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index aa6465dc7..91a20c1e2 100644
> --- a/net/rds/rdma.c
> +++ b/net/rds/rdma.c
> @@ -560,6 +560,7 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  	struct rds_iovec *vec;
>  	struct rds_iovec __user *local_vec;
>  	int tot_pages = 0;
> +	int ret = 0;
>  	unsigned int nr_pages;
>  	unsigned int i;
>  
> @@ -578,16 +579,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  	vec = &iov->iov[0];
>  
>  	if (copy_from_user(vec, local_vec, args->nr_local *
> -			   sizeof(struct rds_iovec)))
> -		return -EFAULT;
> +			   sizeof(struct rds_iovec))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
>  	iov->len = args->nr_local;
>  
>  	/* figure out the number of pages in the vector */
>  	for (i = 0; i < args->nr_local; i++, vec++) {
>  
>  		nr_pages = rds_pages_in_vec(vec);
> -		if (nr_pages == 0)
> -			return -EINVAL;
> +		if (nr_pages == 0) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>  
>  		tot_pages += nr_pages;
>  
> @@ -595,11 +600,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
>  		 * so tot_pages cannot overflow without first going negative.
>  		 */
> -		if (tot_pages < 0)
> -			return -EINVAL;
> +		if (tot_pages < 0) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>  	}
>  
> -	return tot_pages * sizeof(struct scatterlist);
> +	ret = tot_pages * sizeof(struct scatterlist);
> +
> +out:
> +	if (ret < 0) {
> +		kfree(iov->iov);
> +		iov->iov = NULL;
> +	}
> +	return ret;
>  }
>  
>  /*


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox