Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 2/2] selftests/bpf: Cover small conntrack opts error writes
From: Emil Tsalapatis @ 2026-06-16 22:34 UTC (permalink / raw)
  To: Yiyang Chen, bpf, netfilter-devel
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms, andrii,
	eddyz87, ast, daniel, memxor, martin.lau, song, yonghong.song,
	jolsa, emil, shuah, kartikey406, coreteam, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <c4c898dd23181b676ebf6b6b4d9c54f51bb69c75.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

On Tue Jun 16, 2026 at 1:42 AM EDT, Yiyang Chen wrote:
> Add a conntrack kfunc regression check for opts__sz values that do not
> cover opts->error. The BPF program initializes opts->error with a guard
> value, calls the lookup and allocation kfuncs with opts__sz set to
> sizeof(opts->netns_id), and verifies that the guard is still intact
> after the kfunc returns NULL.
>
> Without the conntrack wrapper guard, the kfunc error path overwrites
> that guard with -EINVAL even though the verifier checked only the first
> four bytes of the options object.
>
> Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>

> ---
>  .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 +++++
>  .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 +++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
> index b33dba4b126e2..14d4c1793aed5 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
> @@ -5,6 +5,8 @@
>  #include "test_bpf_nf.skel.h"
>  #include "test_bpf_nf_fail.skel.h"
>  
> +#define CT_OPTS_ERROR_GUARD 0x12345678
> +
>  static char log_buf[1024 * 1024];
>  
>  struct {
> @@ -119,6 +121,10 @@ static void test_bpf_nf_ct(int mode)
>  	ASSERT_EQ(skel->bss->test_einval_reserved_new, -EINVAL, "Test EINVAL for reserved in new struct not set to 0");
>  	ASSERT_EQ(skel->bss->test_einval_netns_id, -EINVAL, "Test EINVAL for netns_id < -1");
>  	ASSERT_EQ(skel->bss->test_einval_len_opts, -EINVAL, "Test EINVAL for len__opts != NF_BPF_CT_OPTS_SZ");
> +	ASSERT_EQ(skel->bss->test_einval_len_opts_small_lookup, CT_OPTS_ERROR_GUARD,
> +		  "Test no error write for lookup opts__sz before error field");
> +	ASSERT_EQ(skel->bss->test_einval_len_opts_small_alloc, CT_OPTS_ERROR_GUARD,
> +		  "Test no error write for alloc opts__sz before error field");
>  	ASSERT_EQ(skel->bss->test_eproto_l4proto, -EPROTO, "Test EPROTO for l4proto != TCP or UDP");
>  	ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id");
>  	ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup");
> diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
> index 076fbf03a1268..df43649ecb785 100644
> --- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
> +++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
> @@ -10,6 +10,8 @@
>  #define EINVAL 22
>  #define ENOENT 2
>  
> +#define CT_OPTS_ERROR_GUARD 0x12345678
> +
>  #define NF_CT_ZONE_DIR_ORIG (1 << IP_CT_DIR_ORIGINAL)
>  #define NF_CT_ZONE_DIR_REPL (1 << IP_CT_DIR_REPLY)
>  
> @@ -19,6 +21,8 @@ int test_einval_reserved = 0;
>  int test_einval_reserved_new = 0;
>  int test_einval_netns_id = 0;
>  int test_einval_len_opts = 0;
> +int test_einval_len_opts_small_lookup = 0;
> +int test_einval_len_opts_small_alloc = 0;
>  int test_eproto_l4proto = 0;
>  int test_enonet_netns_id = 0;
>  int test_enoent_lookup = 0;
> @@ -124,6 +128,28 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
>  	else
>  		test_einval_len_opts = opts_def.error;
>  
> +	opts_def.error = CT_OPTS_ERROR_GUARD;
> +	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
> +		       sizeof(opts_def.netns_id));
> +	if (ct) {
> +		bpf_ct_release(ct);
> +		test_einval_len_opts_small_lookup = -EINVAL;
> +	} else {
> +		test_einval_len_opts_small_lookup = opts_def.error;
> +	}
> +
> +	opts_def.error = CT_OPTS_ERROR_GUARD;
> +	ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
> +		      sizeof(opts_def.netns_id));
> +	if (ct) {
> +		ct = bpf_ct_insert_entry(ct);
> +		if (ct)
> +			bpf_ct_release(ct);
> +		test_einval_len_opts_small_alloc = -EINVAL;
> +	} else {
> +		test_einval_len_opts_small_alloc = opts_def.error;
> +	}
> +
>  	opts_def.l4proto = IPPROTO_ICMP;
>  	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
>  		       sizeof(opts_def));


^ permalink raw reply

* [PATCH] netdevsim: Fix deadlock in del_device_store() and nsim_bus_exit()
From: Moksh Panicker @ 2026-06-16 22:26 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	skhan, Moksh Panicker, syzbot+1cf303af03cf30b1275a

del_device_store() and nsim_bus_exit() both hold nsim_bus_dev_list_lock
while calling nsim_bus_dev_del(), which calls device_unregister() which
internally acquires the device lock. If another thread already holds
the device lock and tries to acquire nsim_bus_dev_list_lock, a deadlock
occurs:

  INFO: task hung in nsim_bus_dev_del

Fix this by releasing nsim_bus_dev_list_lock before calling
nsim_bus_dev_del() in both locations, after the devices have already
been removed from the list with list_del().

Reported-by: syzbot+1cf303af03cf30b1275a@syzkaller.appspot.com
Closes: https://syzkaller.appspot.com/bug?extid=1cf303af03cf30b1275a
Signed-off-by: Moksh Panicker <mokshpanicker.7@gmail.com>
---
 drivers/net/netdevsim/bus.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index 41483e371..0f02ff8ad 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -241,11 +241,12 @@ del_device_store(const struct bus_type *bus, const char *buf, size_t count)
 		if (nsim_bus_dev->dev.id != id)
 			continue;
 		list_del(&nsim_bus_dev->list);
-		nsim_bus_dev_del(nsim_bus_dev);
 		err = 0;
 		break;
 	}
 	mutex_unlock(&nsim_bus_dev_list_lock);
+	if (!err)
+		nsim_bus_dev_del(nsim_bus_dev);
 	return !err ? count : err;
 }
 static BUS_ATTR_WO(del_device);
@@ -527,11 +528,11 @@ void nsim_bus_exit(void)
 		complete(&nsim_bus_devs_released);
 
 	mutex_lock(&nsim_bus_dev_list_lock);
-	list_for_each_entry_safe(nsim_bus_dev, tmp, &nsim_bus_dev_list, list) {
+	list_for_each_entry_safe(nsim_bus_dev, tmp, &nsim_bus_dev_list, list)
 		list_del(&nsim_bus_dev->list);
-		nsim_bus_dev_del(nsim_bus_dev);
-	}
 	mutex_unlock(&nsim_bus_dev_list_lock);
+	list_for_each_entry_safe(nsim_bus_dev, tmp, &nsim_bus_dev_list, list)
+		nsim_bus_dev_del(nsim_bus_dev);
 
 	wait_for_completion(&nsim_bus_devs_released);
 
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2] [net] net: airoha: Fix QoS counter configuration for Tx-fwd channels
From: patchwork-bot+netdevbpf @ 2026-06-16 22:11 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161132384.2164449.18407700117859190327@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 16 Jun 2026 18:50:29 +0800 you wrote:
> In airoha_qdma_init_qos_stats(), the Tx-fwd counter was incorrectly
> using register index (i << 1) instead of ((i << 1) + 1). This caused
> the Tx-fwd configuration to overwrite the Tx-cpu configuration for
> each QoS channel, resulting in incorrect QoS statistics.
> 
> Fix by using the correct register index ((i << 1) + 1) for Tx-fwd
> counter configuration.
> 
> [...]

Here is the summary with links:
  - [v2,net] net: airoha: Fix QoS counter configuration for Tx-fwd channels
    https://git.kernel.org/netdev/net-next/c/1402ecccf563

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: airoha: Fix QoS counter configuration for Tx-fwd channels
From: patchwork-bot+netdevbpf @ 2026-06-16 22:11 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178160712947.2156222.3765685889775458986@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 16 Jun 2026 18:50:29 +0800 you wrote:
> In airoha_qdma_init_qos_stats(), the Tx-fwd counter was incorrectly
> using register index (i << 1) instead of ((i << 1) + 1). This caused
> the Tx-fwd configuration to overwrite the Tx-cpu configuration for
> each QoS channel, resulting in incorrect QoS statistics.
> 
> Fix by using the correct register index ((i << 1) + 1) for Tx-fwd
> counter configuration.
> 
> [...]

Here is the summary with links:
  - net: airoha: Fix QoS counter configuration for Tx-fwd channels
    https://git.kernel.org/netdev/net-next/c/1402ecccf563

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH v2] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-16 22:08 UTC (permalink / raw)
  To: anthony.l.nguyen, przemyslaw.kitszel; +Cc: intel-wired-lan, netdev

When the admin queue command to read NVM returns EBUSY, the driver
currently treats it as a fatal error and aborts the entire read
operation. This can cause spurious NVM read failures during periods of
high firmware activity.

Add retry logic to ice_read_flat_nvm() that handles EBUSY responses
from the admin queue. When an EBUSY error is encountered, release the
NVM resource lock, wait for ICE_SQ_SEND_DELAY_TIME_MS, re-acquire it,
and retry the failed read. The retry is attempted up to
ICE_SQ_SEND_MAX_EXECUTE times before giving up.

Code was extracted from OOT ice driver 1.15.4 release. Additional
change was made to reset last_cmd in case of retry to make sure that
all commands are retried properly.

Fixes: e94509906d6b ("ice: create function to read a section of the NVM and Shadow RAM")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
---
Changes in v2:
- change ICE_AQ_RC_EBUSY -> LIBIE_AQ_RC_EBUSY

 drivers/net/ethernet/intel/ice/ice_nvm.c | 25 +++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 7e187a804dfa..b3120605d66f 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -67,6 +67,7 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 {
 	u32 inlen = *length;
 	u32 bytes_read = 0;
+	int retry_cnt = 0;
 	bool last_cmd;
 	int status;
 
@@ -96,11 +97,25 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 					 offset, read_size,
 					 data + bytes_read, last_cmd,
 					 read_shadow_ram, NULL);
-		if (status)
-			break;
-
-		bytes_read += read_size;
-		offset += read_size;
+		if (status) {
+			if (hw->adminq.sq_last_status != LIBIE_AQ_RC_EBUSY ||
+			    retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
+				break;
+			ice_debug(hw, ICE_DBG_NVM,
+				  "NVM read EBUSY error, retry %d\n",
+				  retry_cnt + 1);
+			last_cmd = false;
+			ice_release_nvm(hw);
+			msleep(ICE_SQ_SEND_DELAY_TIME_MS);
+			status = ice_acquire_nvm(hw, ICE_RES_READ);
+			if (status)
+				break;
+			retry_cnt++;
+		} else {
+			bytes_read += read_size;
+			offset += read_size;
+			retry_cnt = 0;
+		}
 	} while (!last_cmd);
 
 	*length = bytes_read;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v1 bpf-next 0/2] bpf: bpf_redirect_peer egress redirection
From: Paul Chaignon @ 2026-06-16 22:06 UTC (permalink / raw)
  To: Jordan Rife
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Stanislav Fomichev
In-Reply-To: <CABi4-ogYKX9T_gWcXsKSs5-y=3GA_WqwfyjobmCxexTtQ_H86w@mail.gmail.com>

On Tue, Jun 16, 2026 at 01:49:26PM -0700, Jordan Rife wrote:
> > IMO, calling it BPF_F_EGRESS would be less confusing. It's a shame we
> > can't have the same flag API between bpf_redirect() and
> > bpf_redirect_peer(), but this is creating inconsistent semantics for
> > the terms egress/ingress across the two helpers.
> 
> Yeah, one annoying thing about BPF_F_EGRESS is that it would only
> apply to bpf_redirect_peer, so you still have inconsistencies across

Yes, that's what I meant by "we can't have the same flag API" :)
Alternatively, we could define BPF_F_EGRESS as 1ULL << 1, for both
helpers, but I'm not sure it's worth it. Maybe Daniel will have another
idea?

> helpers. Perhaps this is less weird than having BPF_F_INGRESS perform
> an egress redirection though.
> 
> Jordan

^ permalink raw reply

* [PATCH] net/sched: dualpi2: fix GSO backlog accounting
From: Xingquan Liu @ 2026-06-16 22:02 UTC (permalink / raw)
  To: netdev; +Cc: Jamal Hadi Salim, Jiri Pirko, Victor Nogueira, Xingquan Liu,
	stable

When DualPI2 splits a GSO skb into N segments, it propagates N
additional packets to its parent before returning NET_XMIT_SUCCESS.
The parent then accounts for the original skb once more, leaving its
qlen one larger than the number of packets actually queued.

With QFQ as the parent, after all real packets are dequeued, QFQ still
has a non-zero qlen while its in-service aggregate has no active
classes. qfq_choose_next_agg() returns NULL and qfq_dequeue() passes
the result to qfq_peek_skb(), causing a NULL pointer dereference.

Count only successfully queued segments and propagate the difference
between the original skb and those segments. Return success whenever
at least one segment was queued.

Fixes: 8f9516daedd6 ("sched: Add enqueue/dequeue of dualpi2 qdisc")
Cc: stable@vger.kernel.org
Signed-off-by: Xingquan Liu <b1n@b1n.io>
---
 net/sched/sch_dualpi2.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index dfec3c99eb45..37d6a8960310 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -461,7 +461,7 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		if (IS_ERR_OR_NULL(nskb))
 			return qdisc_drop(skb, sch, to_free);

-		cnt = 1;
+		cnt = 0;
 		byte_len = 0;
 		orig_len = qdisc_pkt_len(skb);
 		skb_list_walk_safe(nskb, nskb, next) {
@@ -488,16 +488,15 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 				byte_len += nskb->len;
 			}
 		}
-		if (cnt > 1) {
+		if (cnt > 0) {
 			/* The caller will add the original skb stats to its
 			 * backlog, compensate this if any nskb is enqueued.
 			 */
-			--cnt;
-			byte_len -= orig_len;
+			qdisc_tree_reduce_backlog(sch, 1 - cnt,
+						  orig_len - byte_len);
 		}
-		qdisc_tree_reduce_backlog(sch, -cnt, -byte_len);
 		consume_skb(skb);
-		return err;
+		return cnt > 0 ? NET_XMIT_SUCCESS : err;
 	}
 	return dualpi2_enqueue_skb(skb, sch, to_free);
 }

base-commit: fbc6a80cb5d3fd4ac4b56e8c9d791dd17be890c4
--
Xingquan Liu

^ permalink raw reply related

* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: patchwork-bot+netdevbpf @ 2026-06-16 22:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, geert,
	chleroy, npiggin, mpe, maddy, linux-mips, linux-m68k,
	linuxppc-dev
In-Reply-To: <20260615222935.947233-1-kuba@kernel.org>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 15 Jun 2026 15:29:33 -0700 you wrote:
> This tiny series moves appletalk out of tree, to:
> 
>   https://github.com/linux-netdev/mod-orphan
> 
> Core maintainainers are unable to keep up with the rate of security
> bug reports and fixes. Nobody seems to care about appletalk enough
> to review the patches.
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] appletalk: stop storing per-interface state in struct net_device
    https://git.kernel.org/netdev/net-next/c/023f9b0f2f4f
  - [net-next,2/2] appletalk: move the protocol out of tree
    https://git.kernel.org/netdev/net-next/c/8a398a0c189e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net: skmsg: preserve sg.copy across SG transforms
From: patchwork-bot+netdevbpf @ 2026-06-16 22:00 UTC (permalink / raw)
  To: Yiming Qian
  Cc: security, john.fastabend, jakub, kuba, sd, davem, edumazet,
	pabeni, horms, keenanat2000, netdev, bpf, linux-kernel, stable
In-Reply-To: <20260610062137.49075-1-yimingqian591@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 10 Jun 2026 06:21:36 +0000 you wrote:
> The sk_msg sg.copy bitmap is part of the scatterlist entry ownership
> state. A set bit tells sk_msg_compute_data_pointers() not to expose the
> entry through writable BPF ctx->data. This protects entries backed by
> pages that are not private to the sk_msg, such as splice-backed file
> page-cache pages.
> 
> Several sk_msg transform paths move, copy, split, or compact
> msg->sg.data[] entries without moving the matching sg.copy bit. This can
> make an externally backed entry arrive at a new slot with a clear copy
> bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable
> ctx->data and BPF stores can modify the original page cache.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: skmsg: preserve sg.copy across SG transforms
    https://git.kernel.org/netdev/net/c/406e8a651a7b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [Intel-wired-lan] e1000e: Report link down after "Detected Hardware Unit Hang" ?
From: Andrew Lunn @ 2026-06-16 21:59 UTC (permalink / raw)
  To: Ruinskiy, Dima
  Cc: Helge Deller, Helge Deller, Tony Nguyen, Przemek Kitszel,
	intel-wired-lan, netdev
In-Reply-To: <51828156-e859-44db-9926-c076796d0f75@intel.com>

> This does not seem like the right direction to me.
> 
> The "Detected Hardware Unit Hang" print does not indicate that the interface
> is dead, but that the transmitter is stalled.
> 
> This can be due to an unusually high load, or a HW fault / race condition
> with another component, etc.
> 
> When a hang is detected, the transmitter is stopped with netif_stop_queue()
> and eventually ndo_tx_timeout triggers a full reset to the device, which in
> many cases recovers it from the hang.

Does a full reset cause the link to be negotiated again? If so, there
is no harm in setting the carrier down. If the reset is successful,
the carrier will be restored. However, if the reset does not recover
the system, does the carrier say down?

    Andrew


^ permalink raw reply

* Re: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned folio
From: Bobby Eshleman @ 2026-06-16 21:59 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, Gerd Hoffmann,
	Sumit Semwal, Christian König, Shuah Khan, Jason Gunthorpe,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org,
	sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net,
	almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com,
	dw@davidwei.uk, Bobby Eshleman
In-Reply-To: <ajG4zaK9zu7qZT1+@devvm29614.prn0.facebook.com>

On Tue, Jun 16, 2026 at 01:57:49PM -0700, Bobby Eshleman wrote:
> On Tue, Jun 16, 2026 at 06:04:03AM +0000, Kasireddy, Vivek wrote:
> > Adding Jason to this discussion.
> > 
> > Hi Bobby,
> > 
> > > Subject: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned
> > > folio
> > > 
> > > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > > 
> > > get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
> > > underlying folio was larger.
> > > 
> > > Instead, walk folios[] and emit one sg entry per folio. When folios
> > We have recently merged a patch (that will make it into 7.2) from Jason that
> > replaced sg_set_folio() with sg_alloc_table_from_pages() in udmabuf driver:
> > https://gitlab.freedesktop.org/drm/tip/-/commit/5bf888673e0dda5a53220fa0c4956271a46c353c
> > 
> > Since you are relying on sg_set_folio(), the core argument against its usage
> > in udmabuf is that it doesn't work well with offsets > PAGE_SIZE, resulting
> > in a malformed scatterlist. Not sure if this can be fixed easily.
> > 
> > > represent large pages (as is for MFD_HUGETLB), each sg entry is a large
> > > page. Normal PAGE_SIZE sg tables are unchanged.
> > > 
> > > This is helpful for importers like net/core/devmem that expect dmabuf sg
> > IMO, udmabuf needs to detect whether importers can handle segments that
> > are > PAGE_SIZE and set the entries appropriately. Please look into how the
> > GPU drivers and other dmabuf exporters/importers handle this situation, so
> > that we can adopt best practices to address this issue.
> > 
> > Thanks,
> > Vivek
> 
> Hey Vivek,
> 
> It sounds looks like that patch might solve my problem. I'll apply and
> troubleshoot from there.
> 
> Thanks!
> 
> Best,
> Bobby

Good news for me, that patch solves the problem. Thanks for bringing
that up! I can drop my udmabuf patch when I respin the series.

Best,
Bobby

^ permalink raw reply

* Re: [PATCH] e1000: Remove redundant else after return
From: Andrew Lunn @ 2026-06-16 21:51 UTC (permalink / raw)
  To: Lovekesh Solanki
  Cc: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
	edumazet, kuba, pabeni, netdev
In-Reply-To: <20260616210008.109635-1-lovekeshsolanki00@gmail.com>

On Wed, Jun 17, 2026 at 02:30:08AM +0530, Lovekesh Solanki wrote:
> The else branch is needless because the preceding branch
> unconditionally returns -ENOMEM
> 
> Reduce nesting by removing unnecessary else
> 
> Signed-off-by: Lovekesh Solanki <lovekeshsolanki00@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 9b09eb144b81..3d97e952c916 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1546,11 +1546,10 @@ static int e1000_setup_tx_resources(struct e1000_adapter *adapter,
>  			      "for the transmit descriptor ring\n");
>  			vfree(txdr->buffer_info);
>  			return -ENOMEM;
> -		} else {
> +		}
>  			/* Free old allocation, new allocation was successful */
>  			dma_free_coherent(&pdev->dev, txdr->size, olddesc,
>  					  olddma);
> -		}

Hi Lovekesh

Please review this patch yourself and tell us what is wrong with it.

Also, please read

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html


    Andrew

---
pw-bot: cr

^ permalink raw reply

* Re: [syzbot] [net?] WARNING in tls_err_abort
From: Sabrina Dubroca @ 2026-06-16 21:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: syzbot, davem, edumazet, horms, john.fastabend, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <20260616142359.43300727@kernel.org>

2026-06-16, 14:23:59 -0700, Jakub Kicinski wrote:
> On Tue, 16 Jun 2026 23:00:54 +0200 Sabrina Dubroca wrote:
> > 2026-06-16, 08:28:16 -0700, Jakub Kicinski wrote:
> > > On Tue, 16 Jun 2026 17:19:22 +0200 Sabrina Dubroca wrote:  
> > > > I suspect err==0, and sock_error() consumed sk_err in between (the
> > > > alternative would be err > 0).
> > > > 
> > > > Something like this?  
> > > 
> > > Makes sense, but what's eating sk_err?  
> > 
> > The 2 remaining sock_error() in tls_rx_rec_wait()? [1]
> 
> How did that elude my grep..

:)

> > > Don't we depend on it being set
> > > to avoid further state transitions once we hit a crypto error?  
> > 
> > I kind of thought so too.
> 
> In which case the question is whether we should try to remove 
> the sock_error() instead? (stating the obvious I guess)

That would make sense, but we can't prevent sock_error() being called
from some helper.

The only relevant one for ktls at the moment seems to be
sk_stream_error(), and I think via sk_stream_wait_memory() we can hit
that EPIPE.


tls_sw_sendmsg_locked has
...
end:
	ret = sk_stream_error(sk, msg->msg_flags, ret);
	return copied > 0 ? copied : ret;


int sk_stream_error(struct sock *sk, int flags, int err)
{
	if (err == -EPIPE)
		err = sock_error(sk) ? : -EPIPE;
...

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net] net: serialize netif_running() check in enqueue_to_backlog()
From: Kuniyuki Iwashima @ 2026-06-16 21:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, syzbot+965506b59a2de0b6905c,
	Julian Anastasov
In-Reply-To: <20260616141317.407791-1-edumazet@google.com>

On Tue, Jun 16, 2026 at 7:13 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Syzbot reported a KASAN slab-use-after-free in fib_rules_lookup().
>
> The root cause is a race condition where packets can escape the backlog
> flushing during device unregistration (e.g., during netns exit).
>
> Commit e9e4dd3267d0 ("net: do not process device backlog during unregistration")
> introduced a lockless netif_running() check in enqueue_to_backlog() to
> prevent queuing packets to an unregistering device.
>
> However, this creates a TOCTOU race window.
>
> A lockless transmitter (like veth_xmit) can pass
> the check before dev_close() clears IFF_UP. If the transmitter is then
> delayed, flush_all_backlogs() can run and finish before the transmitter
> grabs the backlog lock and queues the packet. The packet then escapes
> the flush and triggers UAF later when processed.
>
> Fix this by moving the netif_running() check inside the backlog lock.
> This serializes the check with the flush work (which also grabs the lock).
> We then either queue the packet before the flush runs (so it gets flushed),
> or check netif_running() after the flush/close completes (so it gets dropped).
>
> Fixes: e9e4dd3267d0 ("net: do not process device backlog during unregistration")
> Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/6a315824.b0403584.28d0ff.0000.GAE@google.com/T/#u
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

Thanks for catching this !

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: Fix skb ownership in taggers
From: Vladimir Oltean @ 2026-06-16 21:37 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Florian Fainelli, Jonas Gorski,
	Hauke Mehrtens, Kurt Kanzenbach, Woojung Huh, UNGLinuxDriver,
	Chester A. Unal, Daniel Golle, Matthias Brugger,
	AngeloGioacchino Del Regno, Wei Fang, Clark Wang,
	Clément Léger, George McCollister, David Yang, netdev,
	Sashiko AI Review
In-Reply-To: <20260616-dsa-fix-free-skb-v2-1-9dbda6a19e97@kernel.org>

On Tue, Jun 16, 2026 at 11:36:22AM +0200, Linus Walleij wrote:
> The tag_8021q.c tagger calls vlan_insert_tag() in dsa_8021q_xmit().
> vlan_insert_tag() will consume the skb with kfree_skb() on failure
> and return NULL.
> 
> When NULL is returned as error code to ->xmit() in dsa_user_xmit()
> it will free the same skb again leading to a double-free.
> 
> The idea of dsa_user_xmit() and dsa_switch_rcv() dropping the skb
> they held before the call to ->xmit() and ->rcv() is conceptually
> wrong: the pattern elsewhere in the networking code is that consumers
> drop their skb:s on failure.
> 
> Modify the ->xmit() and ->rcv() call sites to not drop the SKB if
> the taggers return NULL from any of these calls. Move those drops into
> the taggers so every callback error path that retains ownership consumes
> the skb before returning NULL.
> 
> Keep the existing helper ownership rules: VLAN insertion helpers already
> free on failure (this is the case in tag_8021q.c), while deferred
> transmit paths either transfer the skb reference to worker context or
> hold a worker reference with skb_get() and drop the caller's reference.
> 
> For SJA1105 meta RX, transfer the buffered stampable skb under the meta
> lock and return NULL while the skb is waiting for its meta frame: the
> skb is not dropped in this case.
> 
> Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
> Closes: https://lore.kernel.org/r/20260610153952.1685895-1-kuba@kernel.org/
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Assisted-by: Codex:gpt-5-5
> Acked-by: David Yang <mmyangfl@gmail.com> # yt921x
> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
> Signed-off-by: Linus Walleij <linusw@kernel.org>
> ---
> Changes in v2:
> - In some instances __skb_pad() and __skb_put_padto() followed by a
>   kfree_skb() could be simplified to just call skb_pad() and
>   skb_put_padto() which will free the skb on failure.
> - Use a label and goto for the kfree_skb(); return NULL; in
>   the netc_rcv() callback in tag_netc.c as requested.
> - Collect ACKs.
> - Retag for net-next.
> - Link to v1: https://patch.msgid.link/20260616-dsa-fix-free-skb-v1-1-fd30b35dcf66@kernel.org
> ---

From my perspective, the tradeoff between pros and cons is not so well
explained. Consider the following not mentioned in the commit message:

- Changing the kfree_skb() convention, without any mechanical obstacle
  preventing the backporting of patches that are written assuming one
  convention down to trees expecting the other (obstacle like a failure
  to compile, for example, which would warn people of their otherwise
  silent incompatibility), is an avoidable experience (at best) from a
  maintainance perspective.

- Has anyone proven that a real problem exists? Because dsa_user_xmit()
  -> skb_ensure_writable_head_tail() has run successfully at this stage,
  so we know that dev->needed_headroom bytes are available for writing.
  Because DSA uses VLAN as a tag, dsa_user_setup_tagger() will increase
  dev->needed_headroom by VLAN_HLEN for the tag_8021q protocols, so
  vlan_insert_tag() should not fail. I've looked at this function at it
  seems not to be coded up to fail for any other reason.

Otherwise, sure, it seems cleaner this way, but the way I see it, it
risks introducing more issues than it fixes. If maintainers feel
different about this please go ahead, but given the fact that I don't
really have a lot of time to do proper review during this period, I'm
more on the pragmatic side on this one.

^ permalink raw reply

* Re: [syzbot] [net?] WARNING in tls_err_abort
From: Jakub Kicinski @ 2026-06-16 21:23 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: syzbot, davem, edumazet, horms, john.fastabend, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <ajG5hg9oJvyxPplG@krikkit>

On Tue, 16 Jun 2026 23:00:54 +0200 Sabrina Dubroca wrote:
> 2026-06-16, 08:28:16 -0700, Jakub Kicinski wrote:
> > On Tue, 16 Jun 2026 17:19:22 +0200 Sabrina Dubroca wrote:  
> > > I suspect err==0, and sock_error() consumed sk_err in between (the
> > > alternative would be err > 0).
> > > 
> > > Something like this?  
> > 
> > Makes sense, but what's eating sk_err?  
> 
> The 2 remaining sock_error() in tls_rx_rec_wait()? [1]

How did that elude my grep..

> > Don't we depend on it being set
> > to avoid further state transitions once we hit a crypto error?  
> 
> I kind of thought so too.

In which case the question is whether we should try to remove 
the sock_error() instead? (stating the obvious I guess)

> > I thought that's why we don't consume sk_err in recvmsg and sendmsg in
> > the first place (we are not calling sock_error() anywhere)  
> 
> Umm...
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/tls/tls_sw.c#n1095
> 


^ permalink raw reply

* Re: [PATCH] net: faraday: ftmac100: convert to devm resource management
From: Jakub Kicinski @ 2026-06-16 21:21 UTC (permalink / raw)
  To: Jack Lee; +Cc: davem, andrew+netdev, edumazet, pabeni, netdev, linux-kernel
In-Reply-To: <20260616203233.55234-1-skunkolee@gmail.com>

On Tue, 16 Jun 2026 13:32:33 -0700 Jack Lee wrote:
> Replace manual resource management with device-managed alternatives:
> - alloc_etherdev() -> devm_alloc_etherdev()
> - request_mem_region() + ioremap() -> devm_platform_ioremap_resource()
> 
> This simplifies error handling by removing manual cleanup in error
> paths and the remove function, and eliminates the risk of resource
> leaks.

net-next is closed right now. Also:

Quoting documentation:

  Clean-up patches
  ~~~~~~~~~~~~~~~~
  
  Netdev discourages patches which perform simple clean-ups, which are not in
  the context of other work. For example:
  
  * Addressing ``checkpatch.pl``, and other trivial coding style warnings
  * Addressing :ref:`Local variable ordering<rcs>` issues
  * Conversions to device-managed APIs (``devm_`` helpers)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  

  This is because it is felt that the churn that such changes produce comes
  at a greater cost than the value of such clean-ups.
  
  Conversely, spelling and grammar fixes are not discouraged.
  
See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#clean-up-patches
-- 
pw-bot: reject

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Jakub Kicinski @ 2026-06-16 21:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, Petr Mladek, John Ogness,
	Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
	linux-kernel, stable, Frederic Weisbecker, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260616170257.GH49951@noisy.programming.kicks-ass.net>

On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote:
> > So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> > to NBCON console infrastructure"). Because from here now on writes are
> > deferred to the nbcon thread. So this purely about -stable in this case.  
> 
> Hmm, I thought netconsole had some reserved skbs and could to writes
> 'atomic' like? That said, it was 2.6 era the last time I looked at
> netconsole.

Yes, that part is fine. The problem is that netconsole tries
to reap Tx completions if the Tx queue is full. We can't call
skb destructor in irq context so we put the completed skbs on
a queue and try to arm softirq to get to them later.
Arming softirq causes a ksoftirq wake up.

We already skip the completion polling if we detect getting called
from the same networking driver. It's best effort, anyway.
Networking-side fix would be to toss another OR condition into
the skip. But we don't have one that'd work cleanly :S

^ permalink raw reply

* Re: [PATCH net-next V3 2/7] netdevsim: Register devlink after device init
From: Jakub Kicinski @ 2026-06-16 21:05 UTC (permalink / raw)
  To: Mark Bloch
  Cc: Jiri Pirko, Eric Dumazet, Paolo Abeni, Andrew Lunn,
	David S. Miller, Jonathan Corbet, Shuah Khan, Simon Horman,
	Sunil Goutham, Linu Cherian, Geetha sowjanya, hariprasad,
	Subbaraya Sundeep, Bharat Bhushan, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Ethan Nelson-Moore, linux-doc,
	netdev, linux-rdma
In-Reply-To: <7635d50c-1c82-4090-8907-53a72444fc04@nvidia.com>

On Tue, 16 Jun 2026 20:29:25 +0300 Mark Bloch wrote:
> I think the explicit helper is the cleanest option here, without any
> workqueue fallback inside devlink. It avoids depending on devl_register()
> ordering, and makes the support explicit per driver.
> 
> Does that sound like an acceptable direction?

I'd much rather have the workqueue with the purely theoretical race
with user space than a bunch of drivers that don't act on the cmdline
params.

^ permalink raw reply

* [PATCH] e1000: Remove redundant else after return
From: Lovekesh Solanki @ 2026-06-16 21:00 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: przemyslaw.kitszel, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, Lovekesh Solanki

The else branch is needless because the preceding branch
unconditionally returns -ENOMEM

Reduce nesting by removing unnecessary else

Signed-off-by: Lovekesh Solanki <lovekeshsolanki00@gmail.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 9b09eb144b81..3d97e952c916 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1546,11 +1546,10 @@ static int e1000_setup_tx_resources(struct e1000_adapter *adapter,
 			      "for the transmit descriptor ring\n");
 			vfree(txdr->buffer_info);
 			return -ENOMEM;
-		} else {
+		}
 			/* Free old allocation, new allocation was successful */
 			dma_free_coherent(&pdev->dev, txdr->size, olddesc,
 					  olddma);
-		}
 	}
 	memset(txdr->desc, 0, txdr->size);
 
-- 
2.54.0


^ permalink raw reply related

* Re: [syzbot] [net?] WARNING in tls_err_abort
From: Sabrina Dubroca @ 2026-06-16 21:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: syzbot, davem, edumazet, horms, john.fastabend, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <20260616082816.4dd0f035@kernel.org>

2026-06-16, 08:28:16 -0700, Jakub Kicinski wrote:
> On Tue, 16 Jun 2026 17:19:22 +0200 Sabrina Dubroca wrote:
> > I suspect err==0, and sock_error() consumed sk_err in between (the
> > alternative would be err > 0).
> > 
> > Something like this?
> 
> Makes sense, but what's eating sk_err?

The 2 remaining sock_error() in tls_rx_rec_wait()? [1]

> Don't we depend on it being set
> to avoid further state transitions once we hit a crypto error?

I kind of thought so too.

> I thought that's why we don't consume sk_err in recvmsg and sendmsg in
> the first place (we are not calling sock_error() anywhere)

Umm...
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/tls/tls_sw.c#n1095

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned folio
From: Bobby Eshleman @ 2026-06-16 20:57 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, Gerd Hoffmann,
	Sumit Semwal, Christian König, Shuah Khan, Jason Gunthorpe,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org,
	sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net,
	almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com,
	dw@davidwei.uk, Bobby Eshleman
In-Reply-To: <IA0PR11MB71852246277F773AC41DAAA3F8E52@IA0PR11MB7185.namprd11.prod.outlook.com>

On Tue, Jun 16, 2026 at 06:04:03AM +0000, Kasireddy, Vivek wrote:
> Adding Jason to this discussion.
> 
> Hi Bobby,
> 
> > Subject: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned
> > folio
> > 
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
> > underlying folio was larger.
> > 
> > Instead, walk folios[] and emit one sg entry per folio. When folios
> We have recently merged a patch (that will make it into 7.2) from Jason that
> replaced sg_set_folio() with sg_alloc_table_from_pages() in udmabuf driver:
> https://gitlab.freedesktop.org/drm/tip/-/commit/5bf888673e0dda5a53220fa0c4956271a46c353c
> 
> Since you are relying on sg_set_folio(), the core argument against its usage
> in udmabuf is that it doesn't work well with offsets > PAGE_SIZE, resulting
> in a malformed scatterlist. Not sure if this can be fixed easily.
> 
> > represent large pages (as is for MFD_HUGETLB), each sg entry is a large
> > page. Normal PAGE_SIZE sg tables are unchanged.
> > 
> > This is helpful for importers like net/core/devmem that expect dmabuf sg
> IMO, udmabuf needs to detect whether importers can handle segments that
> are > PAGE_SIZE and set the entries appropriately. Please look into how the
> GPU drivers and other dmabuf exporters/importers handle this situation, so
> that we can adopt best practices to address this issue.
> 
> Thanks,
> Vivek

Hey Vivek,

It sounds looks like that patch might solve my problem. I'll apply and
troubleshoot from there.

Thanks!

Best,
Bobby

> 
> > entries to be size and length aligned. Prior to this patch udmabuf
> > handed over one PAGE_SIZE sg entry per page, so devmem only saw
> > PAGE_SIZE chunks regardless of the underlying folio size.
> > 
> > dma_map_sgtable() does not always merge contiguous pages for us, so we
> > do this internally before exporting.
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> >  drivers/dma-buf/udmabuf.c | 52
> > ++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 47 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> > index 94b8ecb892bb..9b751dd98b12 100644
> > --- a/drivers/dma-buf/udmabuf.c
> > +++ b/drivers/dma-buf/udmabuf.c
> > @@ -141,26 +141,68 @@ static void vunmap_udmabuf(struct dma_buf
> > *buf, struct iosys_map *map)
> >  	vm_unmap_ram(map->vaddr, ubuf->pagecount);
> >  }
> > 
> > +/* Return the number of contiguous pages backed by the folio at @i.
> > + * A udmabuf may map only part of a folio, or reference the same folio
> > + * in multiple non-contiguous runs, so folio_nr_pages() can't be used.
> > + */
> > +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t i)
> > +{
> > +	struct folio *f = ubuf->folios[i];
> > +	pgoff_t j;
> > +
> > +	for (j = 1; i + j < ubuf->pagecount; j++) {
> > +		if (ubuf->folios[i + j] != f)
> > +			break;
> > +		/* Same folio, but not a sequential offset within it. */
> > +		if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * PAGE_SIZE)
> > +			break;
> > +	}
> > +	return j;
> > +}
> > +
> > +/* Count the contiguous folio runs in @ubuf, one sg entry per run.
> > + *
> > + * Coalescing folios into a single sg entry up front lets importers actually
> > + * see large chunks. We can't rely on dma_map_sgtable() to do this for us
> > as
> > + * the dma_map_direct() path preserves the input scatterlist lengths
> > verbatim.
> > + */
> > +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
> > +{
> > +	unsigned int nents = 0;
> > +	pgoff_t i;
> > +
> > +	for (i = 0; i < ubuf->pagecount; i += udmabuf_folio_nr_pages(ubuf,
> > i))
> > +		nents++;
> > +	return nents;
> > +}
> > +
> >  static struct sg_table *get_sg_table(struct device *dev, struct dma_buf
> > *buf,
> >  				     enum dma_data_direction direction)
> >  {
> >  	struct udmabuf *ubuf = buf->priv;
> > -	struct sg_table *sg;
> >  	struct scatterlist *sgl;
> > -	unsigned int i = 0;
> > +	struct sg_table *sg;
> > +	pgoff_t i, run;
> > +	unsigned int nents;
> >  	int ret;
> > 
> > +	nents = udmabuf_sg_nents(ubuf);
> > +
> >  	sg = kzalloc_obj(*sg);
> >  	if (!sg)
> >  		return ERR_PTR(-ENOMEM);
> > 
> > -	ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
> > +	ret = sg_alloc_table(sg, nents, GFP_KERNEL);
> >  	if (ret < 0)
> >  		goto err_alloc;
> > 
> > -	for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
> > -		sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
> > +	sgl = sg->sgl;
> > +	for (i = 0; i < ubuf->pagecount; i += run) {
> > +		run = udmabuf_folio_nr_pages(ubuf, i);
> > +		sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
> >  			     ubuf->offsets[i]);
> > +		sgl = sg_next(sgl);
> > +	}
> > 
> >  	ret = dma_map_sgtable(dev, sg, direction, 0);
> >  	if (ret < 0)
> > 
> > --
> > 2.53.0-Meta
> 

^ permalink raw reply

* Re: [PATCH net] octeontx2-af: Validate NIX maximum LFs correctly
From: Subbaraya Sundeep @ 2026-06-16 20:51 UTC (permalink / raw)
  To: Simon Horman
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, netdev, linux-kernel
In-Reply-To: <20260603170759.GG3766816@horms.kernel.org>

Hi Simon,

On 2026-06-03 at 22:37:59, Simon Horman (horms@kernel.org) wrote:
> On Wed, Jun 03, 2026 at 05:59:46PM +0100, Simon Horman wrote:
> > From: 'Simon Horman' <horms@kernel.org>
> > 
> > This is an AI-generated review of your patch. The human sending this
> > email has considered the AI review valid, or at least plausible.
> > Full review at: https://sashiko.dev
> 
> Sorry, my tooling isn't helping me much today.
> There was supposed to be some commentary in my previous email.
> Let me add it.
> 
> > ---
> > octeontx2-af: Validate NIX maximum LFs correctly
> > 
> > This patch changes the validation condition for setting the maximum
> > number of NIX LFs via devlink. It replaces the check for assigned MCAM
> > entries with a check for assigned NIX LFs to determine if resizing
> > is allowed.
> > 
> > > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> > > index 6494a9ee2f0dd..8ba4b5ba9d2c7 100644
> > > --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> > > +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> > [ ... ]
> > > @@ -1521,11 +1523,18 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
> > >  	/* Do not allow user to modify maximum NIX LFs while mcam entries
> > >  	 * have already been assigned.
> > >  	 */
> > > -	mcam = &rvu->hw->mcam;
> > > -	if (mcam->bmap_fcnt < mcam->bmap_entries) {
> > > -		NL_SET_ERR_MSG_MOD(extack,
> > > -				   "mcam entries have already been assigned, can't resize");
> > > -		return -EPERM;
> > 
> 
> The issue below looks like it should be looked at in the context of this
> patch.
> 
> > [Severity: High]
> > Does removing the check for MCAM entries allow the devlink parameter
> > set operation to silently destroy active MCAM tracking state?
> > 
> > The .set callback unconditionally calls npc_mcam_rsrcs_deinit() and
> > npc_mcam_rsrcs_init(), which frees and reallocates the mcam->bmap. Since
> > MCAM entries can be allocated independently of NIX LFs (e.g. during
> > rvu_switch_enable() or default switchdev/PTP rules), could this cause
> > resource leaks and use-after-free bugs when untracked rules are later
> > accessed?
>
Some background:
 Say NIX hardware supports 128 LFs then
 1. 128 MCAM entries are always reserved by AF to receive unicast traffic to each LF
 2. PFs/VFs request a LF attachment during their probe via mbox message to AF
 3. devlink command to set NIX LFs is added later upon requests from customers/users
    so that if their system uses only 2 PFs then they can use the rest 126 mcam
    entries for additional traffic steering
 4. The only execption is user has to load PF and VF drivers as modules after
    setting NIX LFs as first command via AF devlink.
> > 
> > > +	blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
> > > +	while (blkaddr) {
> > > +		block = &rvu->hw->block[blkaddr];
> 
> While the one below seems like it can be assessed in the context
> of possible follow-up.
> 
> > [Severity: High]
> > This is a pre-existing issue, but does this locklessly read the resource
> > states?
> > 
> > It appears that the devlink validate and set callbacks execute without
> > holding the driver's resource locks (rvu->rsrc_lock or mcam->lock).
> > Could a concurrent mailbox request allocate an LF immediately after this
> > lockless validation check succeeds, leading to a race where the .set
> > callback frees structures that are actively being used?
> > 
As from above background this is not possible.
> > > +
> > > +		free_lfs = rvu_rsrc_free_count(&block->lf);
> > > +		if (free_lfs != block->lf.max) {
> > > +			NL_SET_ERR_MSG_MOD(extack,
> > > +					   "mcam entries have already been assigned, can't resize");
> > 
> 
> The one below seems less important. But perhaps it would be good
> to address if there is a v2 of this patch.
> 
> > [Severity: Medium]
> > Since the validation logic was updated to evaluate NIX LF allocations,
> > should this error message and the comment above be updated as well? The
> > message still states that "mcam entries have already been assigned", which
> > might confuse users since it's actually evaluating NIX LFs.
> > 
Sure will modify the message

Thanks,
Sundeep

> > > +			return -EPERM;
> > > +		}
> > > +
> > > +		blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
> > >  	}

^ permalink raw reply

* Re: [PATCH v1 bpf-next 0/2] bpf: bpf_redirect_peer egress redirection
From: Jordan Rife @ 2026-06-16 20:49 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Stanislav Fomichev
In-Reply-To: <ajAXF8Nvg91xU4f2@mail.gmail.com>

> IMO, calling it BPF_F_EGRESS would be less confusing. It's a shame we
> can't have the same flag API between bpf_redirect() and
> bpf_redirect_peer(), but this is creating inconsistent semantics for
> the terms egress/ingress across the two helpers.

Yeah, one annoying thing about BPF_F_EGRESS is that it would only
apply to bpf_redirect_peer, so you still have inconsistencies across
helpers. Perhaps this is less weird than having BPF_F_INGRESS perform
an egress redirection though.

Jordan

^ permalink raw reply

* Re: [PATCH net v2 1/2] iov_iter: export iov_iter_restore
From: Jens Axboe @ 2026-06-16 20:47 UTC (permalink / raw)
  To: Octavian Purdila, netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo
In-Reply-To: <20260613000953.467473-2-tavip@google.com>

On 6/12/26 6:09 PM, Octavian Purdila wrote:
> Export iov_iter_restore so that it can be used by modules.
> 
> This is needed by the virtio vsock transport (which can be built as a
> module) to restore the msg_iter state when transmission fails.
> 
> Signed-off-by: Octavian Purdila <tavip@google.com>
> ---
>  lib/iov_iter.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 243662af1af73..067e745f9ef53 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
>  		i->__iov -= state->nr_segs - i->nr_segs;
>  	i->nr_segs = state->nr_segs;
>  }
> +EXPORT_SYMBOL(iov_iter_restore);

I don't have a problem exporting this to modules, but any new export
should be _GPL. So please change it to that.

-- 
Jens Axboe

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox