Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf v3 1/2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Sechang Lim @ 2026-06-18 10:27 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski
  Cc: Simon Horman, Bobby Eshleman, Jiayuan Chen, netdev, bpf,
	linux-kernel
In-Reply-To: <20260618102718.2331468-1-rhkrqnwk98@gmail.com>

sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
to find the length of the next message. strparser assembles a message out
of several received skbs by chaining them onto the head's frag_list and
recording where to append the next one in strp->skb_nextp:

	*strp->skb_nextp = skb;
	strp->skb_nextp = &skb->next;

and then calls the parser on the head:

	len = (*strp->cb.parse_msg)(strp, head);

The parser is only meant to inspect the skb, but the program may call
bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
Once the head carries a frag_list these go

	... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail

and __pskb_pull_tail() frees the frag_list skbs that strparser still
tracks through skb_nextp:

	while ((list = skb_shinfo(skb)->frag_list) != insp) {
		skb_shinfo(skb)->frag_list = list->next;
		consume_skb(list);
	}

strp->skb_nextp now points into a freed sk_buff. The next segment of
the same message arrives in __strp_recv(), which links it with
*strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
and the write happen in different __strp_recv() calls, so the message
has to span at least three segments before it triggers.

  BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
  Write of size 8 at addr ffff88810db86140 by task repro/349

  Call Trace:
   <IRQ>
   __strp_recv+0x447/0xda0
   __tcp_read_sock+0x13d/0x590
   tcp_bpf_strp_read_sock+0x195/0x320
   strp_data_ready+0x267/0x340
   sk_psock_strp_data_ready+0x1ce/0x350
   tcp_data_queue+0x1364/0x2fd0
   tcp_rcv_established+0xe07/0x1640
   [...]

  Allocated by task 349:
   skb_clone+0x17b/0x210
   __strp_recv+0x2c3/0xda0
   __tcp_read_sock+0x13d/0x590
   [...]

  Freed by task 349:
   kmem_cache_free+0x150/0x570
   __pskb_pull_tail+0x57b/0xc20
   skb_ensure_writable+0x236/0x260
   __bpf_skb_change_tail+0x1d4/0x590
   sk_skb_change_tail+0x2a/0x40
   bpf_prog_1b285dcd6c41373e+0x27/0x30
   bpf_prog_run_pin_on_cpu+0xf3/0x260
   sk_psock_strp_parse+0x118/0x1e0
   __strp_recv+0x4f6/0xda0
   [...]

The same resize also leaves the head's length inconsistent with its
frags, so a later __pskb_pull_tail() can instead hit the
BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.

A stream parser is only meant to measure the next message, not to modify
the packet. Reject a parser whose program can change packet data
(prog->aux->changes_pkt_data) at attach time. The check is shared by
sock_map_prog_update() and sock_map_link_update_prog(), which between them
cover prog attach, link create and link update. Verdict programs are
unaffected and may still modify the skb.

Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
---
 net/core/sock_map.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 99e3789492a0..c60ba6d292f9 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1515,6 +1515,17 @@ static int sock_map_prog_link_lookup(struct bpf_map *map, struct bpf_prog ***ppr
 	return 0;
 }
 
+static int sock_map_prog_attach_check(enum bpf_attach_type attach_type,
+				      struct bpf_prog *prog)
+{
+	/* A stream parser must not modify the skb, only measure it. */
+	if (prog && attach_type == BPF_SK_SKB_STREAM_PARSER &&
+	    prog->aux->changes_pkt_data)
+		return -EINVAL;
+
+	return 0;
+}
+
 /* Handle the following four cases:
  * prog_attach: prog != NULL, old == NULL, link == NULL
  * prog_detach: prog == NULL, old != NULL, link == NULL
@@ -1533,6 +1544,10 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
 	if (ret)
 		return ret;
 
+	ret = sock_map_prog_attach_check(which, prog);
+	if (ret)
+		return ret;
+
 	/* for prog_attach/prog_detach/link_attach, return error if a bpf_link
 	 * exists for that prog.
 	 */
@@ -1776,6 +1791,11 @@ static int sock_map_link_update_prog(struct bpf_link *link,
 		ret = -EINVAL;
 		goto out;
 	}
+
+	ret = sock_map_prog_attach_check(link->attach_type, prog);
+	if (ret)
+		goto out;
+
 	if (!sockmap_link->map) {
 		ret = -ENOLINK;
 		goto out;
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf v3 0/2] bpf, sockmap: reject a packet-modifying SK_SKB stream parser
From: Sechang Lim @ 2026-06-18 10:27 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski
  Cc: Simon Horman, Bobby Eshleman, Jiayuan Chen, netdev, bpf,
	linux-kernel

A BPF_PROG_TYPE_SK_SKB stream parser runs on strparser's message head,
which can chain skbs through frag_list. A parser that resizes the skb
frees the frag_list segments that strparser still tracks through
skb_nextp, leading to a use-after-free.

A stream parser is only meant to measure the next message, not to modify
the packet, so reject a packet-modifying parser at attach time rather
than working around the resize at runtime.

v3:
 - reject the parser at attach time instead of cloning the skb at
   runtime (Kuniyuki Iwashima, Jiayuan Chen)
 - add a selftest (Bobby Eshleman)

v2:
 - https://lore.kernel.org/all/20260612123553.2724240-1-rhkrqnwk98@gmail.com/

v1:
 - https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@gmail.com/

Sechang Lim (2):
  bpf, sockmap: fix use-after-free when the stream parser resizes the
    skb
  selftests/bpf: test rejection of a packet-modifying SK_SKB stream
    parser

 net/core/sock_map.c                           | 20 ++++++++++++
 .../selftests/bpf/prog_tests/sockmap_strp.c   | 31 +++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_strp.c   |  7 +++++
 3 files changed, 58 insertions(+)

-- 
2.43.0


^ permalink raw reply

* [PATCH bpf-next v2 2/2] selftests/bpf: Cover small conntrack opts error writes
From: Yiyang Chen @ 2026-06-18 10:18 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781765747.git.chenyy23@mails.tsinghua.edu.cn>

Add a conntrack kfunc regression check for opts__sz values that do not
cover opts->error. The BPF program initializes opts->error with a guard
value, calls the lookup and allocation kfuncs with opts__sz set to
sizeof(opts->netns_id), and verifies that the guard is still intact
after the kfunc returns NULL.

Without the conntrack wrapper guard, the kfunc error path overwrites
that guard with -EINVAL even though the verifier checked only the first
four bytes of the options object.

Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")
Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
---
 .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 +++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 +++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
index b33dba4b126e2..14d4c1793aed5 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
@@ -5,6 +5,8 @@
 #include "test_bpf_nf.skel.h"
 #include "test_bpf_nf_fail.skel.h"
 
+#define CT_OPTS_ERROR_GUARD 0x12345678
+
 static char log_buf[1024 * 1024];
 
 struct {
@@ -119,6 +121,10 @@ static void test_bpf_nf_ct(int mode)
 	ASSERT_EQ(skel->bss->test_einval_reserved_new, -EINVAL, "Test EINVAL for reserved in new struct not set to 0");
 	ASSERT_EQ(skel->bss->test_einval_netns_id, -EINVAL, "Test EINVAL for netns_id < -1");
 	ASSERT_EQ(skel->bss->test_einval_len_opts, -EINVAL, "Test EINVAL for len__opts != NF_BPF_CT_OPTS_SZ");
+	ASSERT_EQ(skel->bss->test_einval_len_opts_small_lookup, CT_OPTS_ERROR_GUARD,
+		  "Test no error write for lookup opts__sz before error field");
+	ASSERT_EQ(skel->bss->test_einval_len_opts_small_alloc, CT_OPTS_ERROR_GUARD,
+		  "Test no error write for alloc opts__sz before error field");
 	ASSERT_EQ(skel->bss->test_eproto_l4proto, -EPROTO, "Test EPROTO for l4proto != TCP or UDP");
 	ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id");
 	ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup");
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
index 076fbf03a1268..df43649ecb785 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
@@ -10,6 +10,8 @@
 #define EINVAL 22
 #define ENOENT 2
 
+#define CT_OPTS_ERROR_GUARD 0x12345678
+
 #define NF_CT_ZONE_DIR_ORIG (1 << IP_CT_DIR_ORIGINAL)
 #define NF_CT_ZONE_DIR_REPL (1 << IP_CT_DIR_REPLY)
 
@@ -19,6 +21,8 @@ int test_einval_reserved = 0;
 int test_einval_reserved_new = 0;
 int test_einval_netns_id = 0;
 int test_einval_len_opts = 0;
+int test_einval_len_opts_small_lookup = 0;
+int test_einval_len_opts_small_alloc = 0;
 int test_eproto_l4proto = 0;
 int test_enonet_netns_id = 0;
 int test_enoent_lookup = 0;
@@ -124,6 +128,28 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
 	else
 		test_einval_len_opts = opts_def.error;
 
+	opts_def.error = CT_OPTS_ERROR_GUARD;
+	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+		       sizeof(opts_def.netns_id));
+	if (ct) {
+		bpf_ct_release(ct);
+		test_einval_len_opts_small_lookup = -EINVAL;
+	} else {
+		test_einval_len_opts_small_lookup = opts_def.error;
+	}
+
+	opts_def.error = CT_OPTS_ERROR_GUARD;
+	ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+		      sizeof(opts_def.netns_id));
+	if (ct) {
+		ct = bpf_ct_insert_entry(ct);
+		if (ct)
+			bpf_ct_release(ct);
+		test_einval_len_opts_small_alloc = -EINVAL;
+	} else {
+		test_einval_len_opts_small_alloc = opts_def.error;
+	}
+
 	opts_def.l4proto = IPPROTO_ICMP;
 	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
 		       sizeof(opts_def));
-- 
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v2 0/2] bpf: Guard conntrack opts error writes
From: Yiyang Chen @ 2026-06-18 10:18 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

The conntrack lookup/allocation kfuncs expose an opts/opts__sz pair.
The verifier checks the caller-provided opts__sz range, but the wrappers
currently write opts->error after internal errors even when opts__sz is too
small to include that field.

Patch 1 writes opts->error only when opts__sz includes it, and uses a
single helper to fold ERR_PTR returns into the kfunc ABI result while keeping
the local nfct result variable in each wrapper.
Patch 2 adds a bpf_nf regression check that keeps a guard in opts->error
while passing opts__sz covering only netns_id.

The regression check follows the existing bpf_nf test shape.  Before the
fix, the guard is overwritten with -EINVAL even though opts__sz covers only
the first four bytes of the options object.  After the fix, the kfunc still
returns NULL for the invalid size, but the guard remains intact.

Validation, rebased and tested on bpf-next master e771677c937d
("Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd"):

  git diff --check origin/master..HEAD: OK
  scripts/checkpatch.pl --strict on 1/2 and 2/2: OK
  make O=/root/ebpf-verifier-bug-detection/kernel-build/bpf-next \
    net/netfilter/nf_conntrack_bpf.o: OK
  Focused QEMU direct-runner against XDP and TC lookup/alloc paths:
    unpatched bpf-next e771677c937d: guard overwritten with -EINVAL
    patched v2 007dfd0341cd: guard preserved as 0x12345678
  QEMU upstream bpf_nf selftest with CONFIG_NF_CONNTRACK_MARK,
  CONFIG_NF_CONNTRACK_ZONES, and legacy iptables enabled:
    ./test_progs -t bpf_nf -vv: OK
  git am of exported 1/2 and 2/2 on a fresh worktree at base: OK
  range-diff between branch commits and git-am result: equivalent

Changes in v2:
  - Rebased onto current bpf-next master.
  - Reworked patch 1 to use bpf_ct_opts_result() for the ERR_PTR-to-NULL
    conversion and guarded opts->error write, as suggested by Alexei.
  - Kept the local nfct result variable in each wrapper before returning
    through bpf_ct_opts_result().
  - Added matching Fixes tags to the selftest patch so the regression test
    can be backported with the fix.

v1: https://lore.kernel.org/bpf/cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn/

Yiyang Chen (2):
  bpf: Guard conntrack opts error writes
  selftests/bpf: Cover small conntrack opts error writes

 net/netfilter/nf_conntrack_bpf.c              | 35 +++++++------------
 .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 ++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 ++++++++++++++
 3 files changed, 45 insertions(+), 22 deletions(-)


base-commit: e771677c937da5808f7b6c1f0e4a97ec1a84f8a8
-- 
2.34.1


^ permalink raw reply

* [PATCH bpf-next v2 1/2] bpf: Guard conntrack opts error writes
From: Yiyang Chen @ 2026-06-18 10:18 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781765747.git.chenyy23@mails.tsinghua.edu.cn>

The conntrack lookup and allocation kfuncs take an opts pointer
together with an opts__sz argument. The verifier checks only the memory
range described by opts__sz, but the wrappers unconditionally write
opts->error whenever the internal lookup or allocation helper returns an
error.

For an invalid size smaller than the end of opts->error, that write can
land outside the verifier-checked range. Keep returning NULL for invalid
arguments, but only report the error through opts->error when the
supplied size includes the field.

This preserves error reporting for the supported 12-byte and 16-byte
layouts, and for other invalid sizes that still include opts->error.

Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")
Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
---
 net/netfilter/nf_conntrack_bpf.c | 35 ++++++++++++--------------------
 1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 40c261cd0af38..f98d1d4b42c3d 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -65,6 +65,15 @@ enum {
 	NF_BPF_CT_OPTS_SZ = 16,
 };
 
+static void *bpf_ct_opts_result(struct bpf_ct_opts *opts, u32 opts__sz, void *ret)
+{
+	if (!IS_ERR(ret))
+		return ret;
+	if (opts__sz >= offsetofend(struct bpf_ct_opts, error))
+		opts->error = PTR_ERR(ret);
+	return NULL;
+}
+
 static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple,
 				 u32 tuple_len, u8 protonum, u8 dir,
 				 struct nf_conntrack_tuple *tuple)
@@ -297,12 +306,7 @@ bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
 
 	nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz,
 				       opts, opts__sz, 10);
-	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
-		return NULL;
-	}
-
-	return (struct nf_conn___init *)nfct;
+	return (struct nf_conn___init *)bpf_ct_opts_result(opts, opts__sz, nfct);
 }
 
 /* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a
@@ -331,11 +335,7 @@ bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
 
 	caller_net = dev_net(ctx->rxq->dev);
 	nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
-	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
-		return NULL;
-	}
-	return nfct;
+	return bpf_ct_opts_result(opts, opts__sz, nfct);
 }
 
 /* bpf_skb_ct_alloc - Allocate a new CT entry
@@ -363,12 +363,7 @@ bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
 
 	net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
 	nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10);
-	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
-		return NULL;
-	}
-
-	return (struct nf_conn___init *)nfct;
+	return (struct nf_conn___init *)bpf_ct_opts_result(opts, opts__sz, nfct);
 }
 
 /* bpf_skb_ct_lookup - Lookup CT entry for the given tuple, and acquire a
@@ -397,11 +392,7 @@ bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
 
 	caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
 	nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
-	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
-		return NULL;
-	}
-	return nfct;
+	return bpf_ct_opts_result(opts, opts__sz, nfct);
 }
 
 /* bpf_ct_insert_entry - Add the provided entry into a CT map
-- 
2.34.1


^ permalink raw reply related

* [PATCH v1 3/3] thunderbold: Drop comma after device id array terminator
From: Uwe Kleine-König (The Capable Hub) @ 2026-06-18 10:14 UTC (permalink / raw)
  To: Mika Westerberg, Yehezkel Bernat, Andreas Noever
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, linux-usb
In-Reply-To: <cover.1781776904.git.u.kleine-koenig@baylibre.com>

The usual style for other device id arrays doesn't have a comma after
the initializer.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
---
 drivers/net/thunderbolt/main.c | 2 +-
 drivers/thunderbolt/dma_test.c | 2 +-
 drivers/thunderbolt/stream.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c
index edfcfc41a316..c1003e06a8bd 100644
--- a/drivers/net/thunderbolt/main.c
+++ b/drivers/net/thunderbolt/main.c
@@ -1455,7 +1455,7 @@ static DEFINE_SIMPLE_DEV_PM_OPS(tbnet_pm_ops, tbnet_suspend, tbnet_resume);
 
 static const struct tb_service_id tbnet_ids[] = {
 	{ TB_SERVICE("network", 1) },
-	{ },
+	{ }
 };
 MODULE_DEVICE_TABLE(tbsvc, tbnet_ids);
 
diff --git a/drivers/thunderbolt/dma_test.c b/drivers/thunderbolt/dma_test.c
index 63e6bbf00e12..519c67678b08 100644
--- a/drivers/thunderbolt/dma_test.c
+++ b/drivers/thunderbolt/dma_test.c
@@ -689,7 +689,7 @@ static const struct dev_pm_ops dma_test_pm_ops = {
 
 static const struct tb_service_id dma_test_ids[] = {
 	{ TB_SERVICE("dma_test", 1) },
-	{ },
+	{ }
 };
 MODULE_DEVICE_TABLE(tbsvc, dma_test_ids);
 
diff --git a/drivers/thunderbolt/stream.c b/drivers/thunderbolt/stream.c
index b28e4e95b422..68d81958262e 100644
--- a/drivers/thunderbolt/stream.c
+++ b/drivers/thunderbolt/stream.c
@@ -1630,7 +1630,7 @@ static const struct dev_pm_ops tbstream_pm_ops = {
 
 static const struct tb_service_id tbstream_ids[] = {
 	{ TB_SERVICE("stream", 1) },
-	{ },
+	{ }
 };
 MODULE_DEVICE_TABLE(tbsvc, tbstream_ids);
 
-- 
2.47.3


^ permalink raw reply related

* [PATCH v1 1/3] thunderbold: Stop passing matched device ID to .probe()
From: Uwe Kleine-König (The Capable Hub) @ 2026-06-18 10:14 UTC (permalink / raw)
  To: Mika Westerberg, Yehezkel Bernat, Andreas Noever
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, linux-usb
In-Reply-To: <cover.1781776904.git.u.kleine-koenig@baylibre.com>

No driver makes use of that parameter, so drop it and don't spend the
effort to determine the matching entry.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
---
 drivers/net/thunderbolt/main.c | 2 +-
 drivers/thunderbolt/dma_test.c | 2 +-
 drivers/thunderbolt/domain.c   | 4 +---
 drivers/thunderbolt/stream.c   | 2 +-
 include/linux/thunderbolt.h    | 2 +-
 5 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c
index f8f97e8e2226..edfcfc41a316 100644
--- a/drivers/net/thunderbolt/main.c
+++ b/drivers/net/thunderbolt/main.c
@@ -1335,7 +1335,7 @@ static void tbnet_generate_mac(struct net_device *dev)
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 }
 
-static int tbnet_probe(struct tb_service *svc, const struct tb_service_id *id)
+static int tbnet_probe(struct tb_service *svc)
 {
 	struct tb_xdomain *xd = tb_service_parent(svc);
 	struct net_device *dev;
diff --git a/drivers/thunderbolt/dma_test.c b/drivers/thunderbolt/dma_test.c
index 7877319b1b03..63e6bbf00e12 100644
--- a/drivers/thunderbolt/dma_test.c
+++ b/drivers/thunderbolt/dma_test.c
@@ -636,7 +636,7 @@ static void dma_test_debugfs_init(struct tb_service *svc)
 	debugfs_create_file("test", 0200, debugfs_dir, svc, &test_fops);
 }
 
-static int dma_test_probe(struct tb_service *svc, const struct tb_service_id *id)
+static int dma_test_probe(struct tb_service *svc)
 {
 	struct tb_xdomain *xd = tb_service_parent(svc);
 	struct dma_test *dt;
diff --git a/drivers/thunderbolt/domain.c b/drivers/thunderbolt/domain.c
index 479fa4d265c2..24611f05b3cd 100644
--- a/drivers/thunderbolt/domain.c
+++ b/drivers/thunderbolt/domain.c
@@ -77,12 +77,10 @@ static int tb_service_probe(struct device *dev)
 {
 	struct tb_service *svc = tb_to_service(dev);
 	struct tb_service_driver *driver;
-	const struct tb_service_id *id;
 
 	driver = container_of(dev->driver, struct tb_service_driver, driver);
-	id = __tb_service_match(dev, &driver->driver);
 
-	return driver->probe(svc, id);
+	return driver->probe(svc);
 }
 
 static void tb_service_remove(struct device *dev)
diff --git a/drivers/thunderbolt/stream.c b/drivers/thunderbolt/stream.c
index c1f5c55583d0..b28e4e95b422 100644
--- a/drivers/thunderbolt/stream.c
+++ b/drivers/thunderbolt/stream.c
@@ -1540,7 +1540,7 @@ static void tbstream_group_detach_stream(struct tbstream *stream)
 	config_group_put(&sg->group);
 }
 
-static int tbstream_probe(struct tb_service *svc, const struct tb_service_id *id)
+static int tbstream_probe(struct tb_service *svc)
 {
 	struct tbstream *stream;
 
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index feb1af175cfd..d9dec4322aa0 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -465,7 +465,7 @@ static inline struct tb_service *tb_to_service(struct device *dev)
  */
 struct tb_service_driver {
 	struct device_driver driver;
-	int (*probe)(struct tb_service *svc, const struct tb_service_id *id);
+	int (*probe)(struct tb_service *svc);
 	void (*remove)(struct tb_service *svc);
 	void (*shutdown)(struct tb_service *svc);
 	const struct tb_service_id *id_table;
-- 
2.47.3


^ permalink raw reply related

* [PATCH v1 0/3] thunderbold: A few cleanups
From: Uwe Kleine-König (The Capable Hub) @ 2026-06-18 10:14 UTC (permalink / raw)
  To: Mika Westerberg, Yehezkel Bernat, Andreas Noever
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, linux-usb

Hello,

I'm currently working on a project that includes looking at all device
ID structures from <linux/mod_devicetable.h>. While doing that for
tb_service_id, I spotted these patch opportunities.

These are all non-critical and also my quest doesn't depend on this, so
there is no urge to apply these patches. My suggestion is to apply them
via the thunderbold tree during the next merge window with an ack from
the network guys.

The first patch touches drivers/net and drivers/thunderbold. It could
theretically be split, but then this results in at least 3 commits which
seems excessive to handle three drivers, so I kept it as a single patch.

The third patch is a style change and so is subjective. Drop it, if you
don't like it. Here splitting would be easy, but given that patch #1
already touches the same files, letting these go in together without
splitting seems to be sensible.

Best regards
Uwe

Uwe Kleine-König (The Capable Hub) (3):
  thunderbold: Stop passing matched device ID to .probe()
  thunderbold: Assert that a service driver has a probe callback
  thunderbold: Drop comma after device id array terminator

 drivers/net/thunderbolt/main.c | 4 ++--
 drivers/thunderbolt/dma_test.c | 4 ++--
 drivers/thunderbolt/domain.c   | 4 +---
 drivers/thunderbolt/stream.c   | 4 ++--
 drivers/thunderbolt/xdomain.c  | 3 +++
 include/linux/thunderbolt.h    | 2 +-
 6 files changed, 11 insertions(+), 10 deletions(-)


base-commit: 4fa3f5fabb30bf00d7475d5a33459ea83d639bf9
-- 
2.47.3


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Paul Menzel @ 2026-06-18 10:11 UTC (permalink / raw)
  To: Chia-Lin Kao (AceLan)
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	netdev, linux-kernel
In-Reply-To: <20260618073324.1843310-1-acelan.kao@canonical.com>

Dear Chia-Lin,


Thank you for your patch.

Am 18.06.26 um 09:33 schrieb Chia-Lin Kao (AceLan) via Intel-wired-lan:
> Some systems support MAC passthrough for dock Ethernet controllers by
> having firmware rewrite the receive address registers after the controller
> reset completes.

Please give one example system.

> igc resets the controller before reading RAL0/RAH0, so that reset can
> restore the controller native MAC address temporarily. If the driver reads
> the registers immediately, it can race the firmware rewrite and keep the
> native dock MAC instead of the host passthrough MAC.
> 
> For LMVP devices, poll RAL0/RAH0 after reset and before reading the MAC

What is LMVP?

> address. Stop once the address registers change to another valid Ethernet
> address, allowing firmware a bounded window to complete the passthrough
> update.

What are the downsides of this approach? Longer reset times?

Please add instructions how to test this.

> Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
> ---
>   drivers/net/ethernet/intel/igc/igc_main.c | 48 +++++++++++++++++++++++
>   1 file changed, 48 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 2c9e2dfd8499..fa9752ed8bc5 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -11,6 +11,7 @@
>   #include <net/pkt_sched.h>
>   #include <linux/bpf_trace.h>
>   #include <net/xdp_sock_drv.h>
> +#include <linux/etherdevice.h>
>   #include <linux/pci.h>
>   #include <linux/mdio.h>
>   
> @@ -69,6 +70,52 @@ static const struct pci_device_id igc_pci_tbl[] = {
>   
>   MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
>   
> +static void igc_read_rar0(struct igc_hw *hw, u8 *addr, u32 *ral, u32 *rah)
> +{
> +	*ral = rd32(IGC_RAL(0));
> +	*rah = rd32(IGC_RAH(0));
> +
> +	addr[0] = *ral & 0xff;
> +	addr[1] = (*ral >> 8) & 0xff;
> +	addr[2] = (*ral >> 16) & 0xff;
> +	addr[3] = (*ral >> 24) & 0xff;
> +	addr[4] = *rah & 0xff;
> +	addr[5] = (*rah >> 8) & 0xff;

This looks like a common pattern, but there does not seem to be a 
generic Linux implementation. Maybe `igc_read_mac_addr()` in 
`drivers/net/ethernet/intel/igc/igc_nvm.c` can be used?

> +}
> +
> +static bool igc_is_lmvp_device(struct pci_dev *pdev)
> +{
> +	switch (pdev->device) {
> +	case IGC_DEV_ID_I225_LMVP:
> +	case IGC_DEV_ID_I226_LMVP:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
> +static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
> +					      struct igc_hw *hw)
> +{
> +	u8 addr[ETH_ALEN] __aligned(2);
> +	u32 orig_ral, orig_rah;
> +	u32 ral, rah;
> +	int i;
> +
> +	if (!igc_is_lmvp_device(pdev))
> +		return;
> +
> +	igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
> +
> +	for (i = 0; i < 100; i++) {
> +		msleep(100);

Up to ten seconds delay(?) sounds excessive. Please elaborate in the 
commit message.

> +		igc_read_rar0(hw, addr, &ral, &rah);
> +		if ((ral != orig_ral || rah != orig_rah) &&
> +		    is_valid_ether_addr(addr))
> +			return;
> +	}

No error in case this didn’t work?

> +}
> +
>   enum latency_range {
>   	lowest_latency = 0,
>   	low_latency = 1,
> @@ -7259,6 +7306,7 @@ static int igc_probe(struct pci_dev *pdev,
>   	 * known good starting state
>   	 */
>   	hw->mac.ops.reset_hw(hw);
> +	igc_wait_for_lmvp_mac_passthrough(pdev, hw);
>   
>   	if (igc_get_flash_presence_i225(hw)) {
>   		if (hw->nvm.ops.validate(hw) < 0) {


Kind regards,

Paul

^ permalink raw reply

* Re: [PATCH v2 2/2] drm/xe/xe_drm_ras: Add error-event support in XE drm_ras
From: Raag Jadav @ 2026-06-18 10:07 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, dri-devel, netdev, aravind.iddamsetty, anshuman.gupta,
	rodrigo.vivi, joonas.lahtinen, kuba, simona.vetter, airlied,
	pratik.bari, joshua.santosh.ranjan, ashwin.kumar.kulkarni,
	shubham.kumar, ravi.kishore.koppuravuri, maarten.lankhorst,
	mallesh.koujalagi, soham.purkait
In-Reply-To: <20260611052144.784969-6-riana.tauro@intel.com>

On Thu, Jun 11, 2026 at 10:51:47AM +0530, Riana Tauro wrote:
> Add error-event support in XE drm_ras to notify userspace
> when an error occurs.
> 
> $ sudo ynl --family drm_ras --output-json --subscribe error-notify

Same comment as first patch, but upto you.

> {
>     "name": "error-event",
>      "msg": {
>          "device-name": "0000:03:00.0",
>          "node-id": 1,
>          "node-name": "uncorrectable-errors",
>          "error-id": 1,
>          "error-name": "core-compute",
>          "error-value": 1
>      }
> }
> 
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>

Reviewed-by: Raag Jadav <raag.jadav@intel.com>

^ permalink raw reply

* Re: [PATCH bpf-next v3 0/3] bpf: bidirectional VLAN support for bpf_fib_lookup()
From: Toke Høiland-Jørgensen @ 2026-06-18 10:07 UTC (permalink / raw)
  To: Avinash Duduskar, ast, daniel, andrii
  Cc: ameryhung, a.s.protopopov, bpf, davem, dsahern, eddyz87, edumazet,
	emil, eyal.birger, hawk, horms, john.fastabend, jolsa, kpsingh,
	kuba, leon.hwang, linux-kernel, linux-kselftest, martin.lau,
	memxor, netdev, pabeni, rongtao, sdf, shuah, song, yatsenko,
	yonghong.song
In-Reply-To: <20260617224729.1428662-1-avinash.duduskar@gmail.com>

Avinash Duduskar <avinash.duduskar@gmail.com> writes:

> This series adds VLAN awareness to bpf_fib_lookup() in both directions.
> BPF_FIB_LOOKUP_VLAN resolves a VLAN egress to its underlying real device
> plus the VLAN tag (XDP programs need this because VLAN devices have no XDP
> xmit), and BPF_FIB_LOOKUP_VLAN_INPUT runs the lookup as if a tagged frame
> had arrived on the matching VLAN subinterface, for iif policy routing and
> VRF table selection.
>
> The l3mdev/VRF flow-init fix that was patch 1 in v1 and v2 has been split
> out and sent to bpf on its own, since it is an independent Fixes:-tagged
> fix that routes to stable on its own schedule. This series is otherwise
> independent of it: on the default CONFIG_INIT_STACK_ALL_ZERO the VRF
> selftests pass with or without the fix. Only the one full-lookup VRF arm
> ("IPv4 VLAN input, tag selects VRF table") depends on it, and only on
> INIT_STACK_ALL_PATTERN or NONE builds, where the uninitialized
> flowi_l3mdev otherwise misses the l3mdev rule and the lookup falls
> through to the main table. Applying the l3mdev fix first closes that
> window.
>
> Changes v2 -> v3 (all from Toke's review unless noted):
>
> - Split the l3mdev/VRF flow-init fix out to a standalone bpf submission
>   (it was patch 1 in v2).
>
> - Patch 2 (VLAN_INPUT): bpf_fib_vlan_input_dev() returns a
>   struct net_device * with ERR_PTR() for the -EINVAL case and NULL for
>   NOT_FWDED, instead of an int return and a **dev out-parameter.
>
> - Trim the BPF_FIB_LOOKUP_VLAN and BPF_FIB_LOOKUP_VLAN_INPUT UAPI doc
>   blocks, and drop the in-function comments that restated the commit
>   message or the flag doc.
>
> - Patch 1 (VLAN egress): on the skb path without tot_len, the deferred mtu
>   check now runs against the resolved egress (VLAN) device, not the parent
>   params->ifindex was swapped to, so a VLAN device with a smaller mtu than
>   its parent is no longer checked against, or reported as, the parent's
>   larger mtu. Found by the bpf ci bot; this was an open question in v2.
>
> - Patch 3 (selftests): re-run every case through bpf_xdp_fib_lookup() as
>   well, since the feature targets XDP; and flip the no-tot_len mtu arm to
>   expect the VLAN device's mtu after the fix above.
>
> Open questions (defaults chosen, noted here in case a maintainer
> prefers otherwise):
>
> 1. An unmatched, down, or foreign-netns tag returns
>    BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
>    fib_get_table() finds no table, rather than a new return code.
>
> 2. BPF_FIB_LOOKUP_OUTPUT | BPF_FIB_LOOKUP_VLAN_INPUT is rejected with
>    -EINVAL; restricting now keeps relaxing later backward-compatible.
>
> 3. The name BPF_FIB_LOOKUP_VLAN_INPUT reads oddly next to
>    BPF_FIB_LOOKUP_OUTPUT. A pair like _VLAN_EGRESS/_VLAN_INGRESS is an
>    option while nothing is merged.

These three are fine as-is, I think.

> 4. The egress flag leaves a VLAN it cannot reduce to a physical parent
>    plus one tag (QinQ, or a parent in another namespace) as SUCCESS with
>    the VLAN device's ifindex and the vlan fields zero, like a plain
>    lookup. The input side instead fails closed (NOT_FWDED) on the
>    cross-namespace case. An XDP caller cannot xmit on a VLAN device, and
>    a zero h_vlan_proto does not distinguish this result from a physical
>    egress, so returning NOT_FWDED would be safer for XDP. But the two
>    cases differ: a foreign-netns parent is clearly fail-worthy, while a
>    QinQ egress is still a forwardable route (tc xmits on the inner VLAN
>    device), so failing it closed would reject a usable route. Should
>    egress signal NOT_FWDED, for both or only foreign-netns? I left it
>    best-effort, but will change it if you prefer.

This one is a bit more ambiguous. Specifically, the inability for an XDP
program to distinguish between a route that actually targets a physical
device, and one that targets a VLAN device that couldn't be resolved for
whatever reason.

Since this is a new feature that's opt-in, I think I would lean towards
failing lookups with a new error code (BPF_FIB_LKUP_RET_VLAN_FAILURE,
say) if the lookup finds a VLAN device but can't actually resolve the
parent. That way the XDP program can repeat the lookup without the
BPF_FIB_LOOKUP_VLAN flag if it really wants the ifindex of that VLAN
device, but that will be explicit and not hidden.

-Toke


^ permalink raw reply

* Re: [PATCH v2 1/2] drm/drm_ras: Add drm_ras netlink error event
From: Raag Jadav @ 2026-06-18 10:06 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, dri-devel, netdev, aravind.iddamsetty, anshuman.gupta,
	rodrigo.vivi, joonas.lahtinen, kuba, simona.vetter, airlied,
	pratik.bari, joshua.santosh.ranjan, ashwin.kumar.kulkarni,
	shubham.kumar, ravi.kishore.koppuravuri, maarten.lankhorst,
	mallesh.koujalagi, soham.purkait, Zack McKevitt, Lijo Lazar,
	Hawking Zhang, David S. Miller, Paolo Abeni, Eric Dumazet
In-Reply-To: <20260611052144.784969-5-riana.tauro@intel.com>

On Thu, Jun 11, 2026 at 10:51:46AM +0530, Riana Tauro wrote:
> Define a new netlink event 'error-event' and a new multicast group
> 'error-notify' in drm_ras. Each event contains device name, node and
> error information to identify the error triggering the event.
> 
> Add drm_ras_nl_error_event() to trigger an event from the driver.
> Userspace must subscribe to 'error-notify' to receive 'error-event'
> notifications.
> 
> Usage:
> 
> $ sudo ynl --family drm_ras --subscribe error-notify

...

>  operations:
>    list:
> @@ -124,3 +151,24 @@ operations:
>        do:
>          request:
>            attributes: *id-attrs
> +    -
> +      name: error-event
> +      doc: >-
> +           Notify userspace of an error event.
> +           The event includes the device, node and error information
> +           of the error that triggered the event.
> +      attribute-set: error-event-attrs
> +      mcgrp: error-notify

This looks much closer to "notify:" property, which IIUC it's not. Looking
at some of the existing examples, a better name could be something like
'error-monitor' or 'error-report' to make it a bit distinguishable.

Or perhaps it could be just me without the coffee :(
so I'll leave it to you.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>

> +      event:
> +        attributes:
> +          - device-name
> +          - node-id
> +          - node-name
> +          - error-id
> +          - error-name
> +          - error-value
> +
> +mcast-groups:
> +  list:
> +    -
> +      name: error-notify

^ permalink raw reply

* [PATCH net v2] net: ti: icssg: Fix XSK zero copy TX during application wakeup
From: Meghana Malladi @ 2026-06-18 10:03 UTC (permalink / raw)
  To: diogo.ivo, vadim.fedorenko, haokexin, devnexen, horms,
	jacob.e.keller, m-malladi, pabeni, kuba, edumazet, davem,
	andrew+netdev
  Cc: linux-kernel, netdev, linux-arm-kernel, srk, Vignesh Raghavendra,
	Roger Quadros, danishanwar

emac_xsk_xmit_zc() handles tx xmit for zero copy and gets called
inside napi context. User application wakes up the kernel while
initiating the transmit which triggers napi to start processing
the tx packets. The num_tx check inside emac_tx_complete_packets()
returns early if no packet transfer happen hindering the call
to emac_xsk_xmit_zc(). Remove this check to let application
wakeup initiate zero copy xmit traffic.

Add __netif_tx_lock() to ensure that the TX queue is protected
from concurrent access during the transmission of XDP frames.
This fixes netdev watchdog timeout for long runs.

Fixes: e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
---

v2-v1:
- Added back xsk_tx_release() inside emac_xsk_xmit_zc()
- Added a check for budget>0 to protect the AF_XDP path
- Move txq_trans_cond_update() inside xsk_frames_done check
Above changes address the comments given by Jakub Kicinski <kuba@kernel.org>

v1: https://lore.kernel.org/all/20260611185744.2498070-5-m-malladi@ti.com/

 drivers/net/ethernet/ti/icssg/icssg_common.c | 23 ++++++++++----------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index 82ddef9c17d5..6973d4714246 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -93,8 +93,8 @@ void prueth_ndev_del_tx_napi(struct prueth_emac *emac, int num)
 }
 EXPORT_SYMBOL_GPL(prueth_ndev_del_tx_napi);
 
-static int emac_xsk_xmit_zc(struct prueth_emac *emac,
-			    unsigned int q_idx)
+static void emac_xsk_xmit_zc(struct prueth_emac *emac,
+			     unsigned int q_idx)
 {
 	struct prueth_tx_chn *tx_chn = &emac->tx_chns[q_idx];
 	struct xsk_buff_pool *pool = tx_chn->xsk_pool;
@@ -115,7 +115,7 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac,
 	 * necessary
 	 */
 	if (descs_avail <= MAX_SKB_FRAGS)
-		return 0;
+		return;
 
 	descs_avail -= MAX_SKB_FRAGS;
 
@@ -170,8 +170,8 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac,
 		num_tx++;
 	}
 
-	xsk_tx_release(tx_chn->xsk_pool);
-	return num_tx;
+	if (num_tx)
+		xsk_tx_release(tx_chn->xsk_pool);
 }
 
 void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
@@ -279,9 +279,6 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
 		num_tx++;
 	}
 
-	if (!num_tx)
-		return 0;
-
 	netif_txq = netdev_get_tx_queue(ndev, chn);
 	netdev_tx_completed_queue(netif_txq, num_tx, total_bytes);
 
@@ -297,16 +294,18 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
 		__netif_tx_unlock(netif_txq);
 	}
 
-	if (tx_chn->xsk_pool) {
-		if (xsk_frames_done)
+	if (budget && tx_chn->xsk_pool) {
+		if (xsk_frames_done) {
 			xsk_tx_completed(tx_chn->xsk_pool, xsk_frames_done);
+			txq_trans_cond_update(netif_txq);
+		}
 
 		if (xsk_uses_need_wakeup(tx_chn->xsk_pool))
 			xsk_set_tx_need_wakeup(tx_chn->xsk_pool);
 
-		netif_txq = netdev_get_tx_queue(ndev, chn);
-		txq_trans_cond_update(netif_txq);
+		__netif_tx_lock(netif_txq, smp_processor_id());
 		emac_xsk_xmit_zc(emac, chn);
+		__netif_tx_unlock(netif_txq);
 	}
 
 	return num_tx;

base-commit: 7d8297e26b4e20b5d1c3c3fe51fe81a1c7fbc823
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v3] net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
From: Lorenzo Bianconi @ 2026-06-18 10:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wayen Yan, netdev, horms, pabeni, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <20260617161951.52abe413@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2515 bytes --]

> On Sun, 14 Jun 2026 07:30:54 +0800 Wayen Yan wrote:
> > In airoha_dev_select_queue(), the expression:
> > 
> >   queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;
> > 
> > implicitly converts to unsigned arithmetic: when skb->priority is 0
> > (the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
> > and UINT_MAX % 8 = 7, routing default best-effort packets to the
> > highest-priority QoS queue. This causes QoS inversion where the
> > majority of traffic on a PON gateway starves actual high-priority
> > flows (VoIP, gaming, etc.).
> > 
> > Fix by guarding the subtraction: when priority is 0, map to queue 0
> > (lowest priority), otherwise apply the original (priority - 1) % 8
> > mapping.
> > 
> > Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
> > Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > Reviewed-by: Joe Damato <joe@dama.to>
> > Signed-off-by: Wayen Yan <win847@gmail.com>
> > ---
> >  drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> > index 31cdb11cd7..d476ef83c3 100644
> > --- a/drivers/net/ethernet/airoha/airoha_eth.c
> > +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> > @@ -1933,7 +1933,7 @@ static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
> >  	 */
> >  	channel = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
> >  	channel = channel % AIROHA_NUM_QOS_CHANNELS;
> > -	queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */
> > +	queue = skb->priority ? (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES : 0;
> 
> Hi Lorenzo, is there a reason we're subtracting 1 here in the first
> place? Could be just me, but may be worth adding a comment here.
> 
> Intuitively if we are "narrowing" 16 prios to 8 queues it'd make most
> sense to group the adjacent ones -- divide by two.
> 
> Please respin with some sort of an explanation..

IIRC this is a leftover of the ETS offload support.
I agree it is righ to just do:

	queue = skb->priority % AIROHA_NUM_QOS_QUEUES; /* QoS queue */

@Wayen: can you please respin fixing the issue? Please add even my Acked-by:

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

Regards,
Lorenzo

> 
> >  	queue = channel * AIROHA_NUM_QOS_QUEUES + queue;
> >  
> >  	return queue < dev->num_tx_queues ? queue : 0;
> -- 
> pw-bot: cr

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [RESEND PATCH v1] net: dsa: motorcomm: add yt92xx dsa driver
From: Kyle Switch @ 2026-06-18  9:59 UTC (permalink / raw)
  To: David Yang
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, ming.xu, xiaolin.xu, jianmin.wang, de.ge
In-Reply-To: <CAAXyoMMYxRTwHD6QmpAkspCtiY853KkYuOAUR=qV0v9g5w9v+g@mail.gmail.com>



On 6/17/26 19:15, David Yang wrote:
> On Wed, Jun 17, 2026 at 10:37 AM Kyle Switch <kyle.switch@motor-comm.com> wrote:
>>>> +/* To define the from cpu tag format 8 bytes:
>>>> + *
>>>> + * 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7
>>>> + *|<----------TPID 0x9988---------->|
>>>> + *|<--RESERVE-->|<-----DST PORT---->|
>>>> + *|-|<---------RESERVE------------->|
>>>> + *|<------------------------------->|
>>>> + */
>>>> +#define YT922X_TAG_FORMAT2_NAME "yt922x-8b"
>>>> +#define YT922X_FORMAT2_TAG_LEN                  8
>>>> +#define YT922X_PKT_TYPE          GENMASK(15, 14)
>>>> +#define YT922X_8B_CPUTAG_PKT_FROM_CPU      0x1
>>>> +#define YT922X_8B_CPUTAG_SRC_PORT          GENMASK(6, 2)
>>>> +#define YT922X_8B_CPUTAG_DST_PORTMASK      GENMASK(8, 0)
>>>> +#define YT922X_8B_CPUTAG_DST_PORTMASK_0      BIT(15)
>>>> +#define YT922X_8B_CPUTAG_DST_PORTMASK_0_EN      0x1
>>>> +#define YT922X_8B_CPUTAG_FORCE_DST         BIT(9)
>>>> +#define YT922X_8B_CPUTAG_FORCE_DST_EN      0x1
>>>
>>> If yt922x tag format shares no common with yt921x, make a new tag driver.
>>
>> Ans: thank you for your suggestion, we will consider whether to create a new driver in the new file.
> 
> I'm not an expert in this, but if yt922x tag does support cpu codes
> and priority, please consider updating yt921x tagger to support it,
> even if you don't use or test these features for now.
> 

Ans: here "updating yt921x tagger" you mean yt922x tag driver to support cpu code and dscp prio? We consider
implementing it in the subsequent patch, but no matter what, when we submit the yt922x dsa driver ,it will support it.

>>>
>>>> +static struct dsa_tag_driver *dsa_tag_driver_array[] = {
>>>> +       &DSA_TAG_DRIVER_NAME(yt921x_netdev_ops),
>>>> +       &DSA_TAG_DRIVER_NAME(yt922x_4b_netdev_ops),
>>>> +       &DSA_TAG_DRIVER_NAME(yt922x_8b_netdev_ops),
>>>> +};
>>>
>>> If both are supported by the chip and 4b does nothing more than 8b
>>> does, do not bother with it.
>>
>> Ans: 4b and 8b dsa tag may have different application scenarios. from my opinion,
>>      1. 4b dsa tag can save 4 bytes of payload
>>      2. 8b dsa tag carry more package info.
> 
> We do not support every tag protocol. For DSA switches,
>   - the conduit interface supports jumbo frames so there is room for
> the DSA header, or
>   - you end up with MTU less than 1500 anyway.
> 4-byte reduction does not make a practical difference here. An
> alternative protocol poses 2x work to everyone else, and unnecessarily
> exposes your driver to interoperability issues, as pointed by Andrew.
> 
> As I've commented before, if there is a particular reason to add
> 4-byte protocol, leave it behind for the moment, and focus on a
> minimal yt922x_dsa_switch_ops + yt922x_netdev_ops for your first
> patchset without any offloading supports. This way, others can easily
> see your changes and move the work forward efficiently.

Ans: Thank you for your advise, 8bytes dsa tag driver will be supported firstly.


^ permalink raw reply

* Re: [RESEND PATCH v1] net: dsa: motorcomm: add yt92xx dsa driver
From: Kyle Switch @ 2026-06-18  9:53 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Yang, olteanv, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, ming.xu, xiaolin.xu, jianmin.wang, de.ge
In-Reply-To: <a689a734-bfd2-4e8f-85dd-9ae210b3161a@lunn.ch>



On 6/17/26 17:07, Andrew Lunn wrote:
>>>> +#define CMM_PARAM_CHK(expr, err_code)    \
>>>> +       do {                             \
>>>> +               if ((u32)(expr)) {       \
>>>> +                       return err_code; \
>>>> +               }                        \
>>>> +       } while (0)
>>>> +
>>>> +#define CMM_ERR_CHK(op, ret)           \
>>>> +       do {                           \
>>>> +               ret = (op);            \
>>>> +               if (ret != CMM_ERR_OK) \
>>>> +                       return ret;    \
>>>> +       } while (0)
>>>
>>> Do not use macros like this.
>>
>> Ans: Acknowledged, i will consider how to optimize them in the future.
> 
> It is not about optimization. Hiding a return statement in a macro is
> very bad style. It will lead to locking bugs, and resource leaks,
> because nobody knows the return is there.
> 

Ans: This issue will be fixed before the next patch is sent.

>>>> +/*
>>>> + * Macro Definition
>>>> + */
>>>> +#ifndef NULL
>>>> +#define NULL 0
>>>> +#endif
>>>> +
>>>> +#ifndef FALSE
>>>> +#define FALSE 0
>>>> +#endif
>>>> +
>>>> +#ifndef TRUE
>>>> +#define TRUE 1
>>>> +#endif
>>>
>>> Nonsense.
>>
>> Ans: Acknowledge, will be fixed later.
> 
> No. They will be fixed now.
> 

Ans: This issue will be fixed before the next patch is sent.

>>>> +       /* Print chipid here since we are interested in lower 16 bits */
>>>> +       dev_info(dev,
>>>> +                "Motorcomm %s ethernet switch.\n",
>>>> +                info->name);
>>>
>>> Stop copy-n-paste.
>>
>> Ans: Sry for this, i will recheck the code to make sure each line of comments and code
>> meaningful again.
> 
> Also, consider the comments. Do the comments add anything useful which
> is not already obvious from the code. Comments should be about "Why?".
> 
>>>> --- a/include/uapi/linux/if_ether.h
>>>> +++ b/include/uapi/linux/if_ether.h
>>>> @@ -118,7 +118,7 @@
>>>>  #define ETH_P_QINQ1    0x9100          /* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>>  #define ETH_P_QINQ2    0x9200          /* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>>  #define ETH_P_QINQ3    0x9300          /* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>> -#define ETH_P_YT921X   0x9988          /* Motorcomm YT921x DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>> +#define ETH_P_YT92XX   0x9988          /* Motorcomm YT92xx DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>>  #define ETH_P_EDSA     0xDADA          /* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>>  #define ETH_P_DSA_8021Q        0xDADB          /* Fake VLAN Header for DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>>  #define ETH_P_DSA_A5PSW        0xE001          /* A5PSW Tag Value [ NOT AN OFFICIALLY REGISTERED ID ] */
>>>
>>> UAPI stands for User-space API. Do not change it unless there is a
>>> very very good reason.
>>>
>>
>> Ans: The default tpid both yt921x and yt922x is 0x9988. I have modified this to 
>> allow for simultaneous use in both yt922x and yt921x scenarios.
> 
> As pointed out, this is UAPI. Any changes to this file need a good
> explanation how it does not change the user API. Do this break
> backwards compatibility with user space applications? Maybe tcpdump or
> wireshark has a dissector which expects ETH_P_YT921X and you have just
> broken it?
> 

Ans:Now I have a better understanding of the role of the UAPI representative. 
If a new dsa driver is added in the subsequent patch, consider adding one instead of modifying the original content.

>>>> +#define YT922X_TAG_FORMAT2_NAME "yt922x-8b"
>>>> +#define YT922X_FORMAT2_TAG_LEN                  8
>>>> +#define YT922X_PKT_TYPE          GENMASK(15, 14)
>>>> +#define YT922X_8B_CPUTAG_PKT_FROM_CPU      0x1
>>>> +#define YT922X_8B_CPUTAG_SRC_PORT          GENMASK(6, 2)
>>>> +#define YT922X_8B_CPUTAG_DST_PORTMASK      GENMASK(8, 0)
>>>> +#define YT922X_8B_CPUTAG_DST_PORTMASK_0      BIT(15)
>>>> +#define YT922X_8B_CPUTAG_DST_PORTMASK_0_EN      0x1
>>>> +#define YT922X_8B_CPUTAG_FORCE_DST         BIT(9)
>>>> +#define YT922X_8B_CPUTAG_FORCE_DST_EN      0x1
>>>
>>> If yt922x tag format shares no common with yt921x, make a new tag driver.
>>
>> Ans: thank you for your suggestion, we will consider whether to create a new driver in the new file.
> 
> When you look at other tag drivers, you will also notice some drivers
> implement two taggers in one file. So consider this if there is any
> shared code.
> 

Ans: ok, the tag driver will refer to the methods of other existing tag drivers.

>>>> +static struct dsa_tag_driver *dsa_tag_driver_array[] = {
>>>> +       &DSA_TAG_DRIVER_NAME(yt921x_netdev_ops),
>>>> +       &DSA_TAG_DRIVER_NAME(yt922x_4b_netdev_ops),
>>>> +       &DSA_TAG_DRIVER_NAME(yt922x_8b_netdev_ops),
>>>> +};
>>>
>>> If both are supported by the chip and 4b does nothing more than 8b
>>> does, do not bother with it.
>>
>> Ans: 4b and 8b dsa tag may have different application scenarios. from my opinion,
>>      1. 4b dsa tag can save 4 bytes of payload
>>      2. 8b dsa tag carry more package info.
> 
> How do you plan to swap between the different formats?
> 
> The user perspective is that the machine has a collection of interface
> which are used just as normal, using Linux tools likes like
> iproute2. If the user enables a feature which requires the 8b tag
> format, will you change the format from the DSA driver? And swap back
> to the 4 byte format when the feature is no longer needed?
> 

Ans: After considering your and David's comments and suggestion, we will broken this patch into lots of
small patches which just include 8bytes tag driver for now.
If the 4bytes tag driver scenario is required later, we will use "change_tag_protocol" mechanism from DSA driver.

As you mentioned "One thing i need to point out. Linux has a long tradition of not
replacing existing code with a new implementation. You take the existing code and step by step improve it. " in another mail before.
I want to explain the patch in more detail.

Step 1. We do not attempt to remove the existing driver implementation, and don't change the behavior of existing software,
we will retain the implementation of the existing driver software layer, but encapsulate the use of hardware operations into 
functional interfaces. The advantage of this is that it is easy to maintain and easy to support other motorcomm switch series.

for example: vlan add ops in dsa driver:

Existing code:

yt921x_vlan_add(struct yt921x_priv *priv, int port, u16 vid, bool untagged)
{
 u64 mask64;
 u64 ctrl64;

 mask64 = YT921X_VLAN_CTRL_PORTn(port) |
   YT921X_VLAN_CTRL_PORTS(priv->cpu_ports_mask);
 ctrl64 = mask64;

 mask64 |= YT921X_VLAN_CTRL_UNTAG_PORTn(port);
 if (untagged)
  ctrl64 |= YT921X_VLAN_CTRL_UNTAG_PORTn(port);

 return yt921x_reg64_update_bits(priv, YT921X_VLANn_CTRL(vid),
     mask64, ctrl64);
}

after patch:

yt921x_vlan_add(struct yt921x_priv *priv, int port, u16 vid, bool untagged)
{
 struct yt_port_mask member;
 struct yt_port_mask untag;

 member.portsbits[0] = BIT(port) | priv->cpu_ports_mask;
 if (untagged)
  untag.portbits[0] = BIT(port);

  return yt_vlan_port_set(priv->unit, vid, member, untag);  // Here we use encapsulated interfaces to complete the hardware configuration. 
							     // We can ignore the differences between different motorcomm series, which will be reflected in driver/net/dsa/motorocmm/switch/yt_vlan. c
}

Step 2. if Step 1 is accepted, later, the plan may be to replace the hardware configuration involved in the existing dsa driver 
with the encapsulated interface step by step according to the functional module such as vlan, mirror, lag, etc. Finally, upload the yt922x dsa driver.

> 	Andrew

^ permalink raw reply

* Re: [PATCH net-next v9 01/10] enic: verify firmware supports V2 SR-IOV at probe time
From: Breno Leitao @ 2026-06-18  9:32 UTC (permalink / raw)
  To: Satish Kharat
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Sesidhar Baddela
In-Reply-To: <20260617-enic-sriov-v2-admin-channel-v2-v9-1-37f5f5af4c93@cisco.com>

On Wed, Jun 17, 2026 at 06:53:24PM -0700, Satish Kharat wrote:
> During PF probe, query the firmware get-supported-feature interface
> to verify that the running firmware supports V2 SR-IOV. Firmware
> version 5.3(4.72) and later report VIC_FEATURE_SRIOV via
> CMD_GET_SUPP_FEATURE_VER. If the firmware does not support the
> feature, set vf_type to ENIC_VF_TYPE_NONE and log a warning so the
> admin knows a firmware upgrade is needed.
> 
> VIC_FEATURE_SRIOV is assigned the explicit value 4 to match the
> firmware ABI.  Slot 3 (firmware's VIC_FEATURE_PTP) is reserved with
> a comment rather than a placeholder enum entry, since PTP is not
> used by the upstream driver.
> 
> Suggested-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Satish Kharat <satishkh@cisco.com>

Reviewed-by: Breno Leitao <leitao@debian.org>

FWIW: net-next is closed now.
https://lore.kernel.org/all/20260615085310.014e4e31@kernel.org/

^ permalink raw reply

* [PATCH net v2] net: ethernet: ti: icssg: guard PA stat lookups
From: Philippe Schenker @ 2026-06-18  9:30 UTC (permalink / raw)
  To: netdev
  Cc: Philippe Schenker, Simon Horman, danishanwar, rogerq,
	linux-arm-kernel, stable, Andrew Lunn, David Carlier,
	David S. Miller, Eric Dumazet, Jacob Keller, Jakub Kicinski,
	Kevin Hao, Meghana Malladi, Paolo Abeni, Vadim Fedorenko,
	linux-kernel

From: Philippe Schenker <philippe.schenker@impulsing.ch>

icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name()
with FW PA stat names regardless of whether the PA stats block is
present on the hardware.  emac_get_stat_by_name() already guards the
PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer
is NULL the lookup falls through to netdev_err() and returns -EINVAL.
Because ndo_get_stats64 is polled regularly by the networking stack
this produces thousands of log entries of the form:

  icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR

A secondary consequence is that the int(-EINVAL) return value is
implicitly widened to a near-ULLONG_MAX unsigned value when accumulated
into the __u64 fields of rtnl_link_stats64, silently corrupting the
rx_errors, rx_dropped and tx_dropped counters reported by `ip -s link`.

Every other PA-aware code path in the driver is already guarded with
the same `if (emac->prueth->pa_stats)` check.  Apply the same guard
here.

Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")
Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
Reviewed-by: Simon Horman <horms@kernel.org>

Cc: danishanwar@ti.com
Cc: rogerq@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org

---

Changes in v2:
- Removed newline between Fixes tag and Signed-off-by
- Use return in if statement to guard so we get rid
  of the 80 char warnings.
- Added Simon's Reviewed-by. Thanks!

 drivers/net/ethernet/ti/icssg/icssg_common.c | 49 +++++++++++---------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index a28a608f9bf4..d9af6419e032 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -1628,28 +1628,35 @@ void icssg_ndo_get_stats64(struct net_device *ndev,
 	stats->rx_over_errors = emac_get_stat_by_name(emac, "rx_over_errors");
 	stats->multicast      = emac_get_stat_by_name(emac, "rx_multicast_frames");
 
-	stats->rx_errors  = ndev->stats.rx_errors +
-			    emac_get_stat_by_name(emac, "FW_RX_ERROR") +
-			    emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") +
-			    emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") +
-			    emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") +
-			    emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN");
-	stats->rx_dropped = ndev->stats.rx_dropped +
-			    emac_get_stat_by_name(emac, "FW_DROPPED_PKT") +
-			    emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") +
-			    emac_get_stat_by_name(emac, "FW_INF_SAV") +
-			    emac_get_stat_by_name(emac, "FW_INF_SA_DL") +
-			    emac_get_stat_by_name(emac, "FW_INF_PORT_BLOCKED") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER");
+	stats->rx_errors  = ndev->stats.rx_errors;
+	stats->rx_dropped = ndev->stats.rx_dropped;
 	stats->tx_errors  = ndev->stats.tx_errors;
-	stats->tx_dropped = ndev->stats.tx_dropped +
-			    emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") +
-			    emac_get_stat_by_name(emac, "FW_TX_DROPPED_PACKET") +
-			    emac_get_stat_by_name(emac, "FW_TX_TS_DROPPED_PACKET") +
-			    emac_get_stat_by_name(emac, "FW_TX_JUMBO_FRM_CUTOFF");
+	stats->tx_dropped = ndev->stats.tx_dropped;
+
+	if (!emac->prueth->pa_stats)
+		return;
+
+	stats->rx_errors  +=
+			emac_get_stat_by_name(emac, "FW_RX_ERROR") +
+			emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") +
+			emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") +
+			emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") +
+			emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN");
+	stats->rx_dropped +=
+			emac_get_stat_by_name(emac, "FW_DROPPED_PKT") +
+			emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") +
+			emac_get_stat_by_name(emac, "FW_INF_SAV") +
+			emac_get_stat_by_name(emac, "FW_INF_SA_DL") +
+			emac_get_stat_by_name(emac, "FW_INF_PORT_BLOCKED") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER");
+	stats->tx_dropped +=
+			emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") +
+			emac_get_stat_by_name(emac, "FW_TX_DROPPED_PACKET") +
+			emac_get_stat_by_name(emac, "FW_TX_TS_DROPPED_PACKET") +
+			emac_get_stat_by_name(emac, "FW_TX_JUMBO_FRM_CUTOFF");
 }
 EXPORT_SYMBOL_GPL(icssg_ndo_get_stats64);
 
-- 
2.54.0

base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
branch: fix-icssg_common-pa-stats-errors__master-7-1

^ permalink raw reply related

* Re: [PATCH net] net: ethernet: ti: icssg: guard PA stat lookups
From: Philippe Schenker @ 2026-06-18  9:29 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, danishanwar, rogerq, linux-arm-kernel, stable,
	Andrew Lunn, David Carlier, David S. Miller, Eric Dumazet,
	Jacob Keller, Jakub Kicinski, Kevin Hao, Meghana Malladi,
	Paolo Abeni, Vadim Fedorenko, linux-kernel
In-Reply-To: <20260618091004.GG827683@horms.kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

Hi Simon

Thanks for the review and I'll send a v2 with that blank line removed.
Saw it right after sending the patch.

Philippe

On Thu, 2026-06-18 at 10:10 +0100, Simon Horman wrote:
> On Tue, Jun 16, 2026 at 04:35:34PM +0200, Philippe Schenker wrote:
> > From: Philippe Schenker <philippe.schenker@impulsing.ch>
> > 
> > icssg_ndo_get_stats64() unconditionally calls
> > emac_get_stat_by_name()
> > with FW PA stat names regardless of whether the PA stats block is
> > present on the hardware.  emac_get_stat_by_name() already guards
> > the
> > PA stats lookup with `if (emac->prueth->pa_stats)`; when that
> > pointer
> > is NULL the lookup falls through to netdev_err() and returns -
> > EINVAL.
> > Because ndo_get_stats64 is polled regularly by the networking stack
> > this produces thousands of log entries of the form:
> > 
> >   icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR
> > 
> > A secondary consequence is that the int(-EINVAL) return value is
> > implicitly widened to a near-ULLONG_MAX unsigned value when
> > accumulated
> > into the __u64 fields of rtnl_link_stats64, silently corrupting the
> > rx_errors, rx_dropped and tx_dropped counters reported by `ip -s
> > link`.
> > 
> > Every other PA-aware code path in the driver is already guarded
> > with
> > the same `if (emac->prueth->pa_stats)` check.  Apply the same guard
> > here.
> > 
> > Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")
> 
> nit: no blank line between tags
> 
> > 
> > Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
> > 
> > Cc: danishanwar@ti.com
> > Cc: rogerq@kernel.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: stable@vger.kernel.org
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply

* Re: [PATCH net 5/5] afs: Fix uncancelled rxrpc OOB message handler
From: Simon Horman @ 2026-06-18  9:29 UTC (permalink / raw)
  To: dhowells
  Cc: Simon Horman, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, d4n.for.sec, n05ec, jaltman, stable
In-Reply-To: <20260616155749.2125907-6-dhowells@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
afs: Fix uncancelled rxrpc OOB message handler

This commit cancels AFS OOB message processing during shutdown and moves it
to afs_wq. It also prevents queuing if the net namespace is no longer live.

Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com

> diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
> index fd2d260fb25f9..6241f9349f6b6 100644
> --- a/fs/afs/rxrpc.c
> +++ b/fs/afs/rxrpc.c
[ ... ]
> @@ -128,6 +128,7 @@ void afs_close_socket(struct afs_net *net)
>  	_enter("");
>  
>  	cancel_work_sync(&net->charge_preallocation_work);
> +	cancel_work_sync(&net->rx_oob_work);
>  	kernel_listen(net->socket, 0);
>  	flush_workqueue(afs_async_calls);
>  	flush_workqueue(afs_wq);

[Severity: High]
This is a pre-existing issue, but does placing the cancellation and flush
before kernel_sock_shutdown() leave a window open for the work to be queued
afterwards?

If kernel_sock_shutdown() is what synchronously disables the notify_oob
callback from firing, couldn't a delayed callback still queue work on afs_wq
after we've already flushed it here?

[ ... ]
> @@ -985,5 +986,6 @@ static void afs_rx_notify_oob(struct sock *sk, struct sk_buff *oob)
>  {
>  	struct afs_net *net = sk->sk_user_data;
>  
> -	schedule_work(&net->rx_oob_work);
> +	if (net->live)
> +		queue_work(afs_wq, &net->rx_oob_work);
>  }

[Severity: High]
This is a pre-existing issue, but does this lockless check of net->live allow
a race condition that leads to a use-after-free?

Consider this sequence:

CPU1 (teardown)
afs_net_exit()
    net->live = false;
    afs_close_socket()
        cancel_work_sync(&net->rx_oob_work);
        flush_workqueue(afs_wq);
        kernel_sock_shutdown()  // blocks waiting for recvmsg_lock

CPU2 (incoming OOB)
rxrpc_notify_socket_oob()
    spin_lock_irq(&rx->recvmsg_lock);
    afs_rx_notify_oob()
        // Reads net->live == true (stale read, or delayed execution)
        queue_work(afs_wq, &net->rx_oob_work);
    spin_unlock_irq(&rx->recvmsg_lock);

CPU1 then finishes kernel_sock_shutdown() and frees the net namespace, but
afs_wq is never flushed again.

Would moving cancel_work_sync() and flush_workqueue() to the end of
afs_close_socket(), after kernel_sock_shutdown() completes, prevent this race?

^ permalink raw reply

* Re: [PATCH net 4/5] afs: Fix further netns teardown to cancel the preallocation charger
From: Simon Horman @ 2026-06-18  9:29 UTC (permalink / raw)
  To: dhowells
  Cc: Simon Horman, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, d4n.for.sec, n05ec, jaltman, stable
In-Reply-To: <20260616155749.2125907-5-dhowells@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
afs: Fix further netns teardown to cancel the preallocation charger

This commit modifies the afs network namespace teardown process to wait for
the preallocation charger work item. It addresses a small window where the
work item could be requeued after being cancelled by flushing the workqueue
on which the charger runs.

Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com

> diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
> --- a/fs/afs/rxrpc.c
> +++ b/fs/afs/rxrpc.c
> @@ -130,6 +130,7 @@ void afs_close_socket(struct afs_net *net)
>  	cancel_work_sync(&net->charge_preallocation_work);
>  	kernel_listen(net->socket, 0);
>  	flush_workqueue(afs_async_calls);
> +	flush_workqueue(afs_wq);

[Severity: High]
Will flushing the system-wide global workqueue here cause cross-namespace
stalls during teardown?

It appears afs_wq is allocated once in afs_init() as a system-wide workqueue
shared by all AFS network namespaces:

fs/afs/main.c:afs_init() {
    ...
    afs_wq = alloc_workqueue("afs", WQ_PERCPU, 0);
    ...
}

Other namespaces use afs_wq for operations like cell management, which can
block for tens of seconds waiting for userspace DNS lookups via the
request_key mechanism.

Since afs_close_socket() runs in the context of the global cleanup_net
worker thread, does this stall the destruction of all network namespaces
system-wide until unrelated AFS work across the entire system finishes?

Would it be safer to explicitly call
cancel_work_sync(&net->charge_preallocation_work) a second time here instead
of flushing the shared queue?

^ permalink raw reply

* Re: [PATCH net 1/5] rxrpc: input: reject ACKALL outside transmit phase
From: Simon Horman @ 2026-06-18  9:27 UTC (permalink / raw)
  To: dhowells
  Cc: Simon Horman, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, d4n.for.sec, n05ec, jaltman, stable
In-Reply-To: <20260616155749.2125907-2-dhowells@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
rxrpc: input: reject ACKALL outside transmit phase

rxrpc_input_ackall() accepts ACKALL packets without checking whether
the call is in a state that can legitimately have outstanding transmit
buffers. A forged ACKALL can reach a new service call and trigger a null
pointer dereference.

> diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
> index ce761466b02d6..37881dffa898b 100644
> --- a/net/rxrpc/input.c
> +++ b/net/rxrpc/input.c

[ ... ]

> @@ -1214,8 +1214,22 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
>  static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb)
>  {
>  	struct rxrpc_ack_summary summary = { 0 };
> +	rxrpc_seq_t top = READ_ONCE(call->tx_top);
> +
> +	switch (__rxrpc_call_state(call)) {
> +	case RXRPC_CALL_CLIENT_SEND_REQUEST:
> +	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
> +	case RXRPC_CALL_SERVER_SEND_REPLY:
> +	case RXRPC_CALL_SERVER_AWAIT_ACK:
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	if (call->tx_bottom == top)
> +		return;
>  
> -	if (rxrpc_rotate_tx_window(call, call->tx_top, &summary))
> +	if (rxrpc_rotate_tx_window(call, top, &summary))

[Severity: High]
This isn't a bug introduced by this patch, but it appears the fix might be
incomplete. The patch prevents a NULL pointer dereference in
rxrpc_input_ackall() by checking if call->tx_bottom == top before calling
rxrpc_rotate_tx_window().

However, does the same vulnerability remain in rxrpc_receiving_reply()?

If a client in the RXRPC_CALL_CLIENT_SEND_REQUEST state receives a forged or
early DATA packet (acting as a reply) before it has queued any transmit data,
call->tx_top and call->tx_bottom will both be 0, and call->tx_queue will be
NULL.

rxrpc_receiving_reply() will then unconditionally call:

    if (!rxrpc_rotate_tx_window(call, top, &summary)) {

Because rxrpc_rotate_tx_window() uses a do { ... } while (before_eq(seq, to))
loop, it executes exactly once for seq = 1 and to = 0, which dereferences
the NULL tq:

    if (tq->bufs[ix]->flags & RXRPC_LAST_PACKET) {

Should the empty window check be placed at the top of rxrpc_rotate_tx_window()
instead, or mirrored in rxrpc_receiving_reply() to protect against early
replies?

>  		rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall);
>  }

^ permalink raw reply

* [PATCH net v2 2/2] dpaa2-switch: fix VLAN upper check not rejecting bridge join
From: Ioana Ciornei @ 2026-06-18  9:28 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
  Cc: f.fainelli, vladimir.oltean, linux-kernel
In-Reply-To: <20260618092813.432535-1-ioana.ciornei@nxp.com>

The blamed commit refactored the prechangeupper event handling but
failed to actually return an error in case
dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
on a port which tries to join a bridge. Fix this by returning err
instead of 0.

Fixes: 45035febc495 ("net: dpaa2-switch: refactor prechangeupper sanity checks")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
---
Changes in v2:
- none

 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
index 83ccefdac59f..858ba844ac51 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
@@ -2212,7 +2212,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
 	if (err) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Cannot join a bridge while VLAN uppers are present");
-		return 0;
+		return err;
 	}
 
 	netdev_for_each_lower_dev(upper_dev, other_dev, iter) {
-- 
2.25.1


^ permalink raw reply related

* [PATCH net v2 1/2] dpaa2-switch: do not accept VLAN uppers while bridged
From: Ioana Ciornei @ 2026-06-18  9:28 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
  Cc: f.fainelli, vladimir.oltean, linux-kernel
In-Reply-To: <20260618092813.432535-1-ioana.ciornei@nxp.com>

The dpaa2-switch driver does not support VLAN uppers while its ports are
bridged. This scenario tried to be prevented by rejecting a bridge join
while VLAN uppers exist but the reverse order was still possible.

This patches adds a check so that the dpaa2-switch also does not accept
VLAN uppers while bridged.

Fixes: f48298d3fbfa ("staging: dpaa2-switch: move the driver out of staging")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
---
Changes in v2:
- patch is new

 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
index 45f276c2c3ec..83ccefdac59f 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
@@ -2233,6 +2233,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
 static int dpaa2_switch_port_prechangeupper(struct net_device *netdev,
 					    struct netdev_notifier_changeupper_info *info)
 {
+	struct ethsw_port_priv *port_priv;
 	struct netlink_ext_ack *extack;
 	struct net_device *upper_dev;
 	int err;
@@ -2251,6 +2252,13 @@ static int dpaa2_switch_port_prechangeupper(struct net_device *netdev,
 
 		if (!info->linking)
 			dpaa2_switch_port_pre_bridge_leave(netdev);
+	} else if (is_vlan_dev(upper_dev)) {
+		port_priv = netdev_priv(netdev);
+		if (port_priv->fdb->bridge_dev) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Cannot accept VLAN uppers while bridged");
+			return -EOPNOTSUPP;
+		}
 	}
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related

* [PATCH net v2 0/2] dpaa2-switch: reject VLAN uppers while bridged
From: Ioana Ciornei @ 2026-06-18  9:28 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
  Cc: f.fainelli, vladimir.oltean, linux-kernel

The dpaa2-switch driver does not support VLAN uppers on its ports while
they are bridged. The check which should have prevented a port with a
VLAN upper to join bridge was poorly refactored and didn't actually
return an error. Patch 2/2 fixes that.

On the other hand, the driver didn't reject the addition of a VLAN upper
while bridged. Patch 1/2 fixes that.

Changes in v2:
- added patch 1/2

Ioana Ciornei (2):
  dpaa2-switch: do not accept VLAN uppers while bridged
  dpaa2-switch: fix VLAN upper check not rejecting bridge join

 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox