Netdev List
 help / color / mirror / Atom feed
* [PATCH iproute2-next] ipaddress: add support for showing IPv4 devconf attributes
From: Fernando Fernandez Mancera @ 2026-06-12 23:17 UTC (permalink / raw)
  To: netdev
  Cc: dsahern, stephen, davem, edumazet, kuba, pabeni, horms,
	Fernando Fernandez Mancera

This patch introduces support for showing IPv4 devconf attributes on
detailed output of an interface e.g "ip -d link show dev enp1s0".

Additionally, this refactors 'print_af_spec()' to sequentially process
both AF_INET and AF_INET6 attributes rather than returning early if
AF_INET6 is missing.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
Note/question: Is this too verbose? Maybe we should introduce a new
option to query this on itself? I do not think this will scale up when
adding IPv6.. although for IPv6 we can limit it to "-6" usage only.
---
 ip/ipaddress.c | 241 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 201 insertions(+), 40 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 6017bc83..b066ec53 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -23,6 +23,7 @@
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
 #include <linux/if_infiniband.h>
+#include <linux/ip.h>
 #include <linux/sockios.h>
 #include <linux/net_namespace.h>
 
@@ -294,53 +295,213 @@ static void print_linktype(FILE *fp, struct rtattr *tb)
 	close_json_object();
 }
 
+static void print_inet(FILE *fp, struct rtattr *inet_attr)
+{
+	struct rtattr *tb[IFLA_INET_MAX + 1];
+
+	parse_rtattr_nested(tb, IFLA_INET_MAX, inet_attr);
+
+	if (tb[IFLA_INET_CONF] && show_details) {
+		int *conf = RTA_DATA(tb[IFLA_INET_CONF]);
+		int max_elements = RTA_PAYLOAD(tb[IFLA_INET_CONF]) / sizeof(int);
+
+		if (max_elements >= IPV4_DEVCONF_FORWARDING)
+			print_string(PRINT_ANY, "forwarding", "forwarding %s ",
+				     conf[IPV4_DEVCONF_FORWARDING - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_MC_FORWARDING)
+			print_string(PRINT_ANY, "mc_forwarding", "mc_forwarding %s ",
+				     conf[IPV4_DEVCONF_MC_FORWARDING - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_PROXY_ARP)
+			print_string(PRINT_ANY, "proxy_arp", "proxy_arp %s ",
+				     conf[IPV4_DEVCONF_PROXY_ARP - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_ACCEPT_REDIRECTS)
+			print_string(PRINT_ANY, "accept_redirects",
+				     "accept_redirects %s ",
+				     conf[IPV4_DEVCONF_ACCEPT_REDIRECTS - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_SECURE_REDIRECTS)
+			print_string(PRINT_ANY, "secure_redirects",
+				     "secure_redirects %s ",
+				     conf[IPV4_DEVCONF_SECURE_REDIRECTS - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_SEND_REDIRECTS)
+			print_string(PRINT_ANY, "send_redirects", "send_redirects %s ",
+				     conf[IPV4_DEVCONF_SEND_REDIRECTS - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_SHARED_MEDIA)
+			print_string(PRINT_ANY, "shared_media", "shared_media %s ",
+				     conf[IPV4_DEVCONF_SHARED_MEDIA - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_RP_FILTER)
+			print_int(PRINT_ANY, "rp_filter", "rp_filter %d ",
+				  conf[IPV4_DEVCONF_RP_FILTER - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_ACCEPT_SOURCE_ROUTE)
+			print_string(PRINT_ANY, "accept_source_route",
+				     "accept_source_route %s ",
+				     conf[IPV4_DEVCONF_ACCEPT_SOURCE_ROUTE - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_BOOTP_RELAY)
+			print_string(PRINT_ANY, "bootp_relay", "bootp_relay %s ",
+				     conf[IPV4_DEVCONF_BOOTP_RELAY - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_LOG_MARTIANS)
+			print_string(PRINT_ANY, "log_martians", "log_martians %s ",
+				     conf[IPV4_DEVCONF_LOG_MARTIANS - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_TAG)
+			print_int(PRINT_ANY, "tag", "tag %d ",
+				  conf[IPV4_DEVCONF_TAG - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_ARPFILTER)
+			print_string(PRINT_ANY, "arpfilter", "arpfilter %s ",
+				     conf[IPV4_DEVCONF_ARPFILTER - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_MEDIUM_ID)
+			print_int(PRINT_ANY, "medium_id", "medium_id %d ",
+				  conf[IPV4_DEVCONF_MEDIUM_ID - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_NOXFRM)
+			print_string(PRINT_ANY, "noxfrm", "noxfrm %s ",
+				     conf[IPV4_DEVCONF_NOXFRM - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_NOPOLICY)
+			print_string(PRINT_ANY, "nopolicy", "nopolicy %s ",
+				     conf[IPV4_DEVCONF_NOPOLICY - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_FORCE_IGMP_VERSION)
+			print_int(PRINT_ANY, "force_igmp_version", "force_igmp_version %d ",
+				  conf[IPV4_DEVCONF_FORCE_IGMP_VERSION - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_ARP_ANNOUNCE)
+			print_int(PRINT_ANY, "arp_announce", "arp_announce %d ",
+				  conf[IPV4_DEVCONF_ARP_ANNOUNCE - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_ARP_IGNORE)
+			print_int(PRINT_ANY, "arp_ignore", "arp_ignore %d ",
+				  conf[IPV4_DEVCONF_ARP_IGNORE - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_PROMOTE_SECONDARIES)
+			print_string(PRINT_ANY, "promote_secondaries",
+				     "promote_secondaries %s ",
+				     conf[IPV4_DEVCONF_PROMOTE_SECONDARIES - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_ARP_ACCEPT)
+			print_int(PRINT_ANY, "arp_accept", "arp_accept %d ",
+				  conf[IPV4_DEVCONF_ARP_ACCEPT - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_ARP_NOTIFY)
+			print_string(PRINT_ANY, "arp_notify", "arp_notify %s ",
+				     conf[IPV4_DEVCONF_ARP_NOTIFY - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_ACCEPT_LOCAL)
+			print_string(PRINT_ANY, "accept_local", "accept_local %s ",
+				     conf[IPV4_DEVCONF_ACCEPT_LOCAL - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_SRC_VMARK)
+			print_string(PRINT_ANY, "src_vmark", " src_vmark %s",
+				     conf[IPV4_DEVCONF_SRC_VMARK - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_PROXY_ARP_PVLAN)
+			print_string(PRINT_ANY, "proxy_arp_pvlan", "proxy_arp_pvlan %s ",
+				     conf[IPV4_DEVCONF_PROXY_ARP_PVLAN - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_ROUTE_LOCALNET)
+			print_string(PRINT_ANY, "route_localnet", "route_localnet %s ",
+				     conf[IPV4_DEVCONF_ROUTE_LOCALNET - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_BC_FORWARDING)
+			print_string(PRINT_ANY, "bc_forwarding", "bc_forwarding %s ",
+				     conf[IPV4_DEVCONF_BC_FORWARDING - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL)
+			print_int(PRINT_ANY, "igmpv2_unsolicited_report_interval",
+				  "igmpv2_unsolicited_report_interval %d ",
+				  conf[IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL)
+			print_int(PRINT_ANY, "igmpv3_unsolicited_report_interval",
+				  "igmpv3_unsolicited_report_interval %d ",
+				  conf[IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL - 1]);
+
+		if (max_elements >= IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN)
+			print_string(PRINT_ANY, "ignore_routes_with_linkdown",
+				     "ignore_routes_with_linkdown %s ",
+				     conf[IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN - 1] ?
+				     "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST)
+			print_string(PRINT_ANY, "drop_unicast_in_l2_multicast",
+				     "drop_unicast_in_l2_multicast %s ",
+				     conf[IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST - 1] ?
+				     "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_DROP_GRATUITOUS_ARP)
+			print_string(PRINT_ANY, "drop_gratuitous_arp",
+				     "drop_gratuitous_arp %s ",
+				     conf[IPV4_DEVCONF_DROP_GRATUITOUS_ARP - 1] ? "on" : "off");
+
+		if (max_elements >= IPV4_DEVCONF_ARP_EVICT_NOCARRIER)
+			print_string(PRINT_ANY, "arp_evict_nocarrier",
+				     "arp_evict_nocarrier %s ",
+				     conf[IPV4_DEVCONF_ARP_EVICT_NOCARRIER - 1] ? "on" : "off");
+	}
+}
+
 static void print_af_spec(FILE *fp, struct rtattr *af_spec_attr)
 {
-	struct rtattr *inet6_attr;
 	struct rtattr *tb[IFLA_INET6_MAX + 1];
+	struct rtattr *inet6_attr;
+	struct rtattr *inet_attr;
 
-	inet6_attr = parse_rtattr_one_nested(AF_INET6, af_spec_attr);
-	if (!inet6_attr)
-		return;
+	inet_attr = parse_rtattr_one_nested(AF_INET, af_spec_attr);
+	if (inet_attr)
+		print_inet(fp, inet_attr);
 
-	parse_rtattr_nested(tb, IFLA_INET6_MAX, inet6_attr);
+	inet6_attr = parse_rtattr_one_nested(AF_INET6, af_spec_attr);
+	if (inet6_attr) {
+		parse_rtattr_nested(tb, IFLA_INET6_MAX, inet6_attr);
 
-	if (tb[IFLA_INET6_ADDR_GEN_MODE]) {
-		__u8 mode = rta_getattr_u8(tb[IFLA_INET6_ADDR_GEN_MODE]);
-		SPRINT_BUF(b1);
+		if (tb[IFLA_INET6_ADDR_GEN_MODE]) {
+			__u8 mode = rta_getattr_u8(tb[IFLA_INET6_ADDR_GEN_MODE]);
 
-		switch (mode) {
-		case IN6_ADDR_GEN_MODE_EUI64:
-			print_string(PRINT_ANY,
-				     "inet6_addr_gen_mode",
-				     "addrgenmode %s ",
-				     "eui64");
-			break;
-		case IN6_ADDR_GEN_MODE_NONE:
-			print_string(PRINT_ANY,
-				     "inet6_addr_gen_mode",
-				     "addrgenmode %s ",
-				     "none");
-			break;
-		case IN6_ADDR_GEN_MODE_STABLE_PRIVACY:
-			print_string(PRINT_ANY,
-				     "inet6_addr_gen_mode",
-				     "addrgenmode %s ",
-				     "stable_secret");
-			break;
-		case IN6_ADDR_GEN_MODE_RANDOM:
-			print_string(PRINT_ANY,
-				     "inet6_addr_gen_mode",
-				     "addrgenmode %s ",
-				     "random");
-			break;
-		default:
-			snprintf(b1, sizeof(b1), "%#.2hhx", mode);
-			print_string(PRINT_ANY,
-				     "inet6_addr_gen_mode",
-				     "addrgenmode %s ",
-				     b1);
-			break;
+			SPRINT_BUF(b1);
+			switch (mode) {
+			case IN6_ADDR_GEN_MODE_EUI64:
+				print_string(PRINT_ANY,
+					     "inet6_addr_gen_mode",
+					     "addrgenmode %s ",
+					     "eui64");
+				break;
+			case IN6_ADDR_GEN_MODE_NONE:
+				print_string(PRINT_ANY,
+					     "inet6_addr_gen_mode",
+					     "addrgenmode %s ",
+					     "none");
+				break;
+			case IN6_ADDR_GEN_MODE_STABLE_PRIVACY:
+				print_string(PRINT_ANY,
+					     "inet6_addr_gen_mode",
+					     "addrgenmode %s ",
+					     "stable_secret");
+				break;
+			case IN6_ADDR_GEN_MODE_RANDOM:
+				print_string(PRINT_ANY,
+					     "inet6_addr_gen_mode",
+					     " addrgenmode %s ",
+					     "random");
+				break;
+			default:
+				snprintf(b1, sizeof(b1), "%#.2hhx", mode);
+				print_string(PRINT_ANY,
+					     "inet6_addr_gen_mode",
+					     "addrgenmode %s ",
+					     b1);
+				break;
+			}
 		}
 	}
 }
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH net-next v3 2/2] rds: convert to getsockopt_iter: manual merge
From: Jakub Kicinski @ 2026-06-13  0:44 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Breno Leitao, Allison Henderson, linux-kernel, netdev, linux-rdma,
	rds-devel, linux-kselftest, kernel-team, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Shuah Khan, Andy Grover,
	Mark Brown, Linux Next Mailing List
In-Reply-To: <b91ff67e-ce74-4edf-a8b0-08be04586485@kernel.org>

On Fri, 12 Jun 2026 13:41:00 +0200 Matthieu Baerts wrote:
> > I was aware of the conflict but didn't realize a note would be helpful
> > for the merge. I should have included one.
> > 
> > Could you point me to an example commit/patch that contains such a note so I
> > can understand the expected format and procedure?  
> 
> In this particular example, I think it would have been easier to have
> waited for the fix to land in net-next -- after the weekly sync with net
> -- and then send the net-next patches.
> 
> When this cannot be avoided, then you can mention the conflict, and
> ideally share a diff of the resolution, plus a description, especially
> when it is not obvious, when simply saying "take the version from X" is
> helpful, when extra modifications are needed, etc. e.g. [1]. Something
> similar to what Mark is usually doing on the linux-next ML, or what I
> did here.

Thanks for explaining! This conflict was avoidable but I didn't find
the appropriately polite explanation within me :)

When conflicting code is _already committed_ to net-next we can deal
with the conflict. If there's a patch only posted but not commited and
we notice a bug - the net-next patch should be explicitly withdrawn and
reposted once the fix has propagated.

^ permalink raw reply

* Re: [PATCH bpf-next v3 6/7] bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
From: Kuniyuki Iwashima @ 2026-06-13  0:44 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: andrii, ast, bpf, cong.wang, daniel, davem, eddyz87, edumazet,
	emil, hawk, horms, ihor.solodrai, jakub, john.fastabend, jolsa,
	kuba, linux-kernel, linux-kselftest, martin.lau, memxor, netdev,
	pabeni, rhkrqnwk98, sdf, shuah, song, yonghong.song,
	Kuniyuki Iwashima
In-Reply-To: <20260612130919.299124-7-jiayuan.chen@linux.dev>

From: Jiayuan Chen <jiayuan.chen@linux.dev>
Date: Fri, 12 Jun 2026 21:07:50 +0800
> From: Sechang Lim <rhkrqnwk98@gmail.com>
> 
> start and len are u32, so
> 
> 	u64 last = start + len;
> 
> evaluates start + len in 32-bit and wraps before storing it in last.
> The bounds check
> 
> 	if (start >= offset + l || last > msg->sg.size)
> 		return -EINVAL;
> 
> can then be passed with an out-of-range start/len, after which the pop
> loop runs off the end of the scatterlist and sk_msg_shift_left() calls
> put_page() on the empty msg->sg.end slot:
> 
>   Oops: general protection fault, probably for non-canonical address
>   0xdffffc0000000001: 0000 [#1] SMP KASAN PTI
>   KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
>   RIP: 0010:sk_msg_shift_left net/core/filter.c:2957 [inline]
>   RIP: 0010:____bpf_msg_pop_data net/core/filter.c:3103 [inline]
>   RIP: 0010:bpf_msg_pop_data+0x753/0x1a10 net/core/filter.c:2984
>   Call Trace:
>    <TASK>
>    bpf_prog_4cc92c278f4d5d56+0x1b1/0x1e8
>    bpf_prog_run_pin_on_cpu+0x107/0x320 include/linux/filter.h:746
>    sk_psock_msg_verdict+0x357/0x7f0 net/core/skmsg.c:934
>    tcp_bpf_send_verdict net/ipv4/tcp_bpf.c:420 [inline]
>    tcp_bpf_sendmsg+0x766/0x1ae0 net/ipv4/tcp_bpf.c:583
>    __sock_sendmsg+0x153/0x1c0 net/socket.c:802
>    __sys_sendto+0x326/0x430 net/socket.c:2265
>    __x64_sys_sendto+0xe3/0x100 net/socket.c:2268
>    do_syscall_64+0x14c/0x480
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>    </TASK>
> 
> Widen the addition with a (u64) cast so the bound is evaluated in
> 64-bit and a len near U32_MAX no longer wraps below msg->sg.size.
> 
> While here, change pop from int to u32. It counts bytes against the
> unsigned scatterlist lengths and can never be negative, so the signed
> type only invites sign-confusion in the pop loop.
> 
> Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/7] bpf, sockmap: keep sk_msg copy state in sync
From: Kuniyuki Iwashima @ 2026-06-13  0:40 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: 2045gemini, andrii, ast, bpf, cong.wang, daniel, davem, eddyz87,
	edumazet, emil, hawk, horms, ihor.solodrai, jakub, john.fastabend,
	jolsa, kuba, linux-kernel, linux-kselftest, martin.lau, memxor,
	netdev, pabeni, rhkrqnwk98, rollkingzzc, sdf, shuah, song, stable,
	yonghong.song, Kuniyuki Iwashima
In-Reply-To: <20260612130919.299124-5-jiayuan.chen@linux.dev>

From: Jiayuan Chen <jiayuan.chen@linux.dev>
Date: Fri, 12 Jun 2026 21:07:48 +0800
> From: Zhang Cen <rollkingzzc@gmail.com>
> 
> SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries
> with this bit set are copied before data/data_end are exposed to SK_MSG
> BPF programs for direct packet access.
> 
> bpf_msg_pull_data(), bpf_msg_push_data(), and bpf_msg_pop_data()
> rewrite the sk_msg scatterlist ring by collapsing, splitting, and
> shifting entries. These operations move msg->sg.data[] entries, but the
> parallel copy bitmap can be left behind on the old slot. A copied entry
> can then return to msg->sg.start with its copy bit clear and be exposed
> as directly writable packet data.
> 
> This corruption path requires an attached SK_MSG BPF program that calls
> the mutating helpers; ordinary sockmap/TLS traffic that never runs
> push/pop/pull helper sequences is not affected.
> 
> Keep msg->sg.copy synchronized with scatterlist entry moves, preserve
> the copy bit when an entry is split, clear it when a helper replaces an
> entry with a private page, and clear slots vacated by pull-data
> compaction.
> 
> Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
> Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
> Cc: stable@vger.kernel.org
> Co-developed-by: Han Guidong <2045gemini@gmail.com>
> Reviewed-by: John Fastabend <john.fastabend@gmail.com>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Signed-off-by: Han Guidong <2045gemini@gmail.com>
> Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net] selftests: iou-zcrx: defer listen() until after zcrx setup
From: patchwork-bot+netdevbpf @ 2026-06-13  0:40 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, shuah, dw, axboe,
	cratiu, netdev, linux-kselftest, linux-kernel
In-Reply-To: <20260611160341.3697227-2-dtatulea@nvidia.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 11 Jun 2026 19:03:41 +0300 you wrote:
> The server binds the queues for zero-copy after listen(). If the client
> does a connect() during this time it can fail with EHOSTUNREACH on
> a cold system. This was encountered with the mlx5 driver where binding
> the .ndo_queue_start() is a slow operation during which no packets
> can be exchanged.
> 
> This change moves listen() after queue binding, when the test server is
> fully operational.
> 
> [...]

Here is the summary with links:
  - [net] selftests: iou-zcrx: defer listen() until after zcrx setup
    https://git.kernel.org/netdev/net-next/c/ec782be97d2d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2 0/2] ipv6: mcast: annotate data races in /proc/net/igmp6
From: patchwork-bot+netdevbpf @ 2026-06-13  0:30 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: davem, dsahern, edumazet, idosch, kuba, pabeni, horms,
	linux-kernel, netdev
In-Reply-To: <20260609081113.7613-1-sigefriedhyy@gmail.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  9 Jun 2026 17:11:11 +0900 you wrote:
> /proc/net/igmp6 walks IPv6 multicast memberships under RCU without
> holding idev->mc_lock, taking a lockless snapshot of two fields that
> writers update under the lock: mca_flags and mca_work.timer.expires.
> 
> Patch 1 adds WRITE_ONCE() to all mca_flags update sites and READ_ONCE()
> to the procfs reader.  Patch 2 does the same for the timer.expires read
> in the procfs path.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] ipv6: mcast: annotate data-races around mca_flags
    https://git.kernel.org/netdev/net-next/c/d0dc208808a2
  - [net-next,v2,2/2] ipv6: mcast: annotate igmp6 timer expiry race
    https://git.kernel.org/netdev/net-next/c/1ea2f885a76b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v3 0/5] net: mdio: realtek-rtl9300: Add RTL931x support
From: patchwork-bot+netdevbpf @ 2026-06-13  0:30 UTC (permalink / raw)
  To: Markus Stockhausen
  Cc: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, netdev,
	chris.packham, daniel, robh, krzk+dt, conor+dt, devicetree
In-Reply-To: <20260610194145.4153668-1-markus.stockhausen@gmx.de>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 10 Jun 2026 21:41:40 +0200 you wrote:
> The Realtek Otto switch platform consists of four different series
> 
> - RTL838x aka maple   : 28 port 1G Switches
> - RTL839x aka cypress : 52 port 1G Switches
> - RTL930x aka longan  : 28 port 1G/2.5G/10G Switches
> - RTL931x aka mango   : 56 port 1G/2.5G/10G Switches
> 
> [...]

Here is the summary with links:
  - [net-next,v3,1/5] dt-bindings: net: realtek,rtl9301-mdio: Add RTL931x series
    https://git.kernel.org/netdev/net-next/c/a390863b493e
  - [net-next,v3,2/5] net: mdio: realtek-rtl9300: Add prefix to register field defines
    https://git.kernel.org/netdev/net-next/c/29a540b56e51
  - [net-next,v3,3/5] net: mdio: realtek-rtl9300: Make otto_emdio_read_cmd() generic
    https://git.kernel.org/netdev/net-next/c/6e1d8b024de7
  - [net-next,v3,4/5] net: mdio: realtek-rtl9300: Add registers for high port count models
    https://git.kernel.org/netdev/net-next/c/3e8035b861c2
  - [net-next,v3,5/5] net: mdio: realtek-rtl9300: Add support for RTL931x
    https://git.kernel.org/netdev/net-next/c/5ebdcac59aff

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/4] Avoid mistaken parent class deactivation during peek
From: patchwork-bot+netdevbpf @ 2026-06-13  0:30 UTC (permalink / raw)
  To: Victor Nogueira
  Cc: davem, edumazet, kuba, pabeni, horms, jhs, jiri, netdev,
	anirudhrudr, pctammela, ij, henrist, chia-yu.chang
In-Reply-To: <20260610192855.3121513-1-victor@mojatatu.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 10 Jun 2026 16:28:51 -0300 you wrote:
> Several qdiscs (fq_codel, codel and dualpi2) may drop packets while
> peeking at their queue. When that happens they call
> qdisc_tree_reduce_backlog() to notify the parent of the backlog/qlen
> change. The problem is that they do so *before* reincrementing the qlen
> that peek had temporarily decremented.
> 
> If the qlen momentarily drops to zero while peek still has an skb to
> return, qdisc_tree_reduce_backlog() ends up invoking the parent's
> qlen_notify() callback even though the child is not actually empty. The
> parent then deactivates the class, while the child still holds a packet.
> For parents such as QFQ this desync corrupts the active class list and
> leads to wild memory accesses and NULL pointer dereferences (see the
> per-patch splats). For HFSC it might lead to stalls [1].
> 
> [...]

Here is the summary with links:
  - [net,1/4] net/sched: sch_fq_codel: Do not call qdisc_tree_reduce_backlog during peek before restoring qlen
    https://git.kernel.org/netdev/net/c/097f6fc7b1ae
  - [net,2/4] net/sched: sch_codel: Do not call qdisc_tree_reduce_backlog during peek before restoring qlen
    https://git.kernel.org/netdev/net/c/52f1da34c9f4
  - [net,3/4] net/sched: sch_dualpi2: Do not call qdisc_tree_reduce_backlog during peek before restoring qlen
    https://git.kernel.org/netdev/net/c/15cd0c93bf4f
  - [net,4/4] selftests/tc-testing: Verify child qdisc will not mistakenly deactivate QFQ parent
    https://git.kernel.org/netdev/net/c/101f1047c2f6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH bpf-next v3 3/7] bpf, sockmap: zero-initialize pages allocated in bpf_msg_push_data
From: Kuniyuki Iwashima @ 2026-06-13  0:28 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: andrii, ast, bestswngs, bpf, cong.wang, daniel, davem, eddyz87,
	edumazet, emil, hawk, horms, ihor.solodrai, jakub, john.fastabend,
	jolsa, kuba, linux-kernel, linux-kselftest, martin.lau, memxor,
	mmmxny, netdev, pabeni, rhkrqnwk98, sdf, shuah, song, xmei5,
	yonghong.song
In-Reply-To: <20260612130919.299124-4-jiayuan.chen@linux.dev>

From: Jiayuan Chen <jiayuan.chen@linux.dev>
Date: Fri, 12 Jun 2026 21:07:47 +0800
> From: Weiming Shi <bestswngs@gmail.com>
> 
> bpf_msg_push_data() allocates pages via alloc_pages() without
> __GFP_ZERO. In the non-copy path, the entire page of uninitialized
> heap content is added directly to the sk_msg scatterlist, which is
> then transmitted over TCP to userspace via tcp_bpf_push(). In the
> copy path, a gap of len bytes between the front and back memcpy
> regions is similarly left uninitialized.
> 
> This leads to a kernel heap information leak: stale page content
> including kernel pointers from the direct-map and vmemmap regions
> is transmitted to userspace, which can be used to defeat KASLR.
> 
> Add __GFP_ZERO to the alloc_pages() call to ensure the allocated
> page is always zeroed before it enters the scatterlist.
> 
> Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
> Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> Tested-by: Xiang Mei <xmei5@asu.edu>
> Tested-by: Xinyu Ma <mmmxny@gmail.com>
> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
>  net/core/filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3e555f276ba80..6e345ca65ca14 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2832,7 +2832,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  	if (unlikely(copy + len < copy))
>  		return -EINVAL;
>  
> -	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
> +	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP | __GFP_ZERO,

This is a red flag.

We have a bunch of KMSAN reports due to raw/packet sockets,
which requires CAP_NET_ADMIN, and leave them unfixed although
some people attempted to "fix" them by adding __GFP_ZERO to
__alloc_skb().



>  			   get_order(copy + len));
>  	if (unlikely(!page))
>  		return -ENOMEM;
> -- 
> 2.43.0

^ permalink raw reply

* [PATCH] net: airoha: Fix always-true condition in PPE1 queue reservation loop
From: Wayen.Yan @ 2026-06-13  0:23 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

In airoha_fe_pse_ports_init(), the inner condition for PPE1 queue
reservation is identical to the for-loop bound, making it always true
and the else branch dead code:

  for (q = 0; q < pse_port_num_queues[FE_PSE_PORT_PPE1]; q++) {
      if (q < pse_port_num_queues[FE_PSE_PORT_PPE1])  /* always true */
          set RSV_PAGES;
      else
          set 0;  /* unreachable */
  }

The intended behavior is to reserve pages only for the first half of
the queues, matching the PPE2 implementation on line 334 which
correctly uses the /2 divisor. Fix the PPE1 condition accordingly.

Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Wayen.Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11..999f517 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -311,7 +311,7 @@ static void airoha_fe_pse_ports_init(struct airoha_eth *eth)
 					 PSE_QUEUE_RSV_PAGES);
 	/* PPE1 */
 	for (q = 0; q < pse_port_num_queues[FE_PSE_PORT_PPE1]; q++) {
-		if (q < pse_port_num_queues[FE_PSE_PORT_PPE1])
+		if (q < pse_port_num_queues[FE_PSE_PORT_PPE1] / 2)
 			airoha_fe_set_pse_oq_rsv(eth, FE_PSE_PORT_PPE1, q,
 						 PSE_QUEUE_RSV_PAGES);
 		else
-- 
2.51.0



^ permalink raw reply related

* [PATCH] net: airoha: Fix non-standard return value in airoha_ppe_get_wdma_info()
From: Wayen.Yan @ 2026-06-13  0:22 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

airoha_ppe_get_wdma_info() returns -1 when the last path in the
forwarding path stack is not of type DEV_PATH_MTK_WDMA. This is not
a standard kernel error code. Replace it with -EINVAL since the
input path type is invalid from the caller's perspective.

Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Wayen.Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_ppe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 5c9dff6..7260177 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -264,7 +264,7 @@ static int airoha_ppe_get_wdma_info(struct net_device *dev, const u8 *addr,
 
 	path = &stack.path[stack.num_paths - 1];
 	if (path->type != DEV_PATH_MTK_WDMA)
-		return -1;
+		return -EINVAL;
 
 	info->idx = path->mtk_wdma.wdma_idx;
 	info->bss = path->mtk_wdma.bss;
-- 
2.51.0



^ permalink raw reply related

* Re: [PATCH bpf-next v3 2/7] bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
From: Kuniyuki Iwashima @ 2026-06-13  0:17 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: andrii, ast, bestswngs, bpf, cong.wang, daniel, davem, eddyz87,
	edumazet, emil, hawk, horms, ihor.solodrai, jakub, john.fastabend,
	jolsa, kuba, linux-kernel, linux-kselftest, martin.lau, memxor,
	netdev, pabeni, rhkrqnwk98, sdf, shuah, song, xmei5,
	yonghong.song, Kuniyuki Iwashima
In-Reply-To: <20260612130919.299124-3-jiayuan.chen@linux.dev>

From: Jiayuan Chen <jiayuan.chen@linux.dev>
Date: Fri, 12 Jun 2026 21:07:46 +0800
> From: Weiming Shi <bestswngs@gmail.com>
> 
> When bpf_msg_push_data() splits a scatterlist element into head and
> tail, the tail's page offset is advanced by `start` (absolute message
> byte offset) instead of `start - offset` (byte position within the
> element). This makes rsge.offset overshoot by `offset` bytes, pointing
> to the wrong location within the page or beyond its boundary. Consumers
> of the corrupted entry either silently read wrong data or trigger an
> out-of-bounds access.
> 
>  BUG: KASAN: slab-use-after-free in bpf_msg_pull_data (net/core/filter.c:2728)
>  Read of size 32752 at addr ffff8881042f0010 by task poc/130
>  Call Trace:
>   __asan_memcpy (mm/kasan/shadow.c:105)
>   bpf_msg_pull_data (net/core/filter.c:2728)
>   bpf_prog_run_pin_on_cpu (include/linux/bpf.h:1402)
>   sk_psock_msg_verdict (net/core/skmsg.c:934)
>   tcp_bpf_send_verdict (net/ipv4/tcp_bpf.c:421)
>   sock_sendmsg_nosec (net/socket.c:727)
> 
> Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH bpf-next v3 1/7] bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
From: Kuniyuki Iwashima @ 2026-06-13  0:09 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: andrii, ast, bestswngs, bpf, cong.wang, daniel, davem, eddyz87,
	edumazet, emil, hawk, horms, ihor.solodrai, jakub, john.fastabend,
	jolsa, kuba, linux-kernel, linux-kselftest, martin.lau, memxor,
	mmmxny, netdev, pabeni, rhkrqnwk98, sdf, shuah, song, xmei5,
	yonghong.song, Kuniyuki Iwashima
In-Reply-To: <20260612130919.299124-2-jiayuan.chen@linux.dev>

From: Jiayuan Chen <jiayuan.chen@linux.dev>
Date: Fri, 12 Jun 2026 21:07:45 +0800
> From: Weiming Shi <bestswngs@gmail.com>
> 
> When the scatterlist ring is full or nearly full, bpf_msg_push_data()
> enters a copy fallback path and computes copy + len for the page
> allocation size. Since len comes from BPF with arg3_type = ARG_ANYTHING
> and both are u32, a crafted len can wrap the sum to a small value,
> causing an undersized allocation followed by an out-of-bounds memcpy.
> 
>  BUG: unable to handle page fault for address: ffffed104089a402
>  Oops: Oops: 0000 [#1] SMP KASAN NOPTI
>  Call Trace:
>   __asan_memcpy (mm/kasan/shadow.c:105)
>   bpf_msg_push_data (net/core/filter.c:2852 net/core/filter.c:2788)
>   bpf_prog_9ed8b5711920a7d7+0x2e/0x36
>   sk_psock_msg_verdict (net/core/skmsg.c:934)
>   tcp_bpf_sendmsg (net/ipv4/tcp_bpf.c:421 net/ipv4/tcp_bpf.c:584)
>   __sys_sendto (net/socket.c:2206)
>   do_syscall_64 (arch/x86/entry/syscall_64.c:94)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> 
> Add an overflow check before the allocation.
> 
> Link: https://lore.kernel.org/all/20260424155913.A19FDC19425@smtp.kernel.org
> Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> Tested-by: Xiang Mei <xmei5@asu.edu>
> Tested-by: Xinyu Ma <mmmxny@gmail.com>
> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> To sashiko:
> 
> Regarding bpf_msg_push_data() reading "copy = msg->sg.data[i].length" with
> i == msg->sg.end (appending at the very end of a full/near-full ring):
> 
> This is pre-existing code, not touched by this series, and reproducing it needs
> a narrow combination -- a pure append at the end so the loop exits with
> i == msg->sg.end, a full/near-full ring, plus a prior push/pop history that
> leaves a stale length in the otherwise-unused end slot. A freshly built ring
> zeroes that slot, so copy stays 0. We don't consider it practically reproducible.
> 
> Even then it's already covered: the overflow check in patch 1 ("copy + len <
> copy") rejects the dangerous case, and __GFP_ZERO in patch 3 prevents any data
> exposure. Not worth fixing here.
> ---
>  net/core/filter.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9590877b0714f..3c8f1cedb217f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2829,6 +2829,9 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  	if (!space || (space == 1 && start != offset))
>  		copy = msg->sg.data[i].length;
>  
> +	if (unlikely(copy + len < copy))
> +		return -EINVAL;

Who wants to push E2BIG "metadata or option" ? :(
https://docs.ebpf.io/linux/helper-function/bpf_msg_push_data/

I feel this is the same class of "bug" we discussed recently,
but given the change is just small validation,

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* [PATCH net v2 2/2] vsock/virtio: restore msg_iter on transmission failure
From: Octavian Purdila @ 2026-06-13  0:09 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo, Octavian Purdila,
	syzbot+28e5f3d207b14bae122a
In-Reply-To: <20260613000953.467473-1-tavip@google.com>

When transmission fails in virtio_transport_send_pkt_info, the msg_iter
might have been partially advanced. If we don't restore it, the next
attempt to send data will use an incorrect iterator state, leading to
desync and warnings like "send_pkt() returns 0, but X expected".

Specifically, this can happen in the following scenario, triggered by
the syzkaller repro:

1. A write-only VMA (PROT_WRITE only) is partially populated by a
   prior TUN write that failed with -EIO but still faulted in some
   pages).
2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
   buffer from this VMA.
3. The first packet (64KB) is sent successfully because the pages are
   populated.
4. The second packet allocation fails because GUP fast pins the first page
   but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
   permissions.
5. The iterator is advanced by the partially successful GUP (68KB total
   advanced: 64KB from first packet + 4KB from second), but the send loop
   breaks and only reports 64KB sent. This creates a 4KB desync.
6. The next retry starts with a non-zero iov_offset, disabling zerocopy
   and falling back to copy mode.
7. In copy mode, the transmission succeeds for the next packets but
   exhausts the iterator early because of the desync.
8. The final retry sees an empty iterator but zerocopy is re-enabled
   (offset resets). It attempts to send the remaining bytes with zerocopy
   but pins 0 pages, creating an empty packet.
9. The transport sends the empty packet, triggering the warning because
   the returned bytes (header only) do not match the expected payload size.
10. The loop continues to spin, allocating ubuf_info each time, eventually
    exhausting sysctl_optmem_max and returning -ENOMEM to userspace.

Restore msg_iter to its original state before the packet allocation
and transmission attempt if they fail.

Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
Assisted-by: gemini:gemini-3.1-pro
Signed-off-by: Octavian Purdila <tavip@google.com>
---
 net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b10666937c490..2baa5a6ebd750 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 	u32 src_cid, src_port, dst_cid, dst_port;
 	const struct virtio_transport *t_ops;
+	struct iov_iter_state msg_iter_state;
 	struct virtio_vsock_sock *vvs;
 	struct ubuf_info *uarg = NULL;
 	u32 pkt_len = info->pkt_len;
@@ -368,8 +369,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 		struct sk_buff *skb;
 		size_t skb_len;
 
+		/* Save iterator state in case allocation or transmission fails
+		 * so we can restore it and retry.
+		 */
+		if (info->msg)
+			iov_iter_save_state(&info->msg->msg_iter, &msg_iter_state);
+
 		skb_len = min(max_skb_len, rest_len);
 
+		/* Note: virtio_transport_alloc_skb() can advance info->msg->msg_iter
+		 * even if it fails (e.g. partial GUP success).
+		 */
 		skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
 						 uarg,
 						 src_cid, src_port,
@@ -399,6 +409,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 			break;
 	} while (rest_len);
 
+	if (info->msg && ret < 0)
+		iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
 	virtio_transport_put_credit(vvs, rest_len);
 
 	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related

* [PATCH net v2 1/2] iov_iter: export iov_iter_restore
From: Octavian Purdila @ 2026-06-13  0:09 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo, Octavian Purdila
In-Reply-To: <20260613000953.467473-1-tavip@google.com>

Export iov_iter_restore so that it can be used by modules.

This is needed by the virtio vsock transport (which can be built as a
module) to restore the msg_iter state when transmission fails.

Signed-off-by: Octavian Purdila <tavip@google.com>
---
 lib/iov_iter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 243662af1af73..067e745f9ef53 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 		i->__iov -= state->nr_segs - i->nr_segs;
 	i->nr_segs = state->nr_segs;
 }
+EXPORT_SYMBOL(iov_iter_restore);
 
 /*
  * Extract a list of contiguous pages from an ITER_FOLIOQ iterator.  This does
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related

* [PATCH net v2 0/2] vsock/virtio: fix msg_iter desync on transmission failure
From: Octavian Purdila @ 2026-06-13  0:09 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo, Octavian Purdila

This series fixes a msg_iter desync issue in the virtio vsock transport
that can lead to warnings and eventual -ENOMEM under specific failure
scenarios (e.g. partial GUP failure during MSG_ZEROCOPY transmission).

To fix this, we need to restore the msg_iter state on transmission failure.
However, since virtio vsock transport can be built as a module, we first
need to export iov_iter_restore.

Patch 1 exports iov_iter_restore.
Patch 2 implements the msg_iter restoration in virtio vsock.

Changes in v2:
- Use iov_iter_savestate()/iov_iter_restore() (Stefano)
- Use a single restore point (Stefano)
- Reverse xmas tree (Stefano)
- Added comments in the code (Stefano)

v1: https://lore.kernel.org/all/20260609004809.1285028-1-tavip@google.com/

Octavian Purdila (2):
  iov_iter: export iov_iter_restore
  vsock/virtio: restore msg_iter on transmission failure

 lib/iov_iter.c                          |  1 +
 net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
 2 files changed, 14 insertions(+)

-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply

* Re: [PATCH net v2 2/2] geneve: validate inner network offset in geneve_gro_complete()
From: Xiang Mei @ 2026-06-13  0:00 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jakub Kicinski, Eric Dumazet, Andrew Lunn,
	David S . Miller, Weiming Shi, Kyle Zeng
In-Reply-To: <fcc21647-d7ac-4b9b-ac61-42fec517cb99@redhat.com>

On Thu, Jun 11, 2026 at 12:47:52PM +0200, Paolo Abeni wrote:
> On 6/9/26 6:13 AM, Xiang Mei wrote:
> > Even with both paths gated on gs->gro_hint, geneve_gro_complete()
> > re-derives the inner dispatch type and length from the packet and the
> > current gs->gro_hint, independently of geneve_gro_receive(). The two can
> > disagree if gs->gro_hint flips under a concurrent geneve_quiesce()/
> > geneve_unquiesce() (sk_user_data is NULL across a synchronize_net()), or if
> > the re-read option bytes differ from the ones receive parsed.
> > 
> > geneve_gro_receive() already records the inner network header position in
> > NAPI_GRO_CB()->inner_network_offset. Have geneve_gro_complete() check the
> > offset it is about to dispatch at against that value, adding ETH_HLEN in
> > the ETH_P_TEB case where eth_gro_complete() steps over the inner MAC
> > header, and bail out on mismatch instead of trusting the re-derivation.
> > 
> > Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
> > Assisted-by: Claude:claude-opus-4-8
> > Tested-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > ---
> > v2: Add patch for race condition found by Sashiko
> > 
> >  drivers/net/geneve.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> > index d0dc5d6c46df..028740e97740 100644
> > --- a/drivers/net/geneve.c
> > +++ b/drivers/net/geneve.c
> > @@ -956,6 +956,19 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
> >  	type = gh->proto_type;
> >  	geneve_sk_gro_hint_off(sk, gh, &type, &gh_len);
> >  
> > +	/* Bail out if our inner network offset disagrees with gro_receive().
> > +	 * ETH_P_TEB adds ETH_HLEN for the inner MAC header.
> > +	 */
> > +	if (skb->encapsulation) {
> 
> I think the disagreement could happen even in the opposite direction,
> i.e. gro_receives does not see hints available, but gro_complete does.
> 
> > +		unsigned int inner_nh = nhoff + gh_len;
> > +
> > +		if (type == htons(ETH_P_TEB))
> > +			inner_nh += ETH_HLEN;
> 
> This does not work in when the innermost headers carry a vlan tag.

Yes, you're right. the check is wrong.


inner_network_offset is the inner L3 offset recorded by gro_receive. For a
tagged inner frame eth_gro_receive hands off to vlan_gro_receive, which
pulls the VLAN tag before the inner ip*_gro_receive records it, so it ends
up at nhoff + gh_len + ETH_HLEN + n*VLAN_HLEN. We only add ETH_HLEN, so
'!=' drops a valid frame.

Do you think a lower bound works?

For example:

	if (skb->encapsulation) {
		unsigned int inner_nh = nhoff + gh_len;

		if (type == htons(ETH_P_TEB))
			inner_nh += ETH_HLEN;

		if (unlikely(inner_nh > NAPI_GRO_CB(skb)->inner_network_offset))
			return -EINVAL;
	}

A VLAN tag only pushes inner_network_offset later, so a valid packet has
inner_nh <= inner_network_offset and passes. The opposite direction you
mentioned (receive does not see the hint, complete does) inflates gh_len
in complete, so inner_nh goes past inner_network_offset and is rejected.

The only case this does not reject is the reverse (receive uses the hint,
complete does not), where inner_nh lands before inner_network_offset.
That is in bounds and only reachable across a concurrent
geneve_changelink() quiesce/unquiesce, so not attacker reachable. If you
want that too, we need to record the offset gro_receive used in
napi_gro_cb and compare by '=='.

Xiang

> 
> /P
> 

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: remove tls_toe
From: patchwork-bot+netdevbpf @ 2026-06-13  0:00 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: netdev, ayush.sawal, john.fastabend, kuba, davem, andrew+netdev,
	edumazet, pabeni, horms
In-Reply-To: <cover.1781165969.git.sd@queasysnail.net>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 11 Jun 2026 12:21:32 +0200 you wrote:
> This series removes the tls_toe feature, its single user (chtls), and
> cleans up the EXPORT_SYMBOL()s that no other module requires.
> 
> Driver changes only compile-tested.
> 
> v2:
>  - fix small issues in the docs clean up
>  - also remove NETIF_F_HW_TLS_RECORD (Sashiko)
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] tls: remove tls_toe and the related driver
    https://git.kernel.org/netdev/net-next/c/cdae65fc43f2
  - [net-next,v2,2/2] net: remove some unused EXPORT_SYMBOL()s
    https://git.kernel.org/netdev/net-next/c/f51a442dc15a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/5] rxrpc: Miscellaneous fixes
From: patchwork-bot+netdevbpf @ 2026-06-13  0:00 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, marc.dionne, kuba, davem, edumazet, pabeni, horms,
	linux-afs, linux-kernel
In-Reply-To: <20260609140911.838677-1-dhowells@redhat.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  9 Jun 2026 15:09:04 +0100 you wrote:
> Here are some miscellaneous AF_RXRPC fixes:
> 
>  (1) Make sure rxrpc_verify_data() allocates a buffer, even if the DATA packet
>      being looked at is zero length to avoid potential NULL-pointer
>      exceptions.
> 
>  (2) Don't move an OOB message (e.g. an RxGK CHALLENGE) off the receive queue
>      onto the pending queue in recvmsg() if MSG_PEEK is specified.
> 
> [...]

Here is the summary with links:
  - [net,1/5] rxrpc: rxrpc_verify_data ensure rx_dec_buffer alloc
    https://git.kernel.org/netdev/net/c/16c8ae9735c5
  - [net,2/5] rxrpc: Don't move a peeked OOB message onto the pending queue
    https://git.kernel.org/netdev/net/c/5801cff7d5d7
  - [net,3/5] rxrpc: Fix UAF in rxgk_issue_challenge()
    https://git.kernel.org/netdev/net/c/107a4cb0d47e
  - [net,4/5] afs: Fix netns teardown to cancel the preallocation charger
    https://git.kernel.org/netdev/net/c/47694fbc9d24
  - [net,5/5] rxrpc: serialize kernel accept preallocation with socket teardown
    https://git.kernel.org/netdev/net/c/dc175389b18c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] ethtool: tsconfig: always take rtnl_lock
From: patchwork-bot+netdevbpf @ 2026-06-12 23:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, andrew,
	gal, jacob.e.keller, sdf, kory.maincent
In-Reply-To: <20260611200355.2020663-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 11 Jun 2026 13:03:55 -0700 you wrote:
> mlx5 throws ASSERT_RTNL() warnings on timestamp config, because
> it tries to update features. mlx5e_hwtstamp_set() calls
> netdev_update_features().
> 
> I missed this while grepping the drivers because tsconfig goes
> through ndo_hwtstamp_set/get, not ethtool ops, even tho the new
> uAPI is in ethtool Netlink. We could add a dedicated opt out bit
> for mlx5, but NDOs were not supposed to be part of the ethtool locking
> conversion in the first place.
> 
> [...]

Here is the summary with links:
  - [net-next] ethtool: tsconfig: always take rtnl_lock
    https://git.kernel.org/netdev/net-next/c/f48cd5b47bfe

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] ip_tunnel: annotate data-races around t->err_count and t->err_time
From: patchwork-bot+netdevbpf @ 2026-06-12 23:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, idosch, dsahern, netdev, eric.dumazet
In-Reply-To: <20260611165247.2710257-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 11 Jun 2026 16:52:47 +0000 you wrote:
> ip_tunnel_xmit() runs locklessly (dev->lltx == true).
> 
> ipgre_err() and ipip_err() also run locklessly.
> 
> We need to add READ_ONCE() and WRITE_ONCE() annotations
> around t->err_count and t->err_time.
> 
> [...]

Here is the summary with links:
  - [net] ip_tunnel: annotate data-races around t->err_count and t->err_time
    https://git.kernel.org/netdev/net-next/c/80a7e3507d86

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v3] tcp: clear sock_ops cb flags before force-closing a child socket
From: patchwork-bot+netdevbpf @ 2026-06-12 23:50 UTC (permalink / raw)
  To: Sechang Lim
  Cc: edumazet, ncardwell, davem, kuba, pabeni, kuniyu, horms, brakmo,
	ast, jiayuan.chen, netdev, linux-kernel, bpf
In-Reply-To: <20260611092923.1895982-1-rhkrqnwk98@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 11 Jun 2026 09:29:18 +0000 you wrote:
> A child socket inherits the listener's bpf_sock_ops_cb_flags via
> sk_clone_lock(). If its setup fails in tcp_v4_syn_recv_sock() /
> tcp_v6_syn_recv_sock(), the child is freed through put_and_exit, where
> inet_csk_prepare_forced_close() drops the socket lock and tcp_done() runs
> without it.
> 
> If BPF_SOCK_OPS_STATE_CB_FLAG was inherited, tcp_done() -> tcp_set_state()
> calls tcp_call_bpf(), which expects the lock and trips sock_owned_by_me():
> 
> [...]

Here is the summary with links:
  - [net,v3] tcp: clear sock_ops cb flags before force-closing a child socket
    https://git.kernel.org/netdev/net/c/990348e5bb45

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/1] net: atm: reject out-of-range traffic classes in QoS validation
From: patchwork-bot+netdevbpf @ 2026-06-12 23:50 UTC (permalink / raw)
  To: Ren Wei; +Cc: linux-atm-general, netdev, 3chas3, yuantan098, bird, zcliangcn
In-Reply-To: <58f02c6f73d9818fd5d2022e1116759fdde6116b.1780965530.git.zcliangcn@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  9 Jun 2026 16:34:37 +0800 you wrote:
> From: Zhengchuan Liang <zcliangcn@gmail.com>
> 
> Reject ATM traffic classes above ATM_ANYCLASS in check_tp().
> SO_ATMQOS stores the supplied QoS after check_qos() succeeds, so
> accepting larger values leaves invalid traffic_class values in
> vcc->qos.
> 
> [...]

Here is the summary with links:
  - [net,1/1] net: atm: reject out-of-range traffic classes in QoS validation
    https://git.kernel.org/netdev/net/c/cdf19f380e46

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2 net] virtio_net: do not allow tunnel csum offload for non GSO packets
From: patchwork-bot+netdevbpf @ 2026-06-12 23:50 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem,
	edumazet, kuba, virtualization, g.goller, f.ebner
In-Reply-To: <6c3b6c47fb05c100f384630dc48f3975cf37b67a.1781195144.git.pabeni@redhat.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 11 Jun 2026 18:36:48 +0200 you wrote:
> Fiona reports broken connectivity for virtio net setup using UDP tunnel
> inside the guest and NIC with not UDP tunnel TSO support in the host.
> 
> Currently the virtio_net driver exposes csum offload for UDP-tunneled,
> TCP non GSO packets. Such packet reach the host as CSUM_PARTIAL ones
> with the 'encapsulation' flag cleared, as the virtio specification do
> not support this specific kind of offload.
> 
> [...]

Here is the summary with links:
  - [v2,net] virtio_net: do not allow tunnel csum offload for non GSO packets
    https://git.kernel.org/netdev/net/c/86c51f0f2313

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 4/5] afs: Fix netns teardown to cancel the preallocation charger
From: Jakub Kicinski @ 2026-06-12 23:48 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Marc Dionne, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, linux-afs, linux-kernel, Li Daming, Ren Wei,
	Jeffrey Altman, stable
In-Reply-To: <20260609140911.838677-5-dhowells@redhat.com>

On Tue,  9 Jun 2026 15:09:08 +0100 David Howells wrote:
> Fix the teardown of an afs network namespace to make sure it cancels the
> work item that keeps the preallocated rxrpc call/conn/peer queue charged
> before incoming calls are disabled (i.e. listen 0).
> 
> Also, if net->live is false because the afs netns is being deleted, make
> afs_charge_preallocation() skip charging and make afs_rx_new_call() avoid
> requeuing the charger.
> 
> (This was found by AI review).

Both Sashikos think this patch is still racy FWIW but not a blocker IMO

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox