Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net/sched: act_bpf: use rcu_dereference_bh() to read the filter
From: patchwork-bot+netdevbpf @ 2026-07-01  2:20 UTC (permalink / raw)
  To: Sechang Lim
  Cc: davem, edumazet, kuba, pabeni, jhs, jiri, daniel, john.fastabend,
	sdf, ast, andrii, martin.lau, horms, bpf, netdev, linux-kernel
In-Reply-To: <20260629154112.1164986-1-rhkrqnwk98@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Jun 2026 15:41:06 +0000 you wrote:
> tcf_bpf_act() can run from the tc egress path, which holds only
> rcu_read_lock_bh(), but reads prog->filter with rcu_dereference() and
> trips lockdep:
> 
>   WARNING: suspicious RCU usage
>   net/sched/act_bpf.c:47 suspicious rcu_dereference_check() usage!
>   1 lock held by syz.2.1588/12756:
>    #0: (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit net/core/dev.c:4792
>    tcf_bpf_act+0x6ae/0x940 net/sched/act_bpf.c:47
>    tcf_classify+0x6e4/0x1080 net/sched/cls_api.c:1860
>    sch_handle_egress net/core/dev.c:4545 [inline]
>    __dev_queue_xmit+0x2185/0x2c00 net/core/dev.c:4808
>    packet_sendmsg+0x3dfa/0x5120 net/packet/af_packet.c:3114
> 
> [...]

Here is the summary with links:
  - [net] net/sched: act_bpf: use rcu_dereference_bh() to read the filter
    https://git.kernel.org/netdev/net/c/adc49c7ba690

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net] gue: validate REMCSUM private option length
From: Qihang @ 2026-07-01  2:26 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, kuba, davem, pabeni, Qihang

GUE private flags can indicate that remote checksum offload metadata is
present. The private flags field itself is accounted for by
guehdr_flags_len(), but guehdr_priv_flags_len() currently returns 0 even
when GUE_PFLAG_REMCSUM is set.

This lets a packet with only the private flags field pass
validate_gue_flags(), after which gue_remcsum() and gue_gro_remcsum()
read the missing REMCSUM start/offset fields from the following bytes.

Account for GUE_PLEN_REMCSUM when GUE_PFLAG_REMCSUM is present so that
malformed packets are rejected during option validation.

Fixes: c1aa8347e73e ("gue: Protocol constants for remote checksum offload")
Signed-off-by: Qihang <q.h.hack.winter@gmail.com>
---
 include/net/gue.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/gue.h b/include/net/gue.h
index dfca298bec9c..caefd6da8693 100644
--- a/include/net/gue.h
+++ b/include/net/gue.h
@@ -80,7 +80,7 @@ static inline size_t guehdr_flags_len(__be16 flags)

 static inline size_t guehdr_priv_flags_len(__be32 flags)
 {
-	return 0;
+	return (flags & GUE_PFLAG_REMCSUM) ? GUE_PLEN_REMCSUM : 0;
 }

 /* Validate standard and private flags. Returns non-zero (meaning invalid)
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related

* Re: [PATCH iproute2-next v2 2/2] devlink: support u64-array values in devlink param show/set
From: Ratheesh Kannoth @ 2026-07-01  2:29 UTC (permalink / raw)
  To: David Ahern
  Cc: stephen, kuba, linux-kernel, netdev, andrew+netdev, edumazet,
	pabeni, jiri
In-Reply-To: <9d9cceae-3934-4dd6-ae8e-af995ae6b0ab@kernel.org>

On 2026-06-30 at 20:06:17, David Ahern (dsahern@kernel.org) wrote:
> On 6/29/26 7:50 PM, Ratheesh Kannoth wrote:
> > diff --git a/devlink/devlink.c b/devlink/devlink.c
> > index 9372e92f..3c29601d 100644
> > --- a/devlink/devlink.c
> > +++ b/devlink/devlink.c
> > @@ -3496,13 +3496,115 @@ static const struct param_val_conv param_val_conv[] = {
> >  };
> >
> >  #define PARAM_VAL_CONV_LEN ARRAY_SIZE(param_val_conv)
> > +#define DEVLINK_PARAM_MAX_ARRAY_SIZE 32
>
> Why 32? Is that based on current code?
Yes, this aligns with the current kernel-side limits. See:
https://lore.kernel.org/all/20260609040453.711932-5-rkannoth@marvell.com/

>How does the kernel side handle
> the number of parameters? What happens if the kernel sends more than 32
> parameters - from a user's perspective, not this code and processing the
> output?
The kernel strictly validates and restricts the number of parameters. To be safe, this patch
adds an explicit bounds check to prevent userspace issues if that threshold is ever crossed.

Ideally, since "union devlink_param_value" is omitted from the UAPI, we have to define
DEVLINK_PARAM_MAX_ARRAY_SIZE here. Moving the underlying structures to the UAPI in the
future would allow us to share a single definition and avoid this hardcoded value in userspace.

>
> > +
> > +struct devlink_param_u64_array {
> > +	uint64_t size;
> > +	uint64_t val[DEVLINK_PARAM_MAX_ARRAY_SIZE];
> > +};
> > +
> > +
> > +static int param_value_u64_array_put_from_str(struct nlmsghdr *nlh,
> > +					      const char *param_value,
> > +					      const struct devlink_param_u64_array *cur)
> > +{
> > +	struct devlink_param_u64_array new_arr = {};
> > +	char *copy, *token, *saveptr = NULL;
> > +	char delim[] = " ,";
> > +	uint64_t val;
> > +	int err;
> > +
> > +	copy = strdup(param_value);
> > +	if (!copy)
> > +		return -ENOMEM;
> > +
> > +	token = strtok_r(copy, delim, &saveptr);
> > +	while (token) {
> > +		if (new_arr.size >= DEVLINK_PARAM_MAX_ARRAY_SIZE) {
> > +			free(copy);
> > +			pr_err("Too many array elements (max %d)\n",
> > +			       DEVLINK_PARAM_MAX_ARRAY_SIZE);
> > +			return -EINVAL;
> > +		}
> > +		err = get_u64((__u64 *)&val, token, 10);
> > +		if (err) {
> > +			free(copy);
> > +			pr_err("Value \"%s\" is not a number or not within range\n",
> > +			       token);
> > +			return err;
> > +		}
> > +		new_arr.val[new_arr.size++] = val;
> > +		token = strtok_r(NULL, delim, &saveptr);
> > +	}
> > +	free(copy);
> > +
> > +	if (cur && param_value_u64_array_equal(&new_arr, cur))
> > +		return 1;
> > +
> > +	for (uint64_t i = 0; i < new_arr.size; i++)
>
> put the declaration at the top of the function with the rest of them.
> global comment; fix all of them.
ACK.
>
> > +		mnl_attr_put_u64(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, new_arr.val[i]);
>
> Why can't this put be done in the loop above as the string is processed?
ACK.
>

^ permalink raw reply

* Re: [PATCH] net: usb: cx82310_eth: stop parsing reboot marker as packet
From: Tianchu Chen @ 2026-07-01  2:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: andrew+netdev, davem, edumazet, pabeni, linux-usb, netdev
In-Reply-To: <20260630154255.2954c33a@kernel.org>

July 1, 2026 at 6:42 AM, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:

> 
> On Tue, 30 Jun 2026 10:30:53 +0000 Tianchu Chen wrote:
> 
> > 
> > June 30, 2026 at 8:44 AM, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:
> >  On Thu, 25 Jun 2026 15:32:04 +0000 Tianchu Chen wrote:
> >  > From: Tianchu Chen <flynnnchen@tencent.com>
> >  > 
> >  > Discovered by Atuin - Automated Vulnerability Discovery Engine.
> >  > 
> >  > cx82310_rx_fixup() treats an RX length of 0xffff as a device reboot
> >  > marker and schedules work to re-enable ethernet mode, but then continues
> >  > processing the marker as a normal packet length. This is an out-of-bounds
> >  > heap write controlled by the usb device.
> >  > 
> >  Where? Can you be more specific in the commit message? At a glance 
> >  the accesses seem to be bound-checked with skb->len.
> >  -- 
> >  pw-bot: cr
> >  
> >  
> >  
> >  The "len > skb->len" check bounds the source read, but the overflow is on the
> >  destination buffer.
> >  
> >  The buggy path is:
> >  
> >  if (len == 0xffff) {
> >  netdev_info(dev->net, "router was rebooted, re-enabling ethernet mode");
> >  schedule_work(&priv->reenable_work);
> >  /* <- BUG: missing return; 0xffff bypasses the oversized-length reject */
> >  } else if (len > CX82310_MTU) {
> >  netdev_err(dev->net, "RX packet too long: %d B\n", len);
> >  return 0;
> >  }
> >  if (len > skb->len) {
> >  dev->partial_len = skb->len; // skb->len is bounded by the USB transfer size (4K)
> >  dev->partial_rem = len - skb->len;
> >  memcpy((void *)dev->partial_data, skb->data,
> >  dev->partial_len); /* <- TRIGGER: can copy 4K bytes into 1516-byte partial_data */
> > 
> If skb->len (== dev->partial_len) is not bound-checked to the size
> of dev->partial_data - aren't there more paths that could hit this
> overflow? Are you fixing the right thing?

Yes, skb->len and CX82310_MTU are different limits here.

skb->len is the amount of data received in the current USB RX URB. This
driver sets dev->rx_urb_size to 4096, so skb->len can be much larger than
the network frame MTU.

The safety invariant for the partial_data copy is not that skb->len is
MTU-bounded by itself. It is that, for normal frames, the code first rejects
len > CX82310_MTU, and then reaches the partial-packet path only when
len > skb->len. Therefore, on the normal path:

	skb->len < len <= CX82310_MTU

so copying skb->len bytes into partial_data is safe, since partial_data is
allocated as dev->hard_mtu = CX82310_MTU + 2.

The 0xffff reboot marker is the only case that breaks that invariant:
len == 0xffff is handled in a separate branch, and it just bypasses the
"len > CX82310_MTU" check entirely.

So the only case that may trigger this OOB-write is len == 0xffff, which is a
reboot signal and should be skipped from being parsed as normal packet.

Also, skb->len is governed by the USB RX URB size, not by the network MTU;
it is a different length limit from CX82310_MTU.

I believe this addresses the concern, but please let me know if you see any
remaining issue and/or would prefer a v2.

Best regards,
Tianchu

^ permalink raw reply

* [PATCH v2 iproute2-next] ss: stop displaying dccp sockets
From: Yafang Shao @ 2026-07-01  2:50 UTC (permalink / raw)
  To: stephen; +Cc: kuniyu, laoar.shao, netdev
In-Reply-To: <20260630185600.43d55c7c@phoenix.local>

DCCP support was retired in kernel commit 2a63dd0edf38 ("net: Retire
DCCP socket."). However, ss still attempts to query DCCP sockets via
netlink, which triggers repeated SELinux warnings in dmesg:

  SELinux: unrecognized netlink message: protocol=4 nlmsg_type=19 \
    sclass=netlink_tcpdiag_socket pid=188945 comm=ss

Stop sending DCCPDIAG_GETSOCK netlink messages to suppress these
warnings and align ss with the kernel change.

After this commit, running `ss -d` fails with:

  # ./misc/ss -d
  ./misc/ss: invalid option -- 'd'
  [...]

  # ./misc/ss --dccp
  ./misc/ss: unrecognized option '--dccp'
  [...]

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
---
 man/man8/ss.8 |  5 +----
 misc/ss.c     | 41 ++++++-----------------------------------
 2 files changed, 7 insertions(+), 39 deletions(-)

diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 70e0a566..3871612d 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -377,9 +377,6 @@ Display TCP sockets.
 .B \-u, \-\-udp
 Display UDP sockets.
 .TP
-.B \-d, \-\-dccp
-Display DCCP sockets.
-.TP
 .B \-w, \-\-raw
 Display RAW sockets.
 .TP
@@ -411,7 +408,7 @@ supported: unix, inet, inet6, link, netlink, vsock, tipc, xdp.
 .B \-A QUERY, \-\-query=QUERY, \-\-socket=QUERY
 List of socket tables to dump, separated by commas. The following identifiers
 are understood: all, inet, tcp, udp, raw, unix, packet, netlink, unix_dgram,
-unix_stream, unix_seqpacket, packet_raw, packet_dgram, dccp, sctp, tipc,
+unix_stream, unix_seqpacket, packet_raw, packet_dgram, sctp, tipc,
 vsock_stream, vsock_dgram, xdp, mptcp. Any item in the list may optionally be
 prefixed by an exclamation mark
 .RB ( ! )
diff --git a/misc/ss.c b/misc/ss.c
index 14e9f27a..b5f59a37 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -195,7 +195,6 @@ static const char *dg_proto;
 enum {
 	TCP_DB,
 	MPTCP_DB,
-	DCCP_DB,
 	UDP_DB,
 	RAW_DB,
 	UNIX_DG_DB,
@@ -215,7 +214,7 @@ enum {
 #define PACKET_DBM ((1<<PACKET_DG_DB)|(1<<PACKET_R_DB))
 #define UNIX_DBM ((1<<UNIX_DG_DB)|(1<<UNIX_ST_DB)|(1<<UNIX_SQ_DB))
 #define ALL_DB ((1<<MAX_DB)-1)
-#define INET_L4_DBM ((1<<TCP_DB)|(1<<MPTCP_DB)|(1<<UDP_DB)|(1<<DCCP_DB)|(1<<SCTP_DB))
+#define INET_L4_DBM ((1<<TCP_DB)|(1<<MPTCP_DB)|(1<<UDP_DB)|(1<<SCTP_DB))
 #define INET_DBM (INET_L4_DBM | (1<<RAW_DB))
 #define VSOCK_DBM ((1<<VSOCK_ST_DB)|(1<<VSOCK_DG_DB))
 
@@ -274,10 +273,6 @@ static const struct filter default_dbs[MAX_DB] = {
 		.states   = SS_CONN,
 		.families = FAMILY_MASK(AF_INET) | FAMILY_MASK(AF_INET6),
 	},
-	[DCCP_DB] = {
-		.states   = SS_CONN,
-		.families = FAMILY_MASK(AF_INET) | FAMILY_MASK(AF_INET6),
-	},
 	[UDP_DB] = {
 		.states   = (1 << SS_ESTABLISHED),
 		.families = FAMILY_MASK(AF_INET) | FAMILY_MASK(AF_INET6),
@@ -388,13 +383,12 @@ static int filter_db_parse(struct filter *f, const char *s)
 		int dbs[MAX_DB + 1];
 	} db_name_tbl[] = {
 #define ENTRY(name, ...) { #name, { __VA_ARGS__, MAX_DB } }
-		ENTRY(all, UDP_DB, DCCP_DB, TCP_DB, MPTCP_DB, RAW_DB,
+		ENTRY(all, UDP_DB, TCP_DB, MPTCP_DB, RAW_DB,
 			   UNIX_ST_DB, UNIX_DG_DB, UNIX_SQ_DB,
 			   PACKET_R_DB, PACKET_DG_DB, NETLINK_DB,
 			   SCTP_DB, VSOCK_ST_DB, VSOCK_DG_DB, XDP_DB),
-		ENTRY(inet, UDP_DB, DCCP_DB, TCP_DB, MPTCP_DB, SCTP_DB, RAW_DB),
+		ENTRY(inet, UDP_DB, TCP_DB, MPTCP_DB, SCTP_DB, RAW_DB),
 		ENTRY(udp, UDP_DB),
-		ENTRY(dccp, DCCP_DB),
 		ENTRY(tcp, TCP_DB),
 		ENTRY(mptcp, MPTCP_DB),
 		ENTRY(sctp, SCTP_DB),
@@ -935,8 +929,6 @@ static const char *proto_name(int protocol)
 		return "mptcp";
 	case IPPROTO_SCTP:
 		return "sctp";
-	case IPPROTO_DCCP:
-		return "dccp";
 	case IPPROTO_ICMPV6:
 		return "icmp6";
 	}
@@ -3897,8 +3889,6 @@ static int tcpdiag_send(int fd, int protocol, struct filter *f)
 
 	if (protocol == IPPROTO_TCP)
 		req.nlh.nlmsg_type = TCPDIAG_GETSOCK;
-	else if (protocol == IPPROTO_DCCP)
-		req.nlh.nlmsg_type = DCCPDIAG_GETSOCK;
 	else
 		return -1;
 
@@ -4134,7 +4124,7 @@ static int inet_show_netlink(struct filter *f, FILE *dump_fp, int protocol)
 
 	/* Suppress netlink errors. Older kernels do not support extended
 	 * protocol requests using INET_DIAG_REQ_PROTOCOL, and some protocols
-	 * may not be available in the running kernel (e.g. SCTP, DCCP).
+	 * may not be available in the running kernel (e.g. SCTP).
 	 * In both cases the kernel returns EINVAL which would cause
 	 * rtnl_dump_error() to print a confusing "RTNETLINK answers" error.
 	 */
@@ -4309,18 +4299,6 @@ static int mptcp_show(struct filter *f)
 	return 0;
 }
 
-static int dccp_show(struct filter *f)
-{
-	if (!filter_af_get(f, AF_INET) && !filter_af_get(f, AF_INET6))
-		return 0;
-
-	if (!getenv("PROC_NET_DCCP") && !getenv("PROC_ROOT")
-	    && inet_show_netlink(f, NULL, IPPROTO_DCCP) == 0)
-		return 0;
-
-	return 0;
-}
-
 static int sctp_show(struct filter *f)
 {
 	if (!filter_af_get(f, AF_INET) && !filter_af_get(f, AF_INET6))
@@ -5779,7 +5757,6 @@ static void _usage(FILE *dest)
 "   -M, --mptcp         display only MPTCP sockets\n"
 "   -S, --sctp          display only SCTP sockets\n"
 "   -u, --udp           display only UDP sockets\n"
-"   -d, --dccp          display only DCCP sockets\n"
 "   -w, --raw           display only RAW sockets\n"
 "   -x, --unix          display only Unix domain sockets\n"
 "       --tipc          display only TIPC sockets\n"
@@ -5795,7 +5772,7 @@ static void _usage(FILE *dest)
 "       --inet-sockopt  show various inet socket options\n"
 "\n"
 "   -A, --query=QUERY, --socket=QUERY\n"
-"       QUERY := {all|inet|tcp|mptcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|packet_raw|packet_dgram|netlink|dccp|sctp|vsock_stream|vsock_dgram|tipc|xdp}[,QUERY]\n"
+"       QUERY := {all|inet|tcp|mptcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|packet_raw|packet_dgram|netlink|sctp|vsock_stream|vsock_dgram|tipc|xdp}[,QUERY]\n"
 "\n"
 "   -D, --diag=FILE     Dump raw information about TCP sockets to FILE\n"
 "   -F, --filter=FILE   read filter information from FILE\n"
@@ -5907,7 +5884,6 @@ static const struct option long_opts[] = {
 	{ "threads", 0, 0, 'T' },
 	{ "bpf", 0, 0, 'b' },
 	{ "events", 0, 0, 'E' },
-	{ "dccp", 0, 0, 'd' },
 	{ "tcp", 0, 0, 't' },
 	{ "sctp", 0, 0, 'S' },
 	{ "udp", 0, 0, 'u' },
@@ -5961,7 +5937,7 @@ int main(int argc, char *argv[])
 	int state_filter = 0;
 
 	while ((ch = getopt_long(argc, argv,
-				 "dhalBetuwxnro460spTbEf:mMiA:D:F:vVzZN:KHQSO",
+				 "halBetuwxnro460spTbEf:mMiA:D:F:vVzZN:KHQSO",
 				 long_opts, NULL)) != EOF) {
 		switch (ch) {
 		case 'n':
@@ -5996,9 +5972,6 @@ int main(int argc, char *argv[])
 		case 'E':
 			follow_events = 1;
 			break;
-		case 'd':
-			filter_db_set(&current_filter, DCCP_DB, true);
-			break;
 		case 't':
 			filter_db_set(&current_filter, TCP_DB, true);
 			break;
@@ -6290,8 +6263,6 @@ int main(int argc, char *argv[])
 		udp_show(&current_filter);
 	if (current_filter.dbs & (1<<TCP_DB))
 		tcp_show(&current_filter);
-	if (current_filter.dbs & (1<<DCCP_DB))
-		dccp_show(&current_filter);
 	if (current_filter.dbs & (1<<SCTP_DB))
 		sctp_show(&current_filter);
 	if (current_filter.dbs & VSOCK_DBM)
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH iproute2-next v2 2/2] devlink: support u64-array values in devlink param show/set
From: Ratheesh Kannoth @ 2026-07-01  2:57 UTC (permalink / raw)
  To: David Ahern
  Cc: stephen, kuba, linux-kernel, netdev, andrew+netdev, edumazet,
	pabeni, jiri
In-Reply-To: <9d9cceae-3934-4dd6-ae8e-af995ae6b0ab@kernel.org>

On 2026-06-30 at 20:06:17, David Ahern (dsahern@kernel.org) wrote:
> On 6/29/26 7:50 PM, Ratheesh Kannoth wrote:
> > --- a/devlink/devlink.c
> > +++ b/devlink/devlink.c
> > +{
> > +	struct devlink_param_u64_array new_arr = {};
> > +	char *copy, *token, *saveptr = NULL;
> > +	char delim[] = " ,";
> > +	uint64_t val;
> > +	int err;
> > +
> > +	copy = strdup(param_value);
> > +	if (!copy)
> > +		return -ENOMEM;
> > +
> > +	token = strtok_r(copy, delim, &saveptr);
> > +	while (token) {
> > +		if (new_arr.size >= DEVLINK_PARAM_MAX_ARRAY_SIZE) {
> > +			free(copy);
> > +			pr_err("Too many array elements (max %d)\n",
> > +			       DEVLINK_PARAM_MAX_ARRAY_SIZE);
> > +			return -EINVAL;
> > +		}
> > +		err = get_u64((__u64 *)&val, token, 10);
> > +		if (err) {
> > +			free(copy);
> > +			pr_err("Value \"%s\" is not a number or not within range\n",
> > +			       token);
> > +			return err;
> > +		}
> > +		new_arr.val[new_arr.size++] = val;
> > +		token = strtok_r(NULL, delim, &saveptr);
> > +	}
> > +	free(copy);
> > +
> > +	if (cur && param_value_u64_array_equal(&new_arr, cur))
> > +		return 1;
> > +
> > +	for (uint64_t i = 0; i < new_arr.size; i++)
> > +		mnl_attr_put_u64(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, new_arr.val[i]);
>
> Why can't this put be done in the loop above as the string is processed?

We need to complete parsing first to check if the new array is identical to the current one
(param_value_u64_array_equal()). If they match, the function returns early without populating
the Netlink attributes. This follows the coding std used for other data types in devlink.

^ permalink raw reply

* Re:Re: [PATCH] net: ipv4: fix TOCTOU race in __ip_do_redirect
From: huanglei @ 2026-07-01  3:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: dsahern, idosch, davem, kuba, pabeni, horms, netdev, linux-kernel,
	Lei Huang
In-Reply-To: <CANn89iLuPPWYKCV02e_35n6Wkh7E=jQfo9aoz=0A6jq14wm6Mw@mail.gmail.com>

Thank you for the review. You are absolutely right.

I traced the callers and confirmed that fib_compute_spec_dst() and __ip_do_redirect() are both invoked from protocol handlers called via ip_protocol_deliver_rcu() in ip_local_deliver_finish(), which holds rcu_read_lock() across the entire dispatch. The fib_lookup() internal rcu_read_lock/unlock is only nesting, so res.fi and res.nhc remain protected by the outer lock after fib_lookup() returns.

Thanks again for the correction.


At 2026-06-30 20:31:09, "Eric Dumazet" <edumazet@google.com> wrote:
>On Tue, Jun 30, 2026 at 5:24 AM Lei Huang <huanglei814@163.com> wrote:
>>
>> From: Lei Huang <huanglei@kylinos.cn>
>>
>> fib_lookup() internally acquires and releases rcu_read_lock and always uses
>> FIB_LOOKUP_NOREF (no refcount on fib_info). After it returns, res (a local
>> struct fib_result on the stack) has its nhc field pointing into the
>> fib_info internal nexthop array, but RCU protection is already dropped.
>> A concurrent route deletion can free the fib_info via kfree_rcu, making
>> res.nhc a stale pointer. Subsequent FIB_RES_NHC(res) reads this stale value
>> and update_or_create_fnhe() dereferences it, causing UAF.
>>
>> Fix by wrap the entire fib_lookup + FIB_RES_NHC + update_or_create_fnhe
>> region in an explicit rcu_read_lock/unlock to keep the fib_info alive
>> throughout the critical section.
>>
>> Signed-off-by: Lei Huang <huanglei@kylinos.cn>
>
>You forgot to include a Fixes: tag.
>
>Please read Documentation/process/maintainer-netdev.rst
>
>Anyway, this patch isn't needed; all callers of this helper already
>use rcu_read_lock().
>
>I am guessing all of them are called from ip_protocol_deliver_rcu()
>
>If you think about this, LOCKDEP would have fired a warning years ago
>at line 769:
>
>in_dev = __in_dev_get_rcu(dev);
>
>
>pw-bot: cr

^ permalink raw reply

* RE: [PATCH net-next v7 4/4] net: phy: realtek: load firmware for RTL8261C_CG
From: Javen @ 2026-07-01  3:12 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: hkallweit1@gmail.com, linux@armlinux.org.uk, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	顾晓军, nb@tipi-net.de, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, daniel@makrotopia.org,
	vladimir.oltean@nxp.com, nic_swsd@realtek.com
In-Reply-To: <1ac195f7-29c2-45a3-84a9-cfc5366aee6f@lunn.ch>

>
>On Mon, Jun 29, 2026 at 02:47:18PM +0800, javen wrote:
>> From: Javen Xu <javen_xu@realsil.com.cn>
>>
>> This patch adds support for loading firmware. Download some parameters
>> for RTL8261C_CG.
>>
>> Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
>> ---
>> Changes in v2:
>>  - remove __pack, struct rtl8261x_fw_header and rtl8261x_fw_entry will
>> not pad
>>  - reverse xmas tree for some definition
>>  - add explanation on rtl_phy_write_mmd_bits()
>>
>> Changes in v3:
>>  - add struct rtl8261x_priv
>>
>> Changes in v4:
>>  - add struct device *dev
>>
>> Changes in v5:
>>  - no changes
>>
>> Changes in v6:
>>  - replace rtl_phy_write_mmd_bits with phy_modify_mmd, keep mdio lock
>>  - check msb and lsb at the beginning of rtl8261x_fw_execute_entry()
>>  - add comments on rtl8261x_config_init()
>>
>> Changes in v7:
>>  - no changes
>> ---
>>  drivers/net/phy/realtek/realtek_main.c | 220
>> +++++++++++++++++++++++++
>>  1 file changed, 220 insertions(+)
>>
>> diff --git a/drivers/net/phy/realtek/realtek_main.c
>> b/drivers/net/phy/realtek/realtek_main.c
>> index ef3700894ebf..bf7bc19fb44c 100644
>> --- a/drivers/net/phy/realtek/realtek_main.c
>> +++ b/drivers/net/phy/realtek/realtek_main.c
>> @@ -8,7 +8,9 @@
>>   * Copyright (c) 2004 Freescale Semiconductor, Inc.
>>   */
>>  #include <linux/bitops.h>
>> +#include <linux/crc32.h>
>>  #include <linux/ethtool_netlink.h>
>> +#include <linux/firmware.h>
>>  #include <linux/of.h>
>>  #include <linux/phy.h>
>>  #include <linux/pm_wakeirq.h>
>> @@ -281,6 +283,42 @@
>>                                        RTL8261X_INT_ALDPS_CHG | \
>>                                        RTL8261X_INT_JABBER)
>>
>> +#define FW_MAIN_MAGIC                        0x52544C38
>> +#define FW_SUB_MAGIC_8261C           0x32363143
>> +#define RTL8261X_POLL_TIMEOUT_MS     100
>> +
>> +#define RTL8261C_CE_FW_NAME  "rtl_nic/rtl8261c.bin"
>> +MODULE_FIRMWARE(RTL8261C_CE_FW_NAME);
>> +
>> +enum rtl8261x_fw_op {
>> +     OP_WRITE = 0x00,        /* Write */
>> +     OP_POLL  = 0x02,        /* Polling */
>> +};
>> +
>> +struct rtl8261x_fw_header {
>> +     __le32 main_magic;      /* Main magic number 0x52544C38 ("RTL8") */
>> +     __le32 sub_magic;       /* Sub magic number */
>> +     __le16 version_major;   /* Major version */
>> +     __le16 version_minor;   /* Minor version */
>> +     __le16 num_entries;     /* Number of entries */
>> +     __le16 reserved;        /* Reserved */
>> +     __le32 crc32;           /* CRC32 checksum */
>> +};
>> +
>> +struct rtl8261x_fw_entry {
>> +     __u8  type;             /* Operation type (OP_*) */
>> +     __u8  dev;              /* MMD device */
>> +     __le16 addr;            /* Register address */
>> +     __u8  msb;              /* MSB bit position */
>> +     __u8  lsb;              /* LSB bit position */
>> +     __le16 value;           /* Value to write/compare */
>> +     __le16 timeout_ms;      /* Poll timeout in milliseconds */
>> +     __u8  poll_set;         /* Poll for set (1) or clear (0) */
>> +     __u8  reserved;         /* Reserved */
>> +};
>
>Are there other devices which need firmware download? Do they use the
>same header? I'm just wondering if this will be reused by other devices?
>
Hi Andrew,

Currently, RTL8261C is the only device which needs this firmware download flow.

Future Realtek PHY ICs which require firmware download are expected to use the same firmware format, so
the current header/entry definition is intended to be reusable.

BRs,
Javen

>        Andrew

^ permalink raw reply

* [PATCH iproute2-next v3 0/2] devlink: support u64-array devlink parameters
From: Ratheesh Kannoth @ 2026-07-01  3:13 UTC (permalink / raw)
  To: stephen, dsahern, kuba, linux-kernel, netdev
  Cc: rkannoth, andrew+netdev, edumazet, pabeni, jiri

The kernel gained support for devlink parameters of type
DEVLINK_VAR_ATTR_TYPE_U64_ARRAY.  These parameters carry a variable-length
list of u64 values encoded as multiple DEVLINK_ATTR_PARAM_VALUE_DATA
attributes.  This is used by drivers that need to expose ordered lists of
configuration values, such as the Marvell CN20K npc_srch_order parameter.

This series updates the devlink tool to handle the new UAPI and adds
show/set support for u64-array parameters on both device and port params.

Patch 1 switches devlink param show/set to use DEVLINK_VAR_ATTR_TYPE_*
constants instead of generic MNL_TYPE_* values when interpreting
DEVLINK_ATTR_PARAM_TYPE.  The kernel now reports param types using
devlink_var_attr_type, so userspace must use the matching symbols.

Patch 2 adds parsing, display, and configuration support for
DEVLINK_VAR_ATTR_TYPE_U64_ARRAY.  Values are shown as a space-separated
list of u64 elements.  Setting accepts a space- or comma-separated list
and emits one DEVLINK_ATTR_PARAM_VALUE_DATA attribute per element.

Tested on CN20K hardware with npc_srch_order:

  # show search order
  devlink dev param show pci/0002:01:00.0 name npc_srch_order
  pci/0002:01:00.0:
    name npc_srch_order type driver-specific
      values:
        cmode runtime value  value  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

  # set search order
  devlink dev param set pci/0002:01:00.0 name npc_srch_order \
    value 31,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30 \
    cmode runtime

Ratheesh Kannoth (2):
  devlink: use DEVLINK_VAR_ATTR_TYPE_* in param show/set
  devlink: support u64-array values in devlink param show/set

 devlink/devlink.c            | 178 ++++++++++++++++++++++++++++++++---
 include/uapi/linux/devlink.h |   1 +
 2 files changed, 164 insertions(+), 15 deletions(-)

--
v2 -> v3: Addressed David comments
	https://lore.kernel.org/netdev/akSCBN0N_7ug1-Fy@rkannoth-OptiPlex-7090/

v1 -> v2: Addressed David comments
	https://lore.kernel.org/netdev/20260615041042.549715-1-rkannoth@marvell.com/

2.43.0

^ permalink raw reply

* [PATCH iproute2-next v3 1/2] devlink: use DEVLINK_VAR_ATTR_TYPE_* in param show/set
From: Ratheesh Kannoth @ 2026-07-01  3:13 UTC (permalink / raw)
  To: stephen, dsahern, kuba, linux-kernel, netdev
  Cc: rkannoth, andrew+netdev, edumazet, pabeni, jiri
In-Reply-To: <20260701031359.839221-1-rkannoth@marvell.com>

Replace MNL_TYPE_* constants with DEVLINK_VAR_ATTR_TYPE_* when
handling DEVLINK_ATTR_PARAM_TYPE in param value display and set
paths. The kernel uAPI now exposes these values directly via
devlink_var_attr_type.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 devlink/devlink.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 434a91fe..803ea5d7 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -3526,7 +3526,7 @@ static int pr_out_param_value_print(const char *nla_name, int nla_type,
 	print_string(PRINT_FP, NULL, " %s ", label);
 
 	switch (nla_type) {
-	case MNL_TYPE_U8:
+	case DEVLINK_VAR_ATTR_TYPE_U8:
 		if (conv_exists) {
 			err = param_val_conv_str_get(param_val_conv,
 						     PARAM_VAL_CONV_LEN,
@@ -3541,7 +3541,7 @@ static int pr_out_param_value_print(const char *nla_name, int nla_type,
 				   mnl_attr_get_u8(val_attr));
 		}
 		break;
-	case MNL_TYPE_U16:
+	case DEVLINK_VAR_ATTR_TYPE_U16:
 		if (conv_exists) {
 			err = param_val_conv_str_get(param_val_conv,
 						     PARAM_VAL_CONV_LEN,
@@ -3556,7 +3556,7 @@ static int pr_out_param_value_print(const char *nla_name, int nla_type,
 				   mnl_attr_get_u16(val_attr));
 		}
 		break;
-	case MNL_TYPE_U32:
+	case DEVLINK_VAR_ATTR_TYPE_U32:
 		if (conv_exists) {
 			err = param_val_conv_str_get(param_val_conv,
 						     PARAM_VAL_CONV_LEN,
@@ -3571,11 +3571,11 @@ static int pr_out_param_value_print(const char *nla_name, int nla_type,
 				   mnl_attr_get_u32(val_attr));
 		}
 		break;
-	case MNL_TYPE_STRING:
+	case DEVLINK_VAR_ATTR_TYPE_STRING:
 		print_string(PRINT_ANY, label, "%s",
 			     mnl_attr_get_str(val_attr));
 		break;
-	case MNL_TYPE_FLAG:
+	case DEVLINK_VAR_ATTR_TYPE_FLAG:
 		if (flag_as_u8)
 			print_bool(PRINT_ANY, label, "%s",
 				   mnl_attr_get_u8(val_attr));
@@ -3753,22 +3753,22 @@ static int cmd_dev_param_set_cb(const struct nlmsghdr *nlh, void *data)
 			ctx->cmode_found = true;
 			val_attr = nla_value[DEVLINK_ATTR_PARAM_VALUE_DATA];
 			switch (nla_type) {
-			case MNL_TYPE_U8:
+			case DEVLINK_VAR_ATTR_TYPE_U8:
 				ctx->value.vu8 = mnl_attr_get_u8(val_attr);
 				break;
-			case MNL_TYPE_U16:
+			case DEVLINK_VAR_ATTR_TYPE_U16:
 				ctx->value.vu16 = mnl_attr_get_u16(val_attr);
 				break;
-			case MNL_TYPE_U32:
+			case DEVLINK_VAR_ATTR_TYPE_U32:
 				ctx->value.vu32 = mnl_attr_get_u32(val_attr);
 				break;
-			case MNL_TYPE_U64:
+			case DEVLINK_VAR_ATTR_TYPE_U64:
 				ctx->value.vu64 = mnl_attr_get_u64(val_attr);
 				break;
-			case MNL_TYPE_STRING:
+			case DEVLINK_VAR_ATTR_TYPE_STRING:
 				ctx->value.vstr = mnl_attr_get_str(val_attr);
 				break;
-			case MNL_TYPE_FLAG:
+			case DEVLINK_VAR_ATTR_TYPE_FLAG:
 				ctx->value.vbool = val_attr ? true : false;
 				break;
 			}
@@ -3841,7 +3841,7 @@ static int cmd_dev_param_set(struct dl *dl)
 
 	mnl_attr_put_u8(nlh, DEVLINK_ATTR_PARAM_TYPE, ctx.nla_type);
 	switch (ctx.nla_type) {
-	case MNL_TYPE_U8:
+	case DEVLINK_VAR_ATTR_TYPE_U8:
 		if (conv_exists) {
 			err = param_val_conv_uint_get(param_val_conv,
 						      PARAM_VAL_CONV_LEN,
@@ -3858,7 +3858,7 @@ static int cmd_dev_param_set(struct dl *dl)
 			return 0;
 		mnl_attr_put_u8(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, val_u8);
 		break;
-	case MNL_TYPE_U16:
+	case DEVLINK_VAR_ATTR_TYPE_U16:
 		if (conv_exists) {
 			err = param_val_conv_uint_get(param_val_conv,
 						      PARAM_VAL_CONV_LEN,
@@ -3875,7 +3875,7 @@ static int cmd_dev_param_set(struct dl *dl)
 			return 0;
 		mnl_attr_put_u16(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, val_u16);
 		break;
-	case MNL_TYPE_U32:
+	case DEVLINK_VAR_ATTR_TYPE_U32:
 		if (conv_exists) {
 			err = param_val_conv_uint_get(param_val_conv,
 						      PARAM_VAL_CONV_LEN,
@@ -3892,7 +3892,7 @@ static int cmd_dev_param_set(struct dl *dl)
 			return 0;
 		mnl_attr_put_u32(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, val_u32);
 		break;
-	case MNL_TYPE_U64:
+	case DEVLINK_VAR_ATTR_TYPE_U64:
 		if (conv_exists)
 			err = param_val_conv_uint_get(param_val_conv,
 						      PARAM_VAL_CONV_LEN,
@@ -3907,7 +3907,7 @@ static int cmd_dev_param_set(struct dl *dl)
 			return 0;
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, val_u64);
 		break;
-	case MNL_TYPE_FLAG:
+	case DEVLINK_VAR_ATTR_TYPE_FLAG:
 		err = str_to_bool(dl->opts.param_value, &val_bool);
 		if (err)
 			goto err_param_value_parse;
@@ -3917,7 +3917,7 @@ static int cmd_dev_param_set(struct dl *dl)
 			mnl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA,
 				     0, NULL);
 		break;
-	case MNL_TYPE_STRING:
+	case DEVLINK_VAR_ATTR_TYPE_STRING:
 		mnl_attr_put_strz(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA,
 				  dl->opts.param_value);
 		if (!strcmp(dl->opts.param_value, ctx.value.vstr))
-- 
2.43.0


^ permalink raw reply related

* [PATCH iproute2-next v3 2/2] devlink: support u64-array values in devlink param show/set
From: Ratheesh Kannoth @ 2026-07-01  3:13 UTC (permalink / raw)
  To: stephen, dsahern, kuba, linux-kernel, netdev
  Cc: rkannoth, andrew+netdev, edumazet, pabeni, jiri
In-Reply-To: <20260701031359.839221-1-rkannoth@marvell.com>

Add support for DEVLINK_VAR_ATTR_TYPE_U64_ARRAY parameters that carry
multiple DEVLINK_ATTR_PARAM_VALUE_DATA attributes. Parse and display
u64 array values in param show, and accept space- or comma-separated
u64 values in devlink and port param set commands.

  - Show search order

  devlink dev param show pci/0002:01:00.0 name npc_srch_order
  pci/0002:01:00.0:
    name npc_srch_order type driver-specific
      values:
        cmode runtime value  value  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

   - Set search order

   devlink dev param set pci/0002:01:00.0 name npc_srch_order value 31,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,\
		22,23,24,25,26,27,28,29,30  cmode runtime

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 devlink/devlink.c | 157 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 153 insertions(+), 4 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 803ea5d7..c6566937 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -3515,13 +3515,116 @@ static const struct param_val_conv param_val_conv[] = {
 };
 
 #define PARAM_VAL_CONV_LEN ARRAY_SIZE(param_val_conv)
+#define DEVLINK_PARAM_MAX_ARRAY_SIZE 32
+
+struct devlink_param_u64_array {
+	uint64_t size;
+	uint64_t val[DEVLINK_PARAM_MAX_ARRAY_SIZE];
+};
+
+static int param_value_nested_u64_attr_cb(const struct nlattr *attr, void *data)
+{
+	struct devlink_param_u64_array *arr = data;
+	unsigned int len;
+
+	if (mnl_attr_get_type(attr) != DEVLINK_ATTR_PARAM_VALUE_DATA)
+		return MNL_CB_OK;
+
+	if (arr->size >= DEVLINK_PARAM_MAX_ARRAY_SIZE)
+		return MNL_CB_ERROR;
+
+	len = mnl_attr_get_payload_len(attr);
+	if (len == sizeof(uint32_t))
+		arr->val[arr->size++] = mnl_attr_get_u32(attr);
+	else if (len == sizeof(uint64_t))
+		arr->val[arr->size++] = mnl_attr_get_u64(attr);
+	else
+		return MNL_CB_ERROR;
+
+	return MNL_CB_OK;
+}
+
+static int param_value_u64_array_fill(struct nlattr *nl,
+				      struct devlink_param_u64_array *arr)
+{
+	int err;
+
+	arr->size = 0;
+	err = mnl_attr_parse_nested(nl, param_value_nested_u64_attr_cb, arr);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	return 0;
+}
+
+static bool param_value_u64_array_equal(const struct devlink_param_u64_array *a,
+					const struct devlink_param_u64_array *b)
+{
+	uint64_t i;
+
+	if (a->size != b->size)
+		return false;
+
+	for (i = 0; i < a->size; i++) {
+		if (a->val[i] != b->val[i])
+			return false;
+	}
+
+	return true;
+}
+
+static int param_value_u64_array_put_from_str(struct nlmsghdr *nlh,
+					      const char *param_value,
+					      const struct devlink_param_u64_array *cur)
+{
+	struct devlink_param_u64_array new_arr = {};
+	char *copy, *token, *saveptr = NULL;
+	char delim[] = " ,";
+	uint64_t val;
+	int err, i;
+
+	copy = strdup(param_value);
+	if (!copy)
+		return -ENOMEM;
+
+	token = strtok_r(copy, delim, &saveptr);
+	while (token) {
+		if (new_arr.size >= DEVLINK_PARAM_MAX_ARRAY_SIZE) {
+			free(copy);
+			pr_err("Too many array elements (max %d)\n",
+			       DEVLINK_PARAM_MAX_ARRAY_SIZE);
+			return -EINVAL;
+		}
+		err = get_u64((__u64 *)&val, token, 10);
+		if (err) {
+			free(copy);
+			pr_err("Value \"%s\" is not a number or not within range\n",
+			       token);
+			return err;
+		}
+		new_arr.val[new_arr.size++] = val;
+		token = strtok_r(NULL, delim, &saveptr);
+	}
+	free(copy);
+
+	/* Check current and new values. If both are equal, bail out */
+	if (cur && param_value_u64_array_equal(&new_arr, cur))
+		return 1;
+
+	for (i = 0; i < new_arr.size; i++)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, new_arr.val[i]);
+
+	return 0;
+}
 
 static int pr_out_param_value_print(const char *nla_name, int nla_type,
 				     struct nlattr *val_attr, bool conv_exists,
-				     const char *label, bool flag_as_u8)
+				     const char *label, bool flag_as_u8, struct nlattr *nl)
 {
+	struct devlink_param_u64_array u64_arr = { };
 	const char *vstr;
-	int err;
+	char buffer[1024];
+	int err, cnt = 0;
 
 	print_string(PRINT_FP, NULL, " %s ", label);
 
@@ -3582,6 +3685,20 @@ static int pr_out_param_value_print(const char *nla_name, int nla_type,
 		else
 			print_bool(PRINT_ANY, label, "%s", val_attr);
 		break;
+	case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
+		err = param_value_u64_array_fill(flag_as_u8 ? val_attr : nl, &u64_arr);
+		if (err)
+			return err;
+
+		for (uint64_t i = 0; i < u64_arr.size; i++) {
+			if (i)
+				cnt += snprintf(buffer + cnt, sizeof(buffer) - cnt, " ");
+			cnt += snprintf(buffer + cnt, sizeof(buffer) - cnt,
+					"%" PRIu64, u64_arr.val[i]);
+		}
+
+		print_string(PRINT_ANY, label, "%s", buffer);
+		break;
 	}
 
 	return 0;
@@ -3601,6 +3718,7 @@ static void pr_out_param_value(struct dl *dl, const char *nla_name,
 
 	if (!nla_value[DEVLINK_ATTR_PARAM_VALUE_CMODE] ||
 	    (nla_type != MNL_TYPE_FLAG &&
+	     nla_type != DEVLINK_VAR_ATTR_TYPE_U64_ARRAY &&
 	     !nla_value[DEVLINK_ATTR_PARAM_VALUE_DATA]))
 		return;
 
@@ -3614,14 +3732,14 @@ static void pr_out_param_value(struct dl *dl, const char *nla_name,
 					    nla_name);
 
 	err = pr_out_param_value_print(nla_name, nla_type, val_attr,
-				       conv_exists, "value", false);
+				       conv_exists, "value", false, nl);
 	if (err)
 		return;
 
 	val_attr = nla_value[DEVLINK_ATTR_PARAM_VALUE_DEFAULT];
 	if (val_attr) {
 		err = pr_out_param_value_print(nla_name, nla_type, val_attr,
-					       conv_exists, "default", true);
+					       conv_exists, "default", true, nl);
 		if (err)
 			return;
 	}
@@ -3704,6 +3822,7 @@ struct param_ctx {
 		uint64_t vu64;
 		const char *vstr;
 		bool vbool;
+		struct devlink_param_u64_array u64arr;
 	} value;
 };
 
@@ -3745,6 +3864,7 @@ static int cmd_dev_param_set_cb(const struct nlmsghdr *nlh, void *data)
 
 		if (!nla_value[DEVLINK_ATTR_PARAM_VALUE_CMODE] ||
 		    (nla_type != MNL_TYPE_FLAG &&
+		     nla_type != DEVLINK_VAR_ATTR_TYPE_U64_ARRAY &&
 		     !nla_value[DEVLINK_ATTR_PARAM_VALUE_DATA]))
 			return MNL_CB_ERROR;
 
@@ -3771,6 +3891,12 @@ static int cmd_dev_param_set_cb(const struct nlmsghdr *nlh, void *data)
 			case DEVLINK_VAR_ATTR_TYPE_FLAG:
 				ctx->value.vbool = val_attr ? true : false;
 				break;
+			case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
+				err = param_value_u64_array_fill(param_value_attr,
+								 &ctx->value.u64arr);
+				if (err)
+					return MNL_CB_ERROR;
+				break;
 			}
 			break;
 		}
@@ -3923,6 +4049,14 @@ static int cmd_dev_param_set(struct dl *dl)
 		if (!strcmp(dl->opts.param_value, ctx.value.vstr))
 			return 0;
 		break;
+	case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
+		err = param_value_u64_array_put_from_str(nlh, dl->opts.param_value,
+							 &ctx.value.u64arr);
+		if (err == 1)
+			return 0;
+		if (err)
+			return err;
+		break;
 	default:
 		printf("Value type not supported\n");
 		return -ENOTSUP;
@@ -5369,6 +5503,7 @@ static int cmd_port_param_set_cb(const struct nlmsghdr *nlh, void *data)
 
 		if (!nla_value[DEVLINK_ATTR_PARAM_VALUE_CMODE] ||
 		    (nla_type != MNL_TYPE_FLAG &&
+		     nla_type != DEVLINK_VAR_ATTR_TYPE_U64_ARRAY &&
 		     !nla_value[DEVLINK_ATTR_PARAM_VALUE_DATA]))
 			return MNL_CB_ERROR;
 
@@ -5391,6 +5526,12 @@ static int cmd_port_param_set_cb(const struct nlmsghdr *nlh, void *data)
 			case MNL_TYPE_FLAG:
 				ctx->value.vbool = val_attr ? true : false;
 				break;
+			case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
+				err = param_value_u64_array_fill(param_value_attr,
+								 &ctx->value.u64arr);
+				if (err)
+					return MNL_CB_ERROR;
+				break;
 			}
 			break;
 		}
@@ -5519,6 +5660,14 @@ static int cmd_port_param_set(struct dl *dl)
 		if (!strcmp(dl->opts.param_value, ctx.value.vstr))
 			return 0;
 		break;
+	case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
+		err = param_value_u64_array_put_from_str(nlh, dl->opts.param_value,
+							 &ctx.value.u64arr);
+		if (err == 1)
+			return 0;
+		if (err)
+			return err;
+		break;
 	default:
 		printf("Value type not supported\n");
 		return -ENOTSUP;
-- 
2.43.0


^ permalink raw reply related

* [PATCH] net: phylink: reject unsupported speed/duplex in ksettings_set() with PHY
From: muhammad.nazim.amirul.nazle.asmade @ 2026-07-01  3:17 UTC (permalink / raw)
  To: linux, andrew, hkallweit1
  Cc: davem, edumazet, kuba, pabeni, netdev, linux-kernel

From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>

When using ethtool to change speed and duplex on a phylink-managed
interface with a PHY attached, the requested speed/duplex combination
is not validated against the MAC's supported capabilities before being
passed down to the PHY layer.

commit df0acdc59b09 ("net: phylink: fix ksettings_set() ethtool call")
and commit 03c44a21d033 ("net: phylink: actually fix ksettings_set()
ethtool call") introduced masking of the PHY advertising modes against
pl->supported, but did not add an explicit check that the requested
speed/duplex itself is within the MAC's capability set.

The AUTONEG_DISABLE path in the non-PHY case already uses
phy_caps_lookup() to validate speed/duplex against pl->supported.
Extend the same validation to the pl->phydev path so that ethtool
requests for unsupported speed/duplex combinations are rejected with
-EINVAL before reaching the PHY layer.

Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
 drivers/net/phy/phylink.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 087ac63f9193..22f9bbd381bd 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2989,6 +2989,10 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
 	if (pl->phydev) {
 		struct ethtool_link_ksettings phy_kset = *kset;
 
+		if (!phy_caps_lookup(kset->base.speed, kset->base.duplex,
+				     pl->supported, true))
+			return -EINVAL;
+
 		linkmode_and(phy_kset.link_modes.advertising,
 			     phy_kset.link_modes.advertising,
 			     pl->supported);
-- 
2.43.7


^ permalink raw reply related

* [PATCH net-next v2] ipv4: hold a consistent view of rt->dst.dev under RCU
From: xuanqiang.luo @ 2026-07-01  3:16 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Ido Schimmel
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, linux-kernel,
	Xuanqiang Luo
In-Reply-To: <20260630094250.29386-1-xuanqiang.luo@linux.dev>

From: Xuanqiang Luo <luoxuanqiang@kylinos.cn>

rt_flush_dev() walks the per-CPU uncached route list and rewrites
rt->dst.dev in-place to blackhole_netdev under spin_lock_bh().
This lock does not exclude RCU readers, which may load rt->dst.dev
multiple times within a single rcu_read_lock() region.

ip_rt_send_redirect() is a typical example: it reads rt->dst.dev
three times to obtain in_dev, the L3 master ifindex, and net.
A concurrent device unregistration can repoint rt->dst.dev to
blackhole_netdev between those reads, making the reader combine
state from two different net_devices — for instance, an in_dev
from the real device but a netns and peer lookup from the blackhole
device.  ip_rt_get_source() has the same problem: it reads
rt->dst.dev four times to obtain the output ifindex, the netns,
and the source address, so a concurrent flush can cause the source
selection to mix state from different devices.

Take a single dst_dev_rcu() snapshot of rt->dst.dev at the start
of each affected RCU reader and use that snapshot throughout, so
concurrent flushes cannot cause mid-function inconsistency.
Publish the in-place write in rt_flush_dev() with rcu_assign_pointer()
to match the readers.

Fixes: caacf05e5ad1a ("ipv4: Properly purge netdev references on uncached routes.")
Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
---
v2:
- Use dst_dev_rcu() and dev_net_rcu() for the RCU readers.
- Use rcu_assign_pointer() when publishing the uncached route device
  replacement.
- Slightly adjust the commit message wording because this issue was found
  by inspection, not from an observed user-visible failure.

v1: https://lore.kernel.org/all/20260630094250.29386-1-xuanqiang.luo@linux.dev/

 net/ipv4/route.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3f3de5164d6e5..57f38467e6d0c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -873,6 +873,7 @@ static void ipv4_negative_advice(struct sock *sk,
 void ip_rt_send_redirect(struct sk_buff *skb)
 {
 	struct rtable *rt = skb_rtable(skb);
+	struct net_device *dev;
 	struct in_device *in_dev;
 	struct inet_peer *peer;
 	struct net *net;
@@ -880,15 +881,16 @@ void ip_rt_send_redirect(struct sk_buff *skb)
 	int vif;
 
 	rcu_read_lock();
-	in_dev = __in_dev_get_rcu(rt->dst.dev);
+	dev = dst_dev_rcu(&rt->dst);
+	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev || !IN_DEV_TX_REDIRECTS(in_dev)) {
 		rcu_read_unlock();
 		return;
 	}
 	log_martians = IN_DEV_LOG_MARTIANS(in_dev);
-	vif = l3mdev_master_ifindex_rcu(rt->dst.dev);
+	vif = l3mdev_master_ifindex_rcu(dev);
 
-	net = dev_net(rt->dst.dev);
+	net = dev_net_rcu(dev);
 	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, vif);
 	if (!peer) {
 		rcu_read_unlock();
@@ -1287,29 +1289,32 @@ void ip_rt_get_source(u8 *addr, struct sk_buff *skb, struct rtable *rt)
 {
 	__be32 src;
 
-	if (rt_is_output_route(rt))
+	rcu_read_lock();
+	if (rt_is_output_route(rt)) {
 		src = ip_hdr(skb)->saddr;
-	else {
+	} else {
 		struct fib_result res;
 		struct iphdr *iph = ip_hdr(skb);
+		struct net_device *dev = dst_dev_rcu(&rt->dst);
+		struct net *net = dev_net_rcu(dev);
 		struct flowi4 fl4 = {
 			.daddr = iph->daddr,
 			.saddr = iph->saddr,
 			.flowi4_dscp = ip4h_dscp(iph),
-			.flowi4_oif = rt->dst.dev->ifindex,
+			.flowi4_oif = dev->ifindex,
 			.flowi4_iif = skb->dev->ifindex,
 			.flowi4_mark = skb->mark,
 		};
 
-		rcu_read_lock();
-		if (fib_lookup(dev_net(rt->dst.dev), &fl4, &res, 0) == 0)
-			src = fib_result_prefsrc(dev_net(rt->dst.dev), &res);
+		if (fib_lookup(net, &fl4, &res, 0) == 0)
+			src = fib_result_prefsrc(net, &res);
 		else
-			src = inet_select_addr(rt->dst.dev,
+			src = inet_select_addr(dev,
 					       rt_nexthop(rt, iph->daddr),
 					       RT_SCOPE_UNIVERSE);
-		rcu_read_unlock();
 	}
+	rcu_read_unlock();
+
 	memcpy(addr, &src, 4);
 }
 
@@ -1565,7 +1570,7 @@ void rt_flush_dev(struct net_device *dev)
 		list_for_each_entry_safe(rt, safe, &ul->head, dst.rt_uncached) {
 			if (rt->dst.dev != dev)
 				continue;
-			rt->dst.dev = blackhole_netdev;
+			rcu_assign_pointer(rt->dst.dev_rcu, blackhole_netdev);
 			netdev_ref_replace(dev, blackhole_netdev,
 					   &rt->dst.dev_tracker, GFP_ATOMIC);
 			list_del_init(&rt->dst.rt_uncached);
-- 
2.43.0

^ permalink raw reply related

* [PATCH net-next v2 0/1] net: rnpgbe: fix mailbox endianness
From: Dong Yibo @ 2026-07-01  3:22 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko
  Cc: netdev, linux-kernel, dong100, yaojun

The rnpgbe mailbox exchanges data through 32-bit MMIO registers in
little-endian wire format.  The original code had two problems:

 1. FW structs with __le16/__le32 fields were cast to (u32 *) before
    reaching the transport, hiding the endian annotations from sparse.

 2. No cpu_to_le32()/le32_to_cpu() conversion was performed between
    the CPU-endian MMIO values and the little-endian payload, causing
    data corruption on big-endian systems.

v2 fixes this by introducing union wrappers around the FW structs
and adding the missing byte-order conversions in the transport layer.
All pointer casts on the mailbox data path are eliminated.

Changelog:
v1 -> v2:
- Remove all pointer casts on the mailbox data path.  Use union
  wrappers (mbx_fw_cmd_req_u, mbx_fw_cmd_reply_u) that overlay
  each FW struct with a __le32 dwords[] array.  Callers fill
  named fields with cpu_to_le16/32() and pass dwords[] directly
  to the transport — no casts needed.
- Change transport signatures from u32 */void * to explicit
  __le32 * so sparse can verify endian correctness.
- Add comments in mucse_read_mbx_pf() and mucse_write_mbx_pf()
  explaining why memcpy_toio() cannot replace the readl()/writel()
  loop (the mailbox uses 32-bit MMIO registers, not byte-
  addressable RAM).

links:
---
v1: https://lore.kernel.org/netdev/20260617083531.251119-1-dong100@mucse.com/

Dong Yibo (1):
  net: rnpgbe: fix mailbox endianness and remove pointer casts

 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c    | 26 ++++--
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h    |  5 +-
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c | 82 ++++++++++---------
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h | 14 ++++
 4 files changed, 80 insertions(+), 47 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH net-next v2 1/1] net: rnpgbe: fix mailbox endianness and remove pointer casts
From: Dong Yibo @ 2026-07-01  3:22 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko
  Cc: netdev, linux-kernel, dong100, yaojun
In-Reply-To: <20260701032208.1843156-1-dong100@mucse.com>

The rnpgbe mailbox exchanges data through 32-bit MMIO registers in
little-endian wire format. The original code had two problems:

  1. FW structs (with __le16/__le32 fields) were cast to (u32 *)
     before reaching the mailbox transport, hiding the endian
     annotations from sparse.

  2. No cpu_to_le32()/le32_to_cpu() conversion was done between
     CPU-endian MMIO values and the little-endian payload, causing
     data corruption on big-endian systems.

Fix by adding the missing byte-order conversions in the transport
layer and introducing union wrappers (mbx_fw_cmd_req_u,
mbx_fw_cmd_reply_u) that overlay each FW struct with a __le32
dwords[] array. Callers fill named fields using cpu_to_le16/32(),
then pass dwords[] to the transport, which now takes explicit
__le32 * instead of u32 *. This eliminates all pointer casts on
the mailbox data path and lets sparse verify the conversions.

Fixes: 4543534c3ef5 ("net: rnpgbe: Add basic mbx ops support")
Signed-off-by: Dong Yibo <dong100@mucse.com>
---
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c    | 26 ++++--
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h    |  5 +-
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c | 82 ++++++++++---------
 .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h | 14 ++++
 4 files changed, 80 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c
index de5e29230b3c..c46408698263 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c
@@ -166,18 +166,23 @@ static void mucse_mbx_inc_pf_ack(struct mucse_hw *hw)
  *
  * Return: 0 on success, negative errno on failure
  **/
-static int mucse_read_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
+static int mucse_read_mbx_pf(struct mucse_hw *hw, __le32 *msg, u16 size)
 {
-	const int size_in_words = size / sizeof(u32);
+	const int size_in_words = size / sizeof(__le32);
 	struct mucse_mbx_info *mbx = &hw->mbx;
+	int off = MUCSE_MBX_FWPF_SHM;
 	int err;
 
 	err = mucse_obtain_mbx_lock_pf(hw);
 	if (err)
 		return err;
 
+	/* memcpy_fromio() is unsuitable: the mailbox uses 32-bit MMIO
+	 * registers, not byte-addressable RAM. readl() guarantees
+	 * the required 32-bit access width.
+	 */
 	for (int i = 0; i < size_in_words; i++)
-		msg[i] = mbx_data_rd32(mbx, MUCSE_MBX_FWPF_SHM + 4 * i);
+		msg[i] = cpu_to_le32(mbx_data_rd32(mbx, off + 4 * i));
 	/* Hw needs write data_reg at last */
 	mbx_data_wr32(mbx, MUCSE_MBX_FWPF_SHM, 0);
 	/* flush reqs as we have read this request data */
@@ -236,7 +241,7 @@ static int mucse_poll_for_msg(struct mucse_hw *hw)
  * Return: 0 if it successfully received a message notification and
  * copied it into the receive buffer, negative errno on failure
  **/
-int mucse_poll_and_read_mbx(struct mucse_hw *hw, u32 *msg, u16 size)
+int mucse_poll_and_read_mbx(struct mucse_hw *hw, __le32 *msg, u16 size)
 {
 	int err;
 
@@ -290,9 +295,9 @@ static void mucse_mbx_inc_pf_req(struct mucse_hw *hw)
  * Return: 0 if it successfully copied message into the buffer,
  * negative errno on failure
  **/
-static int mucse_write_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
+static int mucse_write_mbx_pf(struct mucse_hw *hw, const __le32 *msg, u16 size)
 {
-	const int size_in_words = size / sizeof(u32);
+	const int size_in_words = size / sizeof(__le32);
 	struct mucse_mbx_info *mbx = &hw->mbx;
 	int err;
 
@@ -300,8 +305,12 @@ static int mucse_write_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
 	if (err)
 		return err;
 
+	/* memcpy_toio() would decompose into arbitrary-width accesses;
+	 * the mailbox requires 32-bit MMIO writes via writel().
+	 */
 	for (int i = 0; i < size_in_words; i++)
-		mbx_data_wr32(mbx, MUCSE_MBX_FWPF_SHM + i * 4, msg[i]);
+		mbx_data_wr32(mbx, MUCSE_MBX_FWPF_SHM + i * 4,
+			      le32_to_cpu(msg[i]));
 
 	/* flush acks as we are overwriting the message buffer */
 	hw->mbx.fw_ack = mucse_mbx_get_fwack(mbx);
@@ -360,7 +369,8 @@ static int mucse_poll_for_ack(struct mucse_hw *hw)
  * Return: 0 if it successfully copied message into the buffer and
  * received an ack to that message within delay * timeout_cnt period
  **/
-int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw, u32 *msg, u16 size)
+int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw, const __le32 *msg,
+				 u16 size)
 {
 	int err;
 
diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h
index e6fcc8d1d3ca..75b88b18b04d 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h
@@ -14,7 +14,8 @@
 #define MUCSE_MBX_REQ             BIT(0) /* Request a req to mailbox */
 #define MUCSE_MBX_PFU             BIT(3) /* PF owns the mailbox buffer */
 
-int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw, u32 *msg, u16 size);
+int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw,
+				 const __le32 *msg, u16 size);
 void mucse_init_mbx_params_pf(struct mucse_hw *hw);
-int mucse_poll_and_read_mbx(struct mucse_hw *hw, u32 *msg, u16 size);
+int mucse_poll_and_read_mbx(struct mucse_hw *hw, __le32 *msg, u16 size);
 #endif /* _RNPGBE_MBX_H */
diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c
index 8c8bd5e8e1db..5ba74997beac 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c
@@ -20,32 +20,32 @@
  * Return: 0 on success, negative errno on failure
  **/
 static int mucse_fw_send_cmd_wait_resp(struct mucse_hw *hw,
-				       struct mbx_fw_cmd_req *req,
-				       struct mbx_fw_cmd_reply *reply)
+				       union mbx_fw_cmd_req_u *req,
+				       union mbx_fw_cmd_reply_u *reply)
 {
-	int len = le16_to_cpu(req->datalen);
+	int len = le16_to_cpu(req->r.datalen);
 	int retry_cnt = 3;
 	int err;
 
 	mutex_lock(&hw->mbx.lock);
-	err = mucse_write_and_wait_ack_mbx(hw, (u32 *)req, len);
+	err = mucse_write_and_wait_ack_mbx(hw, req->dwords, len);
 	if (err)
 		goto out;
 	do {
-		err = mucse_poll_and_read_mbx(hw, (u32 *)reply,
-					      sizeof(*reply));
+		err = mucse_poll_and_read_mbx(hw, reply->dwords,
+					      sizeof(reply->r));
 		if (err)
 			goto out;
 		/* mucse_write_and_wait_ack_mbx return 0 means fw has
 		 * received request, wait for the expect opcode
 		 * reply with 'retry_cnt' times.
 		 */
-	} while (--retry_cnt >= 0 && reply->opcode != req->opcode);
+	} while (--retry_cnt >= 0 && reply->r.opcode != req->r.opcode);
 out:
 	mutex_unlock(&hw->mbx.lock);
 	if (!err && retry_cnt < 0)
 		return -ETIMEDOUT;
-	if (!err && reply->error_code)
+	if (!err && reply->r.error_code)
 		return -EIO;
 
 	return err;
@@ -61,17 +61,19 @@ static int mucse_fw_send_cmd_wait_resp(struct mucse_hw *hw,
  **/
 static int mucse_mbx_get_info(struct mucse_hw *hw)
 {
-	struct mbx_fw_cmd_req req = {
-		.datalen = cpu_to_le16(MUCSE_MBX_REQ_HDR_LEN),
-		.opcode  = cpu_to_le16(GET_HW_INFO),
+	union mbx_fw_cmd_req_u req = {
+		.r = {
+			.datalen = cpu_to_le16(MUCSE_MBX_REQ_HDR_LEN),
+			.opcode  = cpu_to_le16(GET_HW_INFO),
+		},
 	};
-	struct mbx_fw_cmd_reply reply = {};
+	union mbx_fw_cmd_reply_u reply = {};
 	int err;
 
 	err = mucse_fw_send_cmd_wait_resp(hw, &req, &reply);
 	if (!err)
 		hw->pfvfnum = FIELD_GET(GENMASK_U16(7, 0),
-					le16_to_cpu(reply.hw_info.pfnum));
+					le16_to_cpu(reply.r.hw_info.pfnum));
 
 	return err;
 }
@@ -111,21 +113,23 @@ int mucse_mbx_sync_fw(struct mucse_hw *hw)
  **/
 int mucse_mbx_powerup(struct mucse_hw *hw, bool is_powerup)
 {
-	struct mbx_fw_cmd_req req = {
-		.datalen = cpu_to_le16(sizeof(req.powerup) +
-				       MUCSE_MBX_REQ_HDR_LEN),
-		.opcode  = cpu_to_le16(POWER_UP),
-		.powerup = {
-			/* fw needs this to reply correct cmd */
-			.version = cpu_to_le32(GENMASK_U32(31, 0)),
-			.status  = cpu_to_le32(is_powerup ? 1 : 0),
+	union mbx_fw_cmd_req_u req = {
+		.r = {
+			.datalen = cpu_to_le16(sizeof(req.r.powerup) +
+					       MUCSE_MBX_REQ_HDR_LEN),
+			.opcode  = cpu_to_le16(POWER_UP),
+			.powerup = {
+				/* fw needs this to reply correct cmd */
+				.version = cpu_to_le32(GENMASK_U32(31, 0)),
+				.status  = cpu_to_le32(is_powerup ? 1 : 0),
+			},
 		},
 	};
 	int len, err;
 
-	len = le16_to_cpu(req.datalen);
+	len = le16_to_cpu(req.r.datalen);
 	mutex_lock(&hw->mbx.lock);
-	err = mucse_write_and_wait_ack_mbx(hw, (u32 *)&req, len);
+	err = mucse_write_and_wait_ack_mbx(hw, req.dwords, len);
 	mutex_unlock(&hw->mbx.lock);
 
 	return err;
@@ -142,11 +146,13 @@ int mucse_mbx_powerup(struct mucse_hw *hw, bool is_powerup)
  **/
 int mucse_mbx_reset_hw(struct mucse_hw *hw)
 {
-	struct mbx_fw_cmd_req req = {
-		.datalen = cpu_to_le16(MUCSE_MBX_REQ_HDR_LEN),
-		.opcode  = cpu_to_le16(RESET_HW),
+	union mbx_fw_cmd_req_u req = {
+		.r = {
+			.datalen = cpu_to_le16(MUCSE_MBX_REQ_HDR_LEN),
+			.opcode  = cpu_to_le16(RESET_HW),
+		},
 	};
-	struct mbx_fw_cmd_reply reply = {};
+	union mbx_fw_cmd_reply_u reply = {};
 
 	return mucse_fw_send_cmd_wait_resp(hw, &req, &reply);
 }
@@ -166,24 +172,26 @@ int mucse_mbx_get_macaddr(struct mucse_hw *hw, int pfvfnum,
 			  u8 *mac_addr,
 			  int port)
 {
-	struct mbx_fw_cmd_req req = {
-		.datalen      = cpu_to_le16(sizeof(req.get_mac_addr) +
-					    MUCSE_MBX_REQ_HDR_LEN),
-		.opcode       = cpu_to_le16(GET_MAC_ADDRESS),
-		.get_mac_addr = {
-			.port_mask = cpu_to_le32(BIT(port)),
-			.pfvf_num  = cpu_to_le32(pfvfnum),
+	union mbx_fw_cmd_req_u req = {
+		.r = {
+			.datalen      = cpu_to_le16(sizeof(req.r.get_mac_addr) +
+						    MUCSE_MBX_REQ_HDR_LEN),
+			.opcode       = cpu_to_le16(GET_MAC_ADDRESS),
+			.get_mac_addr = {
+				.port_mask = cpu_to_le32(BIT(port)),
+				.pfvf_num  = cpu_to_le32(pfvfnum),
+			},
 		},
 	};
-	struct mbx_fw_cmd_reply reply = {};
+	union mbx_fw_cmd_reply_u reply = {};
 	int err;
 
 	err = mucse_fw_send_cmd_wait_resp(hw, &req, &reply);
 	if (err)
 		return err;
 
-	if (le32_to_cpu(reply.mac_addr.ports) & BIT(port))
-		memcpy(mac_addr, reply.mac_addr.addrs[port].mac, ETH_ALEN);
+	if (le32_to_cpu(reply.r.mac_addr.ports) & BIT(port))
+		memcpy(mac_addr, reply.r.mac_addr.addrs[port].mac, ETH_ALEN);
 	else
 		return -ENODATA;
 
diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h
index fb24fc12b613..fe996aeffc4d 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h
@@ -80,6 +80,20 @@ struct mbx_fw_cmd_reply {
 	};
 } __packed;
 
+/* Union wrappers to expose struct as __le32 dword array for mailbox
+ * transport, eliminating the need for pointer casts.  The __packed
+ * structs have no padding, so dwords[] overlays the fields exactly.
+ */
+union mbx_fw_cmd_req_u {
+	struct mbx_fw_cmd_req r;
+	__le32 dwords[sizeof(struct mbx_fw_cmd_req) / sizeof(__le32)];
+};
+
+union mbx_fw_cmd_reply_u {
+	struct mbx_fw_cmd_reply r;
+	__le32 dwords[sizeof(struct mbx_fw_cmd_reply) / sizeof(__le32)];
+};
+
 int mucse_mbx_sync_fw(struct mucse_hw *hw);
 int mucse_mbx_powerup(struct mucse_hw *hw, bool is_powerup);
 int mucse_mbx_reset_hw(struct mucse_hw *hw);
-- 
2.25.1


^ permalink raw reply related

* [PATCH net-next v2] ipv4: hold a consistent view of rt->dst.dev under RCU
From: xuanqiang.luo @ 2026-07-01  3:24 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Ido Schimmel
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, linux-kernel,
	Xuanqiang Luo

From: Xuanqiang Luo <luoxuanqiang@kylinos.cn>

rt_flush_dev() walks the per-CPU uncached route list and rewrites
rt->dst.dev in-place to blackhole_netdev under spin_lock_bh().
This lock does not exclude RCU readers, which may load rt->dst.dev
multiple times within a single rcu_read_lock() region.

ip_rt_send_redirect() is a typical example: it reads rt->dst.dev
three times to obtain in_dev, the L3 master ifindex, and net.
A concurrent device unregistration can repoint rt->dst.dev to
blackhole_netdev between those reads, making the reader combine
state from two different net_devices — for instance, an in_dev
from the real device but a netns and peer lookup from the blackhole
device.  ip_rt_get_source() has the same problem: it reads
rt->dst.dev four times to obtain the output ifindex, the netns,
and the source address, so a concurrent flush can cause the source
selection to mix state from different devices.

Take a single dst_dev_rcu() snapshot of rt->dst.dev at the start
of each affected RCU reader and use that snapshot throughout, so
concurrent flushes cannot cause mid-function inconsistency.
Publish the in-place write in rt_flush_dev() with rcu_assign_pointer()
to match the readers.

Fixes: caacf05e5ad1a ("ipv4: Properly purge netdev references on uncached routes.")
Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
---
v2:
- Use dst_dev_rcu() and dev_net_rcu() for the RCU readers.
- Use rcu_assign_pointer() when publishing the uncached route device
  replacement.
- Slightly adjust the commit message wording because this issue was found
  by inspection, not from an observed user-visible failure.

v1: https://lore.kernel.org/all/20260630094250.29386-1-xuanqiang.luo@linux.dev/

 net/ipv4/route.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3f3de5164d6e5..57f38467e6d0c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -873,6 +873,7 @@ static void ipv4_negative_advice(struct sock *sk,
 void ip_rt_send_redirect(struct sk_buff *skb)
 {
 	struct rtable *rt = skb_rtable(skb);
+	struct net_device *dev;
 	struct in_device *in_dev;
 	struct inet_peer *peer;
 	struct net *net;
@@ -880,15 +881,16 @@ void ip_rt_send_redirect(struct sk_buff *skb)
 	int vif;
 
 	rcu_read_lock();
-	in_dev = __in_dev_get_rcu(rt->dst.dev);
+	dev = dst_dev_rcu(&rt->dst);
+	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev || !IN_DEV_TX_REDIRECTS(in_dev)) {
 		rcu_read_unlock();
 		return;
 	}
 	log_martians = IN_DEV_LOG_MARTIANS(in_dev);
-	vif = l3mdev_master_ifindex_rcu(rt->dst.dev);
+	vif = l3mdev_master_ifindex_rcu(dev);
 
-	net = dev_net(rt->dst.dev);
+	net = dev_net_rcu(dev);
 	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, vif);
 	if (!peer) {
 		rcu_read_unlock();
@@ -1287,29 +1289,32 @@ void ip_rt_get_source(u8 *addr, struct sk_buff *skb, struct rtable *rt)
 {
 	__be32 src;
 
-	if (rt_is_output_route(rt))
+	rcu_read_lock();
+	if (rt_is_output_route(rt)) {
 		src = ip_hdr(skb)->saddr;
-	else {
+	} else {
 		struct fib_result res;
 		struct iphdr *iph = ip_hdr(skb);
+		struct net_device *dev = dst_dev_rcu(&rt->dst);
+		struct net *net = dev_net_rcu(dev);
 		struct flowi4 fl4 = {
 			.daddr = iph->daddr,
 			.saddr = iph->saddr,
 			.flowi4_dscp = ip4h_dscp(iph),
-			.flowi4_oif = rt->dst.dev->ifindex,
+			.flowi4_oif = dev->ifindex,
 			.flowi4_iif = skb->dev->ifindex,
 			.flowi4_mark = skb->mark,
 		};
 
-		rcu_read_lock();
-		if (fib_lookup(dev_net(rt->dst.dev), &fl4, &res, 0) == 0)
-			src = fib_result_prefsrc(dev_net(rt->dst.dev), &res);
+		if (fib_lookup(net, &fl4, &res, 0) == 0)
+			src = fib_result_prefsrc(net, &res);
 		else
-			src = inet_select_addr(rt->dst.dev,
+			src = inet_select_addr(dev,
 					       rt_nexthop(rt, iph->daddr),
 					       RT_SCOPE_UNIVERSE);
-		rcu_read_unlock();
 	}
+	rcu_read_unlock();
+
 	memcpy(addr, &src, 4);
 }
 
@@ -1565,7 +1570,7 @@ void rt_flush_dev(struct net_device *dev)
 		list_for_each_entry_safe(rt, safe, &ul->head, dst.rt_uncached) {
 			if (rt->dst.dev != dev)
 				continue;
-			rt->dst.dev = blackhole_netdev;
+			rcu_assign_pointer(rt->dst.dev_rcu, blackhole_netdev);
 			netdev_ref_replace(dev, blackhole_netdev,
 					   &rt->dst.dev_tracker, GFP_ATOMIC);
 			list_del_init(&rt->dst.rt_uncached);
-- 
2.43.0

^ permalink raw reply related

* [PATCH v2] xfrm: iptfs: propagate SKBFL_SHARED_FRAG in iptfs_skb_add_frags()
From: Chen YanJun @ 2026-07-01  3:31 UTC (permalink / raw)
  To: steffen.klassert, herbert, davem; +Cc: netdev, moomichen

From: Chen YanJun <moomichen@tencent.com>

When iptfs_skb_add_frags() copies frag references from the source
frag walk into a new SKB, it increments the page reference count via
__skb_frag_ref() but does not propagate SKBFL_SHARED_FRAG to the
destination SKB's skb_shinfo->flags.

If the source SKB carries shared frags (e.g. from a page-pool backed
receive path), the new inner SKB will appear to ESP as having privately
owned frags.  A subsequent esp_input() call for a nested transport-mode
SA then takes the no-COW fast path and decrypts in place, writing over
pages that are still referenced by the outer IPTFS SKB.  This causes
kernel-visible memory corruption and can trigger a panic.

All other frag-transfer helpers in the kernel (skb_try_coalesce,
skb_gro_receive, __pskb_copy_fclone, skb_shift, skb_segment) correctly
propagate SKBFL_SHARED_FRAG; align iptfs_skb_add_frags() with this
convention by setting the flag inside the loop immediately after
__skb_frag_ref() and nr_frags++, so every exit path that attaches a frag
unconditionally propagates SKBFL_SHARED_FRAG.

Fixes: 5f2b6a909574 ("xfrm: iptfs: add skb-fragment sharing code")
Signed-off-by: Chen YanJun <moomichen@tencent.com>

---
v2: move SKBFL_SHARED_FRAG assignment inside the loop, immediately after
    __skb_frag_ref()/nr_frags++, to also cover the early-return path
    that fires when the requested length is satisfied mid-frag (pointed
    out by maintainer review of v1).
---
 net/xfrm/xfrm_iptfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
index ad810d1f97c0..597aedeac26e 100644
--- a/net/xfrm/xfrm_iptfs.c
+++ b/net/xfrm/xfrm_iptfs.c
@@ -480,6 +480,7 @@ static int iptfs_skb_add_frags(struct sk_buff *skb,
 		}
 		__skb_frag_ref(tofrag);
 		shinfo->nr_frags++;
+		shinfo->flags |= SKBFL_SHARED_FRAG;

 		/* see if we are done */
 		fraglen = tofrag->len;
-- 
2.47.0

^ permalink raw reply related

* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: Askar Safin @ 2026-07-01  4:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Hildenbrand (Arm), akpm, avagin, axboe, collin.funk1,
	david.laight.linux, dhowells, fuse-devel, hch, jack, joannelkoong,
	kernel, linux-api, linux-fsdevel, linux-kernel, linux-mm, luto,
	metze, miklos, netdev, patches, pfalcato, torvalds, val, viro, w,
	willy
In-Reply-To: <20260629-bauland-knabbern-abgeladen-c0acbfa62cc2@brauner>

On Mon, Jun 29, 2026 at 11:56 AM Christian Brauner <brauner@kernel.org> wrote:
> The amount of regression reports that we got in short succession doesn't
> make it likely that we can merge a plain degradation.

Let me repeat: this v2 patchset fixes all regressions found so far,
except for major CRIU performance regression

-- 
Askar Safin

^ permalink raw reply

* [PATCH v3] net/liquidio: drop cached VF pci_dev LUT
From: Yuho Choi @ 2026-07-01  4:08 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev
  Cc: Kory Maincent, Zilin Guan, Arend van Spriel, Marco Crivellari,
	Uwe Kleine-König (The Capable Hub), Vadim Fedorenko,
	linux-kernel, Yuho Choi

The PF SR-IOV enable path caches VF pci_dev pointers in
dpiring_to_vfpcidev_lut[] by iterating with pci_get_device(). Those
entries do not own a reference, because the iterator drops the previous
device reference on each step. The cached pointer is then dereferenced
later when handling OCTEON_VF_FLR_REQUEST.

Replace the cached VF mapping with runtime lookup on the mailbox DPI
ring: derive the VF index from q_no, resolve the VF via exported PCI
IOV helpers, validate it with the PF pointer and VF ID, then issue
pcie_flr() and drop the reference with pci_dev_put(). Remove the
unused VF lookup table initialization and cleanup.

Fixes: ca6139ffc67ee ("liquidio CN23XX: sysfs VF config support")
Fixes: 8c978d059224 ("liquidio CN23XX: Mailbox support")
Signed-off-by: Yuho Choi <dbgh9129@gmail.com>
---
Changes in v3:
- Drop unrelated Xgene PCI probe note from the commit message.
Changes in v2:
- Replace use of pci_iov_virtfn_bus() with runtime VF lookup using exported helpers.
 .../net/ethernet/cavium/liquidio/lio_main.c   | 27 ---------------
 .../ethernet/cavium/liquidio/octeon_device.h  |  3 --
 .../ethernet/cavium/liquidio/octeon_mailbox.c | 33 ++++++++++++++++++-
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 0db08ac3d098..e303956b4bf1 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -3779,9 +3779,7 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 static int octeon_enable_sriov(struct octeon_device *oct)
 {
 	unsigned int num_vfs_alloced = oct->sriov_info.num_vfs_alloced;
-	struct pci_dev *vfdev;
 	int err;
-	u32 u;
 
 	if (OCTEON_CN23XX_PF(oct) && num_vfs_alloced) {
 		err = pci_enable_sriov(oct->pci_dev,
@@ -3794,23 +3792,6 @@ static int octeon_enable_sriov(struct octeon_device *oct)
 			return err;
 		}
 		oct->sriov_info.sriov_enabled = 1;
-
-		/* init lookup table that maps DPI ring number to VF pci_dev
-		 * struct pointer
-		 */
-		u = 0;
-		vfdev = pci_get_device(PCI_VENDOR_ID_CAVIUM,
-				       OCTEON_CN23XX_VF_VID, NULL);
-		while (vfdev) {
-			if (vfdev->is_virtfn &&
-			    (vfdev->physfn == oct->pci_dev)) {
-				oct->sriov_info.dpiring_to_vfpcidev_lut[u] =
-					vfdev;
-				u += oct->sriov_info.rings_per_vf;
-			}
-			vfdev = pci_get_device(PCI_VENDOR_ID_CAVIUM,
-					       OCTEON_CN23XX_VF_VID, vfdev);
-		}
 	}
 
 	return num_vfs_alloced;
@@ -3818,8 +3799,6 @@ static int octeon_enable_sriov(struct octeon_device *oct)
 
 static int lio_pci_sriov_disable(struct octeon_device *oct)
 {
-	int u;
-
 	if (pci_vfs_assigned(oct->pci_dev)) {
 		dev_err(&oct->pci_dev->dev, "VFs are still assigned to VMs.\n");
 		return -EPERM;
@@ -3827,12 +3806,6 @@ static int lio_pci_sriov_disable(struct octeon_device *oct)
 
 	pci_disable_sriov(oct->pci_dev);
 
-	u = 0;
-	while (u < MAX_POSSIBLE_VFS) {
-		oct->sriov_info.dpiring_to_vfpcidev_lut[u] = NULL;
-		u += oct->sriov_info.rings_per_vf;
-	}
-
 	oct->sriov_info.num_vfs_alloced = 0;
 	dev_info(&oct->pci_dev->dev, "oct->pf_num:%d disabled VFs\n",
 		 oct->pf_num);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index 19344b21f8fb..858a0fff2cc0 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -390,9 +390,6 @@ struct octeon_sriov_info {
 
 	struct lio_trusted_vf	trusted_vf;
 
-	/*lookup table that maps DPI ring number to VF pci_dev struct pointer*/
-	struct pci_dev *dpiring_to_vfpcidev_lut[MAX_POSSIBLE_VFS];
-
 	u64	vf_macaddr[MAX_POSSIBLE_VFS];
 
 	u16	vf_vlantci[MAX_POSSIBLE_VFS];
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
index ad685f5d0a13..697fcdc41e3c 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
@@ -26,6 +26,31 @@
 #include "octeon_mailbox.h"
 #include "cn23xx_pf_device.h"
 
+static struct pci_dev *lio_vf_pci_dev_by_qno(struct octeon_device *oct, u32 q_no)
+{
+	struct pci_dev *vfdev = NULL;
+	int vfidx;
+
+	if (!oct->sriov_info.rings_per_vf)
+		return NULL;
+
+	if (q_no % oct->sriov_info.rings_per_vf)
+		return NULL;
+
+	vfidx = q_no / oct->sriov_info.rings_per_vf;
+	if (vfidx >= oct->sriov_info.num_vfs_alloced)
+		return NULL;
+
+	while ((vfdev = pci_get_device(PCI_VENDOR_ID_CAVIUM,
+				       OCTEON_CN23XX_VF_VID, vfdev))) {
+		if (pci_physfn(vfdev) && pci_physfn(vfdev) == oct->pci_dev &&
+		    pci_iov_vf_id(vfdev) == vfidx)
+			return vfdev;
+	}
+
+	return NULL;
+}
+
 /**
  * octeon_mbox_read:
  * @mbox: Pointer mailbox
@@ -237,6 +262,7 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
 				   struct octeon_mbox_cmd *mbox_cmd)
 {
 	struct octeon_device *oct = mbox->oct_dev;
+	struct pci_dev *vfdev;
 
 	switch (mbox_cmd->msg.s.cmd) {
 	case OCTEON_VF_ACTIVE:
@@ -260,7 +286,12 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
 		dev_info(&oct->pci_dev->dev,
 			 "got a request for FLR from VF that owns DPI ring %u\n",
 			 mbox->q_no);
-		pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no]);
+		vfdev = lio_vf_pci_dev_by_qno(oct, mbox->q_no);
+		if (!vfdev)
+			break;
+
+		pcie_flr(vfdev);
+		pci_dev_put(vfdev);
 		break;
 
 	case OCTEON_PF_CHANGED_VF_MACADDR:
-- 
2.43.0


^ permalink raw reply related

* RE: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
From: Selvamani Rajagopal @ 2026-07-01  4:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Selvamani Rajagopal via B4 Relay, Parthiban Veerasooran,
	Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Lunn
In-Reply-To: <20260630154647.468076db@kernel.org>

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, June 30, 2026 3:47 PM
> Subject: Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two
> different kernel instances
> 
> > I believe xmit path and IRQ thread would run in different kernel
> > instances. Imagine oa_tc6_try_spi_transfer call fails in threaded
> > IRQ. It would set disable_irq. If xmit function didn't see that when
> > it checked, but it is set before placing skb buffer in the
> > waiting_tx_skb pointer (due to skb_linearize for example), the skb
> > would be stuck in waiting_tx_skb.
> 
> Perhaps, but wouldn't that cause a stall not a leak?

I should have mentioned disable_traffic flag instead of disable_irq, though
Irq would be disabled when this flag is set.

Once it is set, there won't be anymore traffic, unless driver is unloaded and
reloaded. At this state, tx queue would be blocked  The skb pointer
would be held in waiting_tx_skb for ever. (skb lost as far as kernel is concerned)

> Please do your digging and submit high quality patches which don't

While I appreciate you spending time to review this, I do believe the changes in the patch 
is the right approach. Not sure what I am missing here.

^ permalink raw reply

* Re: [PATCH bpf-next v5 0/3] bpf, sockmap: reject a packet-modifying SK_SKB stream parser
From: Ihor Solodrai @ 2026-07-01  4:48 UTC (permalink / raw)
  To: Sechang Lim, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	John Fastabend, Jakub Sitnicki, Eduard Zingerman
  Cc: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Kumar Kartikeya Dwivedi, Simon Horman,
	Shuah Khan, Jiayuan Chen, Bobby Eshleman, netdev, bpf,
	linux-kselftest, linux-kernel
In-Reply-To: <20260620024423.4141004-1-rhkrqnwk98@gmail.com>

On 2026-06-19 7:44 p.m., Sechang Lim wrote:
> A BPF_PROG_TYPE_SK_SKB stream parser runs on strparser's message head,
> which can chain skbs through frag_list. A parser that resizes the skb
> frees the frag_list segments that strparser still tracks through
> skb_nextp, leading to a use-after-free.
> 
> A stream parser is only meant to measure the next message, not to modify
> the packet, so reject a packet-modifying parser at attach time.
> 
> v5:
>   - target bpf-next instead of bpf
>   - add Reviewed-by tag (Jiayuan Chen)
> 
> v4:
>   - https://lore.kernel.org/all/20260619062959.3277612-1-rhkrqnwk98@gmail.com/
> 
> v3:
>   - https://lore.kernel.org/all/20260618102718.2331468-1-rhkrqnwk98@gmail.com/
> 
> v2:
>   - https://lore.kernel.org/all/20260612123553.2724240-1-rhkrqnwk98@gmail.com/
> 
> v1:
>   - https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@gmail.com/
> 
> Sechang Lim (3):
>    selftests/bpf: don't modify the skb in the strparser parser prog
>    bpf, sockmap: reject a packet-modifying SK_SKB stream parser
>    selftests/bpf: test rejection of a packet-modifying SK_SKB stream
>      parser


Hi Sechang, all,

This series broke test_maps (test_sockmap subtest) on the bpf
tree. Currently on BPF CI the test fails on bpf, but passes on
bpf-next (it doesn't have the series yet).

test_maps fails with:

     + taskset 0xF ./test_maps
     [    8.352378] clocksource: Watchdog remote CPU 2 read timed out
     Failed sockmap unexpected timeout

See test_maps.c:995 in test_sockmap(): the 30s select() times out and
test_maps exits 1. Note there is no "Failed stream parser bpf prog
attach" message, the parser attaches fine.

The series was merged into bpf on 2026-06-26 00:42 UTC

CI runs:
   last good (pre-merge, 06-25): 
https://github.com/kernel-patches/bpf/actions/runs/28158326456
   first bad (post-merge, 06-26): 
https://github.com/kernel-patches/bpf/actions/runs/28210181858
   recent bad (06-30): 
https://github.com/kernel-patches/bpf/actions/runs/28475936023

Confirmed locally reverting the 3 commits and rebuilding makes
test_sockmap pass again.

Could you please help investigate?

Thanks!


> 
>   net/core/sock_map.c                           | 20 ++++++++++++
>   .../selftests/bpf/prog_tests/sockmap_strp.c   | 31 +++++++++++++++++++
>   .../selftests/bpf/progs/sockmap_parse_prog.c  | 22 -------------
>   .../selftests/bpf/progs/test_sockmap_strp.c   |  7 +++++
>   4 files changed, 58 insertions(+), 22 deletions(-)
> 


^ permalink raw reply

* Re: [PATCH bpf-next v5 0/3] bpf, sockmap: reject a packet-modifying SK_SKB stream parser
From: Jiayuan Chen @ 2026-07-01  5:03 UTC (permalink / raw)
  To: Ihor Solodrai, Sechang Lim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, Jakub Sitnicki, Eduard Zingerman
  Cc: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Kumar Kartikeya Dwivedi, Simon Horman,
	Shuah Khan, Bobby Eshleman, netdev, bpf, linux-kselftest,
	linux-kernel
In-Reply-To: <e3a91acd-2b4d-4e93-a3bb-a0e9ee5ede0f@linux.dev>


On 7/1/26 12:48 PM, Ihor Solodrai wrote:
> On 2026-06-19 7:44 p.m., Sechang Lim wrote:
>> A BPF_PROG_TYPE_SK_SKB stream parser runs on strparser's message head,
>> which can chain skbs through frag_list. A parser that resizes the skb
>> frees the frag_list segments that strparser still tracks through
>> skb_nextp, leading to a use-after-free.
>>
>> A stream parser is only meant to measure the next message, not to modify
>> the packet, so reject a packet-modifying parser at attach time.
>>
>> v5:
>>   - target bpf-next instead of bpf
>>   - add Reviewed-by tag (Jiayuan Chen)
>>
>> v4:
>>   - 
>> https://lore.kernel.org/all/20260619062959.3277612-1-rhkrqnwk98@gmail.com/
>>
>> v3:
>>   - 
>> https://lore.kernel.org/all/20260618102718.2331468-1-rhkrqnwk98@gmail.com/
>>
>> v2:
>>   - 
>> https://lore.kernel.org/all/20260612123553.2724240-1-rhkrqnwk98@gmail.com/
>>
>> v1:
>>   - 
>> https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@gmail.com/
>>
>> Sechang Lim (3):
>>    selftests/bpf: don't modify the skb in the strparser parser prog
>>    bpf, sockmap: reject a packet-modifying SK_SKB stream parser
>>    selftests/bpf: test rejection of a packet-modifying SK_SKB stream
>>      parser
>
>
> Hi Sechang, all,
>
> This series broke test_maps (test_sockmap subtest) on the bpf
> tree. Currently on BPF CI the test fails on bpf, but passes on
> bpf-next (it doesn't have the series yet).
>
> test_maps fails with:
>
>     + taskset 0xF ./test_maps
>     [    8.352378] clocksource: Watchdog remote CPU 2 read timed out
>     Failed sockmap unexpected timeout
>
> See test_maps.c:995 in test_sockmap(): the 30s select() times out and
> test_maps exits 1. Note there is no "Failed stream parser bpf prog
> attach" message, the parser attaches fine.
>
> The series was merged into bpf on 2026-06-26 00:42 UTC
>
> CI runs:
>   last good (pre-merge, 06-25): 
> https://github.com/kernel-patches/bpf/actions/runs/28158326456
>   first bad (post-merge, 06-26): 
> https://github.com/kernel-patches/bpf/actions/runs/28210181858
>   recent bad (06-30): 
> https://github.com/kernel-patches/bpf/actions/runs/28475936023
>
> Confirmed locally reverting the 3 commits and rebuilding makes
> test_sockmap pass again.
>
> Could you please help investigate?

I'll work on this.



^ permalink raw reply

* Re: [PATCH net 5/9] netfilter: nfnetlink_cthelper: cap to maximum number of expectation per master
From: Florian Westphal @ 2026-07-01  5:04 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo
In-Reply-To: <20260630045243.2657-6-fw@strlen.de>

Florian Westphal <fw@strlen.de> wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> 
> If userspace helper policy updates sets maximum number of expectation to
> zero, cap it to NF_CT_EXPECT_MAX_CNT (255) on updates too.
> 
> Fixes: 397c8300972f ("netfilter: nf_conntrack_helper: cap maximum number of expectation at helper registration")
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Pablo, can you please check

https://sashiko.dev/#/message/20260630045243.2657-6-fw%40strlen.de

?

AFAICS the comment is correct, but it should be handled in a
followup patch rather than a v2.

^ permalink raw reply

* Re: [PATCH net 3/9] netfilter: ipset: fix race between dump and ip_set_list resize
From: Florian Westphal @ 2026-07-01  5:09 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, kadlec, xmei5
In-Reply-To: <20260630045243.2657-4-fw@strlen.de>

Florian Westphal <fw@strlen.de> wrote:
> From: Xiang Mei <xmei5@asu.edu>

Xiang, Jozsef, could you please have a look at

https://sashiko.dev/#/patchset/20260630045243.2657-1-fw%40strlen.de

AFAICS it's correct but should be handled in a followup patch rather
than a v2.

Thanks!

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net-next] i40e: Avoid repeating RX filter warning
From: Rinitha, SX @ 2026-07-01  5:23 UTC (permalink / raw)
  To: Chris Packham, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Lobakin, Aleksander
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Blair Steven, Carl Smith
In-Reply-To: <20260514003733.1718771-1-chris.packham@alliedtelesis.co.nz>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Chris Packham
> Sent: 14 May 2026 06:08
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Lobakin, Aleksander <aleksander.lobakin@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Blair Steven <blair.steven@alliedtelesis.co.nz>; Carl Smith <carl.smith@alliedtelesis.co.nz>; Chris Packham <chris.packham@alliedtelesis.co.nz>
> Subject: [Intel-wired-lan] [PATCH net-next] i40e: Avoid repeating RX filter warning
>
> When the i40e runs out of space for RX filters the driver switches to promiscuous mode and warns that it has done so. In scenarios with a large number of these filters this can generate a lot of warnings. For
example:
>
>   $ dmesg -c > /dev/null
>   $ ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 1
>   $ ip link set dev eth7 master br0
>   $ bridge vlan add vid 1 dev eth7 pvid untagged self
>   $ bridge vlan add vid 2-4094 dev eth7 tagged
>   $ dmesg
>   [   25.601705] i40e 0000:01:00.1: Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
>   [   25.601833] i40e 0000:01:00.1: Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
>   [   25.601961] i40e 0000:01:00.1: Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
>   [   25.602088] i40e 0000:01:00.1: Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
>   [   25.602216] i40e 0000:01:00.1: Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
>   [   25.602344] i40e 0000:01:00.1: Error LIBIE_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
>   ...
>
> Use test_and_set_bit() so that the warning is only issued when the driver enables promiscuous mode and not on the addition of subsequent RX filters.
>
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> ---
>
> Resend with net-next tag
>
> drivers/net/ethernet/intel/i40e/i40e_main.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>

Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox